You are currently viewing Understanding Counter in MapReduce along with code

Understanding Counter in MapReduce along with code

In this blog, we will learn about the Counter concepts in MapReduce. Here we will talk about types of Counters and their implementation in the Java programming language.

What is a counter in MapReduce?

Counter in MapReduce is used to track the status of the MapReduce job. They are used to keep track of occurrences of events happening during MapReduce job execution. Counters are grouped by an enum and can have multiple counters in each group.

Types of Counters in MapReduce

There are basically two types of counters available in MapReduce in general.

● Built-in counters

● User-defined counter

Built-in counters

In Apache Hadoop MapReduce jobs, there are multiple inbuilt counters calculated to come up with helpful metrics, which can be helpful to verify if the job was executed successfully and if the result came as per the expectation or not. These counters are of ENUM type. There are multiple different types of ENUM metrics available, which are divided into multiple groups. We will discuss each one of them one by one. Let’s take each group one by one to understand its purpose.

Below is the list of groups that are used as a built-in counter.

FileInputFormat Counter

MapReduce jobs are reading records in a File Input Format. So, these counters are helpful to Identify how many bytes have been read by map tasks in the MapReduce job using FileInputFormat.

FileOutputFormat Counter

MapReduce jobs are writing records in a File Output Format. So, these counters are helpful to Identify how many bytes have been written by map task or reduce task in MapReduce jobs using FileOutputFormat.

Job Counter

Job Counter is used to retrieve detail related to the configuration of the job. This is a configuration that was defined before the job started. So, these counters stores job level statistics. It does not work on statistics and gets updated while the job is running. These job counters are calculated on the master machine at the job level. So, don’t need to make it traverse throughout the network.

File System Counter

This counter is used to calculate the amount of bytes read by the file system as well as the amount of bytes written by the file system. Below are the sub counters in this counter group:

FileSystem bytes are written – The number of bytes written by the file system.

FileSystem bytes read – The number of bytes reads by the file system.

MapReduce Task Counter

As we have seen, job counters are used to calculate information before job execution starts, A task counter is used to retrieve information about the task while it is getting executed. So, it collects information while the job is running. Some examples of such counters are the number of records reads and the number of records written while running a job. We can also identify records read or written on a particular mapper or reducer.

User-defined counters

Till now, we have discussed all in-build counters. But what if we want some statistics which is not provided by existing Hadoop MapReduce counters. So, Hadoop is providing an extra feature for the same. MapReduce user-defined counters come to the rescue you in such a case. There are some pre-defined ways in which we can calculate our user-defined counters based on the client’s custom requirement. We can also say a custom counter to them. In java, we are using the enum type for calculating custom or user-defined counters.

In a Hadoop job, we can be defined as no. of enums as per our requirement. Here, each enum is a counter group, and each field of enum is considered as a counter in the particular counter group. So, this is compiled time approach. So, we can’t define or change it at runtime. We need to specify it before the job run.

Dynamic Counters in Hadoop

Apart from enum-based user-defined counters, which are available at compile-time, which means we can not change or add new counters at runtime, what if we want to add new counters dynamically at runtime? Here are dynamic counters which we can use at runtime. But, we can’t define them at compile time.

Implementation of Counter in MapReduce

Now, let’s implement a sample program to create two counters, ODD_NUMBERS_COUNT and EVEN_NUMBERS_COUNT. Suppose we have numbers having a single number on each line; how we will calculate the count of even and odd numbers. Let’s see now.

Let’s first take a sample Input file below detail:

So, the output should be

ODD_NUMBERS_COUNT     3

EVEN_NUMBERS_COUNT   3

Now, let’s see the program to see how do we have  implemented the counters:

Mapper Class

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

// Mapper
public class MapperClass extends Mapper<LongWritable, Text, Text, IntWritable>{
  private Text item = new Text();
  IntWritable sales = new IntWritable();
  public void map(LongWritable key, Text value, Context context)
        throws IOException, InterruptedException {
     // Splitting the line on tab
     int number = Integer.parseInt(value.toString());


     if(number%2==0) {
        context.getCounter(Driver.Counter.EVEN_COUNT).increment(1);
     }else {
        // incrementing counter
        context.getCounter(Driver.Counter.ODD_COUNT).increment(1);

     }
     context.write(item, sales);
  }
}

Driver Class

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class Driver extends Configured implements Tool  {

    enum Counter {
        ODD_COUNT,
        EVEN_COUNT
    }

    public static void main(String[] args) throws Exception {
        int exitFlag = ToolRunner.run(new Driver(), args);
        System.exit(exitFlag);
    }

    public int run(String[] args) throws Exception {
        Configuration conf=new Configuration();
        Job job = Job.getInstance(conf, "ODD-EVEN Counter");
        job.setJarByClass(getClass());
        job.setMapperClass(MapperClass.class);
        job.setNumReduceTasks(0);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        return job.waitForCompletion(true) ? 0 : 1;
    }

}

Here, we have two classes.

One is the Driver class, where we have created one enum to implement counter. As we have discussed before, the enum name Counter is the counter group name, and ODD_NUMBER and EVEN_NUMBER are two counters which we will use to calculate even numbers and odd numbers.

Also, we have mentioned different configuration parameters like which is the mapper class, and there is no reducer class, and some other basic details like output key and output value class type, input path from where we will read input and output path where we will store the output.

Another class is mapper class, where we will actually calculate the counter for even numbers and odd numbers.

context.getCounter(Driver.Counter.EVEN_COUNT).increment(1);

This line is used to increment a counter for an even number, and same there is a line to increment a counter for an odd number.

Here is the counter we are getting when we run for the input file mentioned above.

Output

Conclusion

After reading this you will be able to understand what is counter and what are the various type of counters present in MapReduce. After that, it is implemented in Java.

If you like this blog post, you can like it and comment. If you have any doubts regarding implementation feel free to comment, and share the blog with your friends and colleague.

You can connect with me on social media profiles like LinkedIn, Twitter, and Instagram.

LinkedIn – https://www.linkedin.com/in/abhishek-kumar-singh-8a6326148

Twitter- https://twitter.com/Abhi007si

Instagram- www.instagram.com/dataspoof

This Post Has 2 Comments

  1. rosary

    I ρay a quick visit everydаy some blogs and inf᧐rmation sites to read articles or reviews, еxcept this web site presents qualіty baѕed posts.

    1. Abhishek Singh

      Thanks for your feedback.

Comments are closed.