In this post I would like to explain the meaning of the Hadoop counters (the ones which you can generally see after the job completion). I have been analyzing the starvation of long running jobs in our relatively small cluster and Hadoop counters were of extreme importance in this investigation. Unfortunantely I could not find any resource which would explain in detail the meaning of those. In the table presented below, I am trying to describe in clear way what each of the counters means in
Hadoop 2.6
release.
Counter Name | Counter Display Name | Detailed explanation |
---|---|---|
File System Counters | ||
FILE_BYTES_READ | FILE: Number of bytes read | Amount of data read from local filesystem. |
FILE_BYTES_WRITTEN | FILE: Number of bytes written | Amount of data written to local filesystem. |
FILE_READ_OPS | FILE: Number of read operations | Number of read operations from local filesystem. |
FILE_LARGE_READ_OPS | FILE: Number of large read operations | Number of read operations of large files from local filesystem (the ones which does not fit entirely into memory). |
FILE_WRITE_OPS | FILE: Number of write operations | Number of write operations from local filesystem. |
HDFS_BYTES_READ | HDFS: Number of bytes read | Amount of data read from HDFS. |
HDFS_BYTES_WRITTEN | HDFS: Number of bytes written | Amount of data written to HDFS. |
HDFS_READ_OPS | HDFS: Number of read operations | Number of read operations from HDFS. |
HDFS_LARGE_READ_OPS | HDFS: Number of large read operations | Number of read operations of large files from HDFS (the ones which does not fit entirely into memory). |
HDFS_WRITE_OPS | HDFS: Number of write operations | Number of write operations to HDFS. |
Job Counters | ||
TOTAL_LAUNCHED_MAPS | Launched map tasks | Total number of launched map tasks. |
TOTAL_LAUNCHED_REDUCES | Launched reduce tasks | Total number of launched reduce tasks. |
DATA_LOCAL_MAPS | Data-local map tasks | Number of map tasks which were launched on the nodes containing required data. |
SLOTS_MILLIS_MAPS | Total time spent by all maps in occupied slots (ms) | Total time map tasks were executing. |
SLOTS_MILLIS_REDUCES | Total time spent by all reduces in occupied slots (ms) | Total time reduce tasks were executing. |
MILLIS_MAPS | Total time spent by all map tasks (ms) | Wall-time resources were occupied by mappers. |
MILLIS_REDUCES | Total time spent by all reduce tasks (ms) | Wall-time resources were occupied by reducers. |
VCORES_MILLIS_MAPS | Total vcore-seconds taken by all map tasks | Aggregated number of vCores that the mappers have allocated times the number of seconds the mappers have been running. |
VCORES_MILLIS_REDUCES | Total vcore-seconds taken by all reduce tasks | Aggregated number of vCores that the reducers have allocated times the number of seconds the reducers have been running. |
MB_MILLIS_MAPS | Total megabyte-seconds taken by all map tasks | Aggregated amount of memory (in megabytes) mappers have allocated times the number of seconds mappers have been running. |
MB_MILLIS_REDUCES | Total megabyte-seconds taken by all reduce tasks | Aggregated amount of memory (in megabytes) reducers have allocated times the number of seconds reducers has have running. |
Map-Reduce Framework | ||
MAP_INPUT_RECORDS | Map input records | Total number of records processed by all pf the mappers. Updated when the record is passed from the RecordReader to the mapper. |
MAP_OUTPUT_RECORDS | Map output records | Total number of records produced by by all of the mappers. Updated when the record is passed to OutputCollector. |
MAP_OUTPUT_BYTES | Map output bytes | Total amount of data produced by mappers (uncompressed). Updated when the record is passed to OutputCollector. |
MAP_OUTPUT_MATERIALIZED_BYTES | Map output materialized bytes | The amount of data which was actually written to disk (if the compression is enabled). |
SPLIT_RAW_BYTES | Amount of data consumed for metadata representation during splits. | |
COMBINE_INPUT_RECORDS | Combine input records | Total number of records processed by combiners(if implemented in the application). Updated every time when the value is read from combiner's iterator. |
COMBINE_OUTPUT_RECORDS | Combine output records | Total number of records produced by combiners(if implemented in the application). Updated when the record is passed to OutputCollector. |
REDUCE_INPUT_GROUPS | Reduce input groups | Total number of unique keys (the number of distinct key groups processed by all reducers). |
REDUCE_SHUFFLE_BYTES | Reduce shuffle bytes | |
REDUCE_INPUT_RECORDS | Reduce input records | Total number of records processed by all reducers. |
REDUCE_OUTPUT_RECORDS | Reduce output records | Total number of records produced by all reducers. |
SPILLED_RECORDS | Spilled Records | Total number of records (by mappers and reducers) which were spilled to disk (happens when there is not enough memory). |
SHUFFLED_MAPS | Shuffled Maps | Total number of mappers which undergone through shuffle phase. |
FAILED_SHUFFLE | Failed Shuffles | Total number of mappers which failed to undergo through shuffle phase. |
MERGED_MAP_OUTPUTS | Merged Map outputs | Total number of mapper output files undergone through shuffle phase. |
GC_TIME_MILLIS | GC time elapsed (ms) | Wall-time spent for Garbage Collection. |
CPU_MILLISECONDS | CPU time spent (ms) | Cumulative CPU time for all tasks. |
PHYSICAL_MEMORY_BYTES | Physical memory (bytes) snapshot | Total physical memory used by all tasks including spilled data. |
VIRTUAL_MEMORY_BYTES | Virtual memory (bytes) snapshot | Total virtual memory used by all tasks. |
COMMITTED_HEAP_BYTES | Total committed heap usage (bytes) | Total amount of memory available for JVM. |
Shuffle Errors | ||
BAD_ID | BAD_ID | Total number of errors related with the intepretations of IDs from shuffle headers (mapper ID for example). |
CONNECTION | CONNECTION | Source code does not reveal any usage for this counter. |
IO_ERROR | IO_ERROR | Total number of errors related with reading and writing intermediate data. |
WRONG_LENGTH | WRONG_LENGTH | Total number of errors relared with missbehaving compression and decompression of intermediate data. |
WRONG_MAP | WRONG_MAP | Total number of errors related to duplication of the mapper output data (when framework tries to process already processed mapper output). |
WRONG_REDUCE | WRONG_REDUCE | Total number of errors related to the attempts of shuffling data for wrong reducer (when shuffle for determined reducer tries to shuffle the data for different reducer). |
File Input Format Counters | ||
BYTES_READ | Bytes Read | Amount of data read by every tasks for every filesystem. |
File Output Format Counters | ||
BYTES_WRITTEN | Bytes Written | Amount of data written by every tasks for every filesystem. |
The sources of the information:
- https://www.mapr.com/blog/managing-monitoring-and-testing-mapreduce-jobs-how-work-counters
- http://liveramp.com/engineering/tracking-mapreduce-job-performance-with-counters/
- https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport.html
- http://hadoop.apache.org/docs/r1.0.4/releasenotes.html
- Hadoop sources - Fetcher.java class