ENABLING DEBUG LOGGING – EMR MASTER GUIDE

最新推荐文章于 2022-03-21 09:52:11 发布

玉羽凌风

最新推荐文章于 2022-03-21 09:52:11 发布

阅读量443

点赞数

分类专栏： AWS

原文链接：https://aws.mannem.me/?p=1585

版权

AWS 专栏收录该内容

41 篇文章 2 订阅

订阅专栏

Contains different configurations and procedures to enable logging on different daemons on AWS EMR cluster.
[Please contribute to this article to add additional ways to enable logging]

HBASE on S3 :

{

"Classification": "hbase-log4j",

"Properties": {

"log4j.logger.com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.request" : "DEBUG",

"log4j.logger.com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.latency" : "ERROR"

}

This will enable calls made from EMRFS from HBASE.

Important to troubleshoot S3 consistency issues and failures for HBASE on S3 cluster.

Enabling DEBUG on Hive Metastore daemon (its Datastore) on EMR :

vim /etc/hive/conf/hive-log4j2.properties

status = DEBUG

name = HiveLog4j2

logger.DataNucleus.name = DataNucleus

logger.DataNucleus.level = DEBUG

logger.Datastore.name = Datastore

logger.Datastore.level = DEBUG

sudo stop hive-hcatalog-server

- sudo start hive-hcatalog-server

{

"Classification": "hive-log4j2",

"Properties": {

"logger.Datastore.level" : "DEBUG",

"logger.DataNucleus.level" : "INFO"

}

Logs at /var/log/hive/user/hive/hive.log

HUE:

use_get_log_api=true in the beeswaxsection of the hue.ini configuration file.

Hadoop and MR :

[

{

"classification": "hadoop-log4j",

"properties": {

"log4j.logger.com.amazonaws": "DEBUG",

"log4j.logger.com.amazonaws.http.AmazonHttpClient": "DEBUG",

"log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem": "DEBUG",

"log4j.logger.emr": "DEBUG",

"hadoop.root.logger": "DEBUG,console"

}

{

"Classification": "mapred-env",

"Configurations": [

{

"Classification": "export",

"Configurations": [],

"Properties": {

"HADOOP_MAPRED_ROOT_LOGGER": "DEBUG,DRFA"

}

"Properties": {}

}

]

Enable GC verbose on Hive Server 2 JVM:

#!/bin/bash

# This script sources the hive-env.sh on EMR 4.x.x/5.x.x with custom HADOOP_OPTS for hiveserver2 and restarts the HS2 process.

# These options enable GC verbose on HS2 which gets logged to /var/log/hive/hive-server2.out.

# On OOM , just before HS2 gets killed(with kill -9 command) , it also issues a heap dump(to /var/log/hive path).

echo '' | sudo tee --append /usr/lib/hive/conf/hive-env.sh

echo 'if [ "$SERVICE" = "hiveserver2" ]

then

export HADOOP_OPTS="$HADOOP_OPTS -server -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/hive"

fi' | sudo tee --append /usr/lib/hive/conf/hive-env.sh

echo "stopping HS2"

sudo stop hive-server2

sleep 5;

echo "starting HS2"

sudo start hive-server2

WIRE OR DEBUG logging on EMR to check calls to S3 and DDB for DynamoDb connector library :

Paste the following on log4j configurations of Hadoop / hive / spark etc.

/etc/hadoop/conf/log4j.properties
/etc/hadoop/conf/container-log4j.properties
/etc/hive/conf/hive-log4j2.properties
/etc/spark/conf/..

log4j.rootCategory=DEBUG, stdout

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p %t %c{2} - %m%n

log4j.logger.org.apache.hadoop.hive=ALL

https://github.com/awslabs/emr-dynamodb-connector/blob/master/emr-dynamodb-hive/src/test/resources/log4j.properties

Debug on S3 Calls from EMR HIVE :

These metrics can be obtained from the hive.log when enabling debug logging in aws-java-sdk. To enable this logging, add the following line to '/etc/hive/conf/hive-log4j.properties'. The Configuration API can be used as well.

1	log4j.logger.com.amazonaws=DEBUG

To count the number of HEAD calls made execute:

$ cat /var/log/hive/tmp/hadoop/hive.log | grep "Sending Request: HEAD" | wc -l

To count the number of GET calls made execute:

$ cat /var/log/hive/tmp/hadoop/hive.log | grep "Sending Request: GET" | wc -l

Enable DEBUG logging for Http Connection pool:

(from spark) by adding the following to /etc/spark/conf/log4j.properties

1	log4j.logger.com.amazon.ws.emr.hadoop.fs.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager=DEBUG

*Tez overwrites the loglevel options we have passed. Please see the related items.*

{

"Classification": "tez-site",

"Properties": {

"tez.task.log.level" : "DEBUG",

"tez.am.log.level" : "DEBUG",

"tez.root.logger" : "DEBUG,CLA",

"tez.task-specific.launch.cmd-opts" : "-Dlog4j.configuration=log4j.properties"

}

Enabling Debug on Hadoop log to log calls by EMRFS :

/etc/hadoop/conf/log4j.properties

hadoop.root.logger=DEBUG,console

# Jets3t library

log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=DEBUG

# AWS SDK & S3A FileSystem

log4j.logger.com.amazonaws.http.AmazonHttpClient=DEBUG

log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN

log4j.logger.com.amazonaws=DEBUG

log4j.logger.com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.request=DEBUG

log4j.logger.com.amazon.ws.emr.hadoop.fs.s3.lite.handler.RequestIdLogger=DEBUG #logs the AWS request id and S3 "extended request id" from each response

You can use same logging config for other Application like spark/hbase using respective log4j config files as appropriate. You can also use EMR log4j configuration classification like hadoop-log4j or spark-log4j to set those config’s while starting EMR cluster.(see below for sample JSON for configuration API)

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html

DEBUG on EMR Logpusher Logs :

Edit this file on Master / Slave’s manually and restart Logpusher.

/etc/logpusher/logpusher-log4j.properties

log4j.rootLogger=DEBUG,DRFA

log4j.threshhold=INFO

log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender

log4j.appender.DRFA.File=/emr/logpusher/log/logpusher.log

log4j.appender.DRFA.DatePattern=.yyyy-MM-dd-HH

log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout

log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %t: %m%n

log4j.logger.org.apache.commons.httpclient.contrib.ssl.AuthSSLX509TrustManager=INFO

log4j.logger.aws157.instancecontroller=DEBUG

[ec2-user@ip-10-1-2-175 ~]$ sudo service logpusher stop

Stopped process in pidfile `/emr/logpusher/run/logpusher.pid' (pid 13677).

[ec2-user@ip-10-1-2-175 ~]$ sudo service logpusher status

Running [ OK ]

(Might need to stop Service-nanny before stopping Logpusher, to properly stop/start Logpusher)

DEBUG on Spark classes :

Use the following EMR config to set DEBUG level for relevant class files.

[

{

"Classification": "hadoop-log4j",

"Properties": {

"log4j.logger.org.apache.spark.network.crypto" : "DEBUG"

}

{

"Classification": "spark-log4j",

"Properties": {

"log4j.logger.org.apache.spark.network.crypto" : "DEBUG"

}

]

DEBUG using spark shell:

Execute the following commands after invoking spark-shell to enable DEBUG logging on respective spark classes like Memstore. You can use the same if you want to reduce the amount of logging from INFO (which is default coming from log4j.properties in the spark conf ) to ERROR.

import org.apache.log4j.{Level, Logger}

Logger.getLogger("org.apache.spark.storage.BlockManagerMasterEndpoint").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.storage.BlockManagerInfo").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.storage.BlockManagerMaster").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.storage.MemoryStore").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.scheduler.cluster.YarnScheduler").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.scheduler.cluster.YarnSchedulerBackend").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.scheduler.DAGScheduler").setLevel(Level.DEBUG)

Logger.getLogger("org.apache.spark.scheduler.TaskSetManager").setLevel(Level.DEBUG) // good for progress tracking

Logger.getLogger("org.apache.spark.ContextCleaner").setLevel(Level.DEBUG)

Logger.getLogger("akka.remote.ReliableDeliverySupervisor").setLevel(Level.DEBUG)

EMRFS CLI command like EMRFS SYNC :

/etc/hadoop/conf/log4j.properties

1	hadoop.root.logger=DEBUG,console

Logs will be on the console out. We might need to redirect to a File or do both.

Enable Debug on Boto3 client :

# Create the boto3 client

import boto3

import json

import logging

logging.basicConfig(level=logging.DEBUG)

logger = logging.getLogger()

client = boto3.client('sagemaker-runtime')

endpoint_name = "my-model" # Your endpoint name.

payload = {"data":"this makes me ", "k": 5} # Payload for inference.

response = client.invoke_endpoint(EndpointName=endpoint_name, ContentType='application/json', Body=json.dumps(payload))

response_body = response['Body']

print(response_body.read())

玉羽凌风

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ENABLING DEBUG LOGGING – EMR MASTER GUIDE

Contains different configurations and procedures to enable logging on different daemons on AWS EMR cluster.[Please contribute to this article to add additional ways to enable logging]HBASE on S3 :...
复制链接

扫一扫

专栏目录