hadoop学习(五)

最新推荐文章于 2024-05-06 11:37:09 发布

uhippo

最新推荐文章于 2024-05-06 11:37:09 发布

阅读量700

点赞数

分类专栏：云计算文章标签： hadoop descriptor network access java thread

本文链接：https://blog.csdn.net/uhippo/article/details/6133190

版权

云计算专栏收录该内容

6 篇文章 0 订阅

订阅专栏

HDFS Details for Multimachine Clusters(2nd)

Checking the NameNodes
    ${JAVA_HOME}/bin/jps 结果第一行为java进程的pid
Checking the DataNodes
    bin/slaves.sh jps | grep Datanode | sort
    在查看过程中，如果有slave失败，则需要去那台机器上查看他们的日志文件。这样会不会造成管理员压力太大的问题？
    In fact, I had half of a new cluster fail to start, and it took some time to realize that the newly installed machines had a default firewall that         blocked the HDFS port.
    bin/hadoop dfsadmin -report 可以查看当前在线的datanode的部分信息
Tuning Factors
    most important factors are network bandwidth and disk throughput. Memory use and CPU overhead for thread handling may also be issues.
    The large input-split size reduces the ratio of task setup time to task run time.
    Set the maximum number of requests in progress. the more requests in progress, the more contention there is for storage operations and network bandwidth, with a corresponding increase in memory requirements and CPU overhead for handling all of the outstanding requests.
    Different factors per cluster.
File Descriptors (http://en.wikipedia.org/wiki/File_descriptor )
    Any user that runs processes that access HDFS should have a large limit on file descriptor access, and all applications that open files need careful         checking to make sure that the files are explicitly closed.