hadoop笔记

hadoop shell 命令:
给用户授权: hdfs dfs -chmod -R 755 /
修改所有者权限:hdfs d fs -chown -R larry /
hdfs很多个小文件上传,压缩的好处:namenode中存储了各个文件所在block的位置(该信息常驻内存),多个小文件会对文件查找性能造成影响。

不同的hadoop用户拥有的权限不同,看到目录下的文件结构也不一样,root用户不一定具有所有权限。

hadoop可以操作本地文件

查看jar包包含了那些class文件
jar -ft  spark-examples-1.6.0-hadoop2.6.0.jar

执行hadoop jar包:
hadoop jar mahout-examples-0.11.2-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

默认 hdfs dfs -put 单线程上传本地文件
hdfs dfs -get file 下载文件到本地

DistCp分布式集群到集群的拷贝,可以用来本地多线程上传数据。参考: https://hadoop.apache.org/docs/r1.0.4/cn/distcp.html
bash$ hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo


spark运行在yarn上,查看运行进度:
> yarn application -list | grep SPARK
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1468582953415_1281732 Spark shell SPARK recsys root.bdp_jmart_recsys.recsys_szda RUNNING UNDEFINED 10% http://172.18.149.130:36803
application_1468582953415_1275117 sparksql SPARK mart_risk root.bdp_jmart_risk.bdp_jmart_risk_hkh RUNNING UNDEFINED 10% http://172.18.143.152:5396
> yarn application -status application_1468582953415_1275117 查看一个任务的状态

yarn收集日志: yarn logs -applicationId <application ID>
可以收集应用程序的运行日志,但是必须应用程序运行完才能查看(运行完yarn才聚合日志),同时必须开启日志聚合功能(默认是不开启的),修改yarn.log-aggregation-enable为true.

使用yarn命令查看执行任务情况:
> yarn application -list
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):2
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1470800211073_0030 17778-thriftserver SPARK hadoop root.default RUNNING UNDEFINED 10% http://192.168.177.78:4041
application_1470800211073_0021 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 SPARK hadoop root.default RUNNING UNDEFINED 10% http://192.168.177.77:4040 (spark任务会有tracking-url)
使用yarn rest api查看任务执行情况:
>curl --compressed -H "Accept: application/json" -X GET " http://BDS-TEST-002:8088/ws/v1/cluster/apps/application_1470800211073_0030"
{
"app": {
"id": "application_1470800211073_0030",
"user": "hadoop",
"name": "17778-thriftserver",
"queue": "root.default",
"state": "RUNNING",
"finalStatus": "UNDEFINED",
" progress": 10,
"trackingUI": "ApplicationMaster",
"trackingUrl": "http://BDS-TEST-002:8088/proxy/application_1470800211073_0030/",
"diagnostics": "",
"clusterId": 1470800211073,
"applicationType": "SPARK",
"applicationTags": "",
"startedTime": 1471260225117,
"finishedTime": 0,
"elapsedTime": 64056828,
"amContainerLogs": "http://BDS-TEST-002:8042/node/containerlogs/container_1470800211073_0030_01_000001/hadoop",
"amHostHttpAddress": "BDS-TEST-002:8042",
"allocatedMB": 7168,
"allocatedVCores": 3,
"runningContainers": 3,
"memorySeconds": 459101287,
"vcoreSeconds": 192149,
"preemptedResourceMB": 0,
"preemptedResourceVCores": 0,
"numNonAMContainerPreempted": 0,
"numAMContainerPreempted": 0
}
}
url格式为: http://{http address of service}/ws/{version}/{resourcepath}
另外,需要去yarn-site.xml中查看resource manager的地址,下面的配置中有两个webapp,是由于使用zk HA。
<property>
<name> yarn.resourcemanager.webapp.address.rm1</name>
<value>BDS-TEST-001:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>BDS-TEST-002:8088</value>
</property>


hadoop HA

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值