ssh 免密登录:
vi /etc/ssh/sshd
删掉RSAAuthentication yes
PubKeyAuthentication yes
格式化HDFS:
hdfs namenode -format
Hadoop启动后,通过浏览器(IP_address:50070)可打开web GUI的hadoop
http://IP_Address:8088 可以查看个集群相关信息
sqoop 实战操作和问题解决: http://www.cnblogs.com/avivaye/p/6197123.html http://www.dataguru.cn/thread-577912-1-1.html
https://www.coursera.org/notifications
kafka : http://www.mincoder.com/article/3942.shtml
hadoop 执行示例程序:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduced/hadoop-mapreduce-example-2.8.0.jar wordcount hdfs:/input hdfs://output
Spark执行示例程序:
local: cd spark/bin
# ./run-example SparkPi 1000 (spark://master:7077) // 1000为迭代次数
on Cluster:
./spark-submit --master spark://master:7077 --class org.apache.spark.examples.SparkPi --executor-memory=512M ../lib/spark-example-xx-hadoop.2.x.jar (../examples/jars/xxx) 1000
运行方式:
spark-shell
pyspark
sparkR
Flume:
$bin/flume-ng agent --conf conf --conf-file conf/flume-conf.properties --name producer -Dflume.root.logger=INFO,console
kafka:
- > bin/zookeeper-server-start.shconfig/zookeeper.properties
- > bin/kafka-server-start.shconfig/server.properties
./bin/kafka-topics.sh --create --zookeeper master:2181 --partitions 1 --replication-factor 1 --topic test0
./bin/kafka-topics.sh --list --zookeeper master:2181
bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test0 // 查看topic test0的状态信息,
- leader:负责处理消息的读和写,leader是从所有节点中随机选择的.
- replicas:列出了所有的副本节点,不管节点是否在服务中.
- isr:是正在服务中的节点.
./bin/kafka-console-producer.sh --broker-list master:9092 --topic test0 // 命令行中通过生产者向test0的topic中写数据,即除了flume中向test0写外,命令行中也在写
./bin/kafka-console-consumer.sh --zookeeper master:2181 --topic test0 --from-beginning //消费test0的topic中的数据
storm:
./bin//storm nimbus > /dev/null 2>&1 &
./bin/storm supervisor > /dev/null 2>&1 &
./bin/storm logviewer >/dev/null 2>&1 &
./bin/storm ui > /dev/null 2>&1 & // storm.yaml 中配置ui.port=8090,默认是8080,与spark_webui_port重了
http://master:8090可登录查看storm ui集群情况
nimbus, supervisor, logviewer, ui 可分散到集群中不同的机器上启动执行