mapreduce-linux命令行

linux新建文件

建立wordcount.java文件:

  1. cd /home/hadoop/Documents
  2. touch wordcount.java

启动Hadoop

start-all.sh
stop-all.sh

1.WordCount

  1. cd ~/Documents/wordcount/
  2. javac -classpath /usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar -d ./ ./WordCount.java
  3. jar -cvf WordCount.jar ./*.class
  4. hdfs dfs -rm -r output
  5. hadoop jar ./WordCount.jar WordCount input output
  6. hdfs dfs -ls ./output
  7. hdfs dfs -cat output/*
  8. hdfs dfs -rm -r output

2.MatrixMultiply

  1. cd ~/Documents/Matrix1/
  2. javac -classpath /usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar -d ./ ./MatrixMultiply.java
  3. jar -cvf MatrixMultiply.jar ./*.class
  4. 查看脚本:sudo gedit ./genMatrix.sh
    生成两个矩阵(30x50,50x100):./genMatrix.sh 30 50 100
    将两个矩阵文件从本地发送至hdfs文件系统:(发送一次即可)
    hdfs dfs -put ./M_30_50 input
    hdfs dfs -put ./N_50_100 input
  5. hdfs dfs -rm -r output
  6. hadoop jar ./MatrixMultiply.jar MatrixMultiply input/M_30_50 input/N_50_100 output
  7. hdfs dfs -ls ./output
  8. hdfs dfs -cat output/*
  9. hdfs dfs -rm -r output

3.InvertedIndex

  1. cd ~/Documents/index/
  2. javac -classpath /usr/local/hadoop/share/hadoop/common/hadoop-common-2.9.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar -d ./ ./InvertedIndex.java
  3. jar -cvf InvertedIndex.jar ./*.class
  4. hdfs dfs -rm -r output
  5. 导入停词表到input文件夹:(导入一次即可)
    hdfs dfs -put ./stopwords.txt input
  6. hadoop jar ./InvertedIndex.jar InvertedIndex indexinput output
  7. hdfs dfs -ls ./output
  8. hdfs dfs -cat output/*
  9. hdfs dfs -rm -r output
  10. 创建indexinput文件夹:(创建一次即可)
    cd /usr/local/hadoop/bin
    hdfs dfs -mkdir indexinput
  11. 将文件导入:(导入一次即可)
    cd ~/Documents/index/
    hdfs dfs -put ./input/file1.txt indexinput
    hdfs dfs -put ./input/file2.txt indexinput
    hdfs dfs -put ./input/file3.txt indexinput
    hdfs dfs -put ./input/file4.txt indexinput
    hdfs dfs -put ./input/file5.txt indexinput
  12. hdfs dfs -rm -r indexinput
  • 详细文件:
    file1:Apache Spark Scala Hadoop Java C Python Do And Will KNN
    file2:SVM Scala News Play Akka Yes GBDT
    file3:LDA SVM RF GBDT Adaboost Kmeans KNN
    file4:QQ BAT I Great All LDA
    file5:Apache Hadoop MapReduce Git SVN SVM
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值