大数据技术-实验05-MapReduce实践【实测可行】

MapReduce实践(实测可行)

实践一 MapReduce任务

1)拷贝02-上机实验/ds.txt到客户端机器/opt目录下。

# hadoop fs -put ds.txt /user/root/ds.txt

# hadoop fs -ls /user/root

Found 1 items

-rw-r--r--   3 root supergroup       9135 2015-05-29 19:49 /user/root/ds.txt

2)拷贝hadoop的安装目录的MapReduce Example的jar包到/opt目录下。

# sudo cp ~/local/opt/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar /opt

# ls /opt/hadoop-mapreduce*

/opt/hadoop-mapreduce-examples-2.6.0.jar

[hadoop@master hadoop-2.6.0]$ hadoop jar /opt/hadoop-mapreduce-examples-2.6.0.jar wordcount /user/root/ds.txt /user/root/ds_out

17/04/16 12:21:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/04/16 12:21:32 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.164.5:8032

17/04/16 12:21:35 INFO input.FileInputFormat: Total input paths to process : 1

17/04/16 12:21:36 INFO mapreduce.JobSubmitter: number of splits:1

17/04/16 12:21:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492366567239_0001

17/04/16 12:21:45 INFO impl.YarnClientImpl: Submitted application application_1492366567239_0001

17/04/16 12:21:46 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1492366567239_0001/

17/04/16 12:21:46 INFO mapreduce.Job: Running job: job_1492366567239_0001

17/04/16 12:22:35 INFO mapreduce.Job: Job job_1492366567239_0001 running in uber mode : false

17/04/16 12:22:35 INFO mapreduce.Job:  map 0% reduce 0%

17/04/16 12:23:26 INFO mapreduce.Job:  map 100% reduce 0%

17/04/16 12:23:45 INFO mapreduce.Job:  map 100% reduce 100%

17/04/16 12:23:46 INFO mapreduce.Job: Job job_1492366567239_0001 completed successfully

17/04/16 12:23:46 INFO mapreduce.Job: Counters: 49

         File System Counters

                   FILE: Number of bytes read=10341

                   FILE: Number of bytes written=231931

                   FILE: Number of read operations=0

                   FILE: Number of large read operations=0

                   FILE: Number of write operations=0

                   HDFS: Number of bytes read=9230

                   HDFS: Number of bytes written=9375

                   HDFS: Number of read operations=6

                   HDFS: Number of large read operations=0

                   HDFS: Number of write operations=2

         Job Counters

                   Launched map tasks=1

                   Launched reduce tasks=1

                   Data-local map tasks=1

                   Total time spent by all maps in occupied slots (ms)=44983

                   Total time spent by all reduces in occupied slots (ms)=14322

                   Total time spent by all map tasks (ms)=44983

                   Total time spent by all reduce tasks (ms)=14322

                   Total vcore-seconds taken by all map tasks=44983

                   Total vcore-seconds taken by all reduce tasks=14322

                   Total megabyte-seconds taken by all map tasks=46062592

                   Total megabyte-seconds taken by all reduce tasks=14665728

         Map-Reduce Framework

                   Map input records=240

                   Map output records=240

                   Map output bytes=9855

                   Map output materialized bytes=10341

                   Input split bytes=95

                   Combine input records=240

                   Combine output records=240

                   Reduce input groups=240

                   Reduce shuffle bytes=10341

                   Reduce input records=240

                   Reduce output records=240

                   Spilled Records=480

                   Shuffled Maps =1

                   Failed Shuffles=0

                   Merged Map outputs=1

                   GC time elapsed (ms)=873

                   CPU time spent (ms)=23610

                   Physical memory (bytes) snapshot=301469696

                   Virtual memory (bytes) snapshot=1954639872

                   Total committed heap usage (bytes)=136450048

         Shuffle Errors

                   BAD_ID=0

                   CONNECTION=0

                   IO_ERROR=0

                   WRONG_LENGTH=0

                   WRONG_MAP=0

                   WRONG_REDUCE=0

         File Input Format Counters

                   Bytes Read=9135

         File Output Format Counters

                   Bytes Written=9375

4)查看任务的输出。

# hadoop fs -cat /user/root/ds_out/part-r-00000

16.75481160342442,0.5590169943749481   1

17.759065824032646,0.6708203932499373 1

17.944905786933322,0.5852349955359809 1

18.619213022043585,0.5024937810560444 1

18.664436259885097,0.7433034373659246 1

……

实践二 Hbase MapReduce命令

1)拷贝02-上机实验/user.csv到客户端机器/opt目录下,并上传至HDFS。

# hadoop fs -put /opt/user.txt /user/root/user.csv

# hadoop fs -ls /user/root/user.csv

Found 1 items

-rw-r--r--   3 root supergroup       8393 2015-08-13 11:04 /user/root/user.csv

2)使用HBase Shell新建user表。

hbase(main):001:0> create 'user','info'

0 row(s) in 1.2520 seconds

=> Hbase::Table - user

3)运行MapReduce任务,导入数据。

①生成HFile

#export HADOOP_CLASSPATH=$HBASE_HOME/lib/*:classpath

# hadoop jar $HBASE_HOME/lib/hbase-server-1.0.3.jar importtsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=/user/root/hbase_tmp -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age user /user/root/user.csv

15/08/13 11:16:50 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0xdcfda20 connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:host.name=master.example.com

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.6.0/etc/hadoop:/opt/hadoop-2.6.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/asm-3.2.jar……

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.6.0/lib/native

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:user.name=root

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root

15/08/13 11:16:50 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=90000 watcher=hconnection-0xdcfda200x0, quorum=master:2181, baseZNode=/hbase

15/08/13 11:16:50 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

15/08/13 11:16:50 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session

15/08/13 11:16:50 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14f2495fb0f000c, negotiated timeout = 90000

15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Looking up current regions for table user

15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Configuring 1 reduce partitions to match current region count

15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Writing partition information to /tmp/hadoop-root/partitions_aa02a3fe-23be-40a6-844f-cd2d64a38e92

15/08/13 11:16:52 INFO compress.CodecPool: Got brand-new compressor [.deflate]

15/08/13 11:16:52 INFO mapreduce.HFileOutputFormat2: Incremental table user output configured.

15/08/13 11:16:52 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService

15/08/13 11:16:52 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f2495fb0f000c

15/08/13 11:16:52 INFO zookeeper.ZooKeeper: Session: 0x14f2495fb0f000c closed

15/08/13 11:16:52 INFO zookeeper.ClientCnxn: EventThread shut down

15/08/13 11:16:53 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.222.131:8032

15/08/13 11:16:55 INFO input.FileInputFormat: Total input paths to process : 1

15/08/13 11:16:55 INFO mapreduce.JobSubmitter: number of splits:1

15/08/13 11:16:55 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum

15/08/13 11:16:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439427807631_0002

15/08/13 11:16:56 INFO impl.YarnClientImpl: Submitted application application_1439427807631_0002

15/08/13 11:16:56 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1439427807631_0002/

15/08/13 11:16:56 INFO mapreduce.Job: Running job: job_1439427807631_0002

15/08/13 11:17:11 INFO mapreduce.Job: Job job_1439427807631_0002 running in uber mode : false

15/08/13 11:17:11 INFO mapreduce.Job:  map 0% reduce 0%

15/08/13 11:17:19 INFO mapreduce.Job:  map 100% reduce 0%

15/08/13 11:17:29 INFO mapreduce.Job:  map 100% reduce 100%

15/08/13 11:17:29 INFO mapreduce.Job: Job job_1439427807631_0002 completed successfully

15/08/13 11:17:29 INFO mapreduce.Job: Counters: 50

File System Counters

                   FILE: Number of bytes read=42188

                   FILE: Number of bytes written=356921

                   FILE: Number of read operations=0

                   FILE: Number of large read operations=0

                   FILE: Number of write operations=0

                   HDFS: Number of bytes read=8496

                   HDFS: Number of bytes written=44391

                   HDFS: Number of read operations=8

                   HDFS: Number of large read operations=0

                   HDFS: Number of write operations=3

         Job Counters

                   Launched map tasks=1

                   Launched reduce tasks=1

                   Data-local map tasks=1

                   Total time spent by all maps in occupied slots (ms)=5813

                   Total time spent by all reduces in occupied slots (ms)=11916

                   Total time spent by all map tasks (ms)=5813

                   Total time spent by all reduce tasks (ms)=5958

                   Total vcore-seconds taken by all map tasks=5813

                   Total vcore-seconds taken by all reduce tasks=5958

                   Total megabyte-seconds taken by all map tasks=5952512

                   Total megabyte-seconds taken by all reduce tasks=7888392

Map-Reduce Framework

                   Map input records=538

                   Map output records=538

                   Map output bytes=41106

                   Map output materialized bytes=42188

Input split bytes=103

                   Combine input records=538

                   Combine output records=538

                   Reduce input groups=538

                   Reduce shuffle bytes=42188

                   Reduce input records=538

                   Reduce output records=1076

                   Spilled Records=1076

                   Shuffled Maps =1

                   Failed Shuffles=0

                   Merged Map outputs=1

                   GC time elapsed (ms)=241

                   CPU time spent (ms)=4890

                   Physical memory (bytes) snapshot=476753920

                   Virtual memory (bytes) snapshot=5710151680

                   Total committed heap usage (bytes)=384303104

         ImportTsv

                   Bad Lines=0

         Shuffle Errors

                   BAD_ID=0

                   CONNECTION=0

                   IO_ERROR=0

                   WRONG_LENGTH=0

                   WRONG_MAP=0

                   WRONG_REDUCE=0

         File Input Format Counters

                   Bytes Read=8393

         File Output Format Counters

                   Bytes Written=44391

查看HDFS,即可看到生成的HFile:

# hadoop fs -ls -R /user/root/hbase_tmp

-rw-r--r--   3 root supergroup          0 2015-08-13 11:17 /user/root/hbase_tmp/_SUCCESS

drwxr-xr-x   - root supergroup          0 2015-08-13 11:17 /user/root/hbase_tmp/info

-rw-r--r--   3 root supergroup      44391 2015-08-13 11:17 /user/root/hbase_tmp/info/e8cf8a1ac70d40e2a985711dfb678cdd

②将HFile数据导入user表中。

# hadoop jar $HBASE_HOME/lib/hbase-server-1.0.3.jar completebulkload /user/root/hbase_tmp user

15/08/13 11:29:02 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x5ed731d0 connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:host.name=master.example.com

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_75

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.75.x86_64/jre

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-2.6.0/etc/hadoop:/opt/hadoop-2.6.0/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.6.0/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.6.0/share/ha……

:/opt/hbase-1.0.1.1/lib/hbase-thrift-1.0.1.1.jar:classpath:/opt/hadoop-2.6.0/contrib/capacity-scheduler/*.jar

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-2.6.0/lib/native

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-504.el6.x86_64

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:user.name=root

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:user.home=/root

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root

15/08/13 11:29:02 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=90000 watcher=hconnection-0x5ed731d00x0, quorum=master:2181, baseZNode=/hbase

15/08/13 11:29:02 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

15/08/13 11:29:02 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session

15/08/13 11:29:02 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14f2495fb0f000f, negotiated timeout = 90000

15/08/13 11:29:03 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4bee18dc connecting to ZooKeeper ensemble=localhost:2181

15/08/13 11:29:03 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181 sessionTimeout=90000 watcher=hconnection-0x4bee18dc0x0, quorum=master:2181, baseZNode=/hbase

15/08/13 11:29:03 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

15/08/13 11:29:03 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session

15/08/13 11:29:03 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14f2495fb0f0010, negotiated timeout = 90000

15/08/13 11:29:04 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://master:8020/user/root/hbase_tmp/_SUCCESS

15/08/13 11:29:04 INFO hfile.CacheConfig: CacheConfig:disabled

15/08/13 11:29:04 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile

=hdfs://master:8020/user/root/hbase_tmp/info/e8cf8a1ac70d40e2a985711dfb678cdd first=1 last=rowkey

15/08/13 11:29:05 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService

15/08/13 11:29:05 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f2495fb0f0010

15/08/13 11:29:05 INFO zookeeper.ZooKeeper: Session: 0x14f2495fb0f0010 closed

15/08/13 11:29:05 INFO zookeeper.ClientCnxn: EventThread shut down

4)查看HBase 中user表的数据

hbase(main):007:0>scan  'user'

……

99                    column=info:age, timestamp=1439435809424, value=57

99                    column=info:name, timestamp=1439435809424, value=user99     

 rowkey                column=info:age, timestamp=1439435809424, value=age         

 rowkey                column=info:name, timestamp=1439435809424, value=name       

538 row(s) in 2.2050 seconds

注意:若出现Class path包含了多个SLF4J的绑定。分别在目录:/usr/local/hbase/lib/usr/local/hadoop/hadoop-1.0.3/lib下,解决的办法很简单,只要移除其中一个SLF4J就行了。我们移除了/usr/local/hadoop/hadoop-1.0.3/lib下的slf4j-log4j12-1.4.3.jar包后,则不会再出现上述错误提示。

若出现找不到jar包,则直接使用/home/Hadoop/local/opt/hbase-1.0.3这样的路径,可以避免这一问题,也可以设置环境变量来解决。

  • 13
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值