mongodb是NoSQl领域里非常流行的一款非关系型数据库,提供了强大的分片存储与查询功能,用来做历史数据(日志)存储与查询比较适合,本身也提供了mapreduce功能,但是并不是任何时候Mongodb的使用者都会使用分片功能,更大的可能是使用副本集的方式(有时候机器并不多),而Hadoop提供了HDFS和分布式计算的功能,我们可以利用hadoop的MapReduce来取代Mongodb的MapReduce,用Mongodb的副本集来取代Hadoop的HDFS,那么就有了Hadoop与Mongodb之间的连接器(adapter)mongo-hadoop-master项目(目前在github上课可以下载到)
一 :下载地址:https://github.com/mongodb/mongo-hadoop
二: 下载之后解压:
- [root@bigdata2 software]# cd mongo-hadoop-master
- [root@bigdata2 mongo-hadoop-master]# ll
- total 140
- drwxr-xr-x 3 root root 4096 Oct 15 11:53 bin
- -rw-r--r-- 1 root root 5848 Oct 15 11:53 BSON_README.md
- drwxr-xr-x 4 root root 4096 Nov 30 13:06 build
- -rwxr-xr-x 1 root root 168 Oct 15 11:53 build-all.sh
- -rw-r--r-- 1 root root 12731 Oct 15 11:53 build.gradle
- drwxr-xr-x 2 root root 4096 Oct 15 11:53 clusterConfigs
- drwxr-xr-x 2 root root 4096 Oct 15 11:53 config
- -rw-r--r-- 1 root root 7458 Oct 15 11:53 CONFIG.md
- drwxr-xr-x 4 root root 4096 Nov 30 13:06 core
- drwxr-xr-x 6 root root 4096 Oct 15 11:53 docs
- drwxr-xr-x 7 root root 4096 Oct 15 11:53 examples
- drwxr-xr-x 3 root root 4096 Oct 15 11:53 flume
- drwxr-xr-x 3 root root 4096 Oct 15 11:53 gradle
- -rwxr-xr-x 1 root root 5080 Oct 15 11:53 gradlew
- -rw-r--r-- 1 root root 2314 Oct 15 11:53 gradlew.bat
- -rw-r--r-- 1 root root 1862 Oct 15 11:53 History.md
- drwxr-xr-x 3 root root 4096 Oct 15 11:53 hive
- drwxr-xr-x 3 root root 4096 Oct 15 11:53 integration-tests
- -rw-r--r-- 1 root root 6764 Oct 15 11:53 mongo-defaults.xml
- -rw------- 1 root root 4843 Nov 30 13:12 nohup.out
- drwxr-xr-x 3 root root 4096 Oct 15 11:53 pig
- -rw-r--r-- 1 root root 5106 Oct 15 11:53 README.md
- -rw-r--r-- 1 root root 137 Oct 15 11:53 settings.gradle
- drwxr-xr-x 5 root root 4096 Oct 15 11:53 streaming
- -rwxr-xr-x 1 root root 682 Oct 15 11:53 test.sh
- drwxr-xr-x 2 root root 4096 Oct 15 11:53 tools
- [root@bigdata2 mongo-hadoop-master]#
其中Example目录是自带的测试案例,我这里会采用mongo-hadoop-master/examples/treasury_yield 这个案例里面的src/main/resources/下面哦json数据
部分数据 写道
{ "_id" : { "$date" : 631238400000 }, "dayOfWeek" : "TUESDAY", "bc3Year" : 7.9, "bc5Year" : 7.87, "bc10Year" : 7.94, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.87, "bc3Month" : 7.83, "bc30Year" : 8, "bc1Year" : 7.81, "bc7Year" : 7.98, "bc6Month" : 7.89 }
{ "_id" : { "$date" : 631324800000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 7.96, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.94, "bc3Month" : 7.89, "bc30Year" : 8.039999999999999, "bc1Year" : 7.85, "bc7Year" : 8.039999999999999, "bc6Month" : 7.94 }
{ "_id" : { "$date" : 631411200000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 7.93, "bc5Year" : 7.91, "bc10Year" : 7.98, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.92, "bc3Month" : 7.84, "bc30Year" : 8.039999999999999, "bc1Year" : 7.82, "bc7Year" : 8.02, "bc6Month" : 7.9 }
{ "_id" : { "$date" : 631497600000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 7.94, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.9, "bc3Month" : 7.79, "bc30Year" : 8.06, "bc1Year" : 7.79, "bc7Year" : 8.029999999999999,
{ "_id" : { "$date" : 631324800000 }, "dayOfWeek" : "WEDNESDAY", "bc3Year" : 7.96, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.94, "bc3Month" : 7.89, "bc30Year" : 8.039999999999999, "bc1Year" : 7.85, "bc7Year" : 8.039999999999999, "bc6Month" : 7.94 }
{ "_id" : { "$date" : 631411200000 }, "dayOfWeek" : "THURSDAY", "bc3Year" : 7.93, "bc5Year" : 7.91, "bc10Year" : 7.98, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.92, "bc3Month" : 7.84, "bc30Year" : 8.039999999999999, "bc1Year" : 7.82, "bc7Year" : 8.02, "bc6Month" : 7.9 }
{ "_id" : { "$date" : 631497600000 }, "dayOfWeek" : "FRIDAY", "bc3Year" : 7.94, "bc5Year" : 7.92, "bc10Year" : 7.99, "bc20Year" : null, "bc1Month" : null, "bc2Year" : 7.9, "bc3Month" : 7.79, "bc30Year" : 8.06, "bc1Year" : 7.79, "bc7Year" : 8.029999999999999,