![](https://img-blog.csdnimg.cn/20201014180756724.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
Hadoop/Hypertable/Nutch系列
lovejuan1314
这个作者很懒,什么都没留下…
展开
-
Hypertable Installation on Ubuntu
[code="java"] Ubuntu 8.10 Intrepid Ibex 32-bit EC2: ami-5c709435 Basic Dependencies apt-get update apt-get install g++ cmake libboost-dev liblog4cpp5-dev git-core cronolog libgoogle-perftoo...2009-09-17 12:12:29 · 63 阅读 · 0 评论 -
Nutch-0.9 研究 Whole-web Crawling<一>
[code="java"] ## ### [b]Whole-web: Boostrapping the Web Database[/b] wget http://www.alliedquotes.com/mirrors/apache/lucene/nutch/nutch-0.9.tar.gz ## unzip tar xzvf nutch-0.9.tar.gz mv nutch-0...2009-08-17 17:38:23 · 86 阅读 · 0 评论 -
Hypertable 的建表及插入
[code="java"] //建立一个简单的People的表 hypertable> create table People (age,gender,weight,height); Elapsed time: 0.15 s //看看我们刚才建立成功的表 hypertable> describe table People -> ; ...2009-06-03 18:20:06 · 131 阅读 · 0 评论 -
Google BigTable 翻译 ---大表(Bigtable):结构化数据的分布存储系统
[size=medium][b]特别声明:[/b] 转自: [url]http://my.donews.com/eraera/category/%E6%8A%9B%E7%A0%96%E5%BC%95%E7%8E%89/[/url] 原文: [url]http://labs.google.com/papers/bigtable-osdi06.pdf[/url] {本文的翻译可能有不...原创 2009-05-21 10:31:40 · 207 阅读 · 0 评论 -
Hypertable Apache log
To compile this example, you will need to modify the following variables in the Makefile: [code="java"] BOOST_INCLUDE_DIR=/usr/local/include/boost-1_34_1 BOOST_THREAD_LIB=boost_thread-gcc41-mt HY...2009-04-28 16:49:51 · 59 阅读 · 0 评论 -
Hypertable 安装
####Basic Dependencies [code="java"] 1. rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm 2. yum install gcc-c++ make cmake python-devel bzip2-devel zlib-devel ...2009-04-28 16:46:44 · 89 阅读 · 0 评论 -
cloudera hadoop
install jdk with java 1. wget -O jdk-6u13-linux-i586-rpm.bin http://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_Developer-Site/en_US/-/USD/VerifyItem-Start/jdk-6u13-linux-i586-rpm.bin?Bundle...2009-04-28 16:45:30 · 140 阅读 · 0 评论 -
Hadoop 安装
[code="java"] 1. wget http://apache.freelamp.com/hadoop/core/hadoop-0.19.1/hadoop-0.19.1.tar.gz 2. tar xvzf hadoop-0.19.1.tar.gz 3. vi conf/hadoop-env.sh ###add JAVA_HOME parameter expor...2009-04-28 16:44:45 · 59 阅读 · 0 评论 -
关于Hypertable的设置属性
[code="java"] //查看Hypertable 的所有属性 /opt/hypertable/0.9.2.7/bin/ht shell --help-config // Optimize the memory problem Hypertable.RangeServer.MemoryLimit.Percentage=50 //要非常小心这个属性的设置,否则,Hyper...2009-10-28 11:00:21 · 103 阅读 · 0 评论 -
hypertable 0.9.2.7的一个错误
[code="java"] [root@localhost nutch]# /opt/hypertable/0.9.2.7/bin/hypertable /opt/hypertable/0.9.2.7/bin/hypertable: error while loading shared libraries: libboost_iostreams-gcc41-mt-1_37.so.1.37...2009-09-28 15:47:58 · 94 阅读 · 0 评论 -
Nutch 研究<三> 将Nutch爬取结果放入Hypertable
想把Nutch抓取的web page结果放入到Hypertable中去,目前思路主要有三个: 1. 修改Nutch源代码,让Nutch基于Hypertable工作,可以参考Hbase的实现. 由于该实现缺失Nutch好多特性,而且不易升级,考虑作罢. 2. 将Nutch抓取结果以命令导出为text的dump文件,然后用MapReduce解析该文件,哪相关信息到Hypertable. ...2009-09-22 17:12:40 · 109 阅读 · 0 评论 -
Hadoop Safe mode
[code="java"] hadoop fs -rmr /hypertable rmr: org.apache.hadoop.dfs.SafeModeException: Cannot delete /hypertable. Name node is in safe mode. #### turn off hadoop safe mode firstly hadoop dfsadmi...2009-09-17 15:59:35 · 189 阅读 · 0 评论 -
Nutch-0.9 研究 Whole-web Crawling<二>
Nutch 得到Related Link以及动态内容 1. vi conf/crawl-urlfilter.txt [code="java"] #+[?*!@=] # 添加接受链接带? = &字符的 # accept URLs containing certain characters as probable queries, etc. +[?=&] ## 抓取程序链...2009-09-09 19:10:59 · 100 阅读 · 0 评论