一 目标
虚拟机安装ubuntu14.04(64位),然后安装hadoop 2.6.0(伪分布),pig、hive和mahout,用作开发调试。
二 安装
1. 配置ssh
ssh-keygen -t rsa
cd ~/.ssh
cat id_dsa.pub >> ~/.ssh/authorized_keys
2.软件准备
Jdk和mysql-server 直接用apt-get 安装
sudo apt-get install openjdk-7-jre
sudo apt-get install openjdk-7-jdk
sudo apt-get install mysql-server
hadoop-2.6.0.tar.gz
pig-0.15.0.tar.gz
apache-hive-1.1.1-bin.tar.gz
apache-mahout-distribution-0.9.tar.gz
mysql-connector-java-5.1.39.tar.gz
synthetic_control.data
3.设置环境变量
将软件解压缩,拷贝到/usr/local目录下,编辑.bashrc增加以下设置
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export PIG_HOME=/usr/local/pig
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/
export HIVE_HOME=/usr/local/hive
export HIVE_CLASSPATH=/$HADOOP_HOME/etc/hadoop/
export MAHOUT_HOME=/usr/local/mahout
export MAHOUT_CONFI_DIR=/usr/loca/mahout/conf
export PATH=.:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PIG_HOME/bin:$HIVE_HOME/bin:$MAHOUT_HOME/bin:$PATH
检查mysql的安装
sudo /etc/init.d/mysql status
检查java运行情况
java -version
4.配置hadoop伪分布
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
初始化
hadoop namenose -format
运行hdfs和yarn
start-dfs.sh 和 start-yarn.sh
检查运行状态
jps
用浏览器查看
http://localhost:50070
5.配置pig
不需要专门配置,用以下命令验证是否可用
hdfs dfs -put /etc/passwd /user/oliver/passwd
pig -x mapreduce
A = load 'passwd' using PigStorage(':');
B = foreach A generate $0 as id;
dump B
6.配置hive
生成配置文件
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
编辑hive-env.sh
exportHADOOP_HOME=/usr/local/hadoop
export HIVE_CONF_DIR=/usr/local/hive/conf
编辑hive-site.xml
<property>
<name>javax.jdo.option.ConnectionURL </name>
<value>jdbc:mysql://localhost:3306/hive </value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName </name>
<value>com.mysql.jdbc.Driver </value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword </name>
<value>hive </value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/tmp/hive </value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/tmp/hive</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
复制jar文件
cp mysql-connector-java-5.1.39-bin.jar /usr/local/hive/lib
cp jline-2.12.jar /usr/local/hadoop/share/hadoop/yarn/lib
pig不能用jline-2.12.jar,需换回原来的包
建库
insert into mysql.user(Host,User,Password) values("localhost","hive",password("hive"));
create database hive;
grant all on hive.* to hive@'%' identified by 'hive';
grant all on hive.* to hive@'localhost' identified by 'hive';
flush privileges;
用hive的命令初始化数据库
schematool -dbType mysql –initSchema
检查数据库
mysql –uhive –phive
use hive
show tables
启动metastore服务,启动正常则表示安装好了
hive -service metastore
7.配置mahout
将软件解压后拷贝到/usr/local/mahout,设置环境变量即可。
下载数据并进行测试
wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
hdfs dfs -mkdir /testdata
hdfs dfs -put ./synthetic_control.data /testdata
hadoop jar /usr/local/mahout/mahout-examples-0.9-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
至此,开发调试环境安装完毕。