1.刷固件
出厂的Cubieboard是Android系统,需要Linux系统安装Hadoop,到此网址下载:
http://dl.cubieboard.org/software/a20-cubietruck/lubuntu/
①.使用PhoenixSuit一键刷机,并选择 lubuntu 系统镜像
②.电脑一方先接上 USB 线,将Cubieboard电源,电池全部拔除,按住Cubieboard FEL 按钮(此按钮在 RESET 按钮的边上)不放,此时将另外一端的 mini USB 和 Cubieboard 连接,这时候会弹出一个强制升级的提示对话框,然后就可以松开 FEL 按钮了。
③.提示对话框上选 Yes 开始刷系统。
④.刷完系统之后,拿掉和电脑连接的 USB 线,然后接上电源和网线。
2.系统配置
①.使用 linaro 用户登录上去,设置 root 的密码:
$ sudo passwd root
②.cubieboard nand 重新分区扩容
安装分区工具 nand-part (sunxi-tools)
#apt-get install git
#apt-get install build-essential
#apt-get install pkg-config libusb-1.0
#git clone https://github.com/linux-sunxi/sunxi-tools.git
#cd sunxi-tools
#make all
现在我们查看一下 nandflash:
# ls /dev/nand* -l
brw-rw---- 1 root disk 93, 0 Jan 1 2010 /dev/nand
brw-rw---- 1 root disk 93, 1 Jan 1 2010 /dev/nanda
brw-rw---- 1 root disk 93, 2 Jan 1 2010 /dev/nandb
brw-rw---- 1 root disk 93, 3 Jan 1 2010 /dev/nandc
这里的 nand 表示了整个 nandflash,nanda、nandb、nandc 则为其 3 个分区,其中:
nanda 中包含 bootlogo、script.bin、uEnv.txt 等
nandb 中为 rootfs
nandc 有 5G 左右的空间,我觉得把它合并到 nandb 似乎是一个好的想法。敲击命令 nand-part 大概能看到如下信息(只列出主要部分):
partition 1: class = DISK, name = bootloader, partition start = 32768, partition size = 131072 user_type=0
partition 2: class = DISK, name = rootfs, partition start = 163840, partition size = 4194304 user_type=0
partition 3: class = DISK, name = UDISK, partition start = 4358144, partition size = 10584064 user_type=0
我们可以看到各个分区的大小,这样我们就可以重新规划一下:
# nand-part -f a20 /dev/nand 32768 'bootloader 131072' 'rootfs 14778368'
此命令执行后输出:
ready to write new partition tables:
mbr: version 0x00000200, magic softw411
2 partitions
partition 1: class = DISK, name = bootloader, partition start = 32768, partition size = 131072 user_type=0
partition 2: class = DISK, name = rootfs, partition start = 163840, partition size = 14778368 user_type=0
我们看到 bootloader(nanda)的大小未发生变化,rootfs(nandb)和 UDISK(nandc)合并了(4194304 + 10584064 = 14778368)。然后,我们重启一下系统,再敲击命令来完成 nandb 的扩展:
# resize2fs /dev/nandb
需要说明的是,这个重分区的过程不会破坏任何数据的。
处理完 nand 就可以开始处理我的 HDD 硬盘了。使用命令 fdisk 来查看 HDD 硬盘是否存在,执行 fdisk -l
分区fdisk /dev/sda
格式化mkfs.ext4 /dev/sda1
挂载mount /dev/sda1 /data
配置启动时挂载vim /etc/fstab
/dev/sda1 /data ext4 defaults 1 2
允许root用户SSH远程登录
安装OpenSSH server:
1. 使用apt命令安装openssh server
$ sudo apt-get install openssh-server
2. 可以对 openssh server进行配置
$ sudo vi /etc/ssh/sshd_config
找到PermitRootLogin no一行,改为PermitRootLogin yes
3. 重启 openssh server
$ sudo service ssh restart
4. 客户端如果是ubuntu的话,则已经安装好ssh client,可以用下面的命令连接远程服务器。
$ ssh xxx.xxx.xxx.xxx
如果是windows系统的话,可以使用SSH Secure Shell等ssh软件进行远程连接。
3.安装Hadoop
Java
vim ~/.bashrc
export JAVA_HOME=/usr/lib/java/jdk1.7.0_71
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
export PATH=${JAVA_HOME}/bin:/usr/local/hadoop/hadoop-2.2.0/bin:$PATH
source ~/.bashrc
hadoop-env.sh
export JAVA_HOME=/usr/lib/java/jdk1.7.0_71
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
/etc/hostname
/etc/hosts
ssh-keygen -t rsa -P ""
root@m1:/home/hadoop# scp -r root@m2:/root/.ssh/id_rsa.pub ~/.ssh/m2.pub
root@m1:/home/hadoop# scp -r root@s1:/root/.ssh/id_rsa.pub ~/.ssh/s1.pub
root@m1:/home/hadoop# scp -r root@s2:/root/.ssh/id_rsa.pub ~/.ssh/s2.pub
root@m1:/home/hadoop# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# cat ~/.ssh/m2.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# cat ~/.ssh/s1.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# cat ~/.ssh/s2.pub >> ~/.ssh/authorized_keys
root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@m2:~/.ssh/
root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@s1:~/.ssh/
root@m1:/home/hadoop# scp -r ~/.ssh/authorized_keys root@s2:~/.ssh/
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
<description>The name of the default file system</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.2.0/tmp</value>
<description>A base for other temporary directories</description>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hadoop-2.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hadoop-2.2.0/dfs/data</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
slaves
slave001
slave002
scp -r hadoop-2.2.0/ hadoop@slave001:/usr/local/hadoop/
scp -r hadoop-2.2.0/ hadoop@slave002:/usr/local/hadoop/
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ scp -r /usr/lib/java/jdk1.7.0_71/ hadoop@slave001:/usr/lib/java/
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ scp -r /usr/lib/java/jdk1.7.0_71/ hadoop@slave002:/usr/lib/java/
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop namenode -format
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ ./start-dfs.sh
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ jps
3197 NameNode
3387 SecondaryNameNode
4236 Jps
hadoop@slave001:/usr/lib/java$ jps
6129 DataNode
6199 Jps
hadoop@slave002:/usr/lib/java$ jps
5229 DataNode
5301 Jps
http://10.6.4.226:50070/dfshealth.jsp
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ ./start-yarn.sh
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ jps
3197 NameNode
3387 SecondaryNameNode
4557 Jps
4310 ResourceManager
hadoop@slave001:/usr/lib/java$ jps
6129 DataNode
6377 NodeManager
6492 Jps
hadoop@slave002:/usr/lib/java$ jps
5229 DataNode
5478 NodeManager
5592 Jps
http://10.6.4.226:8088/cluster
http://10.6.4.227:8042/node
http://10.6.4.228:8042/node
/usr/local/hadoop/hadoop-2.2.0/sbin# ./mr-jobhistory-daemon.sh start historyserver
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/sbin$ jps
3197 NameNode
3387 SecondaryNameNode
4609 JobHistoryServer
4310 ResourceManager
4665 Jps
http://10.6.4.226:19888/jobhistory
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop fs -mkdir -p /data/wordcount
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop fs -mkdir -p /output/
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop fs -put ../etc/hadoop/*.xml /data/wordcount/
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop fs -ls /data/wordcount
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /data/wordcount /output/wordcount
hadoop@master:/usr/local/hadoop/hadoop-2.2.0/bin$ hadoop fs -cat /output/wordcount/part-r-00000 |head
Eclipse开放环境配置
将hadoop-eclipse-plugin-2.2.0.jar放入/opt/eclipse/plugins
重启eclipse
cd /opt/eclipse
./eclipse
Location Name:可以任意其,标识一个"Map/Reduce Location"
Map/Reduce Master
MasterPort:9001
DFS Master
Use M/R Master host:前面的勾上。(因为我们的NameNode和JobTracker都在一个机器上。)
Port:9000
User name:hadoop
接着点击"Advanced parameters"从中找见"hadoop.tmp.dir",修改成为我们Hadoop集群中设置的地址,我们的Hadoop集群是"/home/hadoop/tmp",这个参数在"core-site.xml"进行了配置。
要关注下面几个参数: fs.defualt.name:与core-site.xml里fs.default.name设置一致。 mapred.job.tracker:与mapred-site.xml里面mapred.job.tracker设置一致。
dfs.replication:与hdfs-site.xml里面的dfs.replication一致。
hadoop.tmp.dir:与core-site.xml里hadoop.tmp.dir设置一致。
hadoop.job.ugi:并不是设置用户名与密码。是用户与组名,所以这里填写hadoop,hadoop。
说明:第一次设置的时候可能是没有hadoop.job.ugi和dfs.replication参数的,不要紧,确认保存。打开Project Explorer中DFS Locations目录,应该可以年看到文件系统中的结构了。
至此开发环境配置完毕。
HDFS与MapReduce开发测试。
1.HDFS操作测试
1.1创建项目
1.2创建类
将core-site.xml 、hdfs-site.xml放入/home/hadoop/workspace/HDFSTest/bin工程目录下
写入代码
package com.hdfs;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class DFSOperator {
private static final String ROOT_PATH ="hdfs://";
private static final int BUFFER_SIZE=4096;
public DFSOperator(){}
public static boolean createFile(String path,boolean overwrite) throws IOException{
Configuration conf= new Configuration();
FileSystem fs=FileSystem.get(conf);
Path f=new Path(ROOT_PATH+path);
fs.create(f,overwrite);
fs.close();
return true;
}
public static boolean deleteFile(String path,boolean recursive) throws IOException{
Configuration conf=new Configuration();
FileSystem fs=FileSystem.get(conf);
Path f=new Path(ROOT_PATH+path);
fs.delete(f, recursive);
fs.close();
return true;
}
public static String readDFSFileToString(String path) throws IOException{
Configuration conf =new Configuration();
FileSystem fs=FileSystem.get(conf);
Path f=new Path(ROOT_PATH+path);
InputStream in=null;
String str=null;
StringBuilder sb=new StringBuilder(BUFFER_SIZE);
if(fs.exists(f)){
in=fs.open(f);
BufferedReader bf=new BufferedReader(new InputStreamReader(in));
while ((str=bf.readLine())!=null) {
sb.append(str);
sb.append("\n");
}
in.close();
bf.close();
fs.close();
return sb.toString();
}
else {
return null;
}
}
public static boolean writeStringToDFSFile(String path,String string) throws IOException{
Configuration conf=new Configuration();
FileSystem fs=FileSystem.get(conf);
FSDataOutputStream os=null;
Path f=new Path(ROOT_PATH+path);
os=fs.create(f,true);
os.writeBytes(string);
os.close();
fs.close();
return true;
}
public static void main(String[] args) {
try {
DFSOperator.createFile("/hadoop/test1.txt", true);
DFSOperator.deleteFile("/hadoop/test1.txt", true);
DFSOperator.writeStringToDFSFile("/hadoop/test1.txt", "u u u good man.\nReally?\n");
System.out.println(DFSOperator.readDFSFileToString("/hadoop/test1.txt"));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
运行
这段代码主要操作是对hdfs文件的创建、删除、以及读取。
2.Mapreduce操作测试
2.1wordcount编写。
新建MapreduceTest项目
同样需要将
新建MapClass类
package com.test.mr;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MapClass extends Mapper<Object, Text, Text, IntWritable> {
public Text keyText=new Text("key");
public IntWritable intValue=new IntWritable(1);
@Override
protected void map(Object key, Text value,
Mapper<Object, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
String str=value.toString();
StringTokenizer stringTokenizer=new StringTokenizer(str);
while (stringTokenizer.hasMoreTokens()) {
keyText.set(stringTokenizer.nextToken());
System.out.print("mapkey"+keyText.toString());
System.out.println("mapvalue"+intValue);
context.write(keyText, intValue);
}
}
}
新建ReduceClass类
package com.test.mr;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class ReduceClass extends Reducer<Text, IntWritable, Text, IntWritable>{
public IntWritable intValue=new IntWritable(0);
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
int sum=0;
int count=0;
while (values.iterator().hasNext()) {
sum+=values.iterator().next().get();
count+=1;
}
System.out.println("sum:"+sum);
System.out.println("count:"+count);
intValue.set(sum);
context.write(key, intValue);
}
}
新建MR类
package com.test.mr;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class MR {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// TODO Auto-generated method stub
Configuration conf=new Configuration();
String[] otherArgs=new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length!=2) {
System.err.println("Usage:wordcount <int> <out>");
System.exit(2);
}
Job job=new Job(conf, "word count");
job.setJarByClass(MR.class);
job.setMapperClass(MapClass.class);
job.setReducerClass(ReduceClass.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0:1);
}
}
运行代码
Hbase搭建
1.Hbase安装
进入master解压hbase-0.98.9-hadoop2-bin.tar.gz至/usr/local/hadoop
1.1配置hbase-env.sh
进入/usr/local/hadoop/hbase-0.98.9-hadoop2/conf
vim hbase-env.sh
修改JAVA_HOME如下
export JAVA_HOME=/usr/lib/java/jdk1.7.0_71
1.2配置regionservers
这里的内容要与hadoop的slaves一致
master
slave1
slave2
1.3配置hbase-site.xml
添加内容如下
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>master:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hadoop/zookeeper</value>
</property>
</configuration>
1.4将slave1、slave2上Hbase安装
将master上的Hbase通过scp命令传输到slave1和slave2上
scp /usr/local/hadoop/hbase-0.98.9-hadoop2/* hadoop@slave1:/usr/local/hadoop/hbase-0.98.9-hadoop2/
scp /usr/local/hadoop/hbase-0.98.9-hadoop2/* hadoop@slave2:/usr/local/hadoop/hbase-0.98.9-hadoop2/
1.5配置profile
修改三个节点的profile
vim /etc/profile
export JAVA_HOME=/usr/lib/java/jdk1.7.0_71
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.2.0
export HBASE_HOME=/usr/local/hadoop/hbase-0.98.9-hadoop2
export PATH=${JAVA_HOME}/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$PATH
使profile生效
source vim /etc/profile
1.6开启Hbase
进入master
切换到hadoop用户
su - hadoop
start-hbase.sh
使用jps命令查看master slave1 slave2进程
Master进程如下
Slave1进程如下
2.Hbase基本操作
2.1执行hbsae shell命令,进入hbase控制台
2.2显示创建的表
输入list命令,如果正常执行,表示hbase启动成功
Test为之前创建的表
操作 | 命令表达式 |
创建表 | create 'table_name, 'family1','family2','familyN' |
添加记录 | put 'table_name', 'rowkey', 'family:column', 'value' |
查看记录 | get 'table_name, 'rowkey' |
查看表中的记录总数 | count 'table_name' |
删除记录 | delete 'table_name' ,'rowkey' , 'family:column' |
deleteall 'table_name','rowkey' | |
删除一张表 | 先 disable 'table_name' |
再 drop 'table_name' | |
查看所有记录 | scan "table_name" ,很危险 最好加LIMIT : scan 'table_name',LIMIT=>10 |
查看某个表某个列中所有数据 | scan "table" , {COLUMNS =>['family1:','family2' VERSIONS=2]} VERSIONS 选填 |