系统提纲
280、使用 Java API 在 HDFS 的根目录下创建/tmp/demo1 目录
282、按部门号对员工信息进行分区
283、下列是词频统计实验中的 Main 类,按提示补全代码
295、使用 Java Api 访问 HBase,按提示补充代码
280. 使用 Java API 在 HDFS 的根目录下创建/tmp/demo1 目录
package com.myhdfs.mypro;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class CreateDir {
public static void main(String[] args) throws Exception{
Configuration conf=new Configuration();
//配置 NameNode 地址
URI uri=new URI("hdfs://192.168.229.141:8020");
//指定用户名,获取 FileSystem 对象
FileSystem fs=FileSystem.get(uri,conf,"dy");
//设置路径
Path dfs=new Path("/");
//列出指定路径下的目录和文件
FileStatus[] fileStatuses = new FileStatus[0];
fileStatuses = fs.listStatus(dfs);
for (FileStatus fileStatus : fileStatuses) {
//System.out.println(fileStatus);
if (fileStatus.isDirectory()){
System.out.println("dir:"+fileStatus.getPath());
}else {
System.out.println("file:"+fileStatus.getPath());
}
}
//创建级联目录
fs.mkdirs(new Path("/tmp/demo1"));
System.out.println("Mkidrs Successfully");
}
}
282. 按部门号对员工信息进行分区
有 emp.csv 文件,内容如下:
7369,SMITH,CLERK,7902,1980/12/17,800,20
7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30
7521,WARD,SALESMAN,7698,1981/2/22,1250,500,30
7566,JONES,MANAGER,7839,1981/4/2,2975,20
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981/5/1,2850,30
7782,CLARK,MANAGER,7839,1981/6/9,2450,10
7788,SCOTT,ANALYST,7566,1987/4/19,3000,20
7839,KING,PRESIDENT,1981/11/17,5000,10
7844,TURNER,SALESMAN,7698,1981/9/8,1500,0,30
7876,ADAMS,CLERK,7788,1987/5/23,1100,20
7900,JAMES,CLERK,7698,1981/12/3,950,30
7902,FORD,ANALYST,7566,1981/12/3,3000,20
7934,MILLER,CLERK,7782,1982/1/23,1300,10
从前往后 | 字段分别为 |
---|---|
EMPNO | 员工 ID |
ENAME | 员工名称 |
JOB | 职位 |
MGR | 直接领导的员工 ID |
HIREDATE | 雇佣时间 |
SAL | 工资 |
COMM | 奖金 |
DEPTNO | 部门号 |
序列化,创建 Employee 类型(员工信息)
package com.mytest.myMapReduce;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
public class Employee implements Writable{
// 定义员工信息表的各属性
private int empno;
private String ename;
private String job;
private int mgr;
private String hiredate;
private int sal;
private int comm;
private int deptno;
// 在 Employee 类中重写 toString()方法以构造出 Reduce 所要的输出
@Override
public String toString() {
return empno+","+ename+","+job+","+mgr+","+hiredate+","+sal+","+comm+","+deptno;
}
// 反序列化(将字节流中的内容读取出来赋给对象)
public void readFields(DataInput input) throws IOException {
this.empno = input.readInt();
this.ename = input.readUTF();this.job = input.readUTF(