2019.12.28 期末hadoop大数据复习

最新推荐文章于 2024-06-13 22:12:11 发布

qq_40330524

最新推荐文章于 2024-06-13 22:12:11 发布

阅读量1.6k

点赞数 6

文章标签： hadoop

本文链接：https://blog.csdn.net/qq_40330524/article/details/103748790

版权

1.”三驾马车”,它们分别是GFS、MapReduce、BigTable。
对应的hadoop里的HDFS、MapReduce、HBase
2.Hadoop中Namenode和datanode在不同服务器上安装方式叫完全分布式
3.HDFSshell使用start-all.sh命令来启动所有所需进程。
4.HDFS中NameNode和DataNode通过心跳机制保持通信。
5.Hadoop中通过dfs.replication配置副本数。
6.MapReduce编程中使用dfs.replication指定默认reduce。
7.Hadoop生态圈中主要核心的存储技术是HDFS，主要分析技术是MapReduce
8.HDFSshell使用hadoop fs -mkdir dir命令来创建目录
9.HDFS中存储元数据的节点是namenode，具体存储数据的节点是datanode。
10.MapReduce中负责拆分数据的是Map部分，负责数据聚合的是Reduce部分
11.运行Hadoop集群需要的守护进程有DataNode、NameNode、TaskTracker、JobTracker
12.数据倾斜是由于
1）Key分布不均匀
2）业务数据本身的特性
3）建表考虑不周
4）sql缺陷引起的
13.CSV 文件通常用于在 Hadoop 和外部系统之间交换数据，可以方便地用于从数据库到 Hadoop 或到分析数据库的批量加载，对模式评估的支持是有限的，因为新字段只能附加到记录的结尾，并且现有字段不能受到限制，不支持块压缩，因此压缩 CSV 文件会有明显的读取性能成本
14.基于 Hadoop 的数据中心的优点
1）随着数据量和复杂性的增加，提高了整体 SLA（即服务水平协议）
2）缩放数据仓库可能会很昂贵
3）探索新的渠道和线索
4）更好的灵活性
15.启动集群的步骤：
1）格式化NameNode
2）启动Namenode
3）启动datanode
4）Jps查看是否启动
16.mapreduce基本执行流程
1）首先对输入数据源进行切片
2）其次Master调度worker执行map任务
3）然后Master调度worker执行reduce任务
17.通常与namenode在一个节点启动的程序是Jobtraker，Jobtraker和namenode在一个节点，用户代码提交到集群后,由jobtracker决定哪个文件将被处理,并为不同的task分配节点，而文件存储信息的管理者是namenode，所以jobtracker一般要和namenode在同一个节点启动

1.大数据技术的特征多样、多价值、多数据
2.负责镜像备份以及日志合并的程序是secondaryNameNode
3.Hadoop2.x版本中的文件默认备份3份
4.hdfs-site.xml中的配置属性有fs.defaultFS、mapreduce.framework.name、yarn.resourcemanager.address
5.Hadoop-2.7.3集群中的HDFS的默认的数据块是128MB
6.HDFS的守护进程有secondarynamenode、datanode、namenode
7.大数据解决方案的关键步骤是提取数据、存储数据、处理数据
8.MapReduce编程模型中:Reducer组件是最后执行的
9.datanode在强制关闭或者非正常断电不会备份。
10.hdfs是由namenode、datanode组成。
11.hadoop的核心配置文件名称叫core-site.xml。
12.我们开发job时，可以根据需要编写reduce。
13.combiner执行在map之后。
14.linux查看文件内容可以用cat命令。
15.Block Size 可以修改的。
16.Slave 节点要存储数据，不是它的磁盘越大越好。

代码：
1）统计相同拼写的单词

Map
  public class WordMap extends Mapper<LongWritable,Text,Text,Text>
{
    protected void map(LongWritable key,Text value,Mapper<LongWritable,Text,Text,Text>.Context context)throws IOException,InterruptedException
    {
        String val = value.toString().trim();
        char[] vals = val.toCharArray();
        Arrays.sort(vals);
        String result = new String(vals);
        context.write(new Text(result),value);
    }
} 
Reduce
   public class WordRed extends Reducer<Text,Text,Tetx,Text>
{
    protected void reduce(Text key,Iterable<Text> value,Reducer<Text,Text,Text,Text>.Context context)throws IOException,InterruptedException
    {
        int count = 0;
        String str = "";
        for(Text vals:value)
        {
            str += " "+vals.toString();
            count++;
        }
        if(count>1)
        {
            context.write(key,new Text(str));
        }
    }
}

Job
	public class WordJob extends Configured implements Tool
{
    Configuration conf = new Configuration();
    conf.set("fs.defaultFS","hdfs://192.168.83.141:9000");
    Job job = Job.getInstance(conf,"WordTest");
    job.setJarByClass(WordJob.class);
    job.setMapperClass(WordMap.class);
    job.setReducerClass(WordRed.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    FileInputFormat.addInputPath(job,new Path("/SimpleWord/data.txt"));
    FileOutputFormat.setOutputPath(job,new Path("/SimpleWord/out"));
    job.waitForCompletion(true);
    return 0;
}
public static void main(String[] args)
{
    new WordJob().run(null);
}

2）wordcount 统计每个单词出现的次数

Map
	public class Map1 extends Mapper<LongWritable,Text,Text,IntWritable>
{
    protected void map(LongWritable key,Text value,Mapper<LongWritable,Text,Text,IntWritable>.Context context)throws IOException,InterruptedException{
        String str = value.toString();
        String[] vals = str.split(" ");
        for(String content : vals)
        {
            context.write(new Text(content),new IntWritable(1));
        }
    }
}

Reduce
	public class Reducer1 extends Reducer<Text,IntWritable,Text,IntWritable>{
    protected void reduce(Text key,Iterable<IntWritable> val,Reducer<Text,IntWritable,Text,IntWritable>.Context context)throws IOException,InterruptedException
    {
        int sum = 0;
        for(IntWritable vals : val)
        {
            sum += vals.get();
        }
        context.write(key,new IntWritable(sum));
    }
}
Job
     public class MyJob extends Configured implements Tool
 	{
     public int run(String[] args)throws Exception{
         Configuration conf = new Configuration();
         conf.set("fs.defaultFS","hdfs://192.168.83.141:9000");
         Job job = Job.getInstance(conf,"WorldCount");
         job.setJarByClass(MyJob.class);
         job.setMapperClass(Map1.class);
         job.setReducerClass(Reducer1.class);
         job.setOutputKeyClass(Text.class);
         job.setOutputValueClass(IntWritable.class);
         FileInputFormat.addInputPath(job,new Path("/neuedu/neuedu.txt"));
         FileOutputFormat.setOutputPath(job,new Path("/neuedu/out"));
         job.waitForCompletion(true);
         return 0;
     }
     public static void main(String[] args) throws Exception
     {
         new MyJob().run(null);
     }
 }

3）统计平均气温

Map
	public class WeaMap extends Mapper<LongWritable, Text,Text,IntWritable>.Context context)throws IOException,InterruptedException
{
    protected void map(LongWritable key,Text value,Mapper<LongWritable,Text,Text,IntWritable>.Context context)throws IOException,InterruptedException{
    FileSplit is = (FileSplit)context.getInputSplit();
    String name = is.getPath().getName().substring(5,10);
    String line = value.toString().substring(13,19).trim();
    int a = Integer.parseInt(line);
    if(a!=-9999)
    {
        context.write(new Text(name),new IntWritable(a));
    }  
    }
}
Reduce
    public class WeaRed extends Reducer<Text,IntWritable,Text,IntWritable>
{
    protected void reduce(Text key,Iterable<IntWritable> value,Reducer<Text,IntWritable,Text,IntWritable>.Context context)throws IOException,InterruptedException
    {
       int sum = 0;
       int count = 0;
       for(IntWritable is : value)
       {
           sum += is.get();
           count++;
       }
       inr avg = sum / count;
       context.write(new Text(key),new IntWritable(avg));
    }
}

Job
public class WeaTest extends Configured implements Tool
{
    public int run(Stirng[] args)throws Exception
    {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS","hdfs://192.168.83.141:9000");
        Job job = Job.getInstance(conf,"Weather Test");
        job.setJarByClass(WeaTest.class);
        job.setMapperClass(WeaMap.class);
        job.setReducerClass(WeaRed.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job,new Path("/weather/data");
        FileOutputFormat.setOutputPath(job,new Path("/weather/out"));
        job.waitForCompletion(true);
        return 0;
    }
}

qq_40330524

关注

6
点赞
踩
59

收藏

觉得还不错? 一键收藏
0
评论
2019.12.28 期末hadoop大数据复习

1.”三驾马车”,它们分别是GFS、MapReduce、BigTable。对应的hadoop里的HDFS、MapReduce、HBase2.Hadoop中Namenode和datanode在不同服务器上安装方式叫完全分布式3.HDFSshell使用start-all.sh命令来启动所有所需进程。4.HDFS中NameNode和DataNode通过心跳机制保持通信。5.Hadoop中通过d...
复制链接

扫一扫