Day5实习日记

学习MapReduce相关知识

分区
在MapReduce执行当中,有一个默认的步骤就是
partitioner分区,默认情况下,分区的作用就是将相同的数据
发送到同一个ReduceTask里面去。在MapReduce中有一个抽
象类叫做Partitioner,默认使用的实现类是HashPartitioner。
在MapReduce中,分区是从0开始的,有多少个分区,就对
应了多少个ReduceTask任务,每个ReduceTask任务会输出一
个结果文件part-r-0000x在这里插入图片描述


分区的特点

  • 在MapReduce中,分区默认是从0开始依次递增
  • 在MapReduce中,每一个分区需要启动对应一个ReduceTask(Reduce任务),每一个Reduce Task都会对应一个结果文件。
  • 如果不指定分区,默认使用的分区类时HashPartitioner,也就是对键取hashcode值后,转换成正数,然后对设定的ReduceTask数量取余。默认的numReduceTasks =1,所以取余的结果一定是0,一定都在一个分区上面,最后也只会产生一个结果文件。

自定义分区
自定义类继承

  • org.apache.hadoop.mapreduce.Partitioner,重写getPartition()方法

  • 在job驱动中,设置自定义partitioner
    job.setPartitionerClass(自定义分区类.class);

  • 自定义partition后,要根据自定义partitioner的逻辑设置相
    应数量的reduce task
    job.setNumReduceTasks(reduce个数);

  • 分区个数=Reduce Task(Reduce任务)个数=结果文件个数


Shuffle
Shuffle是MapReduce处理流程中的一个核心,整体来看,分
为3个操作:

  • 分区(Partitioner):决定了Map输出的数据会被哪个reduce进行处理。reduce个数是由Partitioner个数决定。每个Partitioner的数据由对应的一个reduce来处理。
  • 排序:根据key排序
  • Combiner:进行局部value的合并,目的是降低网络数据传
  • 5
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Day 1: Today is my first day of computer internship. I am excited to learn and gain more experience. My mentor introduced me to the company’s software and hardware tools. Day 2: I learned the basics of programming languages like C++ and Python. My mentor guided me through some coding exercises to make sure I understood the concepts. Day 3: Today, I was introduced to database management systems. I learned how to use SQL to query, update and manage data in a relational database. Day 4: I spent the day working on a project using Java programming language. I learned how to implement object-oriented programming concepts and how to use Java libraries. Day 5: I learned about web development and how to create a simple website using HTML and CSS. I also learned about responsive design and how to make a website compatible with different devices. Day 6-10: I spent these days working on a group project. We developed a simple mobile application using Android Studio. I learned how to design user interfaces and how to integrate different functionalities into the app. Day 11-15: I worked on a project that involved data analysis using Python. I learned how to use libraries like NumPy, Pandas and Matplotlib to analyze and visualize data. Day 16-18: I worked on a cybersecurity project. I learned about different types of attacks and how to prevent them. I also learned how to use penetration testing tools. Day 19: I spent the day improving my coding skills by practicing coding challenges on different online platforms. Day 20: Today is my last day of internship. I am grateful for the opportunity to learn and gain experience. I am now more confident in my coding skills and ready to take on more challenging projects.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值