hadoop 学习笔记(1)

翻译 2015年11月17日 20:22:01

在Coursera上学习的一门课程  :

Hadoop Platform and Application Framework

by University of California, San Diego

https://www.coursera.org/learn/hadoop/home/welcome

里面讲得很好,就是我这边的网下不下来一个cloudera的软件,我也正在学习中,对于HADOOP的了解很有帮助。接下来记一些笔记。


Lesson1:

Common:libraries and utilities

Yarn :enhancesde power of a Hadoop compute cluster ,a resource-management platform,scheduling.

Mapreduce:a programming model for large scale data processing.

HDFS:Hadoop Distributed File System(Hadoop分布式文件系统):



week2:
Yarn, Tez and Spark:都是framework

HDFS2:storage layer
YARN: essentially the basic execution engine in the next generation of Hadoop
Hbase ,other apps: work though on YARN


week3:
HDFS:
1.Introduction to HDFS:
HDFS Design Concept:
• Scalable distributed filesystem
• Distribute data on local disks on several nodes
• Low cost commodity hardware

HDFS Design Factors :
• Hundreds/Thousands of nodes => • Need to handle node/disk failures
• Portability across heterogeneous hardware/software
• Handle large data sets
• High throughput

Approach to meet HDFS design goals:
• Simplified coherency model – write once read many.
• Data Replication – helps handle hardware failures
• Move computation close to data
• Relax POSIX requirements – increase throughput

2.HDFS Architecture and Configuration:
Summary of HDFS Architecture
• Single NameNode - a master server that manages the file system namespace and regulates access to files by clients.
• Multiple DataNodes – typically one per node in the cluster. Functions:
• Manage storage
• Serving read/write requests from clients
• Block creation, deletion, replication based on instructions from NameNode

Performance Envelope of HDFS :
• Able to determine number of blocks for a given file size
• Key HDFS and system components impacted by block size
• Impact of small files on HDFS and system

Default block size is 64MB
10GB = 10 X 1024.  blocks = 10 X 1024/64 =160 bolcks.
3.Read / Write process in HDFS:

另外附上课里面一个学生区域的统计 :可以发现印度学生真的真的很多,北美的学习者也很多,我们的学习还要努力啊!


hadoop学习笔记1.使用shell和JAVA API操作HDFS

我们接着上一节来,上次我们已经把伪分布式环境搭建好了。 OK,我们先来测试一下 ,跑跑自带的例子wordcount 1.运行wordcount测试MapReduce admi...

Hadoop学习笔记——1.java读取Oracle中表的数据,创建新文件写入Hdfs

在编写mapreduce应用程序时,首先要解决的就是把应用系统中的数据先进行整理以文本文件的方式存储到hdfs上,或者将数据整理后保存到其他mapreduce支持的数据源上(如HBase),本文主要是...

YARN(Hadoop)学习笔记(1)

一、    JDK环境 1. 下载JDK 由于Hadoop分布式平台框架是根据Java编写的,所以需要有JDK的支持。 从ORACLE官网上下载JDK。这里需要注意的是下载和OS配套的版本。一般...

Hadoop学习笔记1:伪分布式环境搭建

在搭建Hadoop环境之前,请先阅读如下博文,把搭建Hadoop环境之前的准备工作做好,博文如下:              1、CentOS 6.7下安装JDK , 地址:http://blog.c...

HADOOP学习笔记----------------------(1)

ubuntu虚拟机,带有gitlab和jenkins等测试环境:链接: http://pan.baidu.com/s/1jIhljyI 密码: z39m                Hadoop说明...

Hadoop学习笔记(1)-基本概念

Most of us would be specialized in one or a few specific area/layer as an engineer. Technically it i...

hadoop学习笔记(1)

本人目前在学习hadoop,想把自己学习内容总结一下,新手第一次写博客,请各位大侠轻喷 下面开始进入正题,学习过程中用的是hadoop的伪分布式, 下面是环境配置 jdk:1.6 hadoop...
  • fsc2016
  • fsc2016
  • 2015年05月02日 11:32
  • 237

Hadoop学习笔记1:伪分布式环境搭建

Hadoop学习笔记1:伪分布式环境搭建虚拟机:Ubuntu16.0,jdk1.8.0_111,hadoop-2.7.3 hadoop2.7.3下载: http://apache.fayea.co...

Hadoop学习笔记(1)

Hadoop学习笔记(1)

hadoop学习笔记(1) 序列化

简介 序列化和反序列化就是结构化对象和字节流之间的转换,主要用在内部进程的通讯和持久化存储方面。 hadoop中定义了两个序列化相关的接口:Writable和Comparable,以下分别介绍: ...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:hadoop 学习笔记(1)
举报原因:
原因补充:

(最多只允许输入30个字)