HDPCD-Java-复习笔记(1)

1.Understand Hadoop HDFS

Pig  --  A scripting language that simplifies the creation of MapReduce jobs and excels at exploring and transforming data.

Hive -- Provides SQL-like access to your Big Data.

HBase -- A Hadoop database.

Accumulo  -- A robust,  scalable,  high performance data storage and retrieval system built on Hadoop and Zookeeper.

Ambari  -- For provisioning, managing, and monitoring Apache Hadoop clusters.

Sqoop -- For efficiently transferring bulk data between Hadoop and relation databases.

Falcon -- A data processing and management solution for Hadoop , designed for data motion,coordination of data pipelines, life cycle management, and data discovery.

Oozie -- A workflow scheduler system to manage Apache Hadoop jobs.

Solr -- A standalone enterprise search server with a REST-like API.

Flume -- For efficiently collecting, aggregating, and moving large amounts of log data.

ZooKeeper -- An open-source server which enables highly reliable distributed coordination.

Mahout -- An Apache project whose goal is to build scalable machine learning libraries.


The ApacheHadoop 2.x project consists of the followingmodules:

Hadoop Common -- The utilities that provide support for the other Hadoop modules.

HDFS -- The Hadoop Distributed File System

YARN -- A framework for job scheduling and cluster resource management.

MapReduce -- For processing large data sets in a scalable and parallel fashion.


YARN splits up the functionality of the JobTracker in Hadoop 1.x into two separate processes:

ResourceManager -- A daemon process that allocates cluster resources to applications.

ApplicationMaster -- A per-application process that provides the runtime for executing applications.


Putting a file into HDFS involves the following steps:

1)A client application sends a request to the NameNode that specifies where they want to put the file in the file system.

2)The NameNode determines how the data is broken down into blocks and which DataNodes will be used to store those blocks. That information is given to the client application.

3)The client application communicates directly with each DataNode, writing the blocks onto the DataNode.

4)The DataNode then replicates the newly-created block to 2 others DataNodes (assuming the replication factor is 3).


The NameNode has the following characteristics:

It is the master of the DataNodes and executes file system namespace operations like opening, closing, and renaming files and directories. 

It determines the mapping of blocks to DataNodes and maintains the file system namespace.


The NameNode performs these tasks by maintaining two files:

fsimage_N -- Contains the entire file system namespace, including the mapping of blocks to files and file system properties.

edits_N -- A transaction log that persistently records every change that occurs to file system metadata.


The  DataNodes are  responsible for:

Handling read and write requests from application  clients.

Performing block creation, deletion, and replication upon instruction from the NameNode.

Sending heartbeats to the NameNode.

Sending a Blockreport to the NameNode.


Overview  of HDFS High Availability(NameNode HA)

Quorum Journal Manager

All Namespace modifications are logged durably to a majority of the JournalNode daemons (hence the name quorum).

As the Standby Node sees the edits in the JournalNodes, it applies them to its own namespace.

Configuring Automatic Failover

ZKFailoverController(ZKFC) -- A new component that is a ZooKeeper client that monitors and manages the state of a NameNode.


HDFS Commands

ls, du, count, chgrp, chown, chmod, stat, cat, text ,tail, get, copyFromLocal, put, copyToLocal, getmerge, mv, cp, mkdir, rm, rm -R, touchz

test -- Checks if a file exists.

expunge -- Empties the user’s Trash folder.


The Hadoop Filesystem API

  • Configuration conf = new Configuration();
  • Path dir = new Path("results");
  • FileSystem fs = FileSystem.get(conf);
  • if(!fs.exists(dir)) {
  • fs.mkdirs(dir);
  • }


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值