Cloudera Certified Administrator forApache Hadoop (CCA-500)
Number of Questions: 60 questions
Time Limit: 90 minutes
Passing Score: 70%
Language: English, Japanese

Exam Sections and Blueprint

1. HDFS (17%)

  • Describe the function of HDFS daemons

  • Describe the normal operation of an Apache     Hadoop cluster, both in data storage and in data processing

  • Identify current features of computing systems     that motivate a system like Apache Hadoop

  • Classify major goals of HDFS Design

  • Given a scenario, identify appropriate use     case for HDFS Federation

  • Identify components and daemon of an HDFS     HA-Quorum cluster

  • Analyze the role of HDFS security (Kerberos)

  • Determine the best data serialization choice     for a given scenario

  • Describe file read and write paths

  • Identify the commands to manipulate files in     the Hadoop File System Shell

2. YARN and MapReduce version 2 (MRv2)(17%)

  • Understand how upgrading a cluster from Hadoop     1 to Hadoop 2 affects cluster settings

  • Understand how to deploy MapReduce v2 (MRv2 /     YARN), including all YARN daemons

  • Understand basic design strategy for MapReduce     v2 (MRv2)

  • Determine how YARN handles resource     allocations

  • Identify the workflow of MapReduce job running     on YARN

  • Determine which files you must change and how     in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce     version 2 (MRv2) running on YARN

3. Hadoop Cluster Planning (16%)

  • Principal points to consider in choosing the     hardware and operating systems to host an Apache Hadoop cluster

  • Analyze the choices in selecting an OS

  • Understand kernel tuning and disk swapping

  • Given a scenario and workload pattern,     identify a hardware configuration appropriate to the scenario

  • Given a scenario, determine the ecosystem     components your cluster needs to run in order to fulfill the SLA

  • Cluster sizing: given a scenario and frequency     of execution, identify the specifics for the workload, including CPU,     memory, storage, disk I/O

  • Disk Sizing and Configuration, including JBOD     versus RAID, SANs, virtualization, and disk sizing requirements in a     cluster

  • Network Topologies: understand network usage     in Hadoop (for both HDFS and MapReduce) and propose or identify key     network design components for a given scenario

4. Hadoop Cluster Installation andAdministration (25%)

  • Given a scenario, identify how the cluster     will handle disk and machine failures

  • Analyze a logging configuration and logging     configuration file format

  • Understand the basics of Hadoop metrics and     cluster health monitoring

  • Identify the function and purpose of available     tools for cluster monitoring

  • Be able to install all the ecoystme components     in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue,     Cloudera Manager, Sqoop, Hive, and Pig

  • Identify the function and purpose of available     tools for managing the Apache Hadoop file system

5. Resource Management (10%)

  • Understand the overall design goals of each of     Hadoop schedulers

  • Given a scenario, determine how the FIFO     Scheduler allocates cluster resources

  • Given a scenario, determine how the Fair     Scheduler allocates cluster resources under YARN

  • Given a scenario, determine how the Capacity     Scheduler allocates cluster resources

6. Monitoring and Logging (15%)

  • Understand the functions and features of     Hadoop’s metric collection abilities

  • Analyze the NameNode and JobTracker Web UIs

  • Understand how to monitor cluster daemons

  • Identify and monitor CPU usage on master nodes

  • Describe how to monitor swap and memory     allocation on all nodes

  • Identify how to view and manage Hadoop’s log     files

  • Interpret a log file

 

 

 

 

 

 

 

 

CCA Spark and Hadoop Developer Exam(CCA175)

Number of Questions: 10–12performance-based (hands-on) tasks on CDH5 cluster. See below for full clusterconfiguration

Time Limit: 120 minutes

Passing Score: 70%

Language: English, Japanese (forthcoming)

Required Skills

Data Ingest

The skills to transfer data between external systemsand your cluster. This includes the following:

  • Import data from a MySQL database into HDFS     using Sqoop

  • Export data to a MySQL database from HDFS     using Sqoop

  • Change the delimiter and file format of data     during import using Sqoop

  • Ingest real-time and near-real time (NRT)     streaming data into HDFS using Flume

  • Load data into and out of HDFS using the     Hadoop File System (FS) commands

Transform, Stage, Store

Convert a set of data values in a given format storedin HDFS into new data values and/or a new data format and write them into HDFS.This includes writing Spark applications in both Scala and Python:

  • Load data from HDFS and store results back to     HDFS using Spark

  • Join disparate datasets together using Spark

  • Calculate aggregate statistics (e.g., average     or sum) using Spark

  • Filter data into a smaller dataset using Spark

  • Write a query that produces ranked or sorted     data using Spark

Data Analysis

Use Data Definition Language (DDL) to create tables inthe Hive metastore for use by Hive and Impala.

  • Read and/or create a table in the Hive     metastore in a given schema

  • Extract an Avro schema from a set of datafiles     using avro-tools

  • Create a table in the Hive metastore using the     Avro file format and an external schema file

  • Improve query performance by creating     partitioned tables in the Hive metastore

  • Evolve an Avro schema by changing JSON files

以上,有疑问可加Q1438118790询问