Book Store
Table of Contents
Table of Contents
-
Chapter 1 Meet Hadoop
-
Data!
-
Data Storage and Analysis
-
Comparison with Other Systems
-
A Brief History of Hadoop
-
Apache Hadoop and the Hadoop Ecosystem
-
Hadoop Releases
-
-
Chapter 2 MapReduce
-
A Weather Dataset
-
Analyzing the Data with Unix Tools
-
Analyzing the Data with Hadoop
-
Scaling Out
-
Hadoop Streaming
-
Hadoop Pipes
-
-
Chapter 3 The Hadoop Distributed Filesystem
-
The Design of HDFS
-
HDFS Concepts
-
The Command-Line Interface
-
Hadoop Filesystems
-
The Java Interface
-
Data Flow
-
Data Ingest with Flume and Sqoop
-
Parallel Copying with distcp
-
Hadoop Archives
-
-
Chapter 4 Hadoop I/O
-
Data Integrity
-
Compression
-
Serialization
-
Avro
-
File-Based Data Structures
-
-
Chapter 5 Developing a MapReduce Application
-
The Configuration API
-
Setting Up the Development Environment
-
Writing a Unit Test with MRUnit
-
Running Locally on Test Data
-
Running on a Cluster
-
Tuning a Job
-
MapReduce Workflows
-
-
Chapter 6 How MapReduce Works
-
Anatomy of a MapReduce Job Run
-
Failures
-
Job Scheduling
-
Shuffle and Sort
-
Task Execution
-
-
Chapter 7 MapReduce Types and Formats
-
MapReduce Types
-
Input Formats
-
Output Formats
-
-
Chapter 8 MapReduce Features
-
Counters
-
Sorting
-
Joins
-
Side Data Distribution
-
MapReduce Library Classes
-
-
Chapter 9 Setting Up a Hadoop Cluster
-
Cluster Specification
-
Cluster Setup and Installation
-
SSH Configuration
-
Hadoop Configuration
-
YARN Configuration
-
Security
-
Benchmarking a Hadoop Cluster
-
Hadoop in the Cloud
-
-
Chapter 10 Administering Hadoop
-
HDFS
-
Monitoring
-
Maintenance
-
-
Chapter 11 Pig
-
Installing and Running Pig
-
An Example
-
Comparison with Databases
-
Pig Latin
-
User-Defined Functions
-
Data Processing Operators
-
Pig in Practice
-
-
Chapter 12 Hive
-
Installing Hive
-
An Example
-
Running Hive
-
Comparison with Traditional Databases
-
HiveQL
-
Tables
-
Querying Data
-
User-Defined Functions
-
-
Chapter 13 HBase
-
HBasics
-
Concepts
-
Installation
-
Clients
-
Example
-
HBase Versus RDBMS
-
Praxis
-
-
Chapter 14 ZooKeeper
-
Installing and Running ZooKeeper
-
An Example
-
The ZooKeeper Service
-
Building Applications with ZooKeeper
-
ZooKeeper in Production
-
-
Chapter 15 Sqoop
-
Getting Sqoop
-
Sqoop Connectors
-
A Sample Import
-
Generated Code
-
Imports: A Deeper Look
-
Working with Imported Data
-
Importing Large Objects
-
Performing an Export
-
Exports: A Deeper Look
-
-
Chapter 16 Case Studies
-
Hadoop Usage at Last.fm
-
Hadoop and Hive at Facebook
-
Nutch Search Engine
-
Log Processing at Rackspace
-
Cascading
-
TeraByte Sort on Apache Hadoop
-
Using Pig and Wukong to Explore Billion-edge Network Graphs
-
-
Appendix Installing Apache Hadoop
-
Appendix Cloudera’s Distribution Including Apache Hadoop
-
Appendix Preparing the NCDC Weather Data
-
Colophon