Bigdata-Cloudera CDH5生产环境推荐的硬件配置
首先看官方推荐的硬件配置:
Master Node Hardware Recommendations
▪ Carrier-class hardware
▪ Dual power supplies
▪ Dual Ethernet cards
─
Bonded to provide failover
▪ RAIDed hard drives
▪ Reasonable amount of RAM
─
128GB recommended
Typical configurations for worker nodes
Midline: deep storage, 1Gb Ethernet
─ 16 x 3TB SATA II hard drives, in a non-RAID, JBOD † configuration
─ 1 or 2 of the 16 drives for the OS, with RAID-1 mirroring
─ 2 x 8-core 3.0GHz CPUs, 15MB cache
─ 256GB RAM
─ 2x1 Gigabit Ethernet
High-end: high memory, spindle dense, 10Gb Ethernet
─ 24 x 1TB Nearline/MDL SAS hard drives, in a non-RAID, JBOD*
configuration
─ 2 x 8-core 3.0GHz CPUs, 15MB cache
─ 512GB RAM (or more)
─ 1x10 Gigabit Ethernet
Worker Nodes—CPU
▪ Hex- and octo-core CPUs are commonly available
▪ Hyper-threading and quick-path interconnect (QPI) should be enabled
▪ Hadoop nodes are typically disk- and network-I/O bound
─
Therefore, top-of-the-range CPUs are usually not necessary
Some types of Hadoop jobs do make heavy use of CPU resources
─ Clustering and classification
─ Complex text mining
─ Natural language processing
─ Feature extraction
─ Image manipulation
You might need more processing power on your worker nodes if your specific
workload requires it
Worker Nodes—RAM
▪ Worker node configuration specifies the amount of memory and number of
cores that map tasks, reduce tasks, and ApplicationMasters can use on that
node
▪ Each map and reduce task typically takes 2GB to 4GB of RAM
▪ Each ApplicationMaster typically takes 1GB of RAM
▪ Worker nodes should
▪ Ensure you have enough RAM to run all tasks, plus overhead for the DataNode
and NodeManager daemons, plus the operating system
▪ Rule of thumb:
Total number of tasks = Number of physical processor cores minus one
─
be using virtual memory
This is a starting point, and should not be taken as a definitive setting for all
clusters
▪
New, memory-intensive processing frameworks are being deployed on many
Hadoop clusters
─ Impala
─ Spark
▪ HDFS caching can also take advantage of extra RAM on worker nodes
▪ Good practice to equip your worker nodes with as much RAM as you can
─
Memory configurations up to 512GB per worker node are not unusual for
workloads with high memory requirements
Worker Nodes—Disk
Hadoop’s architecture impacts disk space requirements
─ By default, HDFS data is replicated three times
─ Temporary data storage typically requires 20-30 percent of a cluster’s raw
disk capacity
In general, more spindles (disks) is better
─
Use 3.5 inch disks
─
Faster, cheaper, higher capacity than 2.5" disks
7,200 RPM SATA/SATA II drives are fine
In practice, we see anywhere from four to 24 disks (or even more) per node
No need to buy 15,000 RPM drives
8 x 1.5TB drives is likely to be better than 6 x 2TB drives
─
Different tasks are more likely to be accessing different disks
▪
A good practical maximum is 36TB per worker node
─
More than that will result in massive network traffic if a node dies and block
re-replication must take place
▪ Recommendation: dedicate 1 disk for OS and logs, use the other disks for
Hadoop data
▪ Mechanical hard drives currently provide a significantly better cost/
performance ratio than solid-state drives (SSDs)
▪ For hybrid clusters (both SSDs and HDDs), using SSDs for non-compressed
intermediate shuffle data leads to significant performance gains