关闭

HDFS小结

标签: hadoopmapreduce
347人阅读 评论(0) 收藏 举报
分类:
1、HDFS: Motivation:
(1)Based on Google’s GFS
(2)Redundant storage of massive amounts of data on cheap and unreliable computers
(3)Why not use an existing file system? 
        – Different workload and design priorities;
        – Handles much bigger dataset sizes than other filesystems
2、HDFS Design Decisions
(1)Files stored as blocks-Much larger size than most filesystems (default is 64MB)
(2)Reliability through replication
           – Each block replicated across 3+ DataNodes
(3)Single master (NameNode) coordinates access, metadata
           – Simple centralized management
(4)No data caching-– Little benefit due to large data sets, streaming reads
(5)Familiar interface, but customize the API
          – Simplify the problem; focus on distributed apps
3、HDFS Client Block Diagram
4、Based on GFS Architecture
5、Metadata
(1)Single NameNode stores all metadata
          – Filenames, locations on DataNodes of each file
(2)Maintained entirely in RAM for fast lookup
(3)DataNodes store opaque file contents in “block” objects on underlying local filesystem
6、HDFS Conclusions
(1)HDFS supports large-scale processing workloads on commodity hardware
            –designed to tolerate frequent component failures;
            –optimized for huge files that are mostly appended and read
           – filesystem interface is customized for the job, but still retains familiarity for developers
           – simple solutions can work (e.g., single master)
(2)Reliably stores several TB in individual clusters

0
0

查看评论
* 以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:17017次
    • 积分:330
    • 等级:
    • 排名:千里之外
    • 原创:17篇
    • 转载:3篇
    • 译文:0篇
    • 评论:1条
    文章存档
    最新评论