Big Data Analysis: Week 1

本文介绍了大数据的4V特性,对比了Scale-up和Scale-out的区别,并重点探讨了分布式计算中的Apache Hadoop和Apache Spark。Spark以其速度和内存计算能力与Hadoop的MapReduce相比具有优势。此外,文章还概述了Hadoop的组件和存储基础架构,如HDFS,以及列存储在大数据处理中的角色。
摘要由CSDN通过智能技术生成

Week 1 大数据分析介绍

大数据的四个维度:4V

Volume(数据量):生成和存储的大量数据(通常按TB或PB的顺序)
Variety(数据形式):所使用的数据类型和数据源的范围,包括非结构化数据
Velocity(数据速度):收集,共享和分析数据的速率-通常是实时流式数据(例如,来自社交媒体)
Veracity(数据的可靠性):数据质量的不确定性(准确性,出处,相关性和一致性)

Scale-up VS Scale-out

Scale-up 纵向扩展:仅在一定程度上增加计算机(即磁盘,内存,处理器)的功能。

Scale-out 横向扩展:使用许多标准计算机,并在其上分发数据和计算。

分布式计算

高性能计算(HPC,放大)

– CPU / GPU密集型问题(AI,3D图形)

Seti@Home

-在你的电脑上下载分析无线电望远镜数据

P2P

去中心化(例如bittorrent)

Hadoop -> 并行处理大型数据集
Spark ->更快,在内存里

Apache Hadoop

1.开源
2.基于HDFS (分布式文件系统)
3.Map/Reduce

Apache Spark vs Hadoop

1.快100倍
2.基于HDFS
3.数据保留在内存
4.不仅仅是Map/Reduce(Volume,velocity,图形形状数据等的集成系统。)

Week 2 :大数据量的存储基础

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Series: Chapman & Hall/CRC Mathematical and Computational Biology Hardcover: 294 pages Publisher: Chapman and Hall/CRC (December 22, 2015) Language: English ISBN-10: 1498724523 ISBN-13: 978-1498724524 Demystifies Biomedical and Biological Big Data Analyses Big Data Analysis for Bioinformatics and Biomedical Discoveries provides a practical guide to the nuts and bolts of Big Data, enabling you to quickly and effectively harness the power of Big Data to make groundbreaking biological discoveries, carry out translational medical research, and implement personalized genomic medicine. Contributing to the NIH Big Data to Knowledge (BD2K) initiative, the book enhances your computational and quantitative skills so that you can exploit the Big Data being generated in the current omics era. The book explores many significant topics of Big Data analyses in an easily understandable format. It describes popular tools and software for Big Data analyses and explains next-generation DNA sequencing data analyses. It also discusses comprehensive Big Data analyses of several major areas, including the integration of omics data, pharmacogenomics, electronic health record data, and drug discovery. Accessible to biologists, biomedical scientists, bioinformaticians, and computer data analysts, the book keeps complex mathematical deductions and jargon to a minimum. Each chapter includes a theoretical introduction, example applications, data analysis principles, step-by-step tutorials, and authoritative references
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值