大数据课程学习笔记(1)

1.Do word count over a given set of web pages in parallel use MapReduce

                                 MapReduce Framework


2.结构化数据和半结构化数据

 结构化数据:以表格形式表示的信息

非结构化数据:形式比较自由

半结构化数据:事实上几乎没有数据是无结构化的。

3.IR

 Ultimate Focus of IR:Satisfying user information need(Emphasis is on retrieval of information not data),User information need(Printer reviews,Book prices and availability,Words in which all vowels appear and so on),Predicting which documents are relevent,and the linearly ranking them.

4.DIKW Hierarchy

 D is DATA:Symbolic units  E.g:Records of customer or Bytes from sensors.

 I is Information:Data with an interpretation(who?what?when?where?)E.g:按年龄分组的用户信息

 K is Knowledge:information organized with theoretical concepts or abstract ideas(how?)E.g:经济危机下多少用户在减少开支?

 W is wisdom:understanding of fundamental principles and Human judgement.

 

Thinking at scale

1.problem:We can process data very quickly but we can read/write it very sloely.

     Sharing is slow,we should distribute the data

     Sharing is tricky:exchanging data requires synchronization(Deadlock becomes a problem),finite bandwidth is available(distributed systems can "drown themselves" and failovers can cause cascading failure),temporal dependencies are complicated.

2.Reliability demands

 Support partial failure– Total system must support graceful decline in application  performance rather than a full halt.

 Data Recoverability– If components fail, their workload must be picked up by still-  functioning units.

 Individual Recoverability– Nodes that fail and restart must be able to rejoin the group  activity without a full group restart.

Consistency
– Concurrent operations or partial internal failures should not cause externally visible 
nondeterminism

Scalability
– Adding increased load to a system should not cause outright failure, but a graceful decline
– Increasing resources should support a proportional increase in load capacity

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值