Big Data - Tim Smith

1
00:00:00,000 --> 00:00:07,000
Translator: Andrea McDonough
Reviewer: Jessica Ruby

2
00:00:31,085 --> 00:00:33,847
Big data is an elusive concept.

3
00:00:35,987 --> 00:00:38,675
It represents an amount of digital information,

4
00:00:38,675 --> 00:00:40,845
which is uncomfortable to store,

5
00:00:40,845 --> 00:00:41,973
transport,

6
00:00:41,973 --> 00:00:43,851
or analyze.

7
00:00:43,851 --> 00:00:45,766
Big data is so voluminous

8
00:00:45,766 --> 00:00:48,474
that it overwhelms the technologies of the day

9
00:00:48,474 --> 00:00:50,899
and challenges us to create the next generation

10
00:00:50,899 --> 00:00:54,004
of data storage tools and techniques.

11
00:00:59,557 --> 00:01:01,336
So, big data isn't new.

12
00:01:01,336 --> 00:01:03,694
In fact, physicists at CERN have been rangling

13
00:01:03,694 --> 00:01:08,093
with the challenge of their ever-expanding big data for decades.

14
00:01:09,431 --> 00:01:11,754
Fifty years ago, CERN's data could be stored

15
00:01:11,754 --> 00:01:13,506
in a single computer.

16
00:01:13,506 --> 00:01:15,660
OK, so it wasn't your usual computer,

17
00:01:15,660 --> 00:01:17,077
this was a mainframe computer

18
00:01:17,077 --> 00:01:19,387
that filled an entire building.

19
00:01:21,494 --> 00:01:22,663
To analyze the data,

20
00:01:22,663 --> 00:01:25,611
physicists from around the world traveled to CERN

21
00:01:25,611 --> 00:01:28,637
to connect to the enormous machine.

22
00:01:31,075 --> 00:01:33,928
In the 1970's, our ever-growing big data

23
00:01:33,928 --> 00:01:36,678
was distributed across different sets of computers,

24
00:01:36,678 --> 00:01:38,708
which mushroomed at CERN.

25
00:01:38,708 --> 00:01:40,150
Each set was joined together

26
00:01:40,150 --> 00:01:42,678
in dedicated, homegrown networks.

27
00:01:42,678 --> 00:01:44,464
But physicists collaborated without regard

28
00:01:44,464 --> 00:01:46,413
for the boundaries between sets,

29
00:01:46,413 --> 00:01:49,302
hence needed to access data on all of these.

30
00:01:49,302 --> 00:01:51,287
So, we bridged the independent networks together

31
00:01:51,287 --> 00:01:54,379
in our own CERNET.

32
00:01:54,379 --> 00:01:57,227
In the 1980's, islands of similar networks

33
00:01:57,227 --> 00:01:58,771
speaking different dialects

34
00:01:58,771 --> 00:02:01,311
sprung up all over Europe and the States,

35
00:02:01,311 --> 00:02:04,402
making remote access possible but torturous.

36
00:02:04,402 --> 00:02:06,546
To make it easy for our physicists across the world

37
00:02:06,546 --> 00:02:08,951
to access the ever-expanding big data

38
00:02:08,951 --> 00:02:10,744
stored at CERN without traveling,

39
00:02:10,744 --> 00:02:12,043
the networks needed to be talking

40
00:02:12,043 --> 00:02:13,413
with the same language.

41
00:02:13,413 --> 00:02:17,208
We adopted the fledgling internet working standard from the States,

42
00:02:17,208 --> 00:02:18,584
followed by the rest of Europe,

43
00:02:18,584 --> 00:02:20,752
and we established the principal link at CERN

44
00:02:20,752 --> 00:02:23,255
between Europe and the States in 1989,

45
00:02:23,255 --> 00:02:26,041
and the truly global internet took off!

46
00:02:28,580 --> 00:02:30,371
Physicists could easily then access

47
00:02:30,371 --> 00:02:32,183
the terabytes of big data

48
00:02:32,183 --> 00:02:33,846
remotely from around the world,

49
00:02:33,846 --> 00:02:35,225
generate results,

50
00:02:35,225 --> 00:02:37,520
and write papers in their home institutes.

51
00:02:37,520 --> 00:02:39,021
Then, they wanted to share their findings

52
00:02:39,021 --> 00:02:40,813
with all their colleagues.

53
00:02:40,813 --> 00:02:42,416
To make this information sharing easy,

54
00:02:42,416 --> 00:02:45,358
we created the web in the early 1990's.

55
00:02:45,358 --> 00:02:47,196
Physicists no longer needed to know

56
00:02:47,196 --> 00:02:48,833
where the information was stored

57
00:02:48,833 --> 00:02:51,402
in order to find it and access it on the web,

58
00:02:51,402 --> 00:02:53,536
an idea which caught on across the world

59
00:02:53,536 --> 00:02:55,912
and has transformed the way we communicate

60
00:02:55,912 --> 00:02:57,580
in our daily lives.

61
00:03:00,226 --> 00:03:01,633
During the early 2000's,

62
00:03:01,633 --> 00:03:03,623
the continued growth of our big data

63
00:03:03,623 --> 00:03:06,914
outstripped our capability to analyze it at CERN,

64
00:03:06,914 --> 00:03:10,499
despite having buildings full of computers.

65
00:03:10,499 --> 00:03:12,805
We had to start distributing the petabytes of data

66
00:03:12,805 --> 00:03:14,387
to our collaborating partners

67
00:03:14,387 --> 00:03:17,139
in order to employ local computing and storage

68
00:03:17,139 --> 00:03:19,974
at hundreds of different institutes.

69
00:03:19,974 --> 00:03:22,269
In order to orchestrate these interconnected resources

70
00:03:22,269 --> 00:03:24,313
with their diverse technologies,

71
00:03:24,313 --> 00:03:26,064
we developed a computing grid,

72
00:03:26,064 --> 00:03:27,640
enabling the seamless sharing

73
00:03:27,640 --> 00:03:30,068
of computing resources around the globe.

74
00:03:30,068 --> 00:03:34,459
This relies on trust relationships and mutual exchange.

75
00:03:34,459 --> 00:03:36,752
But this grid model could not be transferred

76
00:03:36,752 --> 00:03:39,036
out of our community so easily,

77
00:03:39,036 --> 00:03:41,330
where not everyone has resources to share

78
00:03:41,330 --> 00:03:43,206
nor could companies be expected

79
00:03:43,206 --> 00:03:45,959
to have the same level of trust.

80
00:03:45,959 --> 00:03:48,254
Instead, an alternative, more business-like approach

81
00:03:48,254 --> 00:03:50,090
for accessing on-demand resources

82
00:03:50,090 --> 00:03:51,798
has been flourishing recently,

83
00:03:51,798 --> 00:03:53,466
called cloud computing,

84
00:03:53,466 --> 00:03:55,342
which other communities are now exploiting

85
00:03:55,342 --> 00:03:57,342
to analyzing their big data.

86
00:03:57,342 --> 00:04:00,329
It might seem paradoxical for a place like CERN,

87
00:04:00,329 --> 00:04:01,900
a lab focused on the study

88
00:04:01,900 --> 00:04:05,071
of the unimaginably small building blocks of matter,

89
00:04:05,071 --> 00:04:08,448
to be the source of something as big as big data.

90
00:04:08,448 --> 00:04:10,530
But the way we study the fundamental particles,

91
00:04:10,530 --> 00:04:13,143
as well as the forces by which they interact,

92
00:04:13,143 --> 00:04:15,246
involves creating them fleetingly,

93
00:04:15,246 --> 00:04:17,614
colliding protons in our accelerators

94
00:04:17,614 --> 00:04:19,041
and capturing a trace of them

95
00:04:19,041 --> 00:04:21,314
as they zoom off near light speed.

96
00:04:21,314 --> 00:04:22,308
To see those traces,

97
00:04:22,308 --> 00:04:25,756
our detector, with 150 million sensors,

98
00:04:25,756 --> 00:04:28,231
acts like a really massive 3-D camera,

99
00:04:28,231 --> 00:04:30,341
taking a picture of each collision event -

100
00:04:30,341 --> 00:04:32,891
that's up to 14 millions times per second.

101
00:04:32,891 --> 00:04:35,424
That makes a lot of data.

102
00:04:37,194 --> 00:04:39,353
But if big data has been around for so long,

103
00:04:39,353 --> 00:04:41,980
why do we suddenly keep hearing about it now?

104
00:04:41,980 --> 00:04:43,691
Well, as the old metaphor explains,

105
00:04:43,691 --> 00:04:46,479
the whole is greater than the sum of its parts,

106
00:04:46,479 --> 00:04:50,256
and this is no longer just science that is exploiting this.

107
00:04:50,256 --> 00:04:51,860
The fact that we can derive more knowledge

108
00:04:51,860 --> 00:04:54,190
by joining related information together

109
00:04:54,190 --> 00:04:55,741
and spotting correlations

110
00:04:55,741 --> 00:04:59,132
can inform and enrich numerous aspects of everyday life,

111
00:04:59,132 --> 00:05:00,160
either in real time,

112
00:05:00,160 --> 00:05:02,451
such as traffic or financial conditions,

113
00:05:02,451 --> 00:05:04,206
in short-term evolutions,

114
00:05:04,206 --> 00:05:06,333
such as medical or meteorological,

115
00:05:06,333 --> 00:05:08,058
or in predictive situations,

116
00:05:08,058 --> 00:05:11,078
such as business, crime, or disease trends.

117
00:05:13,369 --> 00:05:16,432
Virtually every field is turning to gathering big data,

118
00:05:16,432 --> 00:05:18,769
with mobile sensor networks spanning the globe,

119
00:05:18,769 --> 00:05:21,056
cameras on the ground and in the air,

120
00:05:21,056 --> 00:05:24,067
archives storing information published on the web,

121
00:05:24,067 --> 00:05:26,196
and loggers capturing the activities

122
00:05:26,196 --> 00:05:28,895
of Internet citizens the world over.

123
00:05:28,895 --> 00:05:31,486
The challenge is on to invent new tools and techniques

124
00:05:31,486 --> 00:05:33,439
to mine these vast stores,

125
00:05:33,439 --> 00:05:35,240
to inform decision making,

126
00:05:35,240 --> 00:05:37,496
to improve medical diagnosis,

127
00:05:37,496 --> 00:05:39,706
and otherwise to answer needs and desires

128
00:05:39,706 --> 00:05:43,663
of tomorrow's society in ways that are unimagined today.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值