Book Review of “Big Data: A Revolution That Will Transform How We Live, Work and Think”

There have been a number of attempts to chronicle exactly what is “big data” and why anyone should care.  Last year’sThe Human Face of Big Data by Rick Smolan and Jennifer Erwitt focused on telling the personal stories behind big data (and accompanied these stories with some great photographs). The year before, James Gleick wrote The Information: A History, A Theory, A Floodwhich chronicled how information (and not just big data) has changed our world. The latest entrant isBig Data: A Revolution That Will Transform How We Live, Work and Think by Viktor Mayer-Schönberger and Kenneth Cukier which focuses heavily on explaining some of the more interesting impacts of living in a big data world. (Personally, I’m still not a fan of the term big data because 1) the term scares off people who think this is equivalent to “Big Oil” and 2) the term underrepresents the innovation happening around “small” data. But since this is the term used in the book, I’ll stick with it for this review.)

The first part of this book provides a fairly compelling vision of how big data is changing how we use data. Unlike some technology proponents who simply ignore the past, Mayer-Schönberger and Cukier make a point to highlight that the use of data itself is not new, but that information technology (IT) has made it possible to collect and analyze data on a scale not seen in the past. The authors explore three main changes they see arising from big data. First, we will have significantly more data available than in the past. This means that we will be able to approach N = all for some datasets rather than just using population samples. Second, as we increasingly quantify the world, we will have more measurement error in our data, but that is okay because with much larger datasets the messiness of data becomes less important. Third, we will focus much less on understanding causation (“why”) and more on understanding correlation (“what”). (For a detailed look at this last point, see Chris Anderson’s essay“The End of Theory.”)

While these chapters are interesting, Mayer-Schönberger and Cukier are at their best later in the book when they describe the economic consequences of big data, both in terms of how data is creating economic value and how data is disrupting many industries. Unlike other economic resources, the value from data is not exhausted after its initial use. Instead, data can be reused an unlimited number of times, either directly or by combining it with additional information. In addition, “data exhaust” that would have been discarded in the past can now be put to practical use, such as Google using typos entered by users in its search engine to create a better spell check program.

This is a crucial point. It is not always possible to know how data will be used when it is collected, and even if some uses are identified, the value of big data comes from its reuse. Policymakers stuck in theold way of thinking want to imposedata minimization requirements which would effectively create a “use once” policy for data. Instead, to take advantage of data-driven economic value, we need policies that allow and encourage responsible reuse of data.

Mayer-Schönberger and Cukier offer one of the best metaphors for the new type of thinking that we need around data. Using a normal camera, a photographer must decide when taking a photo where to focus the lens. In contrast,plenoptic cameras, like the newLytro camera, capture light field information and allow photographers to change the focus of a picture after the picture has been taken. Like photographers, most data users have been stuck having to decide how to use data at the outset. But with increasingly lower costs for collection, storage and processing, users are now free to explore possible uses after collecting it.

The authors also discuss the new value chain created by companies involved in big data. They identify three primary value propositions: those providing data, those providing the skills, such as the technology and the analytics, and those providing business opportunities. One of their more interesting insights is that new business models are being created to take advantage of data opportunities that do not fit into existing organizations. For example, the health insurers formed the non-profit Health Care Cost Institute to combine data sets for research that individually they could not perform. Similarly, UPS spun off its internal data analytics unit because it could provide substantially more value if it had access to data from UPS’s competitors, but this would never happen if it remained part of the parent company. The authors argue that most of the value will be in the data part of the value chain, but that it isn’t there now. Unfortunately, such an assertion is impossible to prove or disprove. We are still in the early stages of assigning value to data, both at the macro-economic level and the firm level. Government statistics agencies need to include more than just goods and services if they want to accurately measure the data economy (Mike Mandel haswritten a thoughtful piece on this exact point).

While the authors also carve out a chapter to explore the “dark side” of big data, including privacy and misuse, they mostly avoid the overwrought handwringing that typically characterizes writing on this subject. And they recognize that much of the big data revolution does not involve personal data. With regards to personal data, my primary criticism is that they unfairly dismiss de-identification techniques, mostly relying on thecritiques leveled by Paul Ohm, while ignoring theshortcomings of his work described by individuals such as Jane Yakowitz or the continued advancement ofdifferential privacy research. They also get wrapped up in a surprisingly lengthy discussion of the risk of criminal profiling similar to what was seen in the movieMinority Report, where individuals were arrested for crimes before they were actually committed. While perhaps an interesting thought experiment, the authors provide little evidence that this is anything but a far-fetched science-fiction nightmare.

Overall, the book is an enjoyable read if for nothing else than some of the great nuggets of big data trivia that show just how much data has changed. For example, Mayer-Schönberger and Cukier report that the Sloan Digital Sky Survey generated 140 terabytes of information in about 10 years; it’s successor, the Large Synoptic Survey Telescope in Chile will generate as much every 5 days. In addition, the way they handle the risks section of their book bodes well for the future of data—it seems the more people come to understand it, the fewer concerns they have.

Photo credit: Chatham House

参考:http://www.innovationfiles.org/book-review-of-big-data-a-revolution-that-will-transform-how-we-live-work-and-think/    

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值