Hadoop 1.0版本-每个人的大数据

最新推荐文章于 2024-03-08 18:56:45 发布

culi3182

最新推荐文章于 2024-03-08 18:56:45 发布

阅读量190

点赞数

文章标签：大数据 java 人工智能 python 数据分析

原文链接：https://www.sitepoint.com/hadoop-1-0-release-big-data-for-everyone/

版权

Roughly 90% of the data in the world was produced in the last 2 years – which should give you some idea of just exactly how much data is being accumulated the world around, especially by large companies like Google. The data field is so enormous that traditional methods of linking, searching and retrieving data don’t work any more. This is Big Data.

世界上大约90％的数据是在最近两年内产生的–这应该使您确切了解世界上正在累积多少数据，尤其是像Google这样的大公司。数据字段是如此之大，以至于传统的链接，搜索和检索数据的方法不再起作用。这就是大数据。

大数据 (Big Data)

The term “Big Data” was popularized by Roger Magoulas from O’Reilly in 2005, although avid net trawlers have found evidence of the term being used occasionally as far back as 2001. It is a catchall term for the conundrums faced by the massive accumulation of data in disparate forms, most often found collated from internet sources.

“大数据”一词在2005年由O'Reilly的Roger Magoulas推广，尽管狂热的网拖网渔船发现该词最早可追溯到2001年的证据。这是大规模积累所面临难题的总称。形式各异的数据，通常是从互联网来源整理而来。

Big data presents a number of problems to developers and analysts – not least among which is the need to file and compare information sources with wildly different structure and contents, as opposed to the old way of doing things with relational or object based data, where you know what a record looks like and relational links are concrete.

大数据给开发人员和分析人员带来了许多问题，其中最重要的是需要归档和比较结构和内容截然不同的信息源，这与使用关系或基于对象的数据的旧方法不同。知道记录是什么样的，并且关系链接是具体的。

To be able to work with Big data effectively you need to be able to crawl petabytes of information in multiple nodes and reduce it to a human navigable format. Relationships are fuzzy and flexible, and data structures are not always known ahead of time. The parallel processing required is also an entirely new learning curve in itself for many.

为了能够有效地使用大数据，您需要能够在多个节点中爬行PB级的信息，并将其缩减为人类可导航的格式。关系是模糊且灵活的，并且数据结构并非总是提前知道。对于许多人而言，所需的并行处理本身也是一条全新的学习曲线。

Hadoop简介 (Introducing Hadoop)

With the emergence of enormous quantities of data came the need to analyze it and make use of it – in the majority of cases most companies around the world are sitting on top of stupendous amounts of information with no intelligent way to gain benefit from it.

随着大量数据的出现，需要分析和利用它-在大多数情况下，世界上大多数公司都坐在大量信息之上，而没有明智的方法来从中受益。

To this end, Hadoop was born. Hadoop is a an Apache Foundation project that has produced a software library capable of abstracting and simplifying big data queries, handling latency, failure tolerance and asynchronous data availability. Most brilliantly, it can analyze unstructured data as well as structured – so when it comes to Big data, Hadoop is the killer app of the day.

为此， Hadoop诞生了。 Hadoop是一个Apache Foundation项目，它产生了一个软件库，该软件库能够抽象和简化大数据查询，处理延迟，故障容错和异步数据可用性。最出色的是，它既可以分析非结构化数据，也可以分析非结构化数据–因此，在大数据方面，Hadoop是当今的杀手级应用程序。

Heavily in use around the world, Hadoop has still effectively been beta software, until very recently. On January the 4th 2012, the project announced the first stable release of Hadoop, 1.0, which marks a major milestone in public ability to handle big data. Since it’s an apache project, anyone can use Hadoop and build their own solutions on top of it, so the realm of big data and massive intelligent searching is more open to developers than ever before.

Hadoop在世界范围内得到了广泛使用，直到最近，它仍然一直是Beta版软件。在2012年1月4日，该项目宣布了Hadoop的第一个稳定版本1.0，这标志着公共处理大数据能力的重要里程碑。由于这是一个Apache项目，因此任何人都可以使用Hadoop并在其之上构建自己的解决方案，因此，大数据和海量智能搜索的领域比以往任何时候都更向开发人员开放。

The software has been stable and in production use in many places for a long time, but the official release of a 1.0 version paves the way to easier adoption in the corporate environment as well as providing more assurance to developers hoping to tie big data intelligence into their apps.

该软件已经在许多地方稳定使用了很长时间，但是1.0版的正式发布为更轻松地在公司环境中采用该方法铺平了道路，并为希望将大数据智能与他们的应用。

学到更多 (Learn more)

We will be publishing some articles introducing the concepts of Big Data and concepts like Google’s MapReduce in more depth in the coming weeks, so keep coming back for your intro to the new world of intelligent computing.

我们将在未来几周内发布一些文章，以更深入地介绍大数据的概念和Google的MapReduce之类的概念，因此请继续关注您，以介绍智能计算的新世界。

Apache Hadoop

阿帕奇Hadoop

Stock photo Copyright Bellanixie via Shutterstock.com

图库照片版权Bellanixie通过Shutterstock.com