Hadoop Operations(Hadoop操作) 详解(一) 简介

本文为Hadoop Operations系列的第一部分,主要介绍了Hadoop操作的基础概念和重要性,旨在为读者提供Hadoop集群管理和运维的初步理解。
Hadoop Operations 详解


Chapter 1. Introduction

     Over the past few years, there has been a fundamental shift in data storage, management, and processing. Companies are storing more data from more sources in more formats than ever before. This isn’t just about being a “data packrat” but rather building products, features, and intelligence predicated on knowing more about the world (where the world can be users, searches, machine logs, or whatever is relevant to an organization). Organizations are finding new ways to use data that was previously believed to be of little value, or far too expensive to retain, to better serve their constituents. Sourcing and storing data is one half of the equation. Processing that data to produce information is fundamental to the daily operations of every modern business.
     Data storage and processing isn’t a new problem, though. Fraud detection in commerce and finance, anomaly detection in operational systems, demographic analysis in advertising, and many other applications have had to deal with these issues for decades. What has happened is that the volume, velocity, and variety of this data has changed, and in some cases, rather dramatically. This makes sense, as many algorithms benefit from access to more data. Take, for instance, the problem of recommending products to a visitor of an ecommerce website. You could simply show each visitor a rotating list of products they could buy, hoping that one would appeal to them. It’s not exactly an informed decision, but it’s a start. The question is what do you need to improve the chance of showing the right person the right product? Maybe it makes sense to show them what you think they like, based on what they’ve previously looked at. For some products, it’s useful to know what they already own. Customers who already bought a specific brand of laptop computer from you may be interested in compatible accessories and upgrades. [ 1 ] One of the most common techniques is to cluster users by similar behavior (such as purchase patterns) and recommend products purchased by “similar” users. No matter the solution, all of the algorithms behind these options require data and generally improve in quality with more of it. Knowing more about a problem space generally leads to better decisions (or algorithm efficacy), which in turn leads to happier users, more money, reduced fraud, healthier people, safer conditions, or whatever the desired result might be.
Apache Hadoop is a platform that provides pragmatic, cost-effective, scalable infrastructure for building many of the types of applications described earlier. Made up of a distributed filesystem called the Hadoop Distributed Filesystem (HDFS) and a computation layer that implements a processing paradigm called MapReduce, Hadoop is an open source, batch data processing system for enormous amounts of data. We live in a flawed world, and Hadoop is designed to survive in it by not only tolerating hardware and software failures, but also treating them as first-class conditions that happen regularly. Hadoop uses a cluster of plain old commodity servers with no specialized hardware or network infrastructure to form a single, logical, storage and compute platform, or cluster , that can be shared by multiple individuals or groups. Computation in Hadoop MapReduce is performed in parallel, automatically, with a simple abstraction for developers that obviates complex synchronization and network programming. Unlike many other distributed data processing systems, Hadoop runs the user-provided processing logic on the machine where the data lives rather than dragging the data across the network; a huge win for performance.
     Apache Hadoop是一个平台,它提供了实用的、具有成本效益的、可伸缩的基础设施,用于构建前面描述的许多应用程序类型。由称为Hadoop分布式文件系统(HDFS)的分布式文件系统和实现称为MapReduce的处理范例的计算层组成,Hadoop是一种开放源码的批处理数据处理系统,用于处理大量数据。我们生活在一个有缺陷的世界里,Hadoop的设计是为了生存,它不仅容忍硬件和软件的失败,而且还把它们当作是经常发生的一流条件。




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


