大数据开发教程——Nosql综述和ApacheHBase基础

比屋大数据

已于 2022-06-30 09:56:37 修改

阅读量162

点赞数

分类专栏：大数据架构师源码零基础教程文章标签： nosql big data 数据库

于 2022-06-13 13:51:29 首次发布

本文链接：https://blog.csdn.net/qq_42285599/article/details/125258404

版权

16 篇文章 4 订阅

订阅专栏

NoSQL：即：not only SQL，非关系型数据库。
在这里插入图片描述

NoSQL是一个通用术语，指不遵循传统RDBMS模型的数据库，数据是非关系的，且不使用SQL作为主要查询语言；解决数据库的可伸缩性和可用性问题，不针对原子性或一致性问题。

The most common; not necessarily the mostpopular(最普遍的;不一定是最受欢迎的)
Has rows each with something like a bigdictionary/associativearray(每个行都有一个大词典/关联数组)
Common on cloud platforms(常见于云平台)
e.g. Amazon SimpleDB,Azure Table Storage(例如AmazonSimpleDB，Azure表存储)
MemcacheDBVoldemortCouchbaseDynamoDB(AWS)DynomiteRedis and Riak

Hastables with declared column families(具有声明列族的表)
Each column family hascolumns"which are KV pairs that can vary from rowto row(每个列族都有“列”，这些列是KV对，可以在行与行之间变化)
Calls column families“super columns" and tables“super column families”(将列族称为“超列”，将表称为“超列族”)

Documents are typicallyJSONobjects(文档通常是JSON对象)
Each document has properties andvalues(每个文档都有属性和值)
Values can be scalars, arrays, links to documents in other databases or sub-documents(i.e.contained |SON objects-Allows for hierarchical storage)(值可以是数组、他数据库中的文档的链接或子文档(即包含JSON对象一-允许分层存储))
Old versions are retained(保留旧版本)
Most popular with developersshartusVCs(最受开发者、初创公司和风投公司欢迎)
CouchDBMongoDBCouchBase

HBase是一个领先的No-SQL数据库
HBase is column-oriented database(HBase是面向列的数据库)
HBase is a distributed hash map(HBase是一个分布式的HashMap)
HBase is based on the Google Big Table paper(HBase基于Google Big Table论文)
HBase uses HDFS as storage and leverage itsreliability(HBase使用HDFS作为存储并利用其可靠性)
Data can be accessed quickly~2-20 millisecond response time(可以快速访问数据，响应时间约为2-20毫秒)
Great support random read & write 20k to 100k+ops/sper node(非常好的支持随机读写每个节点20k到100k+ops/s相当于每秒10W次)
Scales to 20000+nodes(可扩展至20个节点)

HBase is a core storage in Alibaba search systemsince2010(自2010年以来，HBase一直是阿里巴巴搜索系统的核心存储)

-2010_{2014:0.20->0.94。2014}2015:094->0.98。2016 0.98->1.1.2
Current Scale(当前规模)

HBase Architecture Advantages(HBase体系结构优势)
HBase provides the following benefits(HBase具有以下优点)
Strong consistency model(健壮的一致性)
when a write returns, all readers will see same value
Scales automatically(自动扩展)
Splits when regions become too large(当region变得太大时自动分裂)
Uses HDFS to spread data and managespace(使用HDFS分散数据和管理空间) Built-in recovery(内置恢复)
Using a Write Ahead Log,similar to journaling on file system(写日志，便于恢复) Integrated with Hadoop(与Hadoop集成)
MapReduce on HBase is straightforward(HBase上的MapReduce非常简单)