数据库泛谈

博文目录


Oracle

What Is a Database

Database defined

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS). Together, the data and the DBMS, along with the applications that are associated with them, are referred to as a database system, often shortened to just database.

Data within the most common types of databases in operation today is typically modeled in rows and columns in a series of tables to make processing and data querying efficient. The data can then be easily accessed, managed, modified, updated, controlled, and organized. Most databases use structured query language (SQL) for writing and querying data.

数据库定义

数据库是通常以电子方式存储在计算机系统中的结构化信息或数据的有组织的集合。 数据库通常由数据库管理系统(DBMS)控制。 数据和DBMS以及与之关联的应用程序一起被称为数据库系统,通常简称为数据库。

当前,在操作中最常见的数据库类型中的数据通常以一系列表中的行和列建模,以提高处理和数据查询的效率。 然后可以轻松地访问,管理,修改,更新,控制和组织数据。 大多数数据库使用结构化查询语言(SQL)来编写和查询数据。

What is Structured Query Language (SQL)?

SQL is a programming language used by nearly all relational databases to query, manipulate, and define data, and to provide access control. SQL was first developed at IBM in the 1970s with Oracle as a major contributor, which led to implementation of the SQL ANSI standard, SQL has spurred many extensions from companies such as IBM, Oracle, and Microsoft. Although SQL is still widely used today, new programming languages are beginning to appear.

什么是结构话查询语言(SQL)?

SQL是几乎所有关系数据库都用来查询,操纵和定义数据以及提供访问控制的编程语言。 SQL最初是在1970年代由IBM开发的,主要贡献者是Oracle,这导致了SQL ANSI标准的实施,SQL吸引了IBM,Oracle和Microsoft等公司的许多扩展。 尽管如今SQL仍被广泛使用,但是新的编程语言开始出现。

Evolution of the database

Databases have evolved dramatically since their inception in the early 1960s. Navigational databases such as the hierarchical database (which relied on a tree-like model and allowed only a one-to-many relationship), and the network database (a more flexible model that allowed multiple relationships), were the original systems used to store and manipulate data. Although simple, these early systems were inflexible. In the 1980s, relational databases became popular, followed by object-oriented databases in the 1990s. More recently, NoSQL databases came about as a response to the growth of the internet and the need for faster speed and processing of unstructured data. Today, cloud databases and elf-driving databases are breaking new ground when it comes to how data is collected, stored, managed, and utilized.

数据库的演变

自1960年代初建立以来,数据库已经发生了巨大的发展。 导航数据库(例如分层数据库(依赖于树状模型并且仅允许一对多关系)和网络数据库(更灵活的模型,允许多个关系))是用于存储和操作数据的原始系统。 尽管很简单,但这些早期系统并不灵活。 在1980年代,关系数据库开始流行,随后在1990年代是面向对象的数据库。 最近,NoSQL数据库的出现是对互联网发展以及对更快的速度和对非结构化数据进行处理的需求的回应。 如今,在如何收集,存储,管理和利用数据方面,云数据库和自动驾驶数据库正在开辟新天地。

Types of databases

There are many different types of databases. The best database for a specific organization depends on how the organization intends to use the data.

  • Relational databases. Relational databases became dominant in the 1980s. Items in a relational database are organized as a set of tables with columns and rows. Relational database technology provides the most efficient and flexible way to access structured information.
  • Object-oriented databases. Information in an object-oriented database is represented in the form of objects, as in object-oriented programming.
  • Distributed databases. A distributed database consists of two or more files located in different sites. The database may be stored on multiple computers, located in the same physical location, or scattered over different networks.
  • Data warehouses. A central repository for data, a data warehouse is a type of database specifically designed for fast query and analysis.
  • NoSQL databases. A NoSQL, or nonrelational database, allows unstructured and semistructured data to be stored and manipulated (in contrast to a relational database, which defines how all data inserted into the database must be composed). NoSQL databases grew popular as web applications became more common and more complex.
  • Graph databases. A graph database stores data in terms of entities and the relationships between entities.
  • OLTP databases. An OLTP database is a speedy, analytic database designed for large numbers of transactions performed by multiple users.

These are only a few of the several dozen types of databases in use today. Other, less common databases are tailored to very specific scientific, financial, or other functions. In addition to the different database types, changes in technology development approaches and dramatic advances such as the cloud and automation are propelling databases in entirely new directions. Some of the latest databases include

  • Open source databases. An open source database system is one whose source code is open source; such databases could be SQL or NoSQL databases.
  • Cloud databases. A cloud database is a collection of data, either structured or unstructured, that resides on a private, public, or hybrid cloud computing platform. There are two types of cloud database models: traditional and database as a service (DBaaS). With DBaaS, administrative tasks and maintenance are performed by a service provider.
  • Multimodel database. Multimodel databases combine different types of database models into a single, integrated back end. This means they can accommodate various data types.
    Document/JSON database. Designed for storing, retrieving, and managing document-oriented information, document databases are a modern way to store data in JSON format rather than rows and columns.
  • Self-driving databases. The newest and most groundbreaking type of database, self-driving databases (also known as autonomous databases) are cloud-based and use machine learning to automate database tuning, security, backups, updates, and other routine management tasks traditionally performed by database administrators.

数据库的类型

有许多不同类型的数据库。特定组织的最佳数据库取决于组织打算如何使用数据。

  • 关系数据库。关系数据库在1980年代占主导地位。关系数据库中的项目被组织为一组具有列和行的表。关系数据库技术提供了最有效,最灵活的方法来访问结构化信息。
  • 面向对象的数据库。像面向对象的编程一样,面向对象的数据库中的信息以对象的形式表示。
  • 分布式数据库。分布式数据库由位于不同站点的两个或多个文件组成。该数据库可以存储在位于相同物理位置的多台计算机上,也可以分散在不同的网络上。
  • 数据仓库。数据仓库是数据的中央存储库,是专门为快速查询和分析而设计的一种数据库。
  • NoSQL数据库。 NoSQL或非关系数据库允许存储和处理非结构化和半结构化数据(与关系数据库不同,后者定义了必须如何构成插入数据库的所有数据)。随着Web应用程序变得越来越普遍和复杂,NoSQL数据库变得越来越流行。
  • 图形数据库。图形数据库根据实体以及实体之间的关系存储数据。
  • OLTP数据库。 OLTP数据库是一种快速的分析数据库,设计用于由多个用户执行的大量事务。

这些只是当今使用的几十种数据库中的几种。其他不太常见的数据库则针对非常具体的科学,财务或其他功能进行了定制。除了不同的数据库类型之外,技术开发方法的变化以及诸如云和自动化之类的显着进步正在推动数据库朝着全新的方向发展。一些最新的数据库包括

  • 开源数据库。开源数据库系统是一种源代码为开源的系统。这样的数据库可以是SQL或NoSQL数据库。
  • 云数据库。云数据库是驻留在私有,公共或混合云计算平台上的结构化或非结构化数据的集合。云数据库模型有两种类型:传统模型和数据库即服务(DBaaS)。使用DBaaS,管理任务和维护由服务提供商执行。
  • 多模型数据库。多模型数据库将不同类型的数据库模型组合到单个集成的后端中。这意味着它们可以容纳各种数据类型。
    文档/ JSON数据库。专为存储,检索和管理面向文档的信息而设计,文档数据库是一种以JSON格式而不是行和列存储数据的现代方法。
  • 自动驾驶数据库。自动驾驶数据库(也称为自主数据库)是最新,最突破性的数据库类型,它基于云,并使用机器学习来自动执行数据库调优,安全性,备份,更新和其他常规由数据库管理员执行的日常管理任务。

What is database software?

Database software is used to create, edit, and maintain database files and records, enabling easier file and record creation, data entry, data editing, updating, and reporting. The software also handles data storage, backup and reporting, multi-access control, and security. Strong database security is especially important today, as data theft becomes more frequent. Database software is sometimes also referred to as a “database management system” (DBMS).

Database software makes data management simpler by enabling users to store data in a structured form and then access it. It typically has a graphical interface to help create and manage the data and, in some cases, users can construct their own databases by using database software.

什么是数据库软件

数据库软件用于创建,编辑和维护数据库文件和记录,从而使文件和记录的创建,数据输入,数据编辑,更新和报告更加容易。 该软件还处理数据存储,备份和报告,多路访问控制和安全性。 随着数据盗窃变得越来越频繁,强大的数据库安全性在今天尤为重要。 数据库软件有时也称为“数据库管理系统”(DBMS)。

数据库软件通过使用户能够以结构化形式存储数据然后进行访问来简化数据管理。 它通常具有图形界面来帮助创建和管理数据,并且在某些情况下,用户可以使用数据库软件来构建自己的数据库。

What is a database management system (DBMS)?

A database typically requires a comprehensive database software program known as a database management system (DBMS). A DBMS serves as an interface between the database and its end users or programs, allowing users to retrieve, update, and manage how the information is organized and optimized. A DBMS also facilitates oversight and control of databases, enabling a variety of administrative operations such as performance monitoring, tuning, and backup and recovery.

Some examples of popular database software or DBMSs include MySQL, Microsoft Access, Microsoft SQL Server, FileMaker Pro, Oracle Database, and dBASE.

什么是数据库管理系统(DBMS)?

数据库通常需要称为数据库管理系统(DBMS)的综合数据库软件程序。 DBMS充当数据库与其最终用户或程序之间的接口,允许用户检索,更新和管理信息的组织和优化方式。 DBMS还可以促进对数据库的监督和控制,从而实现各种管理操作,例如性能监视,调整以及备份和恢复。

流行的数据库软件或DBMS的一些示例包括MySQL,Microsoft Access,Microsoft SQL Server,FileMaker Pro,Oracle Database和dBASE。

What is a MySQL database?

MySQL is an open source relational database management system based on SQL. It was designed and optimized for web applications and can run on any platform. As new and different requirements emerged with the internet, MySQL became the platform of choice for web developers and web-based applications. Because it’s designed to process millions of queries and thousands of transactions, MySQL is a popular choice for ecommerce businesses that need to manage multiple money transfers. On-demand flexibility is the primary feature of MySQL.

MySQL is the DBMS behind some of the top websites and web-based applications in the world, including Airbnb, Uber, LinkedIn, Facebook, Twitter, and YouTube.

Learn more about MySQL

什么是MySQL数据库?

MySQL是基于SQL的开源关系数据库管理系统。 它是为Web应用程序设计和优化的,可以在任何平台上运行。 随着Internet出现新的和不同的要求,MySQL成为Web开发人员和基于Web的应用程序的首选平台。 由于MySQL是为处理数百万个查询和数千个交易而设计的,因此它是需要管理多个汇款的电子商务企业的流行选择。 按需灵活性是MySQL的主要功能。

MySQL是世界上一些顶级网站和基于Web的应用程序之后的DBMS,包括Airbnb,Uber,LinkedIn,Facebook,Twitter和YouTube。

Database challenges

Today’s large enterprise databases often support very complex queries and are expected to deliver nearly instant responses to those queries. As a result, database administrators are constantly called upon to employ a wide variety of methods to help improve performance. Some common challenges that they face include:

  • Absorbing significant increases in data volume. The explosion of data coming in from sensors, connected machines, and dozens of other sources keeps database administrators scrambling to manage and organize their companies’ data efficiently.
  • Ensuring data security. Data breaches are happening everywhere these days, and hackers are getting more inventive. It’s more important than ever to ensure that data is secure but also easily accessible to users.
  • Keeping up with demand. In today’s fast-moving business environment, companies need real-time access to their data to support timely decision-making and to take advantage of new opportunities.
  • Managing and maintaining the database and infrastructure. Database administrators must continually watch the database for problems and perform preventative maintenance, as well as apply software upgrades and patches. As databases become more complex and data volumes grow, companies are faced with the expense of hiring additional talent to monitor and tune their databases.
  • Removing limits on scalability. A business needs to grow if it’s going to survive, and its data management must grow along with it. But it’s very difficult for database administrators to predict how much capacity the company will need, particularly with on-premises databases.

Addressing all of these challenges can be time-consuming and can prevent database administrators from performing more strategic functions.

数据库的挑战

当今的大型企业数据库通常支持非常复杂的查询,并且有望对这些查询提供几乎即时的响应。结果,数据库管理员不断被要求采用各种各样的方法来帮助提高性能。他们面临的一些常见挑战包括:

  • 吸收大量增加的数据量。来自传感器,连接的机器和许多其他来源的数据激增,使数据库管理员争先恐后地高效管理和组织公司的数据。
  • 确保数据安全。如今,数据泄露无处不在,黑客正变得越来越有创造力。确保数据安全并让用户轻松访问比以往任何时候都更为重要。
  • 跟上需求。在当今瞬息万变的商业环境中,公司需要实时访问其数据以支持及时的决策并利用新的机会。
  • 管理和维护数据库和基础架构。数据库管理员必须持续监视数据库中的问题并执行预防性维护,以及应用软件升级和补丁程序。随着数据库变得越来越复杂和数据量不断增长,公司面临聘请更多人才来监视和调整数据库的开销。
  • 消除对可伸缩性的限制。企业要生存就必须发展,其数据管理也必须随之发展。但是,数据库管理员很难预测公司将需要多少容量,特别是对于本地数据库而言。

解决所有这些挑战可能很耗时,并且可能阻止数据库管理员执行更具战略意义的功能。

How autonomous technology is improving database management

Self-driving databases are the wave of the future—and offer an intriguing possibility for organizations that want to use the best available database technology without the headaches of running and operating that technology.

Self-driving databases use cloud-based technology and machine learning to automate many of the routine tasks required to manage databases, such as tuning, security, backups, updates, and other routine management tasks. With these tedious tasks automated, database administrators are freed up to do more strategic work. The self-driving, self-securing, and self-repairing capabilities of self-driving databases are poised to revolutionize how companies manage and secure their data, enabling performance advantages, lower costs, and improved security.

自主技术如何改善数据库管理

无人驾驶数据库是未来的潮流,它为希望使用最佳可用数据库技术而又不会为运行和操作该技术带来麻烦的组织提供了一种有趣的可能性。

自动驾驶数据库使用基于云的技术和机器学习来自动化管理数据库所需的许多例行任务,例如调整,安全性,备份,更新和其他例行管理任务。 通过自动完成这些繁琐的任务,数据库管理员得以腾出时间来做更多的战略性工作。 自动驾驶数据库的自动驾驶,自动保护和自动修复功能有望彻底改变公司管理和保护其数据的方式,从而实现性能优势,降低成本并提高安全性。

Future of databases and autonomous databases

The first autonomous database was announced in late 2017, and multiple independent industry analysts quickly recognized the technology and its potential impact on computing.

The February 2018 IDC Perspective praised autonomous database technology for making “enterprise software easier to deploy, use, and administer, using artificial intelligence and machine learning to provide capabilities requiring little or no human intervention to manage software.”

And KuppingerCole’s January 2018 report (PDF) said, “This approach has immense potential benefits, not just for reducing labor and costs for customers, but for dramatically improving databases’ resiliency against both human errors and malicious activities, internal or external. Each database is also designed to have security features enabled by default and relevant parameters automatically configured according to current security best practices.”

数据库和自治数据库的未来

第一个自主数据库于2017年底宣布,多个独立行业分析师迅速认识到了该技术及其对计算的潜在影响。

2018年2月,IDC观点赞扬了自主数据库技术,因为该技术使“企业软件更易于部署,使用和管理,并利用人工智能和机器学习提供了几乎不需要人工干预即可管理软件的功能。”

KuppingerCole在2018年1月的报告(PDF)中说:“这种方法具有巨大的潜在收益,不仅可以减少客户的人工和成本,而且还可以大大提高数据库抵御内部或外部人为错误和恶意活动的弹性。 每个数据库还设计为具有默认启用的安全功能,并根据当前的安全最佳实践自动配置相关参数。”

MongoDB

What is NoSQL

What is NoSQL?

NoSQL databases (aka “not only SQL”) are non tabular, and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads.

When people use the term “NoSQL database”, they typically use it to refer to any non-relational database. Some say the term “NoSQL” stands for “non SQL” while others say it stands for “not only SQL.” Either way, most agree that NoSQL databases are databases that store data in a format other than relational tables.

A common misconception is that NoSQL databases or non-relational databases don’t store relationship data well. NoSQL databases can store relationship data—they just store it differently than relational databases do. In fact, when compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier than in SQL databases, because related data doesn’t have to be split between tables.

NoSQL data models allow related data to be nested within a single data structure.

NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model simply for the purposes of reducing data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimized for developer productivity.
在这里插入图片描述
As storage costs rapidly decreased, the amount of data applications needed to store and query increased. This data came in all shapes and sizes—structured, semistructured, and polymorphic—and defining the schema in advance became nearly impossible. NoSQL databases allow developers to store huge amounts of unstructured data, giving them a lot of flexibility.

Additionally, the Agile Manifesto was rising in popularity, and software engineers were rethinking the way they developed software. They were recognizing the need to rapidly adapt to changing requirements. They needed the ability to iterate quickly and make changes throughout their software stack—all the way down to the database model. NoSQL databases gave them this flexibility.

Cloud computing also rose in popularity, and developers began using public clouds to host their applications and data. They wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale-out instead of scale-up, and to intelligent geo-place their data. Some NoSQL databases like MongoDB provided these capabilities.

什么是NoSQL?

NoSQL数据库(又称“Not Only SQL”)不是表格格式的,存储数据的方式与关系表不同。 NoSQL数据库基于其数据模型而具有多种类型。主要类型是文档,键值,宽列和图形。它们提供了灵活的模式,并且可以轻松处理大量数据和高用户负载的扩展。

人们使用“ NoSQL数据库”一词时,通常会使用它来指代任何非关系数据库。有人说“ NoSQL”代表“非SQL”,而有人说“不仅SQL”。无论哪种方式,大多数人都认为NoSQL数据库是以关系表以外的格式存储数据的数据库。

一个常见的误解是NoSQL数据库或非关系数据库不能很好地存储关系数据。 NoSQL数据库可以存储关系数据-它们与关系数据库的存储方式不同。实际上,与SQL数据库相比,许多人发现NoSQL数据库中的关系数据建模比SQL数据库中的关系数据建模更容易,因为不必在表之间拆分相关数据。

NoSQL数据模型允许将相关数据嵌套在单个数据结构中。
在这里插入图片描述
随着存储成本的急剧下降,NoSQL数据库出现在2000年代后期。仅出于减少数据重复的目的而创建复杂,难以管理的数据模型的日子已经一去不复返了。开发人员(而非存储人员)已成为软件开发的主要成本,因此NoSQL数据库针对开发人员的生产力进行了优化。

随着存储成本迅速降低,存储和查询所需的数据应用程序数量也增加了。这些数据具有各种形状和大小(结构化,半结构化和多态性),因此预先定义架构几乎变得不可能。 NoSQL数据库允许开发人员存储大量的非结构化数据,从而为他们提供了很大的灵活性。

此外,敏捷宣言越来越流行,软件工程师也在重新考虑他们开发软件的方式。他们意识到需要快速适应不断变化的需求。他们需要能够快速迭代并在整个软件堆栈中进行更改(一直到数据库模型)的能力。 NoSQL数据库为他们提供了这种灵活性。

云计算也越来越流行,开发人员开始使用公共云托管其应用程序和数据。他们希望能够在多个服务器和区域之间分布数据,以使其应用程序具有弹性,横向扩展而不是纵向扩展以及对数据进行智能地理定位的能力。一些NoSQL数据库(例如MongoDB)提供了这些功能。

What is SQL?

Now that we have an understanding of NoSQL databases, let’s contrast them with what have traditionally been the most popular databases: relational databases accessed by SQL (Structured Query Language). You can use SQL when interacting with relational databases where data is stored in tables that have fixed columns and rows.

SQL databases rose in popularity in the early 1970s. At the time, storage was extremely expensive, so software engineers normalized their databases in order to reduce data duplication.

Software engineers in the 1970s also commonly followed the waterfall software development model. Projects were planned in detail before development began. Software engineers painstakingly created complex entity-relationship (E-R) diagrams to ensure they had carefully thought through all the data they would need to store. Due to this upfront planning model, software engineers struggled to adapt if requirements changed during the development cycle. As a result, projects frequently went over budget, exceeded deadlines and failed to deliver against user needs.

什么是SQL?

现在我们已经了解了NoSQL数据库,让我们将它们与传统上最流行的数据库进行对比:通过SQL(结构化查询语言)访问的关系数据库。与关系数据库交互时可以使用SQL,在关系数据库中数据存储在具有固定列和行的表中。

SQL数据库在1970年代初开始流行。当时,存储非常昂贵,因此软件工程师对他们的数据库进行规范化以减少数据重复。

1970年代的软件工程师通常也遵循瀑布式软件开发模型。在开始开发之前,详细计划了项目。软件工程师精心创建复杂的实体关系(E-R)图,以确保他们仔细考虑了需要存储的所有数据。由于采用了这种前期计划模型,如果在开发周期中需求发生变化,软件工程师就难以适应。结果,项目经常超出预算,超过了截止日期,无法满足用户需求。

What are the Types of NoSQL Databases?

Over time, four major types of NoSQL databases emerged: document databases, key-value databases, wide-column stores, and graph databases. Let’s examine each type.

  • Document databases store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects, and their structures typically align with objects developers are working with in code. Because of their variety of field value types and powerful query languages, document databases are great for a wide variety of use cases and can be used as a general purpose database. They can horizontally scale-out to accomodate large data volumes. MongoDB is consistently ranked as the world’s most popular NoSQL database according to DB-engines and is an example of a document database. For more on document databases, visit What is a Document Database?.

  • Key-value databases are a simpler type of database where each item contains keys and values. A value can typically only be retrieved by referencing its key, so learning how to query for a specific key-value pair is typically simple. Key-value databases are great for use cases where you need to store large amounts of data but you don’t need to perform complex queries to retrieve it. Common use cases include storing user preferences or caching. Redis and DynanoDB are popular key-value databases.

  • Wide-column stores store data in tables, rows, and dynamic columns. Wide-column stores provide a lot of flexibility over relational databases because each row is not required to have the same columns. Many consider wide-column stores to be two-dimensional key-value databases. Wide-column stores are great for when you need to store large amounts of data and you can predict what your query patterns will be. Wide-column stores are commonly used for storing Internet of Things data and user profile data. Cassandra and HBase are two of the most popular wide-column stores.

  • Graph databases store data in nodes and edges. Nodes typically store information about people, places, and things while edges store information about the relationships between the nodes. Graph databases excel in use cases where you need to traverse relationships to look for patterns such as social networks, fraud detection, and recommendation engines. Neo4j and JanusGraph are examples of graph databases.

NoSQL数据库的类型是什么?

随着时间的流逝,出现了四种主要的NoSQL数据库类型:文档数据库,键值数据库,宽列存储和图形数据库。让我们检查每种类型。

  • 文档数据库将数据存储在类似于JSON(JavaScript对象符号)对象的文档中。每个文档包含成对的字段和值。这些值通常可以是各种类型,包括字符串,数字,布尔值,数组或对象之类的东西,并且它们的结构通常与开发人员在代码中使用的对象保持一致。由于它们的字段值类型和强大的查询语言多种多样,因此文档数据库非常适合各种各样的用例,并且可以用作通用数据库。他们可以水平扩展以适应大数据量。根据数据库引擎,MongoDB一直被评为全球最受欢迎的NoSQL数据库,并且是文档数据库的一个示例。有关文档数据库的更多信息,请访问什么是文档数据库?。

  • 键值数据库是一种较简单的数据库,其中每个项目都包含键和值。通常只能通过引用其键来检索值,因此学习如何查询特定键值对通常很简单。键值数据库非常适合需要存储大量数据但无需执行复杂查询来检索数据的用例。常见的用例包括存储用户首选项或缓存。 Redis和DynanoDB是流行的键值数据库。

  • 宽列存储将数据存储在表,行和动态列中。宽列存储提供了比关系数据库更大的灵活性,因为不需要每一行都具有相同的列。许多人认为宽列存储是二维键值数据库。宽列存储非常适合需要存储大量数据并且可以预测查询模式的情况。宽列存储通常用于存储物联网数据和用户配置文件数据。 Cassandra和HBase是最受欢迎的两个宽列存储。

  • 图形数据库将数据存储在节点和边中。节点通常存储有关人物,地点和事物的信息,而边缘则存储有关节点之间的关系的信息。在需要遍历关系以查找模式(例如社交网络,欺诈检测和推荐引擎)的用例中,图形数据库非常出色。 Neo4j和JanusGraph是图形数据库的示例。

How NoSQL Databases Work

One way of understanding the appeal of NoSQL databases from a design perspective is to look at how the data models of a SQL and a NoSQL database might look in an oversimplified example using address data.

The SQL Case. For an SQL database, setting up a database for addresses begins with the logical construction of the format and the expectation that the records to be stored are going to remain relatively unchanged. After analyzing the expected query patterns, an SQL database might optimize storage in two tables, one for basic information and one pertaining to being a customer, with last name being the key to both tables. Each row in each table is a single customer, and each column has the following fixed attributes:

Last name :: first name :: middle initial :: address fields :: email address :: phone number
Last name :: date of birth :: account number :: customer years :: communication preferences
The NoSQL Case. In the section Types of NoSQL Databases above, there were four types described, and each has its own data model.

Each type of NoSQL database would be designed with a specific customer situation in mind, and there would be technical reasons for how each kind of database would be organized. The simplest type to describe is the document database, in which it would be natural to combine both the basic information and the customer information in one JSON document. In this case, each of the SQL column attributes would be fields and the details of a customer’s record would be the data values associated with each field.

For example: Last_name: “Jones”, First_name: “Mary”, Middle_initial: “S”, etc

NoSQL数据库如何工作

从设计角度理解NoSQL数据库的吸引力的一种方法是,在使用地址数据的过于简化的示例中,查看SQL和NoSQL数据库的数据模型的外观。

SQL案例。对于SQL数据库,为地址建立数据库始于格式的逻辑构造,并且期望要存储的记录将保持相对不变。在分析了预期的查询模式之后,SQL数据库可能会优化两个表中的存储,一个用于基本信息,另一个用于成为客户,姓氏是两个表的键。每个表中的每一行都是一个客户,每列具有以下固定属性:

姓氏::名字::中间名首字母::地址字段::电子邮件地址::电话号码
姓氏::生日::帐号::客户年份::沟通偏好
NoSQL案例。在上述NoSQL数据库的类型部分中,描述了四种类型,每种类型都有自己的数据模型。

每种NoSQL数据库在设计时都会考虑到特定的客户情况,并且会因技术原因而组织每种类型的数据库。描述最简单的类型是文档数据库,在其中自然而然地将基本信息和客户信息结合在一个JSON文档中。在这种情况下,每个SQL列属性都是字段,而客户记录的详细信息将是与每个字段关联的数据值。

例如:Last_name:“ Jones”,First_name:“ Mary”,Middle_initial:“ S”等

就是说NoSQL通过嵌入数据来维护关联关系, 举例: 用户表(id,name,birthAt,address,…)关联行政区划表(id,province,city,provinceCode,cityCode,…), 在NoSQL中{id,name,birthAt,address:{province,city,…}}, NoSQL会重复很多数据, 占用更多空间

NoSQL vs SQL Databases

NoSQL vs SQL Databases

TLDR: NoSQL (“non SQL” or “not only SQL”) databases were developed in the late 2000s with a focus on scaling, fast queries, allowing for frequent application changes, and making programming simpler for developers. Relational databases accessed with SQL (Structured Query Language) were developed in the 1970s with a focus on reducing data duplication as storage was much more costly than developer time. SQL databases tend to have rigid, complex, tabular schemas and typically require expensive vertical scaling.

If you’re not familiar with what NoSQL databases are or the different types of NoSQL databases, start here.

TLDR:NoSQL(“非SQL”或“不仅SQL”)数据库是在2000年代后期开发的,专注于扩展,快速查询,允许频繁的应用程序更改以及使开发人员的编程更简单。 使用SQL(结构化查询语言)访问的关系数据库是在1970年代开发的,其重点是减少数据重复,因为存储比开发人员花费更多的时间。 SQL数据库倾向于具有刚性,复杂的表格格式,并且通常需要昂贵的垂直扩展。

如果您不熟悉什么是NoSQL数据库或不同类型的NoSQL数据库,请从这里开始。

Differences between SQL and NoSQL

The table below summarizes the main differences between SQL and NoSQL databases.

SQL DatabasesNoSQL Databases
Data Storage ModelTables with fixed rows and columnsDocument: JSON documents, Key-value: key-value pairs, Wide-column: tables with rows and dynamic columns, Graph: nodes and edges
Development HistoryDeveloped in the 1970s with a focus on reducing data duplicationDeveloped in the late 2000s with a focus on scaling and allowing for rapid application change driven by agile and DevOps practices.
ExamplesOracle, MySQL, Microsoft SQL Server, and PostgreSQLDocument: MongoDB and CouchDB, Key-value: Redis and DynamoDB, Wide-column: Cassandra and HBase, Graph: Neo4j and Amazon Neptune
Primary PurposeGeneral purposeDocument: general purpose, Key-value: large amounts of data with simple lookup queries, Wide-column: large amounts of data with predictable query patterns, Graph: analyzing and traversing relationships between connected data
SchemasRigidFlexible
ScalingVertical (scale-up with a larger server)Horizontal (scale-out across commodity servers)
Multi-Record ACID TransactionsSupportedMost do not support multi-record ACID transactions. However, some—like MongoDB—do.
JoinsTypically requiredTypically not required
Data to Object MappingRequires ORM (object-relational mapping)Many do not require ORMs. MongoDB documents map directly to data structures in most popular programming languages.

下表总结了SQL和NoSQL数据库之间的主要区别。

SQL 数据库NoSQL 数据库
数据储存模式具有固定行和列的表文档:JSON文档,键值:键值对,宽列:带行和动态列的表,图:节点和边
发展历程于1970年代开发,专注于减少数据重复在2000年代后期开发,专注于扩展和允许敏捷和DevOps实践驱动的快速应用程序更改。
例子Oracle, Oracle,MySQL,Microsoft SQL Server和PostgreSQL文档:MongoDB和CouchDB,键值:Redis和DynamoDB,宽列:Cassandra和HBase,图:Neo4j和Amazon Neptune
主要目的一般用途文档:通用,关键值:具有简单查找查询的大量数据,宽列:具有可预测的查询模式的大量数据,图:分析和遍历所连接数据之间的关系
模式刚硬灵活
缩放比例垂直(使用更大的服务器)水平(在各种商品服务器上横向扩展)
多记录ACID事务支持大多数不支持多记录ACID事务。但是,有些(例如MongoDB)可以
连接(join)通常需要通常不需要
数据到对象的映射需要ORM(对象关系映射)许多不需要ORM。 MongoDB文档直接以大多数流行的编程语言映射到数据结构

What are the Benefits of NoSQL Databases?

NoSQL databases offer many benefits over relational databases. NoSQL databases have flexible data models, scale horizontally, have incredibly fast queries, and are easy for developers to work with.

  • Flexible data models

NoSQL databases typically have very flexible schemas. A flexible schema allows you to easily make changes to your database as requirements change. You can iterate quickly and continuously integrate new application features to provide value to your users faster.

  • Horizontal scaling

Most SQL databases require you to scale-up vertically (migrate to a larger, more expensive server) when you exceed the capacity requirements of your current server. Conversely, most NoSQL databases allow you to scale-out horizontally, meaning you can add cheaper, commodity servers whenever you need to.

  • Fast queries

Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalized, so queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can become expensive. However, data in NoSQL databases is typically stored in a way that is optimized for queries. The rule of thumb when you use MongoDB is Data is that is accessed together should be stored together. Queries typically do not require joins, so the queries are very fast.

  • Easy for developers

Some NoSQL databases like MongoDB map their data structures to those of popular programming languages. This mapping allows developers to store their data in the same way that they use it in their application code. While it may seem like a trivial advantage, this mapping can allow developers to write less code, leading to faster development time and fewer bugs.

NoSQL数据库的好处是什么?

与关系数据库相比,NoSQL数据库具有许多优势。 NoSQL数据库具有灵活的数据模型,可水平扩展,具有令人难以置信的快速查询,并且易于开发人员使用。

  • 灵活的数据模型

NoSQL数据库通常具有非常灵活的架构。灵活的架构使您可以根据需求的变化轻松地对数据库进行更改。您可以快速迭代并持续集成新的应用程序功能,从而更快地为用户提供价值。

  • 水平缩放

当超出当前服务器的容量要求时,大多数SQL数据库都要求您垂直扩展(迁移到更大,更昂贵的服务器)。相反,大多数NoSQL数据库允许您水平扩展,这意味着您可以在需要时添加更便宜的商品服务器。

  • 快速查询

NoSQL数据库中的查询可以比SQL数据库更快。为什么? SQL数据库中的数据通常是规范化的,因此对单个对象或实体的查询要求您从多个表中联接数据。随着表大小的增加,联接可能变得昂贵。但是,NoSQL数据库中的数据通常以针对查询优化的方式存储。使用MongoDB的经验法则是将数据一起存储应该一起存储。查询通常不需要连接,因此查询非常快。

  • 方便开发人员

一些NoSQL数据库(例如MongoDB)将其数据结构映射到流行的编程语言。此映射允许开发人员以与在应用程序代码中使用数据的方式相同的方式存储数据。尽管这看起来似乎是微不足道的优势,但此映射可以使开发人员编写更少的代码,从而缩短开发时间并减少错误。

What are the Drawbacks of NoSQL Databases?

One of the most frequently cited drawbacks of NoSQL databases is that they don’t support ACID (atomicity, consistency, isolation, durability) transactions across multiple documents. With appropriate schema design, single record atomicity is acceptable for lots of applications. However, there are still many applications that require ACID across multiple records.

To address these use cases MongoDB added support for multi-document ACID transactions in the 4.0 release, and extended them in 4.2 to span sharded clusters.

Since data models in NoSQL databases are typically optimized for queries and not for reducing data duplication, NoSQL databases can be larger than SQL databases. Storage is currently so cheap that most consider this a minor drawback, and some NoSQL databases also support compression to reduce the storage footprint.

Depending on the NoSQL database type you select, you may not be able to achieve all of your use cases in a single database. For example, graph databases are excellent for analyzing relationships in your data but may not provide what you need for everyday retrieval of the data such as range queries. When selecting a NoSQL database, consider what your use cases will be and if a general purpose database like MongoDB would be a better option.

NoSQL数据库的缺点是什么?

NoSQL数据库最常被提及的缺点之一是它们不支持跨多个文档的ACID(原子性,一致性,隔离性,持久性)事务。通过适当的模式设计,单记录原子性可以被许多应用程序接受。但是,仍然有许多应用程序需要跨多个记录的ACID。

为了解决这些用例,MongoDB在4.0版本中增加了对多文档ACID事务的支持,并在4.2中扩展了它们以覆盖分片群集。

由于NoSQL数据库中的数据模型通常是针对查询而不是为了减少数据重复而优化的,因此NoSQL数据库可以大于SQL数据库。当前,存储是如此便宜,以至于大多数人都认为这是一个小缺点,并且某些NoSQL数据库还支持压缩以减少存储空间。

根据您选择的NoSQL数据库类型,您可能无法在单个数据库中实现所有用例。例如,图形数据库非常适合分析数据中的关系,但可能无法提供日常检索数据(例如范围查询)所需的内容。选择NoSQL数据库时,请考虑您的用例,以及是否最好使用MongoDB这样的通用数据库。

Document Database

Document Database

Built around JSON-like documents, document databases are both natural and flexible for developers to work with. They promise higher developer productivity, and faster evolution with application needs. As a class of non-relational, sometimes called NoSQL database, the document data model has become the most popular alternative to tabular, relational databases.
在这里插入图片描述
基于类似JSON的文档构建,文档数据库对于开发人员而言既自然又灵活。 它们保证更高的开发人员生产力,并能根据应用程序需求加快发展。 作为一类非关系数据库(有时称为NoSQL数据库),文档数据模型已成为表格,关系数据库的最流行替代方案。

What makes document databases different from relational databases?

  1. Intuitive Data Model: Faster and Easier for Developers

Documents map to the objects in your code, so they are much more natural to work with. There is no need to decompose data across tables, run expensive JOINs, or integrate a separate ORM layer. Data that is accessed together is stored together, so you have less code to write and your users get higher performance.

  1. Flexible Schema: Dynamically Adapt to Change

A document’s schema is dynamic and self-describing, so you don’t need to first pre-define it in the database. Fields can vary from document to document and you modify the structure at any time, avoiding disruptive schema migrations. Some document databases offer JSON Schema so you can optionally enforce rules governing document structures.

  1. Universal: JSON Documents are Everywhere

Lightweight, language-independent, and human readable, JSON has become an established standard for data interchange and storage. Documents are a superset of all other data models so you can structure data any way your application needs – rich objects, key-value pairs, tables, geospatial and time-series data, and the nodes and edges of a graph. You can work with documents using a single query language, giving you a consistent development experience however you’ve chosen to model your data.

  1. Powerful: Query Data Anyway You Need

An important difference between document databases is the expressivity of the query language and richness of indexing. The MongoDB Query Language is comprehensive and expressive. Ad hoc queries, indexing, and real time aggregations provide powerful ways to access, transform, and analyze your data. With ACID transactions you maintain the same guarantees you’re used to in SQL databases, whether manipulating data in a single document, or across multiple documents living in multiple shards.

  1. Distributed: Resilient and Globally Scalable

Unlike monolithic, scale-up relational databases, document databases are distributed systems at their core. Documents are independent units which makes it easier to distribute them across multiple servers while preserving data locality. Replication with self-healing recovery keeps your applications highly available while giving you the ability to isolate different workloads from one another in a single cluster. Native sharding provides elastic and application-transparent horizontal scale-out to accommodate your workload’s growth, along with geographic data distribution for data sovereignty.

是什么使文档数据库与关系数据库不同?

  1. 直观的数据模型:开发人员更快,更轻松

文档映射到代码中的对象,因此使用它们会更加自然。无需跨表分解数据,运行昂贵的JOIN或集成单独的ORM层。一起访问的数据存储在一起,因此您编写的代码更少,用户获得了更高的性能。

  1. 灵活的架构:动态适应变化

文档的架构是动态的并且可以自我描述,因此您无需先在数据库中预先定义它。字段随文档的不同而不同,您可以随时修改结构,以避免破坏性的架构迁移。某些文档数据库提供JSON Schema,因此您可以选择强制实施用于管理文档结构的规则。

  1. 通用:JSON文档无处不在

JSON轻巧,独立于语言且易于阅读,已成为数据交换和存储的既定标准。文档是所有其他数据模型的超集,因此您可以按照应用程序需要的任何方式来构建数据-丰富的对象,键值对,表,地理空间和时间序列数据以及图形的节点和边。您可以使用一种查询语言来处理文档,从而为您提供一致的开发体验,但是您选择了对数据进行建模。

  1. 功能强大:随时查询数据

文档数据库之间的一个重要区别是查询语言的表达能力和索引的丰富性。 MongoDB查询语言是全面而富有表现力的。临时查询,索引编制和实时聚合提供了访问,转换和分析数据的强大方法。使用ACID事务,无论在单个文档中还是在多个分片中的多个文档中,您都可以保持与SQL数据库相同的保证。

  1. 分布式:弹性且可全球扩展

与整体式,按比例放大的关系数据库不同,文档数据库是分布式系统的核心。文档是独立的单元,因此可以更轻松地将它们分布在多个服务器上,同时保留数据的本地性。具有自我修复恢复功能的备份使您的应用程序保持高可用性,同时使您能够在单个群集中将不同的工作负载相互隔离。本机分片可提供弹性和透明的应用程序水平扩展,以适应您的工作量的增长,并提供地理数据分布以实现数据主权。

How much easier are documents to work with than tables?

在这里插入图片描述

Why not just use JSON in a relational database?

With document databases empowering developers to build faster, most relational databases have added support for JSON. However, simply adding a JSON data type does not bring the benefits of a native document database. Why? Because the relational approach detracts from developer productivity, rather than improve it. These are some of the things developers have to deal with.

为什么不只在关系数据库中使用JSON?

随着文档数据库使开发人员能够更快地构建,大多数关系数据库都增加了对JSON的支持。 但是,仅添加JSON数据类型并不能带来本机文档数据库的好处。 为什么? 因为关系方法损害了开发人员的生产力,而不是提高生产力。 这些是开发人员必须处理的事情。

Proprietary Extensions
Working with documents means using custom, vendor-specific SQL functions which will not be familiar to most developers, and which don’t work with your favorite SQL tools. Add low-level JDBC/ODBC drivers and ORMs and you face complex development processes resulting in low productivity.

Primitive Data Handling
Presenting JSON data as simple strings and numbers rather than the rich data types supported by native document databases such as MongoDB makes computing, comparing, and sorting data complex and error prone.

Poor Data Quality & Rigid Tables
Relational databases offer little to validate the schema of documents, so you have no way to apply quality controls against your JSON data. And you still need to define a schema for your regular tabular data, with all the overhead that comes when you need to alter your tables as your application’s features evolve.

Low Performance
Most relational databases do not maintain statistics on JSON data, preventing the query planner from optimizing queries against documents, and you from tuning your queries.

No scale-out
Traditional relational databases offer no way for you to partition (“shard”) the database across multiple instances to scale as workloads grow. Instead you have to implement sharding yourself in the application layer, or rely on expensive scale-up systems.

专有扩展
使用文档意味着要使用大多数供应商都不熟悉的定制的,特定于供应商的SQL函数,并且不能与您喜欢的SQL工具一起使用。添加低级JDBC / ODBC驱动程序和ORM,您将面临复杂的开发过程,从而导致生产率降低。

原始数据处理
将JSON数据显示为简单的字符串和数字,而不是像MongoDB这样的本机文档数据库所支持的丰富数据类型,这使得计算,比较和排序数据变得复杂且容易出错。

数据质量差和刚性表
关系数据库几乎无法验证文档的架构,因此您无法对JSON数据应用质量控制。而且,您仍然需要为常规表格数据定义一个架构,并需要随着应用程序功能的发展而更改表时产生的所有开销。

低效能
大多数关系数据库不维护有关JSON数据的统计信息,从而阻止查询计划程序优化针对文档的查询以及您优化查询。

无横向扩展
传统的关系数据库无法为您提供跨多个实例进行分区(“分片”)以随工作量增长而扩展的方法。相反,您必须在应用程序层中实现分片,或者依靠昂贵的向上扩展系统。

Key-Value Databases

Key-Value Databases

How does a key-value database work?

A key-value database, aka key-value store, associates a value (which can be anything from a number or simple string, to a complex object) with a key, which is used to keep track of the object. In its simplest form, a key-value store is like a dictionary/array/map object as it exists in most programming paradigms, but which is stored in a persistent way and managed by a Database Management System (DBMS).

Key-value databases use compact, efficient index structures to be able to quickly and reliably locate a value by its key, making them ideal for systems that need to be able to find and retrieve data in constant time. Redis, for instance, is a key-value database that is optimized for tracking relatively simple data structures (primitive types, lists, heaps, and maps) in a persistent database. By only supporting a limited number of value types, Redis is able to expose an extremely simple interface to querying and manipulating them, and when configured optimally is capable of extremely high throughput.

键值数据库如何工作?

键值数据库(又名键值存储)将值(可以是数字或简单字符串,也可以是复杂对象)与一个键相关联,该键用于跟踪对象。以最简单的形式,键值存储就像大多数编程范例中存在的字典/数组/映射对象一样,但是以持久方式存储并由数据库管理系统(DBMS)管理。

键值数据库使用紧凑,高效的索引结构来通过键快速,可靠地定位值,使其成为需要能够在恒定时间内查找和检索数据的系统的理想选择。例如,Redis是一个键值数据库,它针对跟踪持久数据库中相对简单的数据结构(原始类型,列表,堆和映射)进行了优化。通过仅支持有限数量的值类型,Redis可以为查询和操作它们提供极其简单的界面,并且在进行最佳配置时,其吞吐量非常高。

What are the features of a key-value database?

A key-value database is defined by the fact that it allows programs or users of programs to retrieve data by keys, which are essentially names, or identifiers, that point to some stored value. Because key-value databases are defined so simply, but can be extended and optimized in numerous ways, there is no global list of features, but there are a few common ones:

Retrieving a value (if there is one) stored and associated with a given key
Deleting the value (if there is one) stored and associated with a given key
Setting, updating, and replacing the value (if there is one) associated with a given key
Key-value databases can also have numerous other features, but at the very least support a system of operating on data in the above ways.

键值数据库的功能是什么?

键值数据库的定义是,它允许程序或程序的用户通过键检索数据,这些键本质上是指向某个存储值的名称或标识符。 由于键值数据库的定义非常简单,但是可以通过多种方式进行扩展和优化,因此没有全局功能列表,但有一些常见的功能:

  • 检索存储的值并与给定键相关联的值(如果有)
  • 删除存储的值并与给定键关联的值(如果有)
  • 设置,更新和替换与给定键关联的值(如果有)

键值数据库还可以具有许多其他功能,但至少要支持以上述方式对数据进行操作的系统。

Is MongoDB a key-value database?

MongoDB is a document database, which means that it stores data in the form of “objects” which have properties which can be changed, added to, deleted, and queried against.

While in an academic sense MongoDB stores values (documents) for keys (identifiers), it would be a bit of a simplification to call MongoDB simply a key-value database (though it can certainly do the job). MongoDB document values are rich objects which can contain entire hierarchies and sub-values, and sophisticated indexing allow documents to be retrieved by any number of different keys.

Further, MongoDB’s document values allow nested key-value structures, allowing not only for accessing data by key in a global sense, but accessing and manipulating data associated with keys within documents, and even creating indexes that allow fast retrieval by these secondary kinds of keys.

MongoDB是键值数据库吗?

MongoDB是一个文档数据库,这意味着它以“对象”形式存储数据,这些对象具有可以更改,添加,删除和查询的属性。

虽然从学术角度讲,MongoDB存储键(标识符)的值(文档),但是将MongoDB简称为键值数据库(虽然它确实可以完成这项工作)会有些简化。 MongoDB文档值是丰富的对象,可以包含整个层次结构和子值,并且复杂的索引允许使用任意数量的不同键来检索文档。

此外,MongoDB的文档值允许嵌套键值结构,不仅允许在全局意义上按键访问数据,而且还允许访问和处理与文档内键相关的数据,甚至创建索引以允许通过这些辅助键快速检索 。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值