将不同数据存储到数据库中_如何将数据存储在兔子洞中

将不同数据存储到数据库中

Starting with databases and venturing into how the physical components of a computer store data and the differences between how those components work. Knowing how a particular database stores data is important for understanding the performance of that database and weighing tradeoffs between databases.

从数据库开始,深入研究计算机的物理组件如何存储数据以及这些组件的工作方式之间的差异。 知道特定数据库如何存储数据对于理解该数据库的性能并权衡数据库之间的权衡至关重要。

Have you ever started learning about a topic and then your suddenly 3 hours and 500 tabs in? That’s what this is, but make it a blog post. I’m going to start with a topic I am interested in — databases. This topic is, uh, pretty huge. Therefore, the focus will be on a subset of databases, how data is stored. Here is a brief outline of the path down the rabbit hole! Welcome!

您是否曾经开始学习某个主题,然后突然进入3个小时500个标签页? 就是这样,但是将其发布为博客。 我将从一个我感兴趣的主题开始-数据库。 这个主题非常大。 因此,重点将放在数据库的子集上,即如何存储数据。 这是沿着兔子洞的路径的简要概述! 欢迎!

Image for post
The path down the rabbit hole
兔子洞下的小路

*Note on style: This is typically how I learn — I start with a broad topic and dig deep into various concepts. This may not be the best format for everyone, but that’s okay!

*有关样式的说明:这通常是我的学习方式-我从一个广泛的主题开始,并深入研究各种概念。 这可能不是每个人的最佳格式,但是没关系!

数据如何在数据库中存储和使用? (How is data stored and used in databases?)

The control over storing the data is done by a DBMS (database management system). The DMBS is software that does all the coordination for a database. It controls the way the data is stored, queried, secured, created, etc. Some DBMS examples are MySQL and PostgreSQL.

DBMS (数据库管理系统)完成对存储数据的控制。 DMBS是为数据库进行所有协调的软件。 它控制数据的存储,查询,保护,创建等方式。一些DBMS示例是MySQLPostgreSQL

有哪些类型的数据库管理系统? (What kinds of database management systems are out there?)

There are different kinds of DMBS. For example, you can have a row-oriented or column-oriented DBMS. We will not focus on the details of these now, that is a whole other rabbit hole. Basically, data is grouped together either by the row or the column, and this affects how we think about our access patterns to this data.

有不同种类的DMBS。 例如,您可以具有面向行或面向列的DBMS。 现在,我们将不再关注这些细节,那就是另一个兔子洞。 基本上,数据按行或列分组在一起,这影响了我们对数据访问模式的看法。

Data is stored in all shapes in sizes, and each is beautiful in its own way!

数据以各种形状存储,大小各异,各有千秋!

Image for post
drawn by me
我画的

Our rabbit hole starts to dig deeper here when I was reading the wiki on column-oriented databases.

当我在阅读有关面向列的数据库的Wiki时,我们的兔子洞开始在这里深入挖掘。

The most expensive operations involving hard disks are seeks.

寻找最昂贵的涉及硬盘的操作。

I wanted to know, why is using the hard disk so expensive? I know we rely on caching, in-memory storage as well. How do all of these really work?

我想知道,为什么使用硬盘如此昂贵? 我知道我们也依靠缓存和内存存储。 所有这些如何真正起作用?

数据库使用什么类型的内存? (What types of memory do databases use?)

Memory and storage are physical components of a computer that store data. They are not a piece of software like the database management system. First, let’s talk about the difference in memory versus storage. Then, let’s dig deeper into each one.

内存和存储是存储数据的计算机的物理组件。 它们不是像数据库管理系统那样的软件。 首先,让我们谈谈内存与存储的区别。 然后,让我们深入研究每个。

什么是存储? (What is storage?)

Storage is data that theoretically can be saved forever unless it is manually removed — think HDDs (hard disk drives) and SSDs (solid-state drives). Data is stored as files on the disk and isn’t meant for being retrieved as quickly as memory. I find this easy to remember because it is the same as when we put things we don’t need now in a storage unit.

从理论上讲,存储是可以永久保存的数据,除非将其手动删除-如HDD(硬盘驱动器)和SSD(固态驱动器)。 数据以文件形式存储在磁盘上,而不意味着像内存一样快地被检索。 我发现这很容易记住,因为它与我们将不需要的东西放入存储单元时相同。

*Really cool note: When I was an intern at Oak Ridge Leadership Computing Facility, I learned that they have a storage option where they store data via robotic-tape and disk storage components. It is their HPSS (high-performance storage system). The disks probably store tons of scientific data from years ago. Disks are not only known for holding permanent data, but also for being able to store a lot of it.

*非常酷的提示:当我在Oak Ridge领导力计算中心实习时,我了解到他们有一个存储选项,可以通过自动磁带和磁盘存储组件存储数据。 这是他们的HPSS(高性能存储系统) 。 这些磁盘可能存储了数年前的大量科学数据。 磁盘不仅以保存永久数据而著称,而且还能够存储大量数据

什么是记忆? (What is memory?)

Memory, in my opinion, is the most fun to discuss! Memory, a.k.a RAM (random access memory), main memory, or in-memory. The takeaways for memory is that it is faster than storage because it is closer to the CPU and doesn’t need to be read serially. Since it is faster and easier to obtain by the CPU, it is likely to be used frequently. The data in memory is the data you need to have handy.

我认为记忆是最有趣的讨论! 内存,又名RAM (随机访问内存), 主内存内存。 内存的好处是它比存储更快,因为它离CPU更近,并且不需要串行读取。 由于CPU可以更快,更轻松地获得它,因此可能会经常使用它。 内存中的数据是您需要方便使用的数据。

Although improvements are being made, RAM is known for being “volatile”, which means that the data isn’t persistent like it is in storage, and you will lose it at some point. You may only need to keep the data in memory for a certain amount of time or your computer may turn off. If a power source is removed, the data in RAM disappears. This isn’t the data you want to keep around forever. It isn’t holding all of your files like storage is.

尽管已经在进行改进,但RAM以“易失性”着称 ,这意味着数据并不像存储中那样持久,并且有时会丢失数据。 您可能只需要将数据保留在内存中一定时间,否则计算机可能会关闭。 如果断开电源,RAM中的数据将消失。 这不是您想要永久保存的数据。 它不像存储那样保存所有文件。

Now that we have a general idea of the differences in memory and storage. Let’s explore how they work!

现在,我们对内存和存储的差异有了一个大致的了解。 让我们探索它们如何工作!

存储 (STORAGE)

What is the difference between an HDD and an SSD?Solid-state drives are made of different materials than hard disk drives. SSDs are made of semiconductor chips. Conversely, HDDs store data by rotating the disk and using magnetic material. HDDs were the first to be widely used in computing, but SSD is gaining in popularity. HDDs are less expensive, but SSDs seem to outperform HDDs in some big areas like latency, data-transfer times, and increased reliability. Again, this is a whole rabbit hole in itself, so we will move on for now.

HDD和SSD之间有什么区别? 固态驱动器由与硬盘驱动器不同的材料制成。 SSD由半导体芯片制成。 相反,HDD通过旋转磁盘并使用磁性材料来存储数据。 HDD是最早在计算中广泛使用的硬盘,但SSD越来越流行。 HDD的价格便宜一些,但SSD在诸如延迟,数据传输时间和增加的可靠性等一些大方面似乎胜过HDD。 同样,这本身就是一个完整的兔子洞,因此我们现在继续。

How do HDDs and SSDs work?HDDs use magnetism to store data. The “platter”, magnetized surface of the disk, contains tiny components that are either magnetized or not (1/0). A small magnet called the “read-write head” moves over the platter to store data. Here is a cool video explaining how a hard drive works and shows a real hard drive in action! The SSDs, conversely, use semiconductor chips to store data and don’t require a read-write head. Both SSDs and HDDs aren’t located on the motherboard and data must travel via a bus. Memory is also connected to the CPU via a bus.

HDD和SSD如何工作? HDD利用磁性来存储数据。 磁盘的“盘片”已磁化,其表面上的微小成分已被磁化或未被磁化(1/0)。 一个称为“读写头”的小磁铁在磁盘上方移动以存储数据。 这是一个很酷的视频,解释了硬盘驱动器的工作原理并显示了实际的硬盘驱动器! 相反,SSD使用半导体芯片来存储数据,不需要读写头。 SSD和HDD均未位于主板上,并且数据必须通过总线传输。 内存也通过总线连接到CPU。

Why are disk seeks the most expensive operations in a column-store?This goes back to where we started this journey — that seeks on hard disks are the most expensive operations. There are two components to hard disk drive performance: access and data transfer times.

为什么磁盘在列存储中寻求最昂贵的操作? 这可以追溯到我们开始此旅程的地方-在硬盘上寻找是最昂贵的操作。 硬盘驱动器性能有两个组成部分: 访问时间数据传输时间

Access time describes how long it takes to transfer data. Rotating disk drives are limited in access time physically. They are spinning with the “head” reading the data. This looks a lot like a record player. Access time relies on the time it takes to

访问时间描述了传输数据所花费的时间。 旋转磁盘驱动器的访问时间实际上受到限制。 他们正在旋转“头”读取数据。 这看起来很像电唱机。 访问时间取决于花费的时间

  • seek: travel to where the data needs to be read/written

    搜寻 :前往需要读取/写入数据的地方

  • rotate: assembling the disk under the “head” appropriately

    旋转 :将磁盘适当地组装在“磁头”下方

  • process commands: organize the communication for reading/writing data

    过程命令 :组织用于读取/写入数据的通信

  • settle: the “head” stabilizing on the track so it doesn’t read/write incorrectly

    稳定 :“头部”稳定在轨道上,因此不会错误地读写

The data transfer rate represents the rate that data is transferred either to or from the disk (read or written). This is the time it takes for data to physically go from point A to point B.

数据传输速率表示从磁盘(读取或写入)或从磁盘传输数据的速率。 这是数据从点A物理到达点B所花费的时间。

Additionally, disk storage is serial — you start at the beginning of the disk and read the data in sequence (unlike memory).

此外,磁盘存储是串行的-您从磁盘的开头开始并按顺序读取数据(与内存不同)。

It makes sense why, compared to RAM, disk drives consume more energy! The disk has to pay the cost to start spinning or keep spinning at all times.

有道理,为什么与RAM相比,磁盘驱动器消耗更多的能量! 磁盘必须支付开始旋转或一直保持旋转的成本。

记忆 (MEMORY)

What is the significance of “random access” in RAM?The data in RAM is accessed randomly. The implication of this is that there is no difference in the time it takes to read data in address A versus B.

RAM中的“随机访问”有何意义? RAM中的数据是随机访问的。 这意味着在地址A和B中读取数据所花费的时间没有差异。

What types of RAM exist and how are they used by databases? IMDB (in-memory databases) primarily use the RAM instead of a disk drive (storage). Databases can scale vertically with RAM, allowing for more computation to be done, but not necessarily faster computations because you are just adding more resources, not faster ones. Having more RAM in your laptop will make more applications able to run at once. There are two main types of RAM — SRAM (static RAM) and DRAM (dynamic RAM).

存在哪些类型的RAM,数据库如何使用它们? IMDB (内存数据库)主要使用RAM而不是磁盘驱动器(存储)。 数据库可以通过RAM垂直扩展,从而可以完成更多的计算,但不一定要更快地进行计算,因为您只是在添加更多的资源,而不是更快的资源。 笔记本电脑中拥有更多RAM将使更多应用程序能够一次运行。 RAM有两种主要类型-SRAM (静态RAM)和DRAM (动态RAM)。

What are the differences in static RAM and dynamic RAM?DRAM is structurally simple with one transistor/capacitor per bit. This is important to note because DRAM needs to be refreshed constantly (see What is a memory refresh? in the section below). Keep in mind that both SRAM and DRAM are volatile — data is lost once the power source is cut off.

静态RAM和动态RAM有什么区别? DRAM在结构上很简单,每位只有一个晶体管/电容器。 请注意这一点,因为DRAM需要不断刷新(请参阅下面的部分中的什么是内存刷新? )。 请记住,SRAM和DRAM是挥发性的-一旦电源被切断数据丢失。

SRAM is typically smaller, faster, and more expensive than DRAM. It is closer to the CPU and built structurally different than DRAM. SRAM uses more transistors per bit (4–6 typically) and, therefore, requires more area on a chip per charge and has a lower data density. The size component due to the use of more transistors per bit makes SRAM more expensive than DRAM. The structural difference (4–6 transistors versus 1) is also why SRAM doesn’t need memory refreshes like DRAM.

SRAM通常比DRAM更小,更快且更昂贵。 它更接近CPU,并且在结构上与DRAM不同。 SRAM每位使用更多的晶体管(通常为4–6),因此,每次充电需要在芯片上占用更多的面积,并且数据密度较低。 由于每位使用更多的晶体管而导致的尺寸分量使SRAM比DRAM贵。 结构上的差异(4-6个晶体管与1个晶体管)也是SRAM 不需要像DRAM这样的存储器刷新的原因。

When cost and size aren’t concerns, SRAM is used. SRAM is typically the cache in the CPU. On the other hand, DRAM is more widely used and referred to as “main memory” because it is cheaper and can fit more data onto a chip.

如果不考虑成本和尺寸,则使用SRAM。 SRAM通常是CPU中的缓存。 另一方面,DRAM被更广泛地使用并被称为“主存储器”,因为它更便宜并且可以在芯片上容纳更多数据。

*Note: there is also NVRAM (non-volatile RAM, aka flash memory) and is most common in cameras. It is expensive and not typically used in computers. We won’t discuss it here, but it does exist.

*注意:还有NVRAM (非易失性RAM,又名闪存),在相机中最为常见。 它很昂贵,通常不用于计算机。 我们不会在这里讨论它,但是它确实存在。

What is a memory refresh?In a memory refresh, data is rewritten to the chip. The data is stored as the presence or absence of an electrical charge (1/0). As time goes on, the chip starts to lose/leak the charge, and each rewrite restores the capacitor’s charge. In my opinion, this is pretty neat!

什么是内存刷新?存储器刷新中 ,数据被重写到芯片上。 数据存储为是否存在电荷(1/0)。 随着时间的流逝,芯片开始失去/泄漏电荷,每次重写都会恢复电容器的电荷。 我认为这很整洁!

在我们深入之前…… (Before we fall any deeper…)

That concludes our trip down the rabbit hole for now, but it definitely doesn’t encompass the full memory/storage story. Another rabbit hole is how L1, L2, and L3 caches are used, their latencies, and physical layouts, for example! I will save that for another time, though. Here, we reviewed the differences between memory and storage and how each of them works in different ways.

到此为止,我们到兔子洞的旅程已经结束,但是它绝对不包含完整的内存/存储故事。 另一个难题是例如如何使用L1,L2和L3缓存,它们的延迟和物理布局! 不过,我会再保存一次。 在这里,我们回顾了内存和存储之间的差异以及它们各自以不同方式工作的方式。

Image for post
storage and memory differences
存储和内存差异

你为什么要在乎呢? (Why should you care?)

The way in which a database stores data impacts its performance and capabilities. An in-memory database will have lower latencies, limited volume, and be volatile in comparison to the traditional, on-disk databases.

数据库存储数据的方式会影响其性能和功能。 与传统的磁盘数据库相比,内存数据库的延迟较低,容量有限且易变。

I used a ton of resources to write this post, check them out!

我用了大量的资源来写这篇文章,看看他们!

翻译自: https://medium.com/swlh/how-data-is-stored-down-the-rabbit-hole-976a46347726

将不同数据存储到数据库中

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值