q邻域填充平均值填充_软件工程101：将多个值填充为单个64位值

最新推荐文章于 2024-09-26 19:15:00 发布

weixin_26746861

最新推荐文章于 2024-09-26 19:15:00 发布

阅读量134

点赞数

文章标签： python

原文链接：https://medium.com/from-the-scratch/software-engineering-101-stuffing-multiple-values-into-a-single-64bit-value-50aebe396559

版权

q邻域填充平均值填充

There are some things which every software engineer should know but unfortunately it is not taught in the schools. One of them is: How to stuff multiple values into a single value.

每位软件工程师都应该知道一些事情，但不幸的是，学校没有教授这些东西。 其中之一是：如何将多个值填充为一个值。

软件工程101系列— (Software Engineering 101 Series —)

Stuffing multiple values into a single 64bit value (you are here)
将多个值填充为一个64位值( 您在此处 )
Time Zones and Working with Dates
时区和日期处理

If you don’t know about this, you may be wondering

如果您对此一无所知，您可能会想知道

How does that help?
有什么帮助？
Why can’t I use different variables for each value?
为什么不能为每个值使用不同的变量？
Why to complicate things unnecessarily? I have 8GiB of RAM, why should I care about saving 8 bytes?
为什么不必要地使事情复杂化？我有8GiB的RAM，为什么要关心节省8个字节？

The main aim of stuffing multiple values into a single large value is not to save memory or decrease the number of variables in the program.

将多个值填充为一个大值的主要目的不是节省内存或减少程序中的变量数量。

The main aim is to encode multiple useful values into a single value to identify it uniquely, most of the time

主要目的是在大多数情况下将多个有用的值编码为一个值以唯一地标识它

Instead of going into “how to do it” first, let’s take an unconventional route and first go into a use case to understand why it is needed and where you can use it. This can make you understand the “how to do it” intuitively.

让我们先走一个非常规的路线，而不是首先进入“ 如何做 ”，首先进入一个用例，以了解为什么需要它以及可以在哪里使用它。这可以使您直观地理解“ 操作方法 ”。

用例 (Use Case)

Let’s consider a scenario where this technique is used extensively, database sharding at application level.

让我们考虑一种广泛使用此技术的场景，即在应用程序级别进行数据库分片。

Database sharding at an application level means the application itself knows where the data is residing among your swarm of database instances. Instead of having some kind of routing mechanism or hash tables to locate the data on the server, the application knows which database has the data it needs. This type of sharding doesn’t depend on the database you are using and doesn’t matter if the database doesn’t provide any sharding features like MySQL.

在应用程序级别的数据库分片意味着应用程序本身知道数据在您的数据库实例群之间的位置。应用程序无需使用某种路由机制或哈希表来在服务器上定位数据，而是知道哪个数据库具有所需的数据。这种类型的分片与您所使用的数据库无关，并且如果数据库不提供任何分片功能(如MySQL)也无关紧要。

For example, you are just starting your website and designing a database for it. Normally, if you don’t have any natural key, you would use the auto-increment value as the primary key. And also you use that auto-incremented ID as a part of the url to map to the data(like how Stackoverflow does).

例如，您只是启动您的网站并为其设计数据库。通常，如果您没有任何自然键，则可以将自动增量值用作主键。而且，您还可以使用该自动递增的ID作为网址的一部分来映射到数据(例如Stackoverflow的工作方式)。

After 1 year, your database server is struggling to catch up with the traffic requirements. To solve that, you have 2 options.

一年后，您的数据库服务器正在努力满足流量需求。为了解决这个问题，您有2个选择。

Vertical Scaling
垂直缩放
Horizontal Scaling
水平缩放

Vertical Scaling is costly and finite. So, you want to opt for horizontal scaling but MySQL doesn’t provide horizontal scaling feature. So, you add a new MySQL instance to shard the data. Now, you need to find a new mapping to find the sharded data directly instead of searching all the instances.

垂直缩放既昂贵又有限。因此，您想选择水平缩放，但是MySQL不提供水平缩放功能。因此，您添加了一个新MySQL实例来分片数据。现在，您需要找到一个新的映射来直接查找分片的数据，而不是搜索所有实例。

This is where we can use the “stuffing multiple values into the single 64bit value” technique.

在这里，我们可以使用“将多个值填充到单个64位值中 ”技术。

Before doing that, we need to do some things:

在此之前，我们需要做一些事情：

Give every database shard a unique ID and call it as dbID
为每个数据库分片赋予唯一的ID，并将其称为dbID
Now, you already have auto-incremented postID for every row in a database.
现在，您已经为数据库中的每一行自动增加了postID 。
Decide how many bits you want to allocate for each ID above. For example, you allocated 10 bits for dbID and 42 bits for postID.
确定要为上述每个ID分配多少位。例如，您分配给DBID 10位和42 帖子ID位。

10 bits can store 2¹⁰ = 1024 values, 42 bits can store 2⁴² = 43,98,04,65,11,104

10位可以存储2个= 1024值，42位可以存储2个= 43,98,04,65,11,104

It means you can create 1024 shards and each shard can have 43,98,04,65,11,104 rows.

这意味着您可以创建1024个分片，每个分片可以具有43,98,04,65,11,104行。

So, instead of identifying the post using the postID, you also need to identify shard i.e., dbID.

因此，除了使用postID标识帖子之外，还需要标识sdb即dbID 。

So, instead of encoding your url like “your_domain.com/posts/postID”, we encode it like “your_domain.com/posts/newID”. Now, the newID contains the info about postID and dbID so that your app knows which shard to ask for the data directly without any middleware.

因此，而不是像编码“your_domain.com/posts/postID”您的网址，我们对其进行编码，如“your_domain.com/posts/newID”。 现在， newID包含有关postID和dbID的信息，因此您的应用程序无需任何中间件即可知道哪个分片直接请求数据。

Now, we are going to encode postID and dbID into the newID like:

现在，我们将postID和dbID编码为newID，如下所示：

(I’m using Java, so Java’s Long is 63 bytes. Why is this important? We’ll see about it below in “how to do it”.)

(我正在使用Java，因此Java的Long是63字节。为什么这很重要？我们将在下面的“ 如何做 ”中对其进行介绍。)

Disclaimer: Don’t be intimidated if you don’t know what it is. We will talk about how to do it after this.

免责声明 ：如果您不知道这是什么，不要被吓倒。 之后，我们将讨论如何做。

long dbID = 997;long postID = 8_04_65_11_104L;long newID = (dbID << 53) | (postID << 11);System.out.println("newID: " + newID);
System.out.println("dbID: " + ((newID >> 53) & 0x3FF))
System.out.println("postID: " + ((newID >> 11) & 0x3FFFFFFFFFFL));

The output will be:

输出将是：

newID: 8980194136231510016
dbID: 997 
postID: 8046511104

So, now your new url can be “your_domain.com/posts/8980194136231510016” instead of “your_domain.com/posts/8046511104”

因此，现在您的新网址可以是“ your_domain.com/posts/8980194136231510016 ”，而不是“ your_domain.com/posts/8046511104 ”

This “8980194136231510016” has the information about the shard ID and post ID.

此“ 89801941362315100100 ”具有有关分片ID和帖子ID的信息 。

This is called application level sharding because your application knows that postID: 8046511104 lives in database id: 997. There is no need for middleware which needs to check in which database the postID: 8046511104 is.

这称为应用程序级分片，因为您的应用程序知道postID： 8046511104位于数据库ID： 997中 。不需要中间件，它不需要检查postID： 8046511104在哪个数据库中。

I’m pretty sure that now you know why it is useful with a real-world use-case. Now, let’s see how it is done

我很确定，现在您知道为什么它在实际用例中很有用了。现在，让我们看看它是如何完成的

怎么做 (How to do it)

First and foremost thing you need to be careful about the size of Integer in your preferred language.

首先，最重要的是，您需要注意首选语言中Integer的大小。

For example,

例如，

Javascript has only one type both for floating and integers i.e., Number. Javascript only uses 53 bits to store the integers but not 64 bits. In that 53 bits, 1 bit is used for sign. So, you can only use 52 bits to encode.

Javascript对于浮点数和整数只有一种类型，即Number。 Javascript仅使用53位存储整数，而不使用64位。在这53位中，1位用于符号。因此，您只能使用52位进行编码。

Java has “Long” type which can store 64 bit integers. But unlike C and C++, Java doesn’t have “unsigned”. So, you can only use 63 bits because 1 bit is used for sign.

Java具有“长”类型，可以存储64位整数。但是与C和C ++不同，Java没有“未签名的”。因此，您只能使用63位，因为1位用于符号。

So assume “B” is your number of bits you can use. In the example above, we have used Java, so B = 63.

因此，假设“ B ”是您可以使用的位数。在上面的示例中，我们使用了Java，因此B = 63 。

NOTE: Whatever you wanna encode, your sum of individual values in bits cannot exceed “B” and also only positive numbers are allowed

注意：无论您想编码什么，以位为单位的单个值的总和不能超过“ B ”，并且只允许使用正数

Let’s assume we want to store the following values in a single value:

假设我们要在单个值中存储以下值：

A with 15 bits — range can be from 0 to 32, 767 (2¹⁵-1)
具有15位的A-范围可以从0到32、767(2¹⁵-1)
B with 35 bits — range can be from 0 to 34,35,97,38,367 (2³⁵-1)
具有35位的B-范围可以从0到34,35,97,38,367(2³⁵-1)
C with 10 bits — range can be from 0 to 1,023 (2¹⁰-1)
具有10位的C-范围可以从0到1,023(2¹⁰-1)
D with 3 bits — range can be from 0 to 7 (2³-1)
带3位的D-范围可以从0到7(2³-1)
E with 1 bit — range can be from 0 and 1 (2¹-1)
带1位的E-范围可以是0到1(2¹-1)

The sum of individual values in bits is 15 + 35 + 10 + 3 + 1 = 64 bits.

以位为单位的各个值的总和为15 + 35 + 10 + 3 +1 = 64位。

But our “B” is only 63. So, you have to let go any one of them to bring it below 64. We will let go “E” , which is 1 bit. Now, our sum of bits will be 63 bits, which is equal to “B”.

但是我们的“ B ”只有63。所以，您必须放开其中任何一个以使其低于64。我们将放开“ E ”，即1位。现在，我们的位总和将为63位，等于“ B ”。

This is how we want to arrange all our values at the end

这就是我们想要在最后安排所有价值观的方式

There are 2 parts for this.

有两个部分。

Encoding
编码方式
Decoding
解码

编码- (Encoding —)

这就是15位的“ A”在63位数据类型中的显示方式： (This is how “A” with 15 bits looks in a 63 bit data type:)

We need to move all the 15 A ’s to the leftmost empty bit. We need to left shift by 63–15(A) = 48 bits

我们需要将所有15 A移到最左边的空位。我们需要左移63–15( A )= 48位

A << 48

After the left shift, this is how it looks:

左移后，外观如下：

这就是35位的“ B”在63位数据类型中的显示方式： (This is how “B” with 35 bits looks in a 63 bit data type:)

If you look at the previous image, “A” has already occupied 15 bits from the left side. So, we need to shift all the 35 B’s where 15 A’s end.

如果您查看上一张图像，“ A ”已经从左侧占据了15位。因此，我们需要将所有35 B移到15 A结尾。

We need to left shift by 63-15(A)-35 (B)= 13 bits

我们需要左移63-15( A )-35( B )= 13位

B << 13

After the left shift, this is how it looks:

左移后，外观如下：

这就是10位的“ C”在63位数据类型中的样子： (This is how “C” with 10 bits looks in a 63 bit data type:)

If you look at the previous image, “A” has already occupied 15 bits and “B” has already occupied 35 bits from the left side. So, we need to shift all the 10 C’s where 35 B’s end

如果您查看上一张图像，则“ A ”已经占据了15位，而“ B ”已经占据了35位。因此，我们需要将所有10 C移到35 B的末尾

We need to left shift by 63–15(A)–35(B)-10(C) = 3 bits

我们需要左移63–15( A )–35( B )-10( C )= 3位

C << 3

After the left shift, this is how it looks:

左移后，外观如下：

这就是3位的“ D”在63位数据类型中的样子： (This is how “D” with 3 bits looks in a 63 bit data type:)

If you look at the previous image, “A” has already occupied 15 bits, “B” has already occupied 35 bits and “C” has already occupied 10 bits from the left side. So, we need to shift all the 3 D’s where 10 C’s end

如果查看上一张图像，从左侧开始，“ A ”已经占据了15位，“ B ”已经占据了35位，而“ C ”已经占据了10位。因此，我们需要将所有3 D移到10 C结束的位置

We need to left shift by 63–15(A)–35(B)-10(C)-3(D) = 0 bits

我们需要左移63–15( A )–35( B )-10( C )-3( D )= 0位

D << 0

After the left shift, this is how it looks:

左移后，外观如下：

Now, we have all the “A”, “B”, “C”, “D” bits at their respective positions like below

现在，所有“ A ”，“ B ”，“ C ”，“ D ”位都位于它们各自的位置，如下所示

Now we want to combine these into one variable. The | operator works by looking at each bit, and returning 1 if the bit is 1 in either of the inputs. So:

现在我们想将它们组合成一个变量。 | 运算符通过查看每个位来工作，如果任一输入中的位为1 ，则返回1 。所以：

0011 | 0101 = 0111

If a bit is 0 in one input, then you get the bit from the other input. Looking at (A << 48), (B << 13), (C << 3)and (D << 0) you'll see that, if a bit is 1 for one of these, it's 0 for the others. So:

如果一个输入中的某个位为0 ，则可以从另一输入中获取该位。观察(A << 48) ， (B << 13) ， (C << 3)和(D << 0)您会发现，如果其中之一为1 ，其他则为0 。所以：

encodedValue = (A << 48) | (B << 13) | (C << 3) | (D << 0)

Now that we have completed with the Encoding part, let’s get into Decoding part.

现在我们已经完成了编码部分，让我们进入解码部分。

解码- (Decoding —)

In encoding, we have used left shift( << ) and OR( | ) but we use the opposite in decoding. We use right shift ( >> ) and AND ( & ).

在编码中，我们使用了左移( << )和OR( | )，但在解码中使用了相反的方法。我们使用右移( >> )和AND( & )。

Now we want to unpack the bits. Let’s start with the D. We want to get the last 3 bits, and ignore the first 60 bits.

现在我们要解压缩位。让我们从D开始。我们要获取最后3位，而忽略前60位。

To do this, we use the & operator, which returns 1 only if both of the input bits are 1. So:

要做到这一点，我们使用&运算符，它返回1只有当两个输入位是1 。所以：

0011 & 0101 = 0001

So, if you want to unpack only “n” bits from right side, you need to perform AND with “n” bits in which all the “n” bits should be 1.

因此，如果您只想从右侧解压缩“ n”个位，则需要对“ n”个位执行“与”操作，其中所有“ n”个位应均为1 。

To get “C”, we need to right shift till it reaches the rightmost bit. Right shift “C” with the same number you have used to left shift during encoding and perform AND operation with 10 bits(because “C” is 10 bits) set to 1.

要获得“ C ”，我们需要右移直到它到达最右边。右移“ C ”与您在编码期间左移使用的数字相同，并以10位(因为“ C ”为10位)设置为1进行“ 与”运算。

Keep performing the same thing for the rest of “B” and “A”.

对于“ B ”和“ A ”的其余部分，请继续执行相同的操作。

Right shift “B” by 13 and perform AND operation with 35 bits(because “B” is 35 bits) set to 1.

将“ B ”右移13并以35位(因为“ B ”为35位)设置为1进行“ 与”运算。

Right shift “A” by 48 bits and perform AND operation with 15 bits(because “A” is 15 bits) set to 1.

将“ A ”右移48位，并以15位(因为“ A ”为15位)设置为1进行“ 与”运算。

Now, we have decoded the single encoded value into their own individual values

现在，我们已经将单个编码值解码为自己的单个值

decodedD = (encodedValue >> 0)  & 0x7
decodedC = (encodedValue >> 3)  & 0x3FF
decodedB = (encodedValue >> 13) & 0x7FFFFFFFF
decodedA = (encodedValue >> 48) & 0x7FFF

魔术数字- (The magic numbers —)

There is a very high probability that you must be wondering what is that Hex value and how to arrive at that particular value.

您很有可能想知道十六进制值是什么以及如何达到该特定值。

In many programming languages, you can use different radixs like 2(binary), 6(hexa), 8(octal) and 10(decimal) to represent numbers.

在许多编程语言中，您可以使用2(二进制)，6(六)，8(八进制)和10(十进制)之类的基数来表示数字。

Here, we are using Hex to represent the numbers. Why? Because it is easy to represent multiple 1 in Hex than in any other radix systems.

在这里，我们使用十六进制表示数字。为什么？因为在十六进制中比在任何其他基数系统中都容易表示多个1 。

For example, if you want a number which has 4 bits set to 1, then in binary it will be represented as 0b1111 and in hex it is represented as 0xF. If you don’t know Hexadecimal, then binary might be easy for you to write but you can’t write 48 1s in binary. But in Hex you can write it like 0xFFFFFFFFFFFF. Instead of writing 48 1s in binary system, we write 12 Fs in hexadecimal system.

例如，如果您想要将4位设置为1 ， 0b1111二进制形式将其表示为0b1111 ，以十六进制形式将其表示为0xF 。如果您不知道十六进制，那么二进制可能会很容易写，但是您不能以二进制形式写入48 1 s。但是在Hex中，您可以像0xFFFFFFFFFFFF一样编写它。而不是在二进制系统中写入48 1 s，而是在十六进制系统中写入12 F

So, if you don’t know how hexadecimal system works, you need to learn about that. It’s an interesting topic of its own.

因此，如果您不知道十六进制系统是如何工作的，则需要学习一下。这是一个有趣的话题。

Want to contact me? Contact me on Twitter.com/@SkrewEverything.

想联系我吗？通过Twitter.com/@SkrewEverything与我联系。

Found any mistakes? Comment down below.

发现任何错误？在下方留言。

Liked it? 👏👏👏 it and Share it.

喜欢吗？ and并分享。