c++ 枚举 ::_可枚举：如何产生业务价值

最新推荐文章于 2024-11-14 09:21:17 发布

cullen2012

最新推荐文章于 2024-11-14 09:21:17 发布

阅读量99

点赞数

文章标签：大数据 python java mysql 编程语言

原文链接：https://habr.com/en/post/444358/

版权

c++ 枚举 ::

For the demonstration of ideas, in the article will be using C# language, but most of the ideas may be translated into other languages.

为了演示这些想法，本文将使用C＃语言，但是大多数想法都可以翻译成其他语言。

From the set of language's features, from my point of view, 'yield' is the most undervalued keyword. You can read the documentation and find a huge bunch of examples on the Internet. To be short, let's say that 'yield' allow creating 'iterators' implicitly. By design, an iterator should expose an IEnumerable source for 'public' usage. And here the tricky starts. Because we have a lot of implementations of IEnumerable in the language: list, dictionary, hashset, queue and etc. And from my experience, the choice of one of them for satisfaction requirements of some business task is wrong. Moreover, all of this is aggravated by whatever implementation is chosen, the program 'just works' — this is what really needs for business, isn't it? Commonly, it works, but only until the service is deployed into a production environment.

从这组语言的功能来看，从我的角度来看，“ yield”是最被低估的关键字。您可以阅读文档并在Internet上找到大量示例。简而言之，假设“收益”允许隐式创建“迭代器”。根据设计，迭代器应公开IEnumerable源以供“公共”使用。从这里开始棘手。因为我们在语言中有很多IEnumerable的实现方式：列表，字典，哈希集，队列等。并且根据我的经验，为了满足某些业务任务的需要而选择其中之一是错误的。而且，无论选择哪种实现方式，所有这些都会使程序变得“糟糕”，这是业务真正需要的，不是吗？通常，它可以工作，但仅在将服务部署到生产环境中之前有效。

For a demonstration of the problem, I suggest choosing very common business case/flow for most enterprise project which we can extend during the article and substitute some part of this flow for understanding a scale of influence this approach on enterprise projects. And it should help you to find your own case in this set to fix it.

为了说明问题，我建议为大多数企业项目选择非常通用的业务案例/流程，我们可以在本文中进行扩展，并用该流程的某些部分替代，以了解这种方法对企业项目的影响程度。而且它应该可以帮助您在这种情况下找到适合您的情况以进行修复。

Example of the task:

任务示例：

Load byline a set of records from a file or DB into memory.
将一组记录从文件或数据库按行装载到内存中。
For each column of the record change the value to someone other value.
对于记录的每一列，将值更改为其他值。
Save the results of transformation into a file or DB.
将转换结果保存到文件或数据库中。

Let's assume several cases where this logic may be applicable. At this moment, I see two cases:

让我们假设这种逻辑可能适用的几种情况。目前，我看到两种情况：

It is maybe a part of flow for some console ETL application.
对于某些控制台ETL应用程序，这可能是流程的一部分。
It is maybe a logic inside of action in Controller of MVC application .
这可能是MVC应用程序控制器中动作内部的逻辑。

If we paraphrase the task into a more technical manner, so it may be sound like this: "(1)Allocate an amount of memory, (2) load information into memory from persistence storage, (3)modify and (4)flush records changes in memory to the persistence storage." Here the first phrase in the description "(1)Allocate an amount of memory" may have a real correlation to your non-functional requirements. Because your job/service should 'live' in some hosting environment which may have some limitations/restrictions(for instance, 150Mb per micro-service) and to predict spendings on your service in budget, we should predict, in our case amount of memory which service will use (commonly we say about maximum amounts of memory). In other words, we should determine a memory 'footprint' for your service.

如果我们将任务改写为更具技术性的方式，那么听起来可能是这样的：“(1)分配一定数量的内存，(2)从持久性存储中将信息加载到内存中，(3)修改和(4)刷新记录更改为持久性存储的内存。” 在这里，描述“(1)分配内存量”中的第一个短语可能与您的非功能性需求有真正的关联。因为您的工作/服务应该“生活”在某些托管环境中，因此可能会有一些限制/限制(例如，每个微服务150Mb)，并预测预算中您的服务支出，因此在这种情况下，我们应该预测一下内存量将使用哪种服务(通常我们说的是最大内存量)。换句话说，我们应该为您的服务确定内存的“足迹”。

Let's consider a memory footprint for really common implementation which I observe from time to time in different codebases of enterprise projects. Also, you can try to find it in your projects too, for example, 'under the hood' of 'repository' pattern implementation, just try to find such words: 'ToList', 'ToArray', 'ToReadonlyCollection' and etc. All of such implementation means that:

让我们考虑一下我在企业项目的不同代码库中不时观察到的真正通用实现的内存占用量。另外，您也可以尝试在项目中找到它，例如，在“存储库”模式实现的“幕后”，只需尝试找到以下单词：“ ToList”，“ ToArray”，“ ToReadonlyCollection”等。全部这样的实现意味着：

1. For each line/record into file/db, allocates memory to hold properties of record from file/db (i.e. var user = new User() { FirstName = 'Test', LastName = 'Test2' })

1.对于文件/数据库中的每一行/记录，分配内存以保存文件/数据库中记录的属性(即var user = new User(){FirstName ='Test'，LastName ='Test2'})

2. Next, with help of, for example, 'ToArray' or manually, object's references are held into some collection (i.e. var users = new List(); users.Add(user)). So, it is allocated some amount of memory for each record from a file and not to forget about it, the reference is stored into some collection.

2.接下来，借助“ ToArray”或手动将对象的引用保存到某个集合中(即，var users = new List(); users.Add(user))。因此，为文件中的每个记录分配了一定数量的内存，并且不要忘记它，引用存储在某个集合中。

Here is an example:

这是一个例子：

private static IEnumerable<User> LoadUsers2()
        {
            var list = new List<User>();
            foreach(var line in File.ReadLines("text.txt"))
            {
                var splittedLine = line.Split(';');

                list.Add(new User()
                { 
                    FirstName = splittedLine[0],
                    LastName = splittedLine[1]
                });
            }

            return list;

            // or

            return File.ReadLines("text.txt")
                .Select(line => line.Split(';'))
                .Select(splittedLine => new User()
                { 
                    FirstName = splittedLine[0],
                    LastName = splittedLine[1]
                }).ToArray();
        }

Memory profiler results:

内存分析器结果：

Exactly such picture I saw every time in prodaction environment before container stops/reloads due to hosting's resource limitation per container.

正是由于托管每个容器的资源限制，我在产品环境中每次在容器停止/重新加载之前都看到过这样的图片。

So, a footprint for this case, roughly, depends on the number of records into a file. Because memory allocates per record in the file. And, the sum of this small peases of memory give us a maximum amount of memory which may be consumed by our service — it is the footprint of the service. But is this footprint predictable? Apparently, no. Because we can not predict a number of records in the file. And, in most case, the file size exceeds the amount of allowed memory in hosting in several times. It means that it is hard to use such implementation in the production environment.

因此，这种情况下的占用空间大致取决于文件中记录的数量。因为内存分配文件中的每个记录。而且，这些小豌豆的总和使我们的服务可能消耗的内存最大-这是服务的占用空间。但是这个足迹可以预测吗？显然没有。因为我们无法预测文件中的许多记录。而且，在大多数情况下，文件大小会多次超过托管中允许的内存量。这意味着很难在生产环境中使用这种实现。

Looks like it is the moment to re-thinks such implementation. Next assumption may give us more opportunities to calculate a footprint for the service: «a footprint should depend on the size only ONE record in the file». Roughly, in this case, we can calculate the maximum size of each column of only one record and sum them. It is quite easy to predict the size of a record instead of prediction of the number of records in the file.

似乎是时候重新考虑这种实现了。下一个假设可能会给我们提供更多计算服务足迹的机会：“足迹应仅取决于文件中一个记录的大小”。粗略地，在这种情况下，我们可以计算仅一个记录的每一列的最大大小并将它们相加。预测记录的大小而不是预测文件中的记录数非常容易。

真的很奇怪，我们是否可以实现一项服务，该服务可以处理不可预测的记录量，并且仅借助一个关键字“ yield” 就能不断消耗仅几兆字节的数据。 (And it is really wondered that we can implement a service which may handle an unpredictable amount of records and constantly consumes only a couple of megabytes with help only one keyword — 'yield'.)

The time for an example:

时间为例：

class Program
{
	static void Main(string[] args)
	{
		// 1. Load byline a set of records from a file or DB into memory.
		var users = LoadUsers();

		// 2. For each column of the record change the value to someone other value.
		users = ModifyFirstName(users);

		// 3. Save the results of transformation into a file or DB.
		SaveUsers(users);
	}

	private static IEnumerable<User> LoadUsers()
	{
		foreach(var line in File.ReadLines("text.txt"))
		{
			var splitedLine = line.Split(';');

			yield return new User() 
			{ 
				FirstName = splitedLine[0],
				LastName = splitedLine[1]
			};
		}
	}

	private static IEnumerable<User> ModifyFirstName(IEnumerable<User> users)
	{
		foreach (var user in users)
		{
			user.FirstName += "_1";

			yield return user;
		}
	}

	private static void SaveUsers(IEnumerable<User> users)
	{
		foreach(var user in users)
		{
			File.AppendAllLines("results.txt", new string []{ user.FirstName + ';' + user.LastName });
		}
	}

	private class User
	{
		public string FirstName { get; set; }

		public string LastName { get; set; }
	}
}

As you can see in the example above, there is allocates memory only for one object at a time: 'yield return new User()' instead of creating a collection and fills it with objects. It is the main point of optimization which allows us to calculate more predictable memory footprint for the service. Because we only need to know the size of two fields, in our case FirstName and LastName. When a modified user is saved into file (see File.AppendAllLines), the instance of the user object is available for garbage collection. And memory which is occupied by the object is deallocated (i.e. the next iteration of 'foreach' statement in LoadUsers), so the next instance of user object may be created. In other words, roughly, the same amount of memory replaces by the same amount of memory on each iteration. That is why we no need more memory than the size of a single record in the file.

如您在上面的示例中看到的，一次只为一个对象分配内存：'yield return new User()'而不是创建一个集合并用对象填充它。这是优化的主要要点，它使我们能够为服务计算更多可预测的内存占用量。因为我们只需要知道两个字段的大小，在我们的例子中就是FirstName和LastName。将修改后的用户保存到文件中时(请参阅File.AppendAllLines)，该用户对象的实例可用于垃圾回收。并且对象所占用的内存被释放(即LoadUsers中“ foreach”语句的下一次迭代)，因此可以创建用户对象的下一个实例。换句话说，大致而言，每次迭代时，相同数量的内存将替换为相同数量的内存。这就是为什么我们不需要比文件中单个记录大的内存。

Memory profiler results after optimization:

优化后的内存分析器结果：

From another perspective, if we slightly rename a couple of methods in the implementation above, so that use can notice some meaningful logic for Controllers in MVC application:

从另一个角度来看，如果我们在上面的实现中稍微重命名几个方法，以便使用可以注意到MVC应用程序中Controller的一些有意义的逻辑：

private static void GetUsersAction()
{
    // 1. Load byline a set of records from a file or DB into memory.
    var users = LoadUsers();
    // 2. For each column of the record change the value to someone other value.
    var usersDTOs = MapToDTO(users);
    // 3. Save the results of transformation into a file or DB.
    OkResult(usersDTOs);
 }

One important note before code listing: most of the important libraries like EntityFramework, ASP.net MVC, AutoMapper, Dapper, NHibernate, ADO.net and etc expose/consume IEnumerables sources. So, it means in the example above that LoadUsers may be replaced by an implementation which uses EntityFramework, for example. Which loads data row by row from the DB table, instead of a file. MapToDTO may be replaced by Automapper and OkResult may be replaced by a 'real' implementation of IActionResult in some MVC framework or our own implementation base on network stream, for example:

代码清单之前的一个重要说明：大多数重要的库，例如EntityFramework，ASP.net MVC，AutoMapper，Dapper，NHibernate，ADO.net等，都公开/使用IEnumerables源。因此，这意味着在上面的示例中，LoadUsers可以替换为使用EntityFramework的实现。从数据库表中逐行加载数据，而不是文件。在某些MVC框架或我们自己基于网络流的实现中，可以用Automapper代替MapToDTO并用IActionResult的“实际”实现代替OkResult，例如：

private static void OkResult(IEnumerable<User> users)
{
    // you can use a networksteam implementation
    using(StreamWriter sw = new StreamWriter("result.txt")) 
    {
        foreach(var user in users)
        {
            sw.WriteLine(user.FirstName + ';' + user.LastName);
         }
      }
}

This 'mvc-like' example shows us that we still able to predict and calculate a memory footprint also for Web-application. But in this case, it will be depends on requests count also. For example, the non-functional requirements may sound in this way: «Maximum memory amount for 1000 request not more then: 200KB per user object x 1000 requests ~ 200MB».

这个“类似于mvc”的示例向我们展示了我们仍然能够预测和计算Web应用程序的内存占用量。但是在这种情况下，也将取决于请求数。例如，非功能性需求可能会以这种方式听起来：«1000个请求的最大内存量不超过：每个用户对象200KB x 1000个请求〜200MB»。

Such calculations are very useful for performance optimization in case of scaling the web application. For instance, you need to scale your web application on 100 containers/VMs. So, in this case, to make a decision about how much resources you should allocate from hosting provider, so you can adjust the formula like this: 200KB per user object x 1000 requests x 100VMs ~ 20GB. Moreover, this is the maximum amount of memory and this is amount is under the control of your project's budget.

在扩展Web应用程序的情况下，这种计算对于性能优化非常有用。例如，您需要在100个容器/ VM上扩展Web应用程序。因此，在这种情况下，要决定应从托管服务提供商分配多少资源，因此可以调整以下公式：每个用户对象200KB x 1000个请求x 100VMs〜20GB。此外，这是最大的内存量，并且该量受项目预算的控制。

I hope that information from this article will be helpful and allow to save a lot of money and time in your projects.

我希望本文中的信息会有所帮助，并为您的项目节省大量金钱和时间。