文献翻译

最新推荐文章于 2024-09-29 08:44:52 发布

chouzhizhong2013

最新推荐文章于 2024-09-29 08:44:52 发布

阅读量138

点赞数

文章标签：数据库大数据

原文链接：https://my.oschina.net/u/2526410/blog/676928

版权

When I added rivers way back in time in the early days of Elasticsearch, the idea was somewhat novel.

One of the first tasks that users do when using Elasticsearch is to get data into Elasticsearch to make it searchable, so why not allow our community to write plugins that can be installed directly on an Elasticsearch cluster to pull data into Elasticsearch automatically.用户使用es的第一件事就是将数据传入集群然后搜索，所以为什么不让我们的社区自己写一个插件呢，直接安装在集群上将数据自动导入集群。

The first few rivers implementations were quite successful and very helpful. The CouchDB river was immensely simple and popular (thanks to CouchDB changes API). Others were popular as well, for example, the RabbitMQ one. They did start to slowly show the problematic nature of rivers as well. --CouchDB 是一个开源的面向文档的数据库管理系统 --消息队列。早起实现的river都比较的成功，CouchDB river是最具代表性的（感谢CouchDB交换的API），其他的像RabbitMQ也很流行。但是他们也渐渐地出现了river数据流共通的问题。

What was the problem we were witnessing? Cluster stability. You see, by their nature, rivers deal with external systems, and those external systems require external libraries to work with. Those are great to use, but they come with an overhead. Part of it is built in overhead, things like additional memory usage, more sockets, file descriptors and so on. Others, sadly, are bugs.我们见证的问题是什么呢？集群的稳定性。你可以看见，因为他们的本质出现的问题，river流处理了外部系统，这些需要外部的库支持，这些是应该使用的，但是他们需要的开销很大，超负荷了，比如，内存，接口，和文件操作符等等，很难，其实这是一个bug。

Part of our efforts in the past couple of years has been to improve Elasticsearch resiliency, and we kept seeing, time and time again, that rivers are a big cause for cluster instability, due to their inherent notion of working with external systems and external libraries. When Found joined us a couple of months ago, we found that they see the same thing, with rivers plugins causing most of the cluster instabilities across the thousands of clusters under management.在过去的几年里，我们努力改善了ElasticSearch容错能力，但是我们总是能够一次次的看见river 是导致集群不稳定最大的原因。因为river使用外部系统和外部库的固有概念。当Found加入我们的几个月前，我们发现他们看到同样的事情。在成千上万的集群管理下，使用river插件引起里很大一部分的集群的不稳定。

We knew it for some time, and in the spirit of helping users build more resilient systems, we decided to deprecate rivers and ask users to focus on getting data to Elasticsearch from "outside" the cluster. Rivers are deprecated from 1.5 moving forward. We will probably keep the infrastructure around in 2.0, and only remove them at a later version, to ease the migration.有一段时间，我们本着建立更有容错能力的系统，决定反对river要求用户专注从外部集群获取数据。为了一步步缓解迁移改动，1.5之后开始弃用rive，2.0会保留基础设施，在2.0之后的版本彻底删除。

The ease of use in getting data into Elasticsearch is still important though, so where should you go from here?但是我们又应该怎样获取数据呢？

For more than a year, we've had official client libraries for Elasticsearch in most programming languages. It means that hooking into your application and getting data through an existing codebase should be relatively simple. This technique also allows to easily munge the data before it gets to Elasticsearch. A common example is an application that already used an ORM to map the domain model to a database, and hooking and indexing the domain model back to Elasticsearch tends to be simple to implement.一年多来,我们有大多数编程语言官方客户端库。这意味着连接应用程序和获取数据变得相当的简单。而且很容易的显示入库之前的数据。一个常见的例子是一个已经使用ORM把领域模型映射到一个数据库应用程序，并把索引域模型到Elasticsearch趋于简单化了。

Also, Logstash, or similar tools, can be used to ship data into Elasticsearch. For example, some of the rivers Elasticsearch came with are now implemented as Logstash plugins (like the CouchDB one) in the forthcoming Logstash 1.5. 还有就是logstash或者是类似的工具，也可以传送数据进入ES,例如有些es-river，像logstash插件。

I love the community and work that has gone into a vibrant set of river plugins, and the decision to deprecate rivers was not taken lightly. What should you do if you are a river plugin author?我喜欢river插件的开发，决定反对river是慎重的，如果你是river的作者，你应该怎么办？

Since rivers are relatively self-sufficient, the code can be extracted into a common library that can be used to get data into Elasticsearch. Then, this library can be used in various different places. A simple "main class" can be written to allow to execute it as a standalone process.由于河流相对自给自足,代码可以提取到一个公共库,可以用来获取数据到Elasticsearch，然后,这个库可以用于各种不同的地方。可以编写一个简单的“主类”执行,允许它作为一个独立的过程。

Another option is to move the plugin to be a Logstash input. Logstash inputs are very simple to write, and in 1.5 the Logstash team has made the process of writing, maintaining, and discovering plugins super simple.另一个选择是将插件Logstash输入，Logstash输入都写的很简单，在1.5队有了写作 logstash，维护过程，发现插件超级简单。

It is a hard decision to take something away, especially with all the effort that has gone into writing those river plugins by the wonderful authors. I deeply apologize for it, and we would love to help out with any questions and ideas on how to move forward adapting them. The issue for it is #10345.

原文地址：https://www.elastic.co/blog/deprecating-rivers

转载于:https://my.oschina.net/u/2526410/blog/676928