为大数据构建用户界面

最新推荐文章于 2022-03-21 16:42:58 发布

weixin_26755329

最新推荐文章于 2022-03-21 16:42:58 发布

阅读量356

点赞数

文章标签： python

原文链接：https://medium.com/dev-genius/build-user-interfaces-for-big-data-e8c436d81d61

版权

In the modern world, data has become one of the most valuable commodities available. Companies collectively spend billions sourcing, shaping and analysing it.

在现代世界中，数据已成为可用的最有价值的商品之一。公司共同花费数十亿美元进行采购，调整和分析。

As a result of this shift, the amount of data we have available to us has outgrown our traditional ways of viewing it

这种转变的结果是，我们可获得的数据量超过了传统的查看方式

Now engineers all over the world are rethinking the technologies we use, and employing various tactics to ensure they are capable of handling this increase in information.

现在，全世界的工程师都在重新考虑我们使用的技术，并采用各种策略来确保他们有能力应对这种不断增长的信息。

The user-facing element is no exception to this. It is unrealistic to assume it could render millions of data points out of the box.

面向用户的元素也不例外。假设它可以立即提供数百万个数据点是不现实的。

I was recently in a team at a startup that was responsible for building the front-end to a network monitoring tool. The challenge here was the sheer amount of events we were expecting to have to process and present to the client in a valuable way.

我最近在一家初创公司的团队中工作，该团队负责构建网络监视工具的前端。这里的挑战是我们期望必须以有价值的方式处理并呈现给客户的大量事件。

Within this overarching challenge lay several questions we’d have to answer if the product were to be successful, with one standing out in particular.

在这一总体挑战中，存在几个问题，如果产品要成功，我们必须回答，尤其是其中一个突出的问题。

我们如何创建稳定的用户体验？ (How do we create a stable user experience?)

There is no single solution to this question. A combination of technology and tricks were needed to achieve it, which can be broken down into several categories:

这个问题没有单一的解决方案。需要将技术与技巧结合起来才能实现，可以分为几类：

Enhancing cognitive stimulation — The user needs cognitive stimulation almost immediately. Increases in digital consumption have caused the human brain to become impatient. Every second the screen is void of content is detrimental to the user’s engagement with the app, and in the realm of big data, those seconds can build up.
增强认知刺激 -用户几乎立即需要认知刺激。数字消费的增长已使人脑变得急躁。屏幕上每一秒钟都没有内容，这不利于用户与应用程序的互动，在大数据领域，这些秒数可能会累积。
Controlling the influx of data — Too much, and the app will become unresponsive. Too little, and your product loses its value. You cannot expect your product to thrive without providing meaningful insights in a highly performant way, so finding a balance is crucial.
控制数据流入 -太多，应用程序将变得无响应。太少了，您的产品就会失去价值。您不能期望您的产品蓬勃发展，而无需以高效能的方式提供有意义的见解，因此，找到一个平衡点至关重要。
Maximising data processing power — We need to maximise the amount of data we’re able to process in the front-end. Controlling the amount coming in becomes less of an issue if there are efficient processing methods in place for when it does.
最大限度地提高数据处理能力 —我们需要最大限度地提高前端能够处理的数据量。如果有有效的处理方法，那么控制进来的数量就不再是问题。
Increasing external response times —By spending some time improving API response time, we can accelerate data refreshing, leading to a more fluid experience.
延长外部响应时间 -通过花一些时间改善API响应时间，我们可以加快数据刷新速度，从而带来更流畅的体验。

Let’s take a look at each of these categories in more detail.

让我们更详细地看一看这些类别。

Image for post — Photo by Amanda Dalbjörn on Unsplash

增强认知刺激 (Enhancing cognitive stimulation)

There is not a lot that can be done to make dynamic content instantaneously available to the user. The server has to process requests, query for data, and then respond over a network of undetermined quality.

要使动态内容立即可供用户使用，没有太多的工作要做。服务器必须处理请求，查询数据，然后通过质量不确定的网络进行响应。

Traditionally, we would display a single loader or blank screen during this process. Neither of which is sufficient in keeping the human brain stimulated. What we needed to do was incorporate some intelligent design into the UI to increase perceived performance.

传统上，在此过程中，我们将显示单个加载程序或空白屏幕。两者都不足以刺激人脑。我们需要做的是将一些智能设计集成到UI中以提高感知性能 。

“Perceived performance is a measure of how quick a user thinks your site is” — Matt West

“感知性能是衡量用户认为您的网站有多快的指标” – Matt West

乐观的用户界面 (Optimistic UI)

To give the impression your app is quicker than it is, we can decouple the API responses from the UI updates. This way, we no longer have to wait for a success or failure message before pushing ahead with UI changes. Instead, we retroactively notify the user when the response comes in.

为了给您的应用带来比实际更快的印象，我们可以将API响应与UI更新解耦。这样，我们不再需要等待成功或失败消息再进行UI更改。相反，我们会在收到响应时回溯通知用户。

An excellent example of this occurs in popular messaging apps. When you send a message, it’s pushed to the main chat window, pending delivery. If successful, a ‘Delivered’ notification is displayed. If the delivery fails, an error message is displayed, often paired with a CTA to re-send.

流行的消息传递应用程序就是一个很好的例子。发送消息时，消息会被推送到主聊天窗口，等待发送。如果成功，将显示“已交付”通知。如果传递失败，则会显示一条错误消息，该消息通常与CTA配对以重新发送。

What does this mean in the context of big data?

在大数据的背景下这意味着什么？

Rather than lying idle while API requests are processed (which could be several seconds considering the large amount of data we’re dealing with), we start setting the UI up for the response. For example, scaling chart axes to the correct timeframe, or moving new filters to a “selected filters” list.

我们开始在响应中设置UI，而不是在处理API请求时处于空闲状态(考虑到我们正在处理的大量数据可能要花费几秒钟)。例如，将图表轴缩放到正确的时间范围，或将新过滤器移至“选定过滤器”列表。

Optimistic UI works best when mutating data. What do we do if we’re retrieving it?

当修改数据时，乐观的UI效果最好。 如果要检索该怎么办？

骨架屏幕 (Skeleton screens)

Skeleton screens have become increasingly popular over the last few years. They are now a fundamental part of interfaces constructed by companies such as Facebook and Youtube.

在过去的几年中，骨架屏幕已变得越来越流行。现在，它们已成为由Facebook和Youtube等公司构建的界面的基本组成部分。

A skeleton screen is a variant of a page that imitates the layout without providing the content

骨架屏幕是页面的一种变体，它模仿布局而不提供内容

Upon retrieval of the content, we replace individual elements of the mock layout with live data. Often skeletons screens will include a pulsing or shimmering animation to give the impression of progress.

检索内容后，我们用实时数据替换了模拟布局的各个元素。通常，骨骼屏幕会包含脉动或闪烁的动画，以给人以进步的印象。

A written explanation will not do the concept justice, so here’s a Codepen of a skeleton screen in action

书面解释并不能使概念公正，所以这里是实际工作中的骨架屏幕的Codepen

Bill Chung has written a great article on skeletons screens if you’re interested in reading more about them.

如果您有兴趣阅读关于骨架屏幕的更多信息，Bill Chung在骨架屏幕上写了一篇很棒的文章。

Employing these two methods should hold the user’s attention until the data required returns from the server, which brings us to the next category.

使用这两种方法应该引起用户的注意，直到所需的数据从服务器返回为止，这将我们带入下一个类别。

控制数据流入 (Controlling the influx of data)

We’ve covered perceived performance, now let’s talk about actual performance.

我们已经介绍了感知性能，现在让我们谈谈实际性能。

When building our network monitoring tool, we needed to think about the variety of devices that would be used to view it. Not all of them would be powerful enough to handle the vast amounts of data we expected to generate.

在构建我们的网络监视工具时，我们需要考虑用于查看它的各种设备。并非所有这些功能都足以处理我们期望生成的大量数据。

Two techniques were utilised to reduce this risk.

利用两种技术来降低这种风险。

铲斗 (Bucketing)

In software engineering data buckets have several definitions. In this case, it is the batching up of data into time intervals. Let’s use an example to demonstrate the concept.

在软件工程中，数据存储区有几个定义。在这种情况下，它是将数据按时间间隔分批处理。让我们用一个例子来演示这个概念。

Say you have a time series chart with the x-axis being the time in minutes, and the y-axis being an arbitrary measurement. Over the course of an hour, the server receives 2 million events.

假设您有一个时序图，其中x轴是时间(以分钟为单位)，y轴是任意度量。在一个小时的过程中，服务器收到了200万个事件。

Without bucketing, these would all be passed to the UI, causing performance problems. With bucketing, we could break the hour down into sixty 1-minute intervals.

如果不使用存储桶，这些将全部传递给UI，从而导致性能问题。使用存储桶，我们可以将时间减少为60个1分钟间隔。

For each 1-minute interval, we would take the average measurement of all the data points that occurred.

对于每个1分钟的间隔，我们将对所有出现的数据点进行平均测量。

There are now only 60 data points we need to worry about

现在只需要担心60个数据点

The trade-off here is that we’ve severely reduced the value of the data. In response to this, the front-end will allow the user to change the time range, triggering a bucket refresh with the new time parameters. By doing this, the ability to dive into the data on a granular level is retained.

这里的权衡是我们已经严重降低了数据的价值。响应于此，前端将允许用户更改时间范围，并使用新的时间参数触发存储桶刷新。这样，就可以保留细粒度地深入数据的功能。

延迟加载 (Lazy loading)

“Lazy loading is the approach of waiting to load resources until they are needed, rather than loading them in advance. This can improve performance by reducing the amount of resources that need to be loaded and parsed on initial page load.” — Sheila Simmons

“延迟加载是一种等待加载资源直到需要它们的方法，而不是事先加载它们。通过减少在初始页面加载时需要加载和解析的资源量，可以提高性能。” —希拉·西蒙斯(Sheila Simmons)

In the context of “controlling the influx of data”, we are talking about reducing the number of API requests by breaking them down into two categories — onscreen and offscreen.

在“控制数据流入”的上下文中，我们正在讨论通过将API请求分为两类(屏幕上和屏幕外)来减少API请求的数量。

If a component is offscreen, we delay its requests by not rendering it until it’s intended position is nearly onscreen. Libraries such as react-lazyload make this painless to implement.

如果某个组件不在屏幕上，我们将在其预期位置几乎在屏幕上之前不渲染它，从而延迟其请求。诸如react-lazyload之类的库使实现起来很轻松。

Delaying unnecessary requests will help speed up the initial load, delivering valuable information to the user faster

延迟不必要的请求将有助于加快初始加载速度，从而更快地向用户传递有价值的信息

Consequently, there is now an issue where there are multiple data loads as the user scrolls. This may not sound like much of a problem, but again when dealing with big data, this can very quickly become one.

因此，现在存在一个问题，即用户滚动时会有多个数据加载。这听起来似乎不是什么大问题，但是在处理大数据时，这很快就会成为一个问题。

To remedy this, we can iterate on the technique. Rather than deferring the offscreen API calls until the components are onscreen, we delay them until the completion of in-view API requests instead.

为了解决这个问题，我们可以迭代该技术。与其将屏幕外的API调用推迟到屏幕上显示这些组件，不如将它们推迟到视图内API请求完成为止。

Combining these methods should minimise any freezing when loading data. That is only half the job though. The focus now needs to shift to increasing performance when interacting with that data.

结合使用这些方法应在加载数据时最大程度地减少冻结。不过，那只是工作的一半。现在，重点需要转移到与数据进行交互时提高性能。

最大化数据处理能力 (Maximising data processing power)

Once our data is in the client, the user will begin mutating it in various ways.

一旦我们的数据进入客户端，用户将开始以各种方式对其进行变异。

This can demand a large amount of resource on the user’s device. To minimise this, we pre-aggregated our data and used Big O notation to make sure any functions we performed on it were as efficient as possible.

这可能需要用户设备上的大量资源。为了最大程度地减少这种情况，我们预先汇总了数据并使用Big O表示法来确保对它执行的所有功能都尽可能高效。

预汇总数据 (Pre-aggregating data)

To avoid unnecessary computing in the front-end, we perform calculations in advance and store them in a database.

为了避免在前端进行不必要的计算，我们预先进行计算并将其存储在数据库中。

Let’s say the application consistently needs to display a total of two measurements. If this is only happening once — it’s not expensive, and we don’t have to worry about pre-aggregation. If this is happening millions of times — pre-emptively running these computations will take the pressure off of the client, freeing it up to focus on user-driven interactions.

假设应用程序始终需要显示总共两个测量值。如果仅发生一次-这并不昂贵，而且我们不必担心预聚合。如果这种情况发生了数百万次，那么先发制人地运行这些计算将减轻客户端的压力，释放它来专注于用户驱动的交互。

As an alternative to pre-aggregating using databases, it is also possible to do it at runtime.

作为使用数据库进行预聚合的替代方法，也可以在运行时进行。

Whilst you can do this manually, using a library like Crossfilter can simplify the process.

虽然您可以手动执行此操作，但是使用 Crosscross之 类的库 可以简化此过程。

The caveat of doing it at runtime is that additional processing power is needed on app load. Once that has completed performing functions such as filtering and reducing on the data will be a lot faster.

在运行时这样做的警告是，应用程序负载需要额外的处理能力。一旦完成，执行诸如过滤和减少数据之类的功能将更快。

The benefit of doing it at runtime is you have more control over what you can aggregate, as you are not tied to using persistent storage.

在运行时执行此操作的好处是您可以更好地控制聚合内容，因为您不必局限于使用持久性存储。

大O符号 (Big O notation)

Often code efficiency is overlooked, in favour of developing features at speed. Learning how to measure and then reduce code complexity is an excellent way of ingraining efficiency into your programming, and Big O notation is a great system to help with this.

通常会忽略代码效率，而倾向于快速开发功能。学习如何衡量并降低代码复杂性是将效率提高到编程中的绝妙方法，而Big O表示法是一个很好的系统。

If we ignore efficiency, user actions may crash the interface

如果我们忽略效率，则用户操作可能会使界面崩溃

It’s best when dealing with resource-heavy code such as iterations, to always be cautious and consider the following:

在处理诸如迭代之类的资源密集型代码时，最好始终保持谨慎并考虑以下几点：

How much data is this code likely to be handling?
此代码可能要处理多少数据？
Are there unnecessarily complicated iterations?
是否有不必要的复杂迭代？
Are there nested iterations?
是否有嵌套的迭代？
Can the iteration be broken early?
迭代可以提前中断吗？

There’s a lot to consider when it comes to code complexity, but it’s well worth investing the time in learning how to reduce it.

在代码复杂性方面有很多要考虑的问题，但是值得花时间学习如何减少代码复杂性。

You can read more about Big O notation here.

您可以在此处阅读有关Big O符号的更多信息。

增加外部响应时间 (Increasing external response times)

We’ve covered how to optimise the internal responsiveness of your UI. Now, what about optimising externally too?

我们已经介绍了如何优化UI的内部响应能力。现在，外部优化又如何呢？

Caching client-ready data using tools such as Redis is common these days. There is, however, a limit to how much data we can store in-memory, as it is not the cheapest of storage options.

如今，使用Redis之类的工具来缓存客户端就绪数据已经很普遍了。但是，我们可以在内存中存储多少数据受到限制，因为它不是最便宜的存储选项。

Considering the amount of data, we had to introduce smarter ways of caching to really get the most out of it.

考虑到数据量，我们必须引入更智能的缓存方法以真正充分利用数据。

相邻缓存 (Adjacent caching)

Instead of just caching static data, we also cache data that is similar to what is currently in-view.

我们不仅缓存静态数据，还缓存与当前视图中相似的数据。

For example, we talked earlier about bucketing data into 5-minute intervals. We predict with a high rate of success that the user will want to see the refined data within those intervals. By caching this data beforehand, the response time of the API used to retrieve it is decreased.

例如，我们之前谈到了将数据存储到5分钟的间隔中。我们以很高的成功率预测用户将希望在这些时间间隔内看到经过精炼的数据。通过预先缓存此数据，可以减少用于检索它的API的响应时间。

自适应缓存 (Adaptive caching)

A user will usually have a consistent set of behaviours. Often they will visit the same pages or select the same filters. Over time, we can start to predict these patterns and pre-emptively cache the relevant data upon login. This will help strike a balance between the cost of caching and maintaining a high level of cache hits.

用户通常会有一系列一致的行为。他们通常会访问相同的页面或选择相同的过滤器。随着时间的流逝，我们可以开始预测这些模式，并在登录时抢先缓存相关数据。这将有助于在缓存成本和保持高水平的缓存命中之间取得平衡。

Adaptive caching can be seen as a smarter, albeit complex form of adjacent caching, due to it being more accurate as it is tailored to the user.

自适应缓存可以被视为更智能的相邻缓存形式，因为它是针对用户量身定制的，因此更为精确。

These two forms of caching aren’t mutually exclusive, but ensure data duplication does not occur when used together.

这两种缓存不是互斥的，但要确保一起使用时不会发生数据重复。

我们如何避免降低性能提升？ (How do we avoid regressing on our performance enhancements?)

Once you’ve built your app, it’s essential not to regress on the hard work with new features that don’t comply with the previous performance-driven standards.

ØNCE你建立你的应用程序，这是必须不与那些不遵守先前的业绩驱动标准的新功能的辛勤工作倒退。

Developers should live and breathe performance

开发人员应该活得喘不过气来

Too often it’s seen as an afterthought, which may be ok for apps with minimal data throughput. However, if we’re building a system that involves big data, it has to be at the forefront of everyone’s mind.

通常，这被认为是事后的想法，这对于数据吞吐量最小的应用程序可能是可以的。但是，如果我们要构建一个涉及大数据的系统，则它必须摆在每个人的脑海中。

文档和入职 (Documentation & onboarding)

Assuming your app performs well in the market, scaling your team will become a priority. It’s essential during this time to be disciplined and ensure you grow in the right way.

假设您的应用在市场上表现良好，那么扩大团队规模将成为当务之急。在这段时间里，纪律检查和确保您以正确的方式成长至关重要。

Remember — Twice as many developers does not immediately equal twice the work

记住：两倍的开发人员并不立即等于工作量的两倍

New engineers need time to onboard and familiarise themselves with the system. Pushing them into the deep-end without a structured onboarding process may lead to an increase in feature implementation, but will also hurt the quality and performance of your product.

新工程师需要时间来熟悉和熟悉系统。在没有结构化的入职流程的情况下将它们推入深端可能会导致功能实现的增加，但也会损害产品的质量和性能。

自动化 (Automation)

Automation can catch anything a developer may have missed. Ideally, you want the developers to be building features with performance in mind. However, as the last line of defence, automation can be a lifesaver.

自动化可以捕获开发人员可能错过的任何东西。理想情况下，您希望开发人员在构建功能时考虑到性能。但是，作为最后一道防线，自动化可以挽救生命。

It can cover anything from page load time, using tools such as Pingdom, to checking for memory leaks with a custom build pipeline job.

它可以覆盖从页面加载时间到使用Pingdom之类的工具到使用自定义构建管道作业检查内存泄漏的所有内容。

These techniques represent a high-level response to the challenges we needed to overcome to take an interface dealing with large amounts of data from ideation through to delivery.

ŤHESE技术代表的高级别响应于我们需要克服通过递送采取的接口处理大量从构思数据的挑战。

There are many other great tactics out there. I strongly suggest thoroughly researching and exploring multiple avenues before settling on a roadmap, especially with the front-end ecosystem continually evolving at such a high speed.

还有许多其他伟大的策略。我强烈建议您在制定路线图之前，要彻底研究和探索多种途径，尤其是前端生态系统以如此高的速度持续发展时。

Every year innovations surface that improves the way we handle data in the front-end

每年都有创新的表面可以改善我们在前端处理数据的方式

Keeping up with them is a fantastic way to ensure the app you’re building is consistently performant, allowing the focus to shift to delivering great features.

跟上他们是确保您正在构建的应用程序始终如一的高性能的绝佳方法，可将重点转移到提供出色的功能上。