hdtune快速扫描可靠吗
重点 (Top highlight)
Over the last few years, Instagram Direct has grown to be a core part of sharing content and experiences with your close friends on Instagram. This isn’t a privilege we take lightly; it’s incredibly important that we deliver each message as reliably and as quickly as possible. The Internet transfers data at the Speed of Light, but in many cases, due to large distances and network inefficiencies, the human eye can still pickup on the delay between a request’s start and finish. Additionally, network requests can fail for a wide variety of reasons, such as connection-loss or transient server issues. In this blog post, we’ll discuss Instagram’s infrastructure on iOS and Android to not only retry these actions, but also to make the whole experience feel much faster and reliable to the user.
在过去的几年中,Instagram Direct已成为与您的密友在Instagram上共享内容和体验的核心部分。 这不是我们掉以轻心的特权。 至关重要的是,我们要尽可能可靠,尽快地传递每条消息。 互联网以光速传输数据,但是在很多情况下,由于距离远且网络效率低下,人眼仍然可以注意到请求开始和结束之间的延迟。 此外,网络请求可能由于多种原因而失败,例如连接丢失或服务器瞬态问题。 在这篇博客文章中,我们将讨论的InstagramiOS和Android上的基础设施,不仅重试这些操作,也使整个体验感觉更快,可靠的用户。
移动应用如何使网络请求看起来快速 (How mobile apps make network requests look fast)
Often, when someone is building a mobile app that wants to “mutate” state on the server (such as tell the server to send a message), they have their view layer initiate a network request. It doesn’t take long before the developer realizes that this request could easily take a second or two even in good network conditions.
通常,当有人在构建想要在服务器上“改变”状态的移动应用程序(例如,告诉服务器发送消息)时,他们会要求其视图层发起网络请求。 不久之后,开发人员意识到即使在良好的网络条件下,此请求也很容易花费一两秒钟。
A common pattern used to make it seem like the application is fast and responsive (i.e., reduce the user-perceived latency) is to “optimistically” guess the expected output of the successful request and immediately apply it to the view — all before the request is even made. We call this concept “optimistic state”.
用来使应用程序看起来像快速和响应(即,减少用户感知的延迟)的一种常见模式是“乐观地”猜测成功请求的预期输出,然后立即将其应用于视图—全部在请求之前甚至被制造。 我们称这个概念为“乐观状态”。
In this iOS example, I have an app that stores a color. The existing color, Red, is stored in _savedColor
, but when MyViewController has set off a network request to change it, the app immediately overwrites the view's color to Blue, in the _updatingToColor
value. This makes the app feel much faster than waiting for the request to complete. This pattern, however, becomes unmanageable as the application grows. If I leave MyViewController, the other views in the app that depend on the same color value don't reflect this ongoing request. This confuses the user, and makes the app look inconsistent, buggy, and slow! To handle this, many developers simply apply the Color change to the app's global data-cache. In fact, Direct also used to apply optimistic changes to our global caches. But this poses many problems, too. What happens if some other event (such as fetching the Color from network) overwrites my ongoing-Blue color back to Red? This concept is referred to as “Clobbering”. It creates weird experiences for the user, and it's difficult for developers to debug/reproduce.
在这个iOS示例中,我有一个存储颜色的应用程序。 现有颜色Red存储在_savedColor
,但是当MyViewController发出网络更改请求时,应用程序会立即在_updatingToColor
值中将视图的颜色覆盖为Blue。 这使应用程序感觉比等待请求完成快得多。 但是,随着应用程序的增长,这种模式变得难以管理。 如果我离开MyViewController,则应用程序中依赖相同颜色值的其他视图不会反映此正在进行的请求。 这会使用户感到困惑,并使应用程序看起来不一致,有问题且运行缓慢! 为了解决这个问题,许多开发人员只需将“颜色”更改应用到应用程序的全局数据缓存即可。 实际上,Direct还曾经将乐观的更改应用于我们的全局缓存。 但这也带来了许多问题。 如果某些其他事件(例如,从网络中获取颜色)将我正在进行的蓝色变回红色,会发生什么情况? 这个概念被称为“ 破坏 ”。 它为用户带来了奇怪的体验,并且使开发人员难以调试/复制。
Additionally, tying a network request to a short-lived ViewController causes its own set of issues. If my request fails for a retriable reason, such as a loss of network, we should be able to perform this request again later, even if MyViewController is deallocated.
另外,将网络请求绑定到短暂的ViewController会导致一系列问题。 如果我的请求由于可重试的原因而失败,例如网络中断,那么即使MyViewController被释放,我们也应该能够在以后再次执行此请求。
As you can quickly see, optimistic state and network-request retry-logic are easy to build, but difficult to get right.
如您所见,乐观状态和网络请求重试逻辑很容易构建,但很难正确实现。
Direct的变异管理器 (Direct’s Mutation Manager)
Given the number of different network conditions we must operate within, and the number of product surfaces we must support, building a consistent retry and optimistic state policy is a difficult task. To solve these problems, we built a piece of infrastructure that we call the Mutation Manager. The Mutation Manager is designed to answer the questions above. Specifically, we wanted to make it effortless for mobile engineers to get:
考虑到我们必须在其中运行的不同网络条件的数量以及我们必须支持的产品表面的数量,建立一致的重试和乐观状态策略是一项艰巨的任务。 为了解决这些问题,我们构建了一个基础结构,称为“变异管理器”。 变异管理器旨在回答上述问题。 具体来说,我们希望使移动工程师毫不费力地获得:
- Intelligent and customizable auto-retry strategies for their network requests, with backoff behavior and retries across cold-starts. 针对网络请求的智能且可自定义的自动重试策略,在冷启动过程中具有回退行为和重试功能。
- Optimistic State applied to all surfaces of the application, and free lifecycle management (adding, removing, handling clobbering, etc). 应用到应用程序所有表面的乐观状态,以及免费的生命周期管理(添加,删除,处理事务等)。
突变管理器如何工作 (How the Mutation Manager Works)
Direct’s Mutation Manager (or short: DMM) achieves these goals (and more) by creating a centralized service that owns the network requests, serializes them to disk for retries, and safely manages the flow of their resulting optimistic state. In Instagram Direct, all surfaces implement this pattern. Let’s follow an example: imagine you navigate into a Direct thread with a message your friend just sent you. In that scenario, these steps occur:
Direct的Mutation Manager(或简称为DMM)通过创建一个拥有网络请求的集中服务来实现这些目标(甚至更多),将网络请求序列化到磁盘以进行重试,并安全地管理其最终乐观状态的流程。 在Instagram Direct中,所有表面均实现此模式。 让我们看一个例子:假设您导航到Direct线程,其中包含您的朋友刚刚发送给您的消息。 在这种情况下,将发生以下步骤:
- Submit a “Mark Thread as Read” mutation to the DMM. 向DMM提交“将线程标记为已读”突变。
- The DMM saves an entry into the OptimisticState cache. This entry is an instruction object, which describes the desired data change before the data is given to any UI. DMM将条目保存到OptimisticState缓存中。 此项是一个指令对象,它描述在将数据提供给任何UI之前所需的数据更改。
- Mutation is saved to disk, in case we need to retry after an app termination such as a crash. 突变会保存到磁盘,以防万一我们需要在应用终止(例如崩溃)后重试。
- The UI will then use ViewModels, which represent the merged state of the published-data and the entry that was saved to the OptimisticState cache. 然后,UI将使用ViewModels,它们代表已发布数据的合并状态以及保存到OptimisticState缓存中的条目。
- The network request is sent out. Each mutation has a MutationToken in its payload, which is a unique id created on the client. 网络请求已发送出去。 每个突变的有效载荷中都有一个MutationToken,这是在客户端上创建的唯一ID。
- Once we have received the new confirmed state (with the matching MutationToken) from the server and updated the published-data, we remove and thus stop applying the entry from the optimistic state cache. 一旦我们从服务器接收到新的确认状态(具有匹配的MutationToken)并更新了已发布的数据,我们将删除并因此停止从乐观状态缓存中应用该条目。
After all our mutations and surfaces were migrated to this pattern, optimistic state became an afterthought of product development, and yet all UX surfaces remain consistent and the app feels fast to the user. Since optimistic state and server data are stored separately, and only merged on-the-fly at the View layer, Clobbering is impossible. Of course, nothing is free. The amount of client-side processing happening here has definitely increased. But, in practice, we’ve been able to mitigate any performance issues by keeping the application of the Optimistic State entry to the View Model as cheap as possible. The DMM also preserves the order in which requests were sent, so mutations that should be visually ordered now get this support for free; for example, the DMM will only send the messages in the exact order in which the API was invoked, sending messages in the order that users expect a messaging service to work.
在我们所有的变体和外观都迁移到这种模式之后,乐观的状态成为了产品开发的重中之重,但是所有UX表面仍然保持一致,并且应用程序对用户的感觉很快。 由于乐观状态和服务器数据是分开存储的,并且只能在View层上即时合并, 因此无法进行破坏 。 当然,没有什么是免费的。 此处发生的客户端处理量肯定增加了。 但是,实际上,通过使视图模型的乐观状态条目的应用尽可能便宜,我们已经能够缓解任何性能问题。 DMM还保留了发送请求的顺序,因此,应该以视觉方式对突变进行排序现在可以免费获得此支持。 例如,DMM将仅按照调用API的确切顺序发送消息,以用户期望消息传递服务正常工作的顺序发送消息。
开发者经验 (The Developer Experience)
As seen above, there are many benefits to centralizing mutations and the flow of optimistic state. The Mutation Manager not only enforces good patterns for the app, but its also makes adding new mutations to this system extremely simple and quick. When adding a new mutation type, the compiler will guide you to answer all necessary questions about this request (network payloads, optimistic entries, etc). This ensures that as our team grows, our UX remains performant and reliable. Let’s take the “Mark Thread as Read” mutation as an example. Previously, this mutation applied optimistic state directly to the Server-Data cache. As a result, that data could accidentally be clobbered back to an Unread state. To prevent this, we introduced merging logic directly in the Server-Data cache, which, while functional, was unfortunately quite complicated. However, once the mutation was moved onto the DMM, not only did it drastically simplify the merging logic, but it also resulted in a more consistent experience for the user. Additionally, requests within the Mutation Manager are easier to debug. In our employee-only debug builds, the Mutation Manager logs events to a file that can be uploaded by the bug reporter. The engineer is then able to easily parse these logs and diagnose the request. In this example, we can see the request failed on the first attempt with a 500 error code, retried a second later, and succeeded.
如上所示,集中化突变和乐观状态流有很多好处。 变异管理器不仅可以为应用强制执行良好的模式,而且还可以非常简单,快速地向该系统添加新的变异。 添加新的突变类型时,编译器将指导您回答有关此请求的所有必要问题(网络有效载荷,乐观条目等)。 这确保了随着我们团队的成长,我们的用户体验仍然保持性能和可靠性。 让我们以“将线程标记为已读”突变为例。 以前,此突变直接将乐观状态应用于服务器数据缓存。 结果,该数据可能会意外地变回未读状态。 为避免这种情况,我们直接在服务器数据缓存中引入了合并逻辑,但不幸的是,合并逻辑虽然非常复杂。 但是,一旦将突变转移到DMM上,不仅大大简化了合并逻辑,而且还为用户带来了更加一致的体验。 此外,变异管理器中的请求更易于调试。 在我们的仅限员工的调试版本中,突变管理器将事件记录到文件中,该文件可以由错误报告程序上传。 然后,工程师可以轻松解析这些日志并诊断请求。 在此示例中,我们可以看到请求在第一次尝试时失败,错误代码为500,第二次重试,然后成功。
As you can see, this infrastructure allows our product teams to move quickly without compromising performance and reliability. Check out our open engineering roles to join a fast-moving team at Instagram today.
如您所见,此基础架构使我们的产品团队能够快速行动,而不会影响性能和可靠性。 快来查看我们的开放工程职位 ,今天就可以加入一个快速发展的团队。
翻译自: https://instagram-engineering.com/making-direct-messages-reliable-and-fast-a152bdfd697f
hdtune快速扫描可靠吗