初创文旅企业员工手册_Citymobil-用于在初创企业业务增长中提高可用性的手册。 第1部分

初创文旅企业员工手册

In this first part of an article series «Citymobil — a manual for improving availability amid business growth for startups» I’m going to break down the way we managed to dramatically scale up the availability of Citymobil services. The article opens with the story about our business, our task, the reason for this task to increase the availability emerged and limitations. Citymobil is a rapid-growing taxi aggregator. In 2018, it increased by more than 15 times in terms of number of successfully completed trips. Some months showed 50% increase compared with the previous month.

在“ Citymobil –在初创企业业务增长中提高可用性的手册”系列文章的第一部分中,我将介绍我们设法大幅扩展Citymobil服务可用性的方式。 本文以关于我们的业务,我们的任务,该任务增加可用性的原因以及限制的故事作为开篇。 Citymobil是一个快速发展的出租车聚合商。 在2018年,以成功完成的旅行次数计,它增长了15倍以上。 某些月份显示比上个月增加了50%。

The business grew like a weed in every direction (it still does): there was an increase in server load, team size and number of deployments. At the same time the new threats to service availability emerged. The company faced a task of the most importance — how to increase availability without compromising company growth. In this article, I’ll talk about the way we managed to solve this task in a relatively short time.

业务在各个方向都像杂草一样增长(现在仍然如此):服务器负载,团队规模和部署数量都有所增加。 同时,出现了对服务可用性的新威胁。 公司面临着最重要的任务-如何在不影响公司成长的前提下提高可用性。 在本文中,我将讨论我们在较短时间内设法解决此任务的方式。

1.定义任务:我们到底要改进什么? (1. Defining a task: what exactly do we want to improve?)

Before improving something, we need to learn how to measure it in order to register the improvements. The closer the measurable value to the business terms, the better. In terms of its success, our most important parameter is a number of successfully completed trips (hereafter «number of trips»). This is the parameter that the investors are looking at when making a decision regarding an investment. The more trips, the more valuable is a company.

在改进某些东西之前,我们需要学习如何衡量它以便记录改进。 可衡量的价值与业务条款越接近,越好。 就其成功而言,我们最重要的参数是成功完成的行程数(以下简称“行程数”)。 这是投资者在做出投资决定时要考虑的参数。 旅行次数越多,公司的价值就越高。

Some trips are profitable, some yield a loss. But we equally care about all the trips, even un-profitable ones since they allow the market share to increase (as a matter of fact, this loss is a payment for market share increase). Therefore, every extra trip is a good thing; and every lost one — is not. All the trips are equal in terms of business success.

有些旅行是有利可图的,有些会带来损失。 但是,我们同样关心所有行程,甚至是无利可图的行程,因为它们允许市场份额增加(事实上,这种损失是对市场份额增加的补偿)。 因此,每次额外旅行都是一件好事。 而每一个丢失的人—并非如此。 在业务成功方面,所有行程都是平等的。

Now we have an easy-to-understand availability measuring criterion: number of lost trips — these are the trips that we definitely lost due to the technical issues. By «technical issue» we mean, for example, code bug, 500 internal server error, infrastructure accident, damaged integration with our partner service (e.g. Google Maps).

现在,我们有了一个易于理解的可用性度量标准:丢失的行程数-这些是我们由于技术问题而绝对丢失的行程。 所谓“技术问题”,是指例如代码错误,500个内部服务器错误,基础结构事故,与我们的合作伙伴服务(例如Google Maps)的集成损坏。

2.如何计算旅行损失? (2. How to count the lost trips?)

Sometimes it’s easy to count lost trips, and sometimes it’s hard. For instance, in case of total service failure, when nothing at all works (knock on wood), it’s very easy to count the lost trips. We know the trips number graph trend before the crash; we see this graph trend after the crash; we draw a line between the point when the downtime started and the point when it ended. The area of trips number graph under this completed line represents our lost trips.

有时很容易计算出丢失的行程,有时却很难。 例如,在整体服务失败的情况下,如果根本没有任何效果(敲敲敲打木头),则很容易计算损失的行程。 我们知道事故发生前的行程数字图趋势; 崩溃后我们看到了这个趋势图。 我们在停机时间开始和停机时间之间划了一条线。 此完成线下方的旅行次数图形图表表示我们的旅行损失。

In the graph below, the black line shows the trips on some day and the green one — the past week trips. At the x-axis — time. At the y-axis — number of trips at some time window around x point. You can see an obvious drop in form of an acute-angled triangle. This triangle area is the number of lost trips. Naturally, it’s an approximate number, since it’s a fluctuating graph. However, we understand that even 10-20% precision is enough to evaluate the magnitude of accident for the business.

在下面的图形中,黑线表示某天的行程,绿线表示过去一周的行程。 在x轴上-时间。 在y轴上-在某个时间窗口内绕x点的行驶次数。 您会看到锐角三角形形式的明显下降。 此三角形区域是旅行的失败次数。 自然,这是一个近似数字,因为它是一个波动的图。 但是,我们了解甚至10%到20%的精度也足以评估业务事故的严重程度。

If the downtime is not total but partial (still — knock on wood), the evaluation is a bit more complicating. For example, if there is a bug causing the situation when 10% of orders not being distributed along the vehicles, then in the trip graph we see a ravine and then rebound (after the bug was fixed). In this situation, the lost trips are represented by the area separated by the trend line on top, the actual trips number graph at the bottom, the downtime start on the left and the downtime end on the right.

如果停机时间不是全部而是部分(仍然是敲门砖),则评估会更加复杂。 例如,如果某个错误导致以下情况发生:当10%的订单未沿着车辆分配时,则在行程图中,我们看到一个沟壑,然后反弹(在错误被修复之后)。 在这种情况下,丢失的行程由顶部趋势线分隔的区域,底部的实际行程数图表,左侧的停机时间和右侧的停机时间表示。

As seen in the graph below, the down peak isn’t that evident, but the number of trips during the previous week without down peaks helps understand that this down peak means loss. In fact, comparison of trips during the day and to the same day last week makes it clear that the rightmost down peak doesn’t show lost trips, but a common ravine for that time of day, since it’s correlated to the previous week.

如下图所示,下降峰值并不明显,但是前一周没有下降峰值的出行次数有助于理解该下降峰值意味着损失。 实际上,通过比较当天和上周同一天的出行,可以清楚地看到,最右边的下行高峰并不表示丢失了出行,而是一天中该时段的一个常见谷点,因为它与前一周相关。

A trend line is generally hard to build, since it’s a sawtooth. This is when week-to-week comparison comes in handy. If we draw two lines in the same graph — past week and current, we see that both curves are almost similar, and the only difference is that one is located above the other (usually the current week is higher than the previous one; though exceptions do happen). Week-to-week comparison is quite important as every day of the week due to various reasons has a different graph shape. When we look at the week-to-week graph, we can tell the location of today trips trend line.

趋势线通常很难建立,因为它是锯齿状的。 这就是每周比较比较方便的时候。 如果我们在同一张图中绘制两条线(过去一周和当前曲线),我们会看到两条曲线几乎相似,唯一的区别是一条曲线位于另一条曲线之上(通常,当前星期要高于前一周;不过有例外)确实发生)。 每周比较非常重要,因为由于各种原因,一周中的每一天都有不同的图形形状。 当我们查看每周图表时,我们可以说出今日旅行趋势线的位置。

Obviously, a lost trip on its own presents a much bigger problem than just one lost trip. A client that needs a ride will find a way to go; for example, she can use a competitive service and won’t come back to us later. Or she will, but only after getting disappointed with our competitor which is unlikely as our competitors are strong. More than that, even if the competitor disappoints the client, it’s not given that the client decides to return to us: she’ll believe that everybody has a bad service and there’s no point in switching from one service to another.

显然,独自旅行所带来的问题比仅仅一次旅行所带来的问题要大得多。 需要搭便车的客户会找到一条路。 例如,她可以使用竞争性服务,以后不会再回来找我们。 还是她会的,但是只有在对我们的竞争对手感到失望之后,这是不可能的,因为我们的竞争对手很强大。 更重要的是,即使竞争对手令客户失望,也没有考虑到客户决定返回我们:她会相信每个人的服务质量都很差,从一种服务转换为另一种服务毫无意义。

Therefore, one lost trip due to technical issues means, in fact, several lost trips.

因此,实际上,由于技术问题而造成的一次旅行损失意味着几次旅行。

To not get confused in terms, let’s call the trips lost due to actual technical problems, primary lost trips; and the trips lost due to a client leaving us for our competitor — secondary lost trips.

为了避免混淆,我们将由于实际技术问题而造成的损失称为主要损失。 以及由于客户将我们留给我们的竞争对手而造成的旅行损失– 二次旅行损失。

Ideally, to estimate the total business loss from one primary lost trip we need to figure out how many secondary lost trips it generated. So, we need to multiply a number of primary lost trips by some K coefficient that can be calculated based on average service usage rate and average time needed by a client to return after leaving us for our competitor.

理想情况下,要估算一次主要旅行的总业务损失,我们需要找出它产生了多少次次要的旅行。 因此,我们需要将一些主要的旅行损失乘以一些K系数,该系数可以根据平均服务使用率和客户离开我们成为竞争对手后返回的平均时间来计算。

Assuming that K doesn’t change much with time, it’d be sufficient for us in order to understand the trips loss trend to count the primary lost trips since the period-to-period correlation between primary lost trips will be the same as period-to-period correlation between secondary lost trips. Example: if we lost 1000 primary trips last month, then we lost 1000*K secondary trips and 1000*(1+K) in total. If, again, we’ve lost 500 primary trips this month, then we lost 500*K secondary trips and 500*(1+K) in total. That said, despite K coefficient value we now lose 1000*(1+K) / (500 * (1+K)) = 2 times less trips.

假设K随时间变化不大,那么我们就足以了解行程损失趋势以计算一次主要行程,因为主要行程之间的时间周期相关性将与周期相同次生旅行之间的时间与时间之间的相关性。 示例:如果上个月我们损失了1000次主要旅行,那么我们损失了1000 * K次要旅行,总共损失了1000 *(1+ K )。 再次,如果本月我们损失了500次主要旅行,那么我们损失了500 * K次要旅行,总共损失了500 *(1+ K )次。 就是说,尽管有K系数值,但我们现在损失的行程减少了1000 *(1+ K )/(500 *(1+ K ))= 2倍。

Even if K coefficient changes with time (being a function of time: K(t)), we are still interested in lowering the number of primary lost trips. For if K(t) grows with time, we definitely have to make effort to lose fewer primary trips since the financial loss caused by each and every one of them is getting bigger and bigger. On the other hand, if K(t) decreases with time it means that for some reason our users are getting more and more loyal to us which means that we absolutely must live up to their expectations!

即使K系数随时间变化(作为时间的函数:K(t)),我们仍然有兴趣减少一次主要旅行次数。 因为如果K(t)随着时间增长,我们绝对必须努力减少一次旅行的损失,因为每一次旅行造成的经济损失越来越大。 另一方面,如果K(t)随时间减少,则意味着由于某种原因,我们的用户越来越忠于我们,这意味着我们绝对必须辜负他们的期望!

To sum up: we are striving to decrease the loss of primary trips. In the next part, I’m going to talk about how our process works, and what we’ve done to improve it. Stay tuned!

总结:我们正在努力减少初次旅行的损失。 在下一部分中,我将讨论我们的流程如何工作以及我们做了哪些改进。 敬请关注!

翻译自: https://habr.com/en/company/mailru/blog/449034/

初创文旅企业员工手册

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值