
A LinkedIn friend with an amazing proof-of-concept web application recently asked me how to scale it. That’s an excellent question — but the short, and probably very unsatisfying answer I gave, was to wait and see until performance drops, and address the most significant contributor to the drop in performance. Lather, rinse, repeat.

一个拥有令人惊叹的概念验证网络应用程序的LinkedIn朋友最近问我如何扩展它。 这是一个很好的问题,但是我给我的简短且可能非常不令人满意的答案是等待观察,直到性能下降为止,并解决造成性能下降的最重要因素。 泡沫,冲洗,重复。

I can picture readers with thought-bubbles full of objections at this point: what about Content Delivery Network (CDN), caching, load balancers and auto-scalers, etc.? Yes, they are all great tactics — but only when needed. If your web site traffic peaks at 1,000 requests per hour, then setting up caching, load balancers and multiple web server instances adds complexity and increases operational cost without adding any value. On the other hand, if you are certain your web site will receive millions of users on a daily basis, then you should definitely architect for that kind of load from the very beginning. Doing so comes at a big effort and operational cost, so I hope you’re right in your projections.

现在,我可以为读者带来很多反对意见,例如: 内容分发网络(CDN)缓存负载均衡器自动缩放器等? 是的,它们都是很棒的策略-但只有在需要时才可以。 如果您的网站流量达到每小时1,000个请求的峰值,那么设置缓存,负载平衡器和多个Web服务器实例会增加复杂性并增加运营成本,而不会增加任何价值。 另一方面,如果您确定您的网站每天会收到数百万的用户,那么您绝对应该从一开始就设计这种负载。 这样做需要付出很大的努力和运营成本,所以我希望您的预测是正确的。

You will likely be able to keep up with your site’s growth in most scenarios as long as you watch the site performance like a hawk, and swoop in when growth causes the numbers to trend upwards.


The following advice is a distillation of experiences I’ve picked up throughout my career as a software engineer and software architect. I’ve worked on large scale e-commerce web sites, mission-critical enterprise systems with hundreds of sites spanning vast geographical areas and millions lines of code, and performance engineered an e-commerce web site that prepared for tens of millions of customers trying to claim a limited number of inventory in scheduled sales events.

以下建议是我作为软件工程师和软件架构师在整个职业生涯中积累的经验的总结。 我曾在大型电子商务网站,关键任务企业系统上工作,其中有数百个站点分布在广阔的地理区域和数百万行代码中,并且性能出色地设计了一个电子商务网站,为数以千万计的客户提供了尝试在预定的销售活动中声明数量有限的库存。

This article will not to make you an expert in performance engineering, but hopefully it will give you a broad idea of indicators and matching tactics, pointers for techniques and concepts to read up on, and maybe ideas where to look next when a performance indicator goes red, and you need to figure out where to start.


页面响应时间 (Page Response Times)

Notebook computer showing a dashboard full of graphs
Companies promoting or doing business over the Internet have learned that Search Engine Optimization (SEO) and page response times are key factors to grow, retain and convert visitors. In my experience, SEO is largely driven by marketing and product input, while page load time is very much an engineering responsibility.

通过互联网促进或开展业务的公司已经了解到, 搜索引擎优化(SEO)页面响应时间是增加,保留和转化访问者的关键因素。 以我的经验,SEO在很大程度上受市场和产品投入的驱动,而页面加载时间在很大程度上是工程责任。

Bounce rates (people visiting and quickly leaving the site), lower conversion rates (people who complete a purchase) are top key performance indicators (KPI) of sites that generate revenue from visitors. Higher bounce rates and lower conversion rates equals lower revenue.

跳出率 (访问并Swift离开网站的人),较低的转化率 (完成购买的人)是最关键的绩效指标 (KPI) 从访问者那里产生收入的网站。 较高的跳出率和较低的转化率等于较低的收入。

Longer page load times lead to higher bounce rates and lower conversion rates. Google have been factoring speed into page rankings since July 2018, and they’ve announced further updates that were scheduled to land in the spring of 2020, but has been postponed to 2021 due to the COVID-19 pandemic.

较长的页面加载时间导致较高的跳出率和较低的转换率自2018年7月以来 ,谷歌一直在考虑提高页面排名的速度,他们已经宣布了计划于2020年Spring发布的更多更新 ,但由于COVID-19大流行而推迟至2021年。

Page load times can be affected by a range of factors, including the device the browser is running on, the network between the browser and the web server, and the design and implementation of the web application that handles the request. While you don’t have control over the browsers and the network, you can optimize your content to work better with challenging environments.

页面加载时间可能受多种因素的影响,包括运行浏览器的设备,浏览器与Web服务器之间的网络以及处理请求的Web应用程序的设计和实现。 尽管您无法控制浏览器和网络,但是您可以优化内容,以在具有挑战性的环境中更好地工作。

Example breakdown of 2 second page load time: 0.3 seconds browser, 0.8 seconds Internet, and 0.9 seconds web server
Sample page load time broken down by time spent in the browser, transportation over the Internet, and the web application

Web application performance is the sum of the code, the other services it calls such as databases and file systems, the 3rd party dependencies it utilizes, the infrastructure it is deployed on, the network web requests travels over, and the devices that the browsers run on, and the browsers that make the requests. Each one of those parts have the potential to be the weakest chain in the link, and thus the bottleneck that limits the entire performance no matter how well every other link in the chain performs.

Web应用程序的性能是代码,其调用的其他服务(例如数据库和文件系统),其利用的第三方依赖关系,其所部署的基础结构,网络Web请求经过的网络以及浏览器运行的设备的总和。以及发出请求的浏览器。 这些部分中的每个部分都有可能成为链路中最弱的链,因此,无论链中其他所有链路的性能如何, 瓶颈都将限制整个性能。

Browser sending request over the Internet to a web server that talks to file storage and database services to render response
Example of a browser sending a request to a web server that talks to external services
Docker containers Docker容器中运行Web应用程序的两个实例的Web服务器

数据驱动优化 (Data Driven Optimization)

Loupe over a computer keyboard
A very common sunk opportunity cost and cause of unnecessary complexity is premature optimization. Software engineers learn to rely on past experiences to quickly identify solutions to new challenges by pattern matching solutions they’ve come up or used with in the past. That generally works well, except when it comes to solving performance issues; a specific issue that turned out to be the most significant performance bottleneck in one project will not necessarily be a significant contributor in another project.

最常见的沉没机会成本和不必要的复杂性原因是过早的优化 。 软件工程师学会依靠过去的经验,通过他们过去提出或使用过的模式匹配解决方案来快速确定新挑战的解决方案。 除非涉及解决性能问题,否则通常效果很好。 一个特定的问题原来是一个项目中最重要的性能瓶颈,而不一定是另一个项目中的重要因素。

Unfortunately, scaling web applications is not a once-and-done thing.


Different web sites have different access patterns based on their usage and their visitors. Even the exact same code base and deployment infrastructure may expose different performance bottlenecks based on concrete access patterns and behaviors of visitors — which could change over time.

不同的网站根据其使用情况和访问者具有不同的访问模式。 即使是完全相同的代码库和部署基础结构,也可能会基于具体的访问模式和访问者的行为而暴露出不同的性能瓶颈-这些可能随时间而变化。

In other words, addressing performance issues need to be strictly data driven. Datadog and New Relic are examples of Application Performance Management (APM) services that magically instrument your entire code base, and help identify where the majority of the time was spent while handling requests in your production, staging and other environments; that might be one or more slow database queries, loading files, or time spent in a particular hierarchy of function calls.

换句话说,解决性能问题必须严格由数据驱动。 DatadogNew Relic应用程序性能管理(APM)服务的示例,可以神奇地检测整个代码库,并帮助确定在生产,暂存和其他环境中处理请求时,大部分时间花在了哪里; 可能是一个或多个缓慢的数据库查询,加载文件或在特定的函数调用层次结构中花费的时间。

Setting up a dashboard on a big screen with real-time, high level indicators of your web site’s performance is a great way to spot unusual behavior at a glance so you can take action to look up the detailed metrics before things spin out of control.


Dashboard with key web site performance indicators
Dashboard with key AWS performance indicators

高级方法 (High-Level Approach)

Notebook with checklist
The following “flow chart” maps common issues to general tactics that are each worth a book or article in their own right. The list below is by no means exhaustive, and particular circumstances of any specific web application could reasonably dictate other tactics.

以下“流程图”将常见问题映射到一般策略,每种策略本身都值得一本书或一篇文章。 下面的列表绝不是详尽无遗的,任何特定Web应用程序的特定情况都可能合理地指示其他策略。

If you’re using free web application hosting, you’ve probably noticed your application takes seconds to respond. It’s a great way to try out concepts and host portfolio projects, but operating this kind of service is not cheap, so the providers throttle performance and host many applications on a limited infrastructure. When you get serious about growing a web site, you’ll have to pay to play.

如果您使用免费的Web应用程序托管,则可能已经注意到您的应用程序需要几秒钟才能响应。 这是尝试概念和托管项目组合的好方法,但是操作这种服务并不便宜,因此提供商可以限制性能并在有限的基础架构上托管许多应用程序。 当您认真考虑发展网站时,必须付费才能玩。

If monitoring shows


  • most of the time is spent serving static assets like images, CSS and JavaScript, then move those assets to an edge cached CDN such as CloudFlare or CloudFront, optimize the images for web and performance, and minify static CSS and JavaScript,

    大部分时间都花在服务静态资产(例如图像, CSS和JavaScript)上,然后将这些资源移到边缘缓存的CDN(例如CloudFlareCloudFront)上针对网络和性能优化图像 ,并最小化静态CSS和JavaScript,

  • most of the time is still spent serving images from the CDN, then lazy load images below-the-fold,


  • most of the time is spent in database queries, then consider if query results can be cached, and use the query analyzer built into all major databases to understand whether to add indexes, re-arrange inefficient queries or whatever other tactics the query analyzer suggests (PostgreSQL, MySQL, Microsoft SQL Server, Oracle). A common pitfall of developers accessing the database via an ORM (Object Relational Mapping) is inadvertently causing n+1 selects,

    大部分时间都花在数据库查询上,然后考虑是否可以缓存查询结果,并使用所有主要数据库中内置的查询分析器来了解是否添加索引,重新排列低效查询或查询分析器建议的其他任何策略( PostgreSQLMySQLMicrosoft SQL ServerOracle )。 开发人员通过ORM(对象关系映射)访问数据库的常见陷阱是无意间导致了n + 1个选择

  • most of the time is spent rendering responses on the server, then check if all or parts of responses can be cached, and examine what is eating up the time — maybe there’s inefficient looping, unused data that could be filtered out and so on,

  • the disk I/O bandwidth is high, examine if the web server is loading large files, or a lot of files. If those files are passed straight through as responses, consider caching them with a CDN or with a memory cache. If the files are used in computations, consider if some or all of the data can be precomputed and/or cached,

    磁盘I / O带宽高,请检查Web服务器是否正在加载大文件或大量文件。 如果这些文件直接作为响应传递,请考虑使用CDN或内存缓存对其进行缓存。 如果文件用于计算,请考虑是否可以对部分或全部数据进行预计算和/或缓存,

  • most of the time is spent transferring dynamically generated content, or rendering in the browser, then consider lazy loading content below-the-fold,

  • the CPU load is high, then see if all the time is spent in a particular function. If so, consider the trade-off between refactoring and optimizing (which might not be possible) versus scaling up, or scaling out with a load balancer and multiple parallel instances,

    CPU负载高,然后查看是否所有时间都花在了特定功能上。 如果是这样,请考虑在重构和优化(这可能是不可能的)与按比例扩展使用负载均衡器和多个并行实例进行横向扩展之间进行权衡,

  • the memory load is high, then check for memory leaks and increase the RAM. In some cases handling big query results may eat up all available memory, or an algorithm may take up more memory than anticipated, in which case the fix is refactoring code, in other cases scaling vertically,

    内存负载高,然后检查内存泄漏并增加RAM 。 在某些情况下,处理大查询结果可能会耗尽所有可用内存,或者算法可能会占用比预期更多的内存,在这种情况下,解决方案是重构代码,在其他情况下,则是纵向扩展,

  • the network latency is the most contributing bottleneck the further browsers are from your web server, then figure out how to run you web servers closer to your users (I’m currently writing a blog post about this), or

  • the network latency is the most contributing bottleneck in general, then scaling horizontally or vertically may solve the problem if you’re hosting the web application in the cloud, otherwise you’ll have to figure out how to increase the network bandwidth to the machine that hosts your web application.


循环访问当前最重要的瓶颈 (Iterate on The Currently Most Significant Bottleneck)

Performance metrics output from your APM will likely show the time and percentage spent across the tech stack when serving a particular request.


You might be tempted to simply work down the ranked list of contributors of a single performance result, but once the most significant bottleneck has been addressed, there’s a good chance the next most significant bottleneck will turn out to be something different than the one that was ranked second in the first metrics output.


Cycle of monitoring and optimizing

If your web application is deployed to a cloud Platform as a Service (PaaS) or Infrastructure as a Service (IaaS), then beware you local development environment will likely not be a good place to evaluate performance.

如果将Web应用程序部署到云即服务(PaaS)基础架构即服务(IaaS) ,则请注意本地开发环境可能不是评估性能的好地方。

绩效策略的堆栈排名 (Stack Ranking of Performance Tactics)

Stacks of macarons
Here’s my highly opinionated stack-ranked list of general performance tactics. I’m well aware particular circumstances can throw this upside down, but in my 30+ years of building enterprise applications, web applications, and odds and bits in-between, this is what I generally consider when thinking about scalability.

这是我对通用性能策略高度评价的堆栈排名列表。 我很清楚特定的情况会导致这种情况倒挂,但是在我构建企业应用程序,Web应用程序以及介于两者之间的30多年的岁月中,这是我在考虑可伸缩性时通常会考虑的问题。

My ranking is broadly based on impact versus cost and effort. In that sense, the best tactics are the ones with the highest impact and lowest cost/effort. It’s often relatively easy to identify and stay away from low impact and high cost/effort tactics, but low impact and low cost/effort can be tempting because the cost/effort is low, so what’s the harm? Opportunity cost. Hunter Walk calls this snacking.

我的排名大致基于影响与成本和努力的对比。 从这个意义上讲,最好的策略是影响最大,成本/工作量最小的策略。 通常比较容易识别并远离低影响和高成本/努力策略,但是低影响和低成本/努力可能很诱人,因为成本/努力低,那么有什么危害呢? 机会成本。 猎人街称之为零食

Graph with low to high impact vertically and low to high effort horizontally. Low effort and impact is the snacking quadrant.
Des Traynor’s blog post Des Traynor的博客文章 The first rule of prioritization: No snacking 优先顺序的第一条规则:零食

通常很重要的可伸缩性策略 (Scalability Tactics that Often Matter)

The tactics in this cohort are fairly low effort and often have a high impact since assets like video, images, CSS, and JavaScript are usually some of the biggest responses returned to web browsers.


  • Scale images to intended display resolutions and compress them. This is particularly important for mobile devices; they are often on slower networks, have less RAM and less CPU power to process incoming data, thus resulting in a slower experience.

    将图像缩放到预期的显示分辨率并压缩它们。 这对于移动设备尤为重要。 它们通常位于速度较慢的网络上,具有较少的RAM和较少的CPU能力来处理传入的数据,因此会导致速度较慢。
  • Serve static assets via a CDN. The major cloud service providers all offer CDN solutions like AWS CloudFront, GCP Cloud CDN, and Azure CDN. If you’re not deploying to the cloud, CloudFlare has your back.

    通过CDN服务静态资产。 所有主要的云服务提供商都提供CDN解决方案,例如AWS CloudFrontGCP Cloud CDNAzure CDN 。 如果您没有部署到云, CloudFlare会为您服务。

  • Scale the web server hosts vertically and horizontally to a size where CPU, memory, and network bandwith is not the limiting factor. This is easy to test if you’re deploying to the cloud: start with a small instance type and see if the response time drops when you add more instances. Whenever CPU or memory runs high, or the maximum requests handled per second plateaus, bump the instance type and go back to one instance, then repeat the process. AWS often ties network and I/O bandwidth to the instance size.

    将Web服务器主机垂直和水平扩展到CPU,内存和网络带宽不是限制因素的大小。 这很容易测试是否要部署到云中:从小实例类型开始,看看添加更多实例时响应时间是否减少。 每当CPU或内存耗尽或每秒处理的最大请求数量稳定时,请提高实例类型并返回到一个实例,然后重复该过程。 AWS通常将网络和I / O带宽与实例大小联系在一起。
  • Lazy load assets below-the-fold.


有时很重要的可伸缩性策略 (Scalability Tactics That Sometimes Matter)

The tactics in this cohort are commonly ones developers think of first without data to pinpoint performance bottlenecks, they are often high effort, and could very well have low impact when done on a hunch.


非常重要的可伸缩性策略 (Scalability Tactics That Rarely Matter)

I have a very pragmatic view of technologies. When applied thoughtfully, one technology may have a number of advantages over another in a particular situation, but I’ve rarely seen raw performance come up as a key deciding point when choosing between similar technologies.

我对技术有非常务实的看法。 如果经过深思熟虑地应用,在特定情况下,一种技术可能会比另一种技术具有许多优势,但是我很少看到原始性能是在类似技术之间进行选择的关键决定点。

Of course there are outliers, like preparing an online ticket system to handle Beyoncé opening up ticket sales for 5 concerts at a 70,000 seat stadium at noon, July 1st.


升级并防止性能下降 (Level Up and Prevent Performance Regression)

Track runners
Once you’ve got detailed performance metrics in place and the team has become used to applying data driven optimization, it is time to level up and prevent performance regression hitting production.


The approach is straightforward: Set response time goals for all pages that contribute to revenue and add tests to your continuous integration/continuous deployment (CI/CD) pipeline to automatically fail builds that cause performance regression.

该方法很简单:为所有有助于收入的页面设置响应时间目标,并向持续集成/持续部署 (CI / CD)管道添加测试,以自动使导致性能下降的构建失败。

This can be achieved by testing for response times in integration tests, integrating Google Chrome Lighthouse, or using other third party services that test response times.

这可以通过在集成测试中测试响应时间,集成Google Chrome Lighthouse或使用其他测试响应时间的第三方服务来实现。

升级并保持增长 (Level Up and Stay Ahead of Growth)

When your site has been growing for a while, revenue is going up, and you begin to worry about hockey stick growth and peak demand, it is time to think about load testing.


Load testing is about simulating realistic user behaviors such as customers browsing the site, putting items in the cart, creating accounts, and checking out. You’ll know from your analytics what the frequency and ratio is for each of these actions. Model user behaviors in tools like Locust or Apache JMeter, then watch how your application performs as you ramp up above your current level of user interactions.

负载测试是关于模拟现实的用户行为,例如客户浏览网站,将商品放入购物车,创建帐户并进行结帐。 通过分析,您将知道这些操作的频率和比率。 在LocustApache JMeter之类的工具中对用户行为进行建模,然后观察您的应用程序在逐渐超过当前用户交互级别时的性能。

If you understand how your users are behaving on your web site, then load testing will help you expose the next weakest link when traffic grows. However, growth might come from a cohort that doesn’t behave like existing customers. Oh well, as your sophistication grows, the complexity grows as well…

如果您了解用户在网站上的行为,那么负载测试将帮助您在流量增长时暴露出下一个最弱的链接。 但是,增长可能来自一群不像现有客户那样的人。 哦,随着您的复杂程度的提高,复杂性也随之提高……

结论 (Conclusion)

Person enjoing view from a mountain top
Scaling web applications is much less an art when tactics are driven by data than hunches. Measure thrice and cut once is a sage advice that also applies to how to focus efforts on optimizing the performance of your web application.

当策略由数据驱动时,扩展Web应用程序不是一门艺术,而是预感。 三次测量和一次剪切是一个明智的建议,该建议也适用于如何集中精力优化Web应用程序的性能。

Set page load thresholds, test against those in your CI/CD pipeline, and you’ll be able free up time building new features without worrying about breaking performance and page ranking.

设置页面加载阈值,针对CI / CD管道中的阈值进行测试,您将可以腾出时间来构建新功能,而不必担心会破坏性能和页面排名。

Model known user behavior and stay ahead of growth with load testing.


Want to learn more details? Here are some great resources:

想了解更多细节? 这里有一些很棒的资源:

We welcome thoughts, comments and counterpoints to help us learn, evolve and grow. Participate and let us know what you think!

我们欢迎各种想法,评论和对策,以帮助我们学习,发展和成长。 参与并让我们知道您的想法!

Please take a look at our careers page if you found this interesting and would like to come help us build, and scale our software for the future.

如果您发现这很有趣,请查看我们的职业页面 ,并希望能帮助我们为将来构建和扩展我们的软件。

A big shout out to Stephanie Cheney and Sophie Parker for very helpful feedback!

非常感谢斯蒂芬妮·切尼 ( Stephanie Cheney)和索菲·帕克(Sophie Parker),以获取非常有用的反馈!

翻译自: https://medium.com/fernish-tech/how-to-scale-a-web-application-f1250a9dbf59

