系统设计框架-Design A Code-Deployment System
- Questiones
- Soulution
-
- Gathering System Requirements
- Coming Up With A Plan
- Build System -- General Overview
- Build System -- Job Queue
- Build System -- SQL Job Queue
- 6. Build System -- Concurrency
- 7. Build System -- Lost Jobs
- 8. Build System -- Scale Estimation
- 9. Build System -- Storage
- 10. Deployment System -- General Overview
- 11. Deployment System -- Replication-Status Service
- 12. Deployment System -- Blob Distribution
- 13. Deployment System -- Trigger
- 14. System Diagram
Design a global and fast code-deployment system.
Many systems design questions are intentionally left very vague and are literally given in the form of Design Foobar
. It’s your job to ask clarifying questions to better understand the system that you have to build.
Questiones
Question 1
Q: What exactly do we mean by a code-deployment system? Are we talking about building, testing, and shipping code?
A: We want to design a system that takes code, builds it into a binary (an opaque blob of data—the compiled code), and deploys the result globally in an efficient and scalable way. We don’t need to worry about testing code; let’s assume that’s already covered.
Question 2
Q: What part of the software-development lifecycle, so to speak, are we designing this for? Is this process of building and deploying code happening when code is being submitted for code review, when code is being merged into a codebase, or when code is being shipped?
A: Once code is merged into the trunk or master branch of a central code repository, engineers should be able to trigger a build and deploy that build (through a UI, which we’re not designing). At that point, the code has already been reviewed and is ready to ship. So to clarify, we’re not designing the system that handles code being submitted for review or being merged into a master branch—just the system that takes merged code, builds it, and deploys it.
Question 3
Q: Are we essentially trying to ship code to production by sending it
to, presumably, all of our application servers around the world?
A: Yes, exactly.
Question 4
Q: How many machines are we deploying to? Are they located all over
the world?
A: We want this system to scale massively to hundreds of thousands of machines spread across 5-10 regions throughout the world.
Question 5
Q: This sounds like an internal system. Is there any sense of urgency in deploying this code? Can we afford failures in the deployment process? How fast do we want a single deployment to take?
A: This is an internal system, but we’ll want to have decent availability, because many outages are resolved by rolling forward or rolling back buggy code, so this part of the infrastructure may be necessary to avoid certain terrible situations. In terms of failure tolerance, any build should eventually reach a SUCCESS or FAILURE state. Once a binary has been successfully built, it should be shippable to all machines globally within 30 minutes.
Question 6
Q: So it sounds like we want our system to be available, but not necessarily highly available, we want a clear end-state for builds, and we want the entire process of building and deploying code to take roughly 30 minutes. Is that correct?
A: Yes, that’s correct.
Question 7
Q: How often will we be building and deploying code, how long does it take to build code, and how big can the binaries that we’ll be deploying get?
A: Engineering teams deploy