设计和部署internet级可扩展服务-On Designing and Deploying Internet-Scale Services

 

James第一条经验“Design for failure”是所有互联网架构成功的一个关键。互联网系统的工程理论其实非常简单,James paper中内容几乎称不上理论,而是多条实践经验分享,每个公司对这些经验的理解及执行力决定了架构成败。

 

http://www.mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf

 

Three simple tenets
1. Expect failures
2. Keep things simple
3. Automate everything

1. Overall Application Design
1. Design for failure
2. Redundancy and fault recovery
3. Commodity hardware slice
4. Single-version software
5. Multi-tenancy
6. Quick service health check
7. Develop in the full environment
8. Zero trust of underlying components
9. Do not build the same functionality in multiple components
10. One pod or cluster should not affect another pod or cluster
11. Allow rare emergency human intervention
12. Keep things simple and robust
13. Enforce admission control at all levels
14. Partition the service
15. Understand the network design
16. Analyze throughput and latency
17. Treat operations utilityies as part of the service
18. Understand access patterns
19. Version everything
20. Keep the unit/functional tests from the last release.
21. Avoid single points of failure

2. Automatic Management and Provisioning
1. Be restartable and redundant
2. Support geo-distribution
3. Automatic provisioning and installation
4. Configuration and code as a unit
5. Manage server roles or personalities rather than servers
6. Multi-system failures are common
7. Recover at the service level
8. Never rely on local storage for non-recoverable information
9. Keep deployment simple
10. Fail services regularly

3. Dependency Management
1. Expect letency
2. Isolate failures
3. Use shipping and proven components
4. Implement inter-service monitoring and alerting
5. Dependent services require the same desgin point
6. Decouple components

4. Release Cycle and Testing
1. Ship often
2. Use production data to find problems
3. Invest in engineering
4. Support version roll-back
5. Maintain forward and backward compatibility
6. Single-server deployment
7. Stress test for load
8. Perform capacity and performance testing prior to new releases
9. Build and deploy shallowly and iteratively
10. Test with real data
11. Run system-level acceptance tests
12. Test and develop in full environments

5. Hardware Selection and Standardization
1. Use only stantard SKUs
2. Purchase full racks
3. Write to a hardware abstraction
4. Abstract the network and naming

6. Operations and Capacity Planning
1. Make the development team responsible
2. Soft delete only
3. Track resource allocation
4. Make one change at a time
5. Make Everything Configurable

7. Auditing, Monitoring and Alerting
1. Instrument everything
2. Data is the most valuable asset
3. Have a customer view of service
4. Instrmentation required for production testing
5. Latencies are the toughest problem
6. Have sufficient production data
7. Configurable logging
8. Expose health information for monitoring
9. Make all reported errors actionable
10. Enable quick diagnosis of production problems

8. Graceful Degradation and Admission Control
1. Support a "big red switch"
2. Control admission
3. Meter admission

9. Customer and Press Communication Plan

10. Customer Self-Provisioning and Self-Help

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值