From: http://netmesh.info/jernst/Comments/sdforum-ebay-architecture.html
This week, I attended a very interesting presentation by Dan Pritchett and Randy Shoup, both senior technologists at eBay, on eBay's architecture. Some of it was as I would have expected, other things were, shall we say, counter-intuitive. Here is a random collection of notes, with some special exclamation marks:
- 212 million registered users, 1 billion photos
- 1 billion page views a day, 105 million listings, 2 petabytes of data, 3 billion API calls a month
- something like a factor of 35 in page views, e-mails sent, bandwidth from June 1999 to Q3/2006.
- 99.94% availability, measured as "all parts of site functional to everybody" vs. at least one part of a site not functional to some users somewhere
- 15,000 application servers, all J2EE. About 100 groups of functionality aka "apps". Notion of a "pool": "all the machines that deal with selling"... Well over 200 databases.
- Everything is planned with the question "what if load increases by 10x". Scaling only horizontal, not vertical: many parallel boxes.
- leverages MSXML framework for presentation layer (even in Java)
- Oracle databases, WebSphere Java (still 1.3.1)
- split databases by primary access path, modulo on a key
- every database has at least 3 on-line databases. Distributed over 8 data centers
- some database copies run 15 min behind, 4 hours behind
- no stored procedures. some very simple triggers.
- move cpu-intensive work moved out of the database layer to applications applications layer: referential integrity, joins, sorting done in the application layer! Reasoning: app servers are cheap, databases are the bottleneck.
- no client-side transactions. no distributed transactions
- J2EE: use servlets, JDBC, connection pools (with rewrite). Not much else.
- no state information in application tier. transient state maintained in cookie or scratch database
- app servers do not talk to each other -- strict layering of architecture
- Search, in 2002: 9 hours to update the index running on largest Sun box available -- not keeping up
- Average item on site changes its search data 5 times before it is sold (e.g. price), so real-time search results are extremely important.
- "Voyager": real-time feeder infrastructure built by eBay.. Uses reliable multicast from primary database to search nodes, in-memory search index, horizontal segmentation, N slices, load-balances over M instances, cache queries