http://www.infoq.com/presentations/Social-Networks-NoSQL
monolithic single service, synchronous
Asynchronous Services
php-amqp (http://code.google.com/p/php-amqp/)
Activity Stream
Social Network Problem(Twitter Problem)
• >15 different Events
• Timelines
• Aggregation
• Filters
• Privacy
18M Events/day sent to ~150 friends
=> 2700M timeline inserts / day
20% during peak hour
=> 3.6M event inserts/hour - 1000/s
=> 540M timeline inserts/hour - 150000/s
• Nginx + Janitor
• Embedded Jetty + RESTeasy
• NoSQL Storage Backends
Schema:
Message → [distribute uid, body]
(index)recipient_time_line → [[itemId, time, type]
(index)sender_time_line → [itemId, time, type]
• Identify Users with recipient lists >{limit}
• Only push updates with recipients <{limit} to MRI(recipient_time_line)
• Pull special profiles and users with >{limit} from ORI (sender_time_line)
• Identify active users with a bloom/bit filter for pull
Activity Filter
Scan full day of updates for16M users on a per minute granularity for 1000 friends in < 100msecs
NoSQL: Redis
• Fast in memory Data-Structure Server
• Easy protocol
• Asynchronous Persistence
• Master-Slave Replication
• Virtual-Memory (>250 GB of RAM needed, swaps less frequented values to disk)
• JRedis - The Java client (No consistent hashing, No rebalancing, Pipelining support)
• Persistence - AOF(less memory hungry) and Bgsave (for additional backups)
NoSQL: Voldemort
• Key-Value Store
• Replication
• Versioning
• Eventual Consistency
• Pluggable Routing / Hashing Strategy
• Rebalancing
• Pluggable Storage-Engine
• Reduce the size of the BDB append log
• Balance Client and Server Threadpools
• Choose a big number of partitions
NoSQL:Hazelcast
• In Memory Data Grid
• Dynamically Scales
• Distributed java.util.{Queue|Set|List|Map}and more
• Dynamic Partitioning with Backups
• Configurable Eviction
• Persistence
• IDs of Stream Entries generated via Hazelcast
• Nodes get ranges assigned (node1: 10000-19999, node2: 20000-29999 ID's)
• IDs per range locally incremented on the node (thread safe/atomic)
• Distributed locks secure range assignment for nodes
Start benchmarking and profiling your app early!
名词大洗底:
Nginx : 反向代理服务器 , 正向代理(forward proxy ) 是out-bound traffic 代理,比如ISP提供商的代理,将用户(比如我等宽带用户)的请求forward到internet的目标服务器,可能会缓存网页,改善性能. 反向代理是在服务器端加入一个代理层, 将请求分发到到不同的主机。因此反向代理可作为load balancer
Janitor: Computer Janitor is a tool that lets you clean up a system so it's more like a freshly installed one ?
AMQP : 高级消息队列协议,这是一个可以和 JMS 进行类比的消息中间件开放规范,所不同的是 AMQP 同时定义了消息中间件的语意层面和协议层面;另外一个不同是 AMQP 是语言中立的,而 JMS 仅和 Java 相关。AMQP 在“语意层面的定义”,这就意味着,它并不仅仅是象 JMS 或者其他的 MQ 一样,仅能按照预定义的方式工作,而是“可编程”的消息中间件 。
RabbitMQ : Erlang 实现的 MQ
Jetty : java jsp servlet的一个轻量级(?)容器,据说比Tomcat, JBoss支持更好的高并发
RESTeasy : jax-ws的一个实现,同JBOSS有着很好的继承;同时有client端和server端的web service 框架
voldmort : Linked 使用的分布式key-value存储系统, java实现, Pluggable serialization, Storage-Engine, Routing / Hashing Strategy (三个关键模块都是pluggable)
hazelcast : In Memory Data Grid, Distributed java.util.{Queue|Set|List|Map} (? 相当有趣)
Redis :c语言的 key-value store,支持数据结构的直接存储(比如list, set, ordered set)