Large data graph database TAO database

Facebook is currently the world's most famous social networking site, if from the point of view ofdata abstraction, Facebook's social graph includes not only the relationship between friends,also includes the relationship between people and entities and entities, each user, each page,every picture, every application, every place and every comment can be used as independent entity, the user like a page is to establish the relationship between the users and the page, the user at a certain place sign up the relationship between users and locations...... If each entity asnodes in the graph, the relationships between the entities as graph directed edges, all data isFacebook would constitute more than 100 billion edges giant entity graph (Entity Graph). Entity relationship diagram of some is a two-way street, for example, the relationship between friends;some is one-way, such as a user at a certain place sign. At the same time, the entity has its ownattributes, such as a user who graduated from the Stanford University, was born in 1988, theseare all attributes of the user entities. Figure 14-2 is a schematic diagram of the fragment of Facebook entity.

Figure 14-2 Facebook entity graph (Fbid is Facebook unique ID number)

Facebook entity and its attribute, entity relationship data all saved in the TAO graph database,web page data read and write requests are handled by the TAO to provide services. TAO is a use of data "eventual consistency" cross data center distributed map database, consisting ofdistributed across multiple data centers thousands of servers, in order to be able to request the application of real-time response, TAO at the expense of strong consistency, the system architecture to pay more attention to high availability and low delay, especially a lot ofoptimization for a read operation to do, in order to ensure the generation of web pages in high load conditions at the time of high efficiency.

TAO client package diagram operations related to data access API, the client can access the entities and attributes, can easily access various entity relationship data. For example, for relational data access can provide a query interface the following relationship list mode:

(ID, aType) ->[anew,... , aold]

Among them, the only marker ID represents an entity, aType pointed out that the type of relationship (friendship), the relationship between the list is in chronological order list other meet the list of entities ID aType type relations pointed to by ID. For example, (I, COMMENT) can list all the comment information on I.

The overall architecture of 1.TAO

TAO is a quasi real time graph database across multiple data centers, its overall architecture, as shown in figure 14-3. First of all, TAO would be more close data center are combined into one partition (Region), thus forming a plurality of partitions, each partition in the cache is responsible for the entity and relationship all data stored. Among them, concentration of initial data is stored in a primary partition database and cache, the other a number from the partition storage data copy (this is a unique design style, the reader is advised to here to take some time to think about the design starting point, and then read the following content).


So the design architecture, is motivated by the following considerations: cache structure is a very important part in TAO, to quickly respond to user requests to read the help function is huge, and the cache needs to be kept in memory if memory resources, low cost and large enough, then the ideal situation is a copy of the data in each data centre are kept intact to quickly respond to a read operation of the user, users avoid cross data center data read thistime-consuming operation. But considering the amount of data to be stored is too high (PB grade), each data center are respectively store a backup data integrity of the cost is too high,so the second, a plurality of data center is close to in the region as a whole and completestorage and backup all data, because the data center area close to, so the communicationefficiency is higher, so between cost and efficiency make a tradeoff and compromise.

In each partition will entity and its relational data storage complete, TAO in the area of storagearchitecture can be divided into three layers (see figure 14-3), the bottom layer is MySQL database layer, because the amount of data is too much, divides the data table after the formation of several data slice (Shard), a data section consists of a logical relation databasestorage, a server can store multiple copies of the data slice. The second layer is the cache layer one one corresponds to the underlying data section, called the Cache layer (Leader Cache),the logical database contents of the main Cache is responsible for the corresponding cache,and read and write communication and database, the top layer is from the Cache layer (FollowerCache), a plurality of from Cache to be one of the main Cache, responsible for the main content in Cache cache. TAO will cache into two level structure reduces the coupling degree between the cache, is conducive to the expansibility of the whole system, when the system load increases, as long as add storage from the Cache server can easily carry out system expansion.

2.TAO read and write operation

The client program with only the outermost layer from Cache to interact, not directly and the main Cache communication (see 14-4). The client data request, and recently from Cache to establish contact, if it is a read operation from the Cache cache and the data, can be directly returns, for Internet applications, read operations are far greater than the proportion of write operation, so from the Cache can respond to most websites load.

If the user requests from Cache did not hit (Cache Miss), then forward it to the main Cachecorresponding to the Cache, if the LORD had not hit, then read from the database by the mainCache, and update the master Cache (Figure 14-4 successful A and D display this logic), thensent a message to the corresponding from the Cache request from the main Cache loading new data.


For a read operation, all partitions regardless of master slave to follow the logic, but for write operation from the client, the main partition and somewhat from the partition behavior of different. For the primary partition, when received from the Cache write operation request, the main Cache to the corresponding Cache, the main responsible for write the correspondinglogical database, database write operation is successful, the main Cache to the correspondingfrom Cache inform the original information failure or reload the requirements. For from the partition, when the write request received from the Cache, will be the main Cache it to thepartition corresponding to the main, Cache does not directly write to the local database, but therequest is forwarded to the primary Cache partition (Fig. 14-4 successful C position illustratesthis situation), by the writing on the primary database.

That is to say, for a write operation, whether it is a primary partition or from the partition, will payby the main Cache partition to update the master database. In the main database update is successful, the main database will pass the message will this change notifications from thepartition from the database to maintain data consistency, will tell from primary Cache partition ofthis change, and the notice from the partition from the Cache update the cache content triggered the main Cache (see Figure 14-4 mark the location of the B).

Please think about: why from the primary Cache partition in the read operation is not hit from the local database to read, but not like writes forwarded to the primary partition? Read by the local database shortcoming is very obvious, the inconsistency will bring the data from the database,because may at this time is expired data, then do so on purpose or what benefits?

Answer: because the probability of read data cannot be hit in Cache is far greater than thenumber of write operations (in Facebook, about 20 times the difference), so cross partitionoperation on a write operation, the overall efficiency of the impact is not big, but if a lot of read operation take method across partitions, read operation efficiency will be greatly reduced. TAOsacrifice data consistency is to ensure low delay the read operation.

Data consistency in 3.TAO

TAO in order to give priority to efficiency of read operation, in terms of data consistency and made sacrifices, taken the eventual consistency rather than strong consistency. In the maindatabase data change notification from the database, take the asynchronous notification andasynchronous notification, i.e. not confirm complete update from the database, then returned to the client corresponding request. So the main database and reach from the database data consistency there is a time difference, in the meantime, may lead to read data from the clientpartitions expired, but smaller delay after the data changes, can reflect to all from the database,so follow the final consistency.

Specifically, in most cases, the TAO data is ensured "read what you write" consistency. That is,a write operation of the client must be able to read the new value instead of the outdated dataupdated, for example this is necessary in many cases, the user, delete one of your friends, but if you can still see the friend sends the message in the message flow, it is intolerable.

TAO is how to do it? First of all, if the data update operation occurs in the primary partition, by the writing process that, certain can guarantee "read what you write" consistency, difficultsituation is a write request from the client partitions. In this case, from the Cache forwards the request to the main Cache, Lord Cache will write request is forwarded to the primary Cache partition again, by its written into the database, to write successful, notify the partition from the Cache update cache value from the primary Cache partition, the above operation is complete synchronization, although this time from partition database may not receive the main databaseupdate messages, but from the partition of the levels of Cache have been synchronous update,after some can read the contents to a new writing from all levels of Cache request from a readon this partition. By this means can guarantee from the partition of the "read what you write"consistency.


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值