CMS - Configuration management service based on MongoDb

CMS是eBay云服务的配置管理服务,基于MongoDb构建,提供REST服务和自定义查询语言。它存储从资产到应用服务的各种配置项,每天处理千万级别的请求。CMS采用MongoDb是因为其内存存储、读取性能、MVCC和灵活的文档设计。文章详细介绍了CMS的架构考虑、设计,包括元数据模块、实体管理和查询模块等。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Author: Su, Ralph

 

Abstract

Configuration management database (CMDB) is commonly used to store the management items inside an organization/company. CMDB typically designed as a centralized database access points. 

As CMDB of eBay cloud service, CMS is a configuration management service built on top of MongoDb. It provides rest service, and with its own query language. CMS now stores eBay marketplaces configuration items range from asset/network to application service topologies. With peak request of 10millions request per day, CMS now serves as a reliable infrastructure service for eBay cloud service. 

In this article, we present the architecture consideration and design of CMS.

Architecture Consideration

Why MongoDb?

Compare to most of current CMDB which built on relational database, CMS choose MongoDb as its background database. There are a couple of pros & cons

Pros

1. CMS design target is to store configuration items, which means its data size would be capable to store in memory, which could fit to the MongoDb best usage.

2. CMS designed to serve the more read requests than write requests. Combine with #1, mongo could provide easily maintained read scalability through its replica set deployment. 

3. CMS not designed to provide RDB’s strong transaction. Instead, to ensure data consistency, CMS provides MVCC (multi version concurrency control) on object level. 

4. CMDB require schema change/evolution frequently (compare to typically RDB migration). MongoDB’s schema less document design make it feasible as CMDB option.

Cons

1. MongoDB is a schema-less document storage. Although configuration items need flexible schema, they are not schema-less. Solution: CMS provide metadata definition to help the schema definition.

2. No transaction. Solution: CMS implements own MVCC on top of mongo.

3. Mongo provide only single collection query based on key/value. Solution: CMS provide its own query language to support query join.

CMS as a CMDB, why there are repositories/metadata concept which is not a typical CMDB scope?

As a software product, CMS is designed as configuration management service based on its core component of metadata management, entity management, and query services. This design makes CMS not only a product that could serve the requirement of CMDB. And it also makes CMS a more general persistent service to provide user flexible metadata define, and store data inside CMS according his/her metadata definition.

Design

CMS is a metadata-driven system. Metadata definition describes how data is stored and fetch in CMS.

<<CMS Arch - Digram>>

Metadata module

Metadata is the “table” definition for data stored in CMS. This module is simple and intuitive; it read/store the metadata definition from/to backend database. The main concept of metadata module is the metaclass and relationship; it also provides the definition of indexes, which is used in query service to improve the database query performance. All the metadata information is cached in the memory using a simple write-through cache.

Entity management module and data access module

The entity management module provides the CRUD operation on CMS runtime data. A runtime data (called entity) is a json structure stored in background database while intercepted by the metadata definition. This module control the data storage strategy; provide MVCC check; provide data relationship check (strong reference and dangling check); default value handling; access control; A typical visitor pattern is used here to process the data.

Data Storage

A couple of storage strategy has been taken into consideration.

Data distribution

“Every repository would have a mongo database as its storage.”

  1. All data in same collection

This is trivial solution for small data set. Some limitations for this solution:

  1. Indexes on different metaclass would need to avoid naming confliction since they are having same namespace from storage point of view.
  2. Unique index must also be sparse since
  3. When the one of the collection’s data set grows, it will also impact other metaclass’ access cost (query need to search for more documents).
  1. Data in different collection per metaclass

This is much more RDB style data store. Every metaclass would have a dedicated collection for its data. Thus the different metaclass could have independent indexes definition. This is suggested data storage distribution stragtegy.

  1. Separate metaclass into different database/replicate-set

This is CMS capability to overcome the mongo limitation on database-level write lock. In case, some of the metaclass grow too quick, and impact other collection in the database.

Storage Format

To store the data in mongoDb, CMS introduce an encoded storage format. Every field of entity would have a dbname. This dbname is treated as storage inside, and this design makes the field name change as easy as an update to the metadata but keeping the dbname unmodified.

  1. Hierarchy format

Hierarchy format have different json key inside entity for each field, including the field property like _lastmofied and _length, this design is easy to manipulate the data, but there would more java maps thus more memory consumption when load this data from mongodb. Thus flatten format is introduce to reduce this overhead.

 
 

{

         "E6v": {

                   "v": "Staging",

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "E6w": {

                   "v": "QA Class Of Service",

                   "t": ISODate("2014-04-15T08:22:19.179Z")

         },

         "E9M": {

                   "v": "unknown",

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "E9O": {

                   "v": "Staging",

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "EOE": {

                   "v": "CloudMgr",

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "EOG": {

                   "v": {

                            "_i": "52930a8f4f5a34725cf9d332",

                            "_t": "PolicyGroup"

                   },

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "EQ5": {

                   "v": "",

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "EQ6": {

                   "v": "ebay.com",

                   "t": ISODate("2013-12-11T08:36:01.017Z")

         },

         "_b": "main",

         "_c": ISODate("2013-08-09T07:46:53.818Z"),

         "_cmt": "",

         "_i": "51d675cd171b3cb034c00d68",

         "_id": ObjectId("51d675cd171b3cb034c00d69"),

         "_l": ISODate("2014-04-15T08:22:19.179Z"),

         "_m": "CloudMgr",

         "_mv": 8,

         "_o": "CloudMgr",

         "_pv": 0,

         "_s": "active",

         "_t": "ClassOfService",

         "_u": "_datasync_tocms",

         "_v": 8

}

 

  1. Flatten format

Flatten is system default format for entity store.

 
 

{

       "_id" : ObjectId("541fbe097700f8b891d5249c"),

       "_t" : "ClassOfService",

       "LY_v" : "healthy",

       "_o" : "unitTestUser",

       "_i" : "51d675cd171b3cb034c00d48",

       "LU_v" : "Production",

       "_pv" : -1,

       "_c" : ISODate("2013-08-09T07:46:53.818Z"),

       "_b" : "main",

       "_v" : 0,

       "LX_v" : "Production",

       "LW_v" : "CloudMgr",

       "_m" : "unitTestUser",

       "_cmt" : "unit test create comments.",

       "_s" : "active",

       "LY_t" : ISODate("2014-09-22T06:13:29.807Z"),

       "_l" : ISODate("2014-09-22T06:13:29.807Z"),

       "LU_t" : ISODate("2014-09-22T06:13:29.807Z"),

       "La_v" : "",

       "La_t" : ISODate("2014-09-22T06:13:29.807Z"),

       "LX_t" : ISODate("2014-09-22T06:13:29.807Z"),

       "LW_t" : ISODate("2014-09-22T06:13:29.807Z"),

       "LZ_v" : "ebay.com",

       "LZ_t" : ISODate("2014-09-22T06:13:29.807Z"),

       "_mv" : 1

}

 

Query module

Query module provides the CMS own query language. The query module provides additional query capability besides MongoDb simple key/value query. 

Query Interface

GET repositories/raptor-paas/branches/main/query/ServiceInstance<@healthStatus>{ @healthStatus, $count() }

 

 

POST repositories/raptor-paas/branches/main/query

Payload:

ApplicationService{*}.(services[@name=~\"srp-app.*\"]{*} && updateStrategies{*})

A CMS Query is submitted to the service through a REST call. There are two interfaces, GET with query in URL, or POST the query as body.

A typical query flow when a query went through the CMS:

  • Parse query string to AST (using ANTLR) 
  • Build query plan based on AST 
  • Query optimization  
  • Query execution 
  • Result population 

Syntax

CMS query syntax is defined in grammar intercepted by antlr. The glance of syntax is shown as below:

<< CMS Query Language Syntax>>

Query optimization

1. Single table query

With metadata definition support of index, single table query could be simply handled to background database.

2. Multiple join query support based on metadata definition

Join query is provided by CMS to overcome the limit of background database support. The CMS join query is an in-memory hash join. 

Query optimizer provides a cost-based optimization for the query execution orders. A query optimization is question of find the best execution order for a chain of db queries.

“The performance of a query plan is determined largely by the order in which the tables are joined. For example, when joining 3 tables A, B, C of size 10 rows, 10,000 rows, and 1,000,000 rows, respectively, a query plan that joins B and C first can take several orders-of-magnitude more time to execute than one that joins A and C first.” –Wikipedia

public enum QueryCostEnum {

        EqualityIndex(1),

        RangeIndex(10),

        AllowFullTableScan(100),

        EqualityScan(1000),

        RangeScan(2000),

        RangeRegxScan(3000),

        NegativeScan(10000),

        FullScan(20000);

        

        private final int value;

        private QueryCostEnum(int cost) {

            this.value = cost;

        }

        public int getValue() {

            return value;

        }

        public static boolean isFullTableScan(int cost) {

            if (cost >= EqualityScan.getValue()) {

                return true;

            }

            return false; }

        }

    } 

 

CMS define the query cost based on index hitting and table size. A cost definition is listed below. And this cost definition would be populate into the search action tree while optimization, and later query executor would rely on the cost to find an appropriate execute order.

Optimization is a quite query specific, the query also provide the “hint” so that user could help the optimizer to find the best execution order.

Additional feature provide by query module to help CMS query as a more complete query system are:

1. Reverse/Tree/Sub queries - to provide more flexible way of traverse the object graph in CMS.

2. Aggregation/Projection - to provide support for 

3. Query explanation - to provide details of the generated queries for query tuning

Sample Visualized Query execution

Sample query: 

Topology{*}.applicationServices{*}.serviceInstances[@resourceId=~"^srp-app.*"]{@resourceId}.runsOn{*}

 

 ParseNode Tree & Action Tree:

Execution Flow (each line for one execution step):

System management module

The system management module is built on top of the metadata/entity management/query module. It provides system level service control including:

  • System state maintain - Based on system state (load, qps, latency), the system management maintain and change the states. By having the states, the system would be able to accept/throttle the coming request based on the system state. Listed state as below.

    public enum State {

        startup, normal, maintain, check, overload, critical,

        readonly, severe, shutdown 

    }

 

 

  • Metric - the system management module provide metric like QPS/percentile Read Write latency/Top cost queries in different granularity.  See reference 4 for a details description of evolution design of sliding window metric.
  • Memory-based throttling - Memory is the main concern of performance for the system in high throughput; CMS provide a memory monitoring based throttling to protect the system.

Monitoring

Ganglia monitoring is used to monitor the metrics exposed by the sys-management module. Below charts are screenshot at the time of this article writing from one of the application server.

Read-QPS monitoring 

Read latency monitoring (mainly monitoring based on 95th percentile curve)

Write-QPS

Write-latency

Conclusion

CMS is the configuration service built on MongoDb and by product it's also a general persistent service. In this article, we demonstrate the arch consideration and detailed design of CMS.   CMS provide object-oriented metadata definition, and provide easy to use query features. By the Props/Cons of the architecture consideration, CMS is fit into the cases that configuration/metadata have a usage pattern that has much more read than write, and require frequent metadata changes. And lacking full transaction support, users need to handle the transaction requirement in client side if there is a requirement of transaction.

In its product roadmap, CMS aims to continuous improvement on its query module, and the data management level of support application-level sharding. Furthermore, we also building the around-service like auditing and event system based on CMS.

Reference

1. http://en.wikipedia.org/wiki/Hash_join

2. http://en.wikipedia.org/wiki/Query_optimization

3. https://github.com/ebay/yidb

4. http://ccoetech.ebay.com/improve-api-gateway-throttling

5. http://en.wikipedia.org/wiki/Multiversion_concurrency_control

 

 

 

 

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值