Configuring Sharding——mongo分片

最新推荐文章于 2023-05-13 12:35:37 发布

laiahu

最新推荐文章于 2023-05-13 12:35:37 发布

阅读量3.9k

点赞数

分类专栏： mongodb 文章标签： sharding collections mongodb database components server

mongodb 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

Configuring Sharding

http://www.mongodb.org/display/DOCS/A+Sample+Configuration+Session

理论：http://www.mongodb.org/display/DOCS/Configuring+Sharding

A Sample Configuration Session

The following example uses two shards (each with a single mongod process), one config db, and one mongos process, all running on a single test server. In addition to the script below, a python script for starting and configuring shard components on a single machine is available.

Creating the Shards

First, start up a couple mongods to be your shards.

$ mkdir /data/db/a /data/db/b
$ ./mongod --shardsvr --dbpath /data/db/a --port 10000 > /tmp/sharda.log &
$ cat /tmp/sharda.log
$ ./mongod --shardsvr --dbpath /data/db/b --port 10001 > /tmp/shardb.log &
$ cat /tmp/shardb.log

Now you need a configuration server and mongos:

$ mkdir /data/db/config
$ ./mongod --configsvr --dbpath /data/db/config --port 20000 > /tmp/configdb.log &
$ cat /tmp/configdb.log
$ ./mongos --configdb localhost:20000 > /tmp/mongos.log &
$ cat /tmp/mongos.log

mongos does not require a data directory, it gets its information from the config server.

In a real production setup, mongod's, mongos's and configs would live on different machines. The use of hostnames or IP addresses is mandatory in that case. 'localhost' appearance here is merely illustrative – but fully functional – and should be confined to single-machine, testing scenarios only.

You can toy with sharding by using a small --chunkSize, e.g. 1MB. This is more satisfying when you're playing around, as you won't have to insert 64MB of documents before you start seeing them moving around. It should not be used in production.

$ ./mongos --configdb localhost:20000 --chunkSize 1 > /tmp/mongos.log &

Setting up the Cluster

We need to run a few commands on the shell to hook everything up. Start the shell, connecting to the mongos process (at localhost:27017 if you followed the steps above).

To set up our cluster, we'll add the two shards (a and b).

$ ./mongo
MongoDB shell version: 1.6.0
connecting to: test
> use admin
switched to db admin
> db.runCommand( { addshard : "localhost:10000" } )
{ "shardadded" : "shard0000", "ok" : 1 }
> db.runCommand( { addshard : "localhost:10001" } )
{ "shardadded" : "shard0001", "ok" : 1 }

Now you need to tell the database that you want to spread out your data at a database and collection level. You have to give the collection a key (or keys) to partition by.
This is similar to creating an index on a collection.

> db.runCommand( { enablesharding : "test" } )
{"ok" : 1}
> db.runCommand( { shardcollection : "test.people", key : {name : 1} } )
{"ok" : 1}

Administration

To see what's going on in the cluster, use the config database.

> use config
switched to db config
> show collections
chunks
databases
lockpings
locks
mongos
settings
shards
system.indexes
version

These collections contain all of the sharding configuration information.

理论： http://www.mongodb.org/display/DOCS/Configuring+Sharding

Configuring Sharding

This document describes the steps involved in setting up a basic sharding cluster. A sharding cluster has three components:

1. One to 1000 shards. Shards are partitions of data. Each shard consists of one or more mongod processes which store the data for that shard. When multiple mongod's are in a single shard, they are each storing the same data – that is, they are replicating to each other.
2. Either one or three config server processes. For production systems use three.
3. One or more mongos routing processes.

For testing purposes, it's possible to start all the required processes on a single server, whereas in a production situation, a number ofserver configurations are possible.

Once the shards (mongod's), config servers, and mongos processes are running, configuration is simply a matter of issuing a series of commands to establish the various shards as being part of the cluster. Once the cluster has been established, you can begin sharding individual collections.

This document is fairly detailed; for a terse, code-only explanation, see the sample shard configuration. If you'd like a quick script to set up a test cluster on a single machine, we have a python sharding script that can do the trick.

Sharding Components

First, start the individual shards (mongod's), config servers, and mongos processes.

Shard Servers

A shard server consists of a mongod process or a replica set of mongod processes. For production, use a replica set for each shard for data safety and automatic failover. To get started with a simple test, we can run a single mongod process per shard, as a test configuration doesn't demand automated failover.

Config Servers

Run a mongod --configsvr process for each config server. If you're only testing, you can use only one config server. For production, use three.

Note: Replicating data to each config server is managed by the router (mongos); they have a synchronous replication protocol optimized for three machines, if you were wondering why that number. Do not run any of the config servers with --replSet; replication between them is automatic.

Note: As the metadata of a MongoDB cluster is fairly small, it is possible to run the config server processes on boxes also used for other purposes.

`mongos` Router

Run mongos on the servers of your choice. Specify the --configdb parameter to indicate location of the config database(s). Note: use dns names, not ip addresses, for the --configdb parameter's value. Otherwise moving config servers later is difficult.

Note that each mongos will read from the first config server in the list provided. If you're running config servers across more than one data center, you should put the closest config servers early in the list.

Configuring the Shard Cluster

Once the shard components are running, issue the sharding commands. You may want to automate or record your steps below in a .js file for replay in the shell when needed.

Start by connecting to one of the mongos processes, and then switch to the admin database before issuing any commands.

The mongos will route commands to the right machine(s) in the cluster and, if commands change metadata, the mongos will update that on the config servers. So, regardless of the number of mongos processes you've launched, you'll only need run these commands on one of those processes.

You can connect to the admin database via mongos like so:

./mongo <mongos-hostname>:<mongos-port>/admin
> db
admin

Adding shards

Each shard can consist of more than one server (see replica sets); however, for testing, only a single server with one mongod instance need be used.

You must explicitly add each shard to the cluster's configuration using the addshard command:

> db.runCommand( { addshard : "<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : ...}

Run this command once for each shard in the cluster.

If the individual shards consist of replica sets, they can be added by specifying replicaSetName/<serverhostname>[:port][,serverhostname2[:port],...], where at least one server in the replica set is given.

> db.runCommand( { addshard : "foo/<serverhostname>[:<port>]" } );
{"ok" : 1 , "added" : "foo"}

Any databases and collections that existed already in the mongod/replica set will be incorporated to the cluster. The databases will have as the "primary" host that mongod/replica set and the collections will not be sharded (but you can do so later by issuing a shardCollectioncommand).

Optional Parameters

name
Each shard has a name, which can be specified using the name option. If no name is given, one will be assigned automatically.

maxSize
The addshard command accepts an optional maxSize parameter. This parameter lets you tell the system a maximum amount of disk space in megabytes to use on the specified shard. If unspecified, the system will use the entire disk. maxSize is useful when you have machines with different disk capacities or when you want to prevent storage of too much data on a particular shard.

As an example:

> db.runCommand( { addshard : "sf103", maxSize:100000/*MB*/ } );

Listing shards

To see current set of configured shards, run the listshards command:

> db.runCommand( { listshards : 1 } );

This way, you can verify that all the shard have been committed to the system.

Removing a shard

See the removeshard command.

Enabling Sharding on a Database

In versions prior to v2.0, dropping a sharded database causes issues - see SERVER-2253 for workaround.

Once you've added one or more shards, you can enable sharding on a database. Unless enabled, all data in the database will be stored on the same shard. After enabling you then need to run shardCollection on the relevant collections (i.e., the big ones).

> db.runCommand( { enablesharding : "<dbname>" } );

Once enabled, mongos will place new collections on the primary shard for that database. Existing collections within the database will stay on the original shard. To enable partitioning of data, we have to shard an individual collection.

Sharding a Collection

When sharding a collection, "pre-splitting", that is, setting a seed set of key ranges, is recommended. Without a seed set of ranges, sharding works, however the system must learn the key distribution and this will take some time; during this time performance is not as high. The presplits do not have to be particularly accurate; the system will adapt to the actual key distribution of the data regardless.

Use the shardcollection command to shard a collection. When you shard a collection, you must specify the shard key. If there is data in the collection, mongo will require an index to be created upfront (it speeds up the chunking process); otherwise, an index will be automatically created for you.

 
> db.runCommand( { shardcollection : "<namespace>",
                   key : <shardkeypatternobject> });

Running the "shardcollection" command will mark the collection as sharded with a specific key. Once called, there is currently no way to disable sharding or change the shard key, even if all the data is still contained within the same shard. It is assumed that the data may already be spread around the shards. If you need to "unshard" a collection, drop it (of course making a backup of data if needed), and recreate the collection (loading the backup data).

For example, let's assume we want to shard a GridFS chunks collection stored in the test database. We'd want to shard on thefiles_id key, so we'd invoke the shardcollection command like so:

 > db.runCommand( { shardcollection : "test.fs.chunks", key : { files_id : 1 } } )
{ "collectionsharded" : "mydb.fs.chunks", "ok" : 1 }

You can use the {unique: true} option to ensure that the underlying index enforces uniqueness so long as the unique index is a prefix of the shard key. (note: prior to version 2.0 this worked only if the collection is empty).

db.runCommand( { shardcollection : "test.users" , key : { email : 1 } , unique : true } );

If the "unique: true" option is not used, the shard key does not have to be unique.

db.runCommand( { shardcollection : "test.products" , key : { category : 1, _id : 1 } } );

You can shard on multiple fields if you are using a compound index.

In the end, picking the right shard key for your needs is extremely important for successful sharding. Choosing a Shard Key.

Examples

Sample configuration session
The following example shows how to run a simple shard setup on a single machine for testing purposes: Sharding JS Test.

Procedure

Complete this procedure by connecting to any mongos in the cluster using the mongo shell.

You can only remove a shard by its shard name. To discover or confirm the name of a shard using the listshards or printShardingStatus commands or the sh.status() shell helper.

The following example will remove shard named mongodb0.

Note

To successfully migrate data from a shard, the balancer process must be active. Check the balancer state using the sh.getBalancerState() helper in the mongo shell. See this section on balancer operations for more information.

Remove Chunks from the Shard

Start by running the removeShard command. This begins “draining” chunks from the shard you are removing.

 
   db.runCommand( { removeshard: "mongodb0" } )

This operation will return a response immediately. For example:

 
   { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 }

Depending on your network capacity and the amount of data in your cluster, this operation can take anywhere from a few minutes to several days to complete.

Check the Status of the Migration

You can run the removeShard again at any stage of the process to check the progress of the migration, as follows:

 
   db.runCommand( { removeshard: "mongodb0" } )

The output will resemble the following document:

 
   { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 }

In the remaining sub document, a counter displays the remaining number of chunks that MongoDB must migrate to other shards, and the number of MongoDB databases that have “primary” status on this shard.

Continue checking the status of the removeshard command until the number of chunks remaining is 0, then you can proceed to the next step.

Move Unsharded Databases

Databases with non-sharded collections store these collections on a single shard, known as the “primary” shard for that database. The following step is necessary only when the shard you want to remove is also the “primary” shard for one or more databases.

Issue the following command at the mongo shell:

 
   db.runCommand( { movePrimary: "myapp", to: "mongodb1" })

This command will migrate all remaining non-sharded data in the database named myapp to the shard named mongodb1.

Warning

Do not run the movePrimary until you have finished draining the shard.

This command will not return until MongoDB completes moving all data, which may take a long time. The response from this command will resemble the following:

 
   { "primary" : "mongodb1", "ok" : 1 }

Finalize the Migration

Run removeShard again to clean up all metadata information and finalize the removal, as follows:

 
   db.runCommand( { removeshard: "mongodb0" } )

When successful, the response will be the following:

 
   { msg: "remove shard completed succesfully" , stage: "completed", host: "mongodb0", ok : 1 }

When the value of “state” is “completed”, you may safely stop the mongodb0 shard.

laiahu

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Configuring Sharding——mongo分片

Configuring Shardinghttp://www.mongodb.org/display/DOCS/A+Sample+Configuration+Session理论：http://www.mongodb.org/display/DOCS/Configuring+ShardingA Sample Configuration Session
复制链接

扫一扫

专栏目录

Configuring Sharding——mongo分片

Configuring Sharding

A Sample Configuration Session

Creating the Shards

Setting up the Cluster

Administration

Configuring Sharding

Sharding Components

Shard Servers

Config Servers

mongos Router

Configuring the Shard Cluster

Adding shards

Optional Parameters

Listing shards

Removing a shard

Enabling Sharding on a Database

Sharding a Collection

Examples

See Also

Procedure

Remove Chunks from the Shard

Check the Status of the Migration

Move Unsharded Databases

Finalize the Migration

“相关推荐”对你有帮助么？

`mongos` Router