在centOS上安装Ceph

Introduction

I created this document based on my experience setting up a small test cluster for the Ceph distributed file system. I used VMWare Server for this setup as I found it to be quick and easy to get going with this and I do not have any spare machines lying around that I could use. Plus, the fact that VMWare Server is free sure doesn't hurt! It mainly serves as personal notes for myself as I tend to forget things like this rather quickly.

Machine Setup and Configuration

My simple cluster has 3 nodes in total so I will need to create 3 virtual machines. If you have never used VMWare before, here is a simple guide for creating a Linux virtual machine using the CentOS distribution. If you know how to create Linux virtual machines then just create 3 of them. Also, create another hard disk for the virtual machine. This will be used for the btrfs file system. This can be relatively small. In my case, I made it 1 GB.

After we have our 3 Linux nodes up and running, I like to modify the /etc/hosts file so I don't have to remember IP addresses all the time. My /etc/hosts file looks as follows.

 

# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1               localhost.localdomain localhost
::1             localhost6.localdomain6 localhost6
192.168.221.137         ceph0
192.168.221.138         ceph1
192.168.221.139         ceph2

 

Checking out Ceph Source Code

Now we are ready to check out Ceph. We will build it at a later stage when we are ready. If git is not present on your machine, you can follow these instructions to install it.

 

# cd /usr/src
# git clone git://ceph.newdream.net/ceph.git
Initialize ceph/.git
Initialized empty Git repository in /usr/src/ceph/.git/
remote: Generating pack...
remote: Counting objects: 498
remote: Done counting 37941 objects.
remote: Deltifying 37941 objects...
remote:  100% (37941/37941) done
remote: Total 37941 (delta 30117), reused 34536 (delta 27139)
Receiving objects: 100% (37941/37941), 8.46 MiB | 568 KiB/s, done.
Resolving deltas: 100% (30117/30117), done.
#

 

We need to export the ceph directory so that each node in the cluster can access the binaries and subdirectories which are needed. We will use NFS for this.

On the ceph0 host (or whichever host has the Ceph source code), edit the /etc/exports file. It should have a line similar to the following (assuming you modified the /etc/hosts file as I described; otherwise, you will need to enter IP addresses in this file):

 

/usr/src/ceph ceph1(rw,async,no_subtree_check) ceph2(rw,async,no_subtree_check)

 

This export entry is extremely simple. I am not taking security concerns into account here as this is a very simple test cluster we are setting up. Next, restart (or start if it was never started) the NFS service as follows.

 

# service nfs restart
Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]
#

 

Now mount this directory on the other nodes in the cluster.

 

ceph1# mount -t nfs -o rw ceph0:/usr/src/ceph /usr/src/ceph
ceph2# mount -t nfs -o rw ceph0:/usr/src/ceph /usr/src/ceph

 

Building the Kernel Client

Before we build Ceph, we want to get the kernel client up and running. For this guide, I am going to build the client into the kernel. We will do this on every node. First, we need to download the latest kernel source code using git.

 

# cd /usr/src/kernels
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
Initialize linux-2.6/.git
Initialized empty Git repository in /usr/src/kernels/linux-2.6/.git/
remote: Counting objects: 882421, done.
remote: Compressing objects: 100% (155150/155150), done.
remote: Total 882421 (delta 736090), reused 872174 (delta 725984)
Receiving objects: 100% (882421/882421), 209.92 MiB | 336 KiB/s, done.
Resolving deltas: 100% (736090/736090), done.
Checking out files: 100% (24247/24247), done.
# cd /usr/src/kernels/linux-2.6
# patch -p1 < /usr/src/ceph/src/kernel/kconfig.patch
patching file fs/Kconfig
Hunk #1 succeeded at 1557 with fuzz 2 (offset 38 lines).
patching file fs/Makefile
Hunk #1 succeeded at 122 (offset 4 lines).
# ln -s /usr/src/ceph/src/kernel fs/ceph
# ln -s /usr/src/ceph/src/include/ceph_fs.h fs/ceph/ceph_fs.h
# cd /usr/src/kernels/linux-2.6
# make mrproper
# make menuconfig

 

A lot of configuration options will be presented to you in this menu. Rather then go into that here, I'll point you to a much more in-depth guide here which discusses the configuration options in more detail.

Ensure that you enable Ceph. It should be the first item under File Systems->Network File Systems as shown below.

Now we are ready to build the kernel. This is only a few commands but will take quite some time depending on your machine.

 

# cd /usr/src/kernels/linux-2.6
# make bzImage
# make modules
# make modules_install
# mkinitrd /boot/initrd-2.6.26.img 2.6.26
# cp /usr/src/kernels/linux-2.6/arch/i386/boot/bzImage /boot/bzImage-2.6.26
# cp /usr/src/kernels/linux-2.6/System.map /boot/System.map-2.6.26
# ln -s /boot/System.map-2.6.26 /boot/System.map

 

Finally, we need to configure the GRUB bootloader to be able to boot the new kernel. The GRUB configuration is located in

/boot/brub/menu.lst
. Once finished editing on a fresh CentOS installation, the file should look as follows:

 

 

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-53.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-53.el5 ro root=/dev/VolGroup00/LogVol00
        initrd /initrd-2.6.18-53.el5.img
title LatestKernel (2.6.26)
        root (hd0,0)
        kernel /bzImage-2.6.26 ro root=/dev/VolGroup00/LogVol00
        initrd /initrd-2.6.26.img

 

Now reboot and select the kernel you just built.

Installing btrfs and Creating btrfs File System

In this guide, I am using btrfs instead of ebofs for each OSD. I am only performing these steps on the storage nodes - ceph1 and ceph2. I will only show the steps for one node but you should obviously repeat this on both nodes. Mercurial is the SCM used by the btrfs developers. Some easy to follow instructions on installing this tool are provided by Mercurial here.

First, we obtain the latest sources:

 

# mkdir -p /usr/src/btrfs
# cd /usr/src/btrfs
# hg clone http://www.kernel.org/hg/btrfs/progs-unstable
destination directory: progs-unstable
requesting all changes
adding changesets
adding manifests
adding file changes
added 247 changesets with 888 changes to 58 files
updating working directory
53 files updated, 0 files merged, 0 files removed, 0 files unresolved
# hg clone http://www.kernel.org/hg/btrfs/kernel-unstable
destination directory: kernel-unstable
requesting all changes
adding changesets
adding manifests
adding file changes
added 650 changesets with 2137 changes to 64 files (+1 heads)
updating working directory
54 files updated, 0 files merged, 0 files removed, 0 files unresolved
#

 

After obtaining the latest sources, a patch by Sage Weil needs to be applied in order for btrfs to work correctly with Ceph. The original email with the patch from Sage is available here but I have a local copy of the patch available which can be used with wget as follows.

 

# cd /usr/src/btrfs/kernel-unstable
# wget http://www.ece.umd.edu/~posulliv/ceph/sage_btrfs.patch
# patch < sage_btrfs.patch
patching file ctree.h
patching file ioctl.c
patching file transaction.c
patching file transaction.h
#

 

Now we are ready to build and install everything btrfs related.

 

# cd /usr/src/btrfs/kernel-unstable
# make
bash version.sh
make -C /lib/modules/`uname -r`/build M=`pwd` modules
make[1]: Entering directory `/usr/src/kernels/linux-2.6'
  CC [M]  /usr/src/btrfs/kernel-unstable/super.o
  CC [M]  /usr/src/btrfs/kernel-unstable/ctree.o
  CC [M]  /usr/src/btrfs/kernel-unstable/extent-tree.o
  CC [M]  /usr/src/btrfs/kernel-unstable/print-tree.o
  CC [M]  /usr/src/btrfs/kernel-unstable/root-tree.o
  CC [M]  /usr/src/btrfs/kernel-unstable/dir-item.o
  CC [M]  /usr/src/btrfs/kernel-unstable/hash.o
  CC [M]  /usr/src/btrfs/kernel-unstable/file-item.o
  CC [M]  /usr/src/btrfs/kernel-unstable/inode-item.o
  CC [M]  /usr/src/btrfs/kernel-unstable/inode-map.o
  CC [M]  /usr/src/btrfs/kernel-unstable/disk-io.o
  CC [M]  /usr/src/btrfs/kernel-unstable/transaction.o
  CC [M]  /usr/src/btrfs/kernel-unstable/bit-radix.o
  CC [M]  /usr/src/btrfs/kernel-unstable/inode.o
  CC [M]  /usr/src/btrfs/kernel-unstable/file.o
  CC [M]  /usr/src/btrfs/kernel-unstable/tree-defrag.o
  CC [M]  /usr/src/btrfs/kernel-unstable/extent_map.o
  CC [M]  /usr/src/btrfs/kernel-unstable/sysfs.o
  CC [M]  /usr/src/btrfs/kernel-unstable/struct-funcs.o
  CC [M]  /usr/src/btrfs/kernel-unstable/xattr.o
  CC [M]  /usr/src/btrfs/kernel-unstable/ordered-data.o
  CC [M]  /usr/src/btrfs/kernel-unstable/extent_io.o
  CC [M]  /usr/src/btrfs/kernel-unstable/volumes.o
  CC [M]  /usr/src/btrfs/kernel-unstable/async-thread.o
  CC [M]  /usr/src/btrfs/kernel-unstable/ioctl.o
  CC [M]  /usr/src/btrfs/kernel-unstable/locking.o
  CC [M]  /usr/src/btrfs/kernel-unstable/orphan.o
  CC [M]  /usr/src/btrfs/kernel-unstable/ref-cache.o
  CC [M]  /usr/src/btrfs/kernel-unstable/acl.o
  LD [M]  /usr/src/btrfs/kernel-unstable/btrfs.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /usr/src/btrfs/kernel-unstable/btrfs.mod.o
  LD [M]  /usr/src/btrfs/kernel-unstable/btrfs.ko
make[1]: Leaving directory `/usr/src/kernels/linux-2.6'
# insmod /usr/src/btrfs/kernel-unstable/btrfs.ko
# cd /usr/src/btrfs/progs-unstable
# make
# make install

 

Next, we need to get our disk on which we will create the btrfs file system ready. We will be using /dev/sdb. If you do not have another hard drive, it is quite easy to shut down your virtual machine and add another hard drive. Now, we will use fdisk to create a partition. This has to be done on all storage nodes.

 

# fdisk /dev/sdb

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-130, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-130, default 130):
Using default value 130

Command (m for help): p

Disk /dev/sdb: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         130     1044193+  83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
# 

 

Now we are ready to create our btrfs file system. The steps to follow for this are:

 

# mkdir -p /mnt/btrfs
# mkfs.btrfs /dev/sdb1
# mount -t btrfs /dev/sdb1 /mnt/btrfs
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      6.7G  5.7G  662M  90% /
/dev/sda1              99M   19M   76M  20% /boot
tmpfs                 192M     0  192M   0% /dev/shm
/dev/sdb1            1020M   40K 1020M   1% /mnt/btrfs
ceph0:/usr/src/ceph   6.7G  5.4G 1003M  85% /usr/src/ceph
#

 

Building Ceph

We are now ready to perform the build. We will compile with debugging symbols since we will be interested in debugging. This only needs to be done on 1 node since this directory is exported via NFS.

 

# cd /usr/src/ceph
# ./autogen.sh
# CXXFLAGS="-g"
# ./configure
# make
# cd src
# mkdir out log

 

Setting up a small cluster

Now we can start setting up our cluster. The first step is to set up the monitor.

 

ceph0# cd /usr/src/ceph/src
ceph0# ./monmaptool --create --clobber --add 192.168.221.137:12345 --print .ceph_monmap
ceph0# ./mkmonfs --clobber mondata/mon0 --mon 0 --monmap .ceph_monmap

 

Now, we start up the monitor for the first time. We will enable extensive logging which will be produced in the /usr/src/ceph/src/log and /usr/src/ceph/src/out directories.

 

ceph0# ./cmon mondata/mon0 -d --debug_mon 10 --debug_ms 1

 

Next, we build the OSD cluster map which is defined as a compact, hierarchical description of the devices comprising the storage cluster. For this simple setup, I have 2 storage nodes - ceph1 and ceph2. After creating the cluster map, we must inform the monitor of the map.

 

ceph0# ./osdmaptool --clobber --createsimple .ceph_monmap 2 --print .ceph_osdmap
ceph0# ./cmonctl osd setmap -i .ceph_osdmap

 

Now we move to the storage nodes. On each storage node, we first initialize the individual object stores.

 

ceph1# mkdir -p /mnt/btrfs/osd0
ceph2# mkdir -p /mnt/btrfs/osd1

ceph1# cd /usr/src/ceph/src
ceph1# ./cosd --mkfs_for_osd 0 /mnt/btrfs/osd0

ceph2# cd /usr/src/ceph/src
ceph2# ./cosd --mkfs_for_osd 1 /mnt/btrfs/osd1

 

Next the OSD daemons are started up on each storage node. Again, we will enable extensive logging so we can troubleshoot any issues that arise. Log files will be placed in the same directories as mentioned previously.

 

ceph1# cd /usr/src/ceph/src
ceph1# ./cosd /mnt/btrfs/osd0 /mnt/btrfs/osd0 -d --debug_osd 10

ceph2# cd /usr/src/ceph/src
ceph2# ./cosd /mnt/btrfs/osd1 /mnt/btrfs/osd1 -d --debug_osd 10

 

Finally, we start the meta data server on ceph0.

 

ceph0# cd /usr/src/ceph/src
ceph0# ./cmds --debug_ms 1 --debug_mds 10 -d

 

Verification of the Cluster

Now, we want to verify the file system is up and working.

 

ceph0# cd /usr/src/ceph/src
ceph0# ./cmonctl osd stat
mon0 <- 'osd stat'
mon0 -> 'e4: 2 osds: 2 up, 2 in' (0)
ceph0# ./cmonctl pg stat
mon0 <- 'pg stat'
mon0 -> 'v27: 1152 pgs: 1152 active+clean; 4 MB used, 2035 MB / 2039 MB free' (0)
ceph0# ./cmonctl mds stat
mon0 <- 'mds stat'
mon0 -> 'e3: 1 nodes: 1 up:active' (0)
ceph0# ./csyn --syn makedirs 1 1 1 --syn walk
starting csyn at 0.0.0.0:57466/13601/0
mounting and starting 1 syn client(s)
waiting for client(s) to finish
10000000000 drwxr-xr-x  1     0     0        0 Sat Aug  2 04:24:15 2008 /syn.0.0
10000000002 drwxr-xr-x  1     0     0        0 Sat Aug  2 04:24:15 2008 /syn.0.0/dir.0
10000000001 -rw-r--r--  1     0     0        0 Sat Aug  2 04:24:15 2008 /syn.0.0/file.0
10000000003 -rw-r--r--  1     0     0        0 Sat Aug  2 04:24:15 2008 /syn.0.0/dir.0/file.0
ceph0#

 

If you do not see output similar to that shown above, then its time to start troubleshooting!!

Using the Kernel Client

Since we went to all that effort of building the kernel client, we may as well utilize it. We will mount the ceph file system on the 2 storage nodes. This is quite simple to do.

 

# mkdir -p /mnt/ceph
# mount -t ceph 192.168.221.137:/ /mnt/ceph/
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      6.7G  5.3G  1.1G  84% /
/dev/sda1              99M   19M   76M  20% /boot
tmpfs                 192M     0  192M   0% /dev/shm
/dev/sdb1            1020M  3.9M 1016M   1% /mnt/btrfs
192.168.221.137:/usr/src/ceph
                      6.7G  5.4G  984M  85% /usr/src/ceph
192.168.221.137:/     2.0G  7.0M  2.0G   1% /mnt/ceph
# ls -l /mnt/ceph
total 1
drwxr-xr-x 1 root root  0 Aug  2 05:01 syn.0.0
# ls -l /mnt/ceph/syn.0.0
total 0
drwxr-xr-x 1 root root 0 Aug  2 04:24 dir.0
-rw-r--r-- 1 root root 0 Aug  2 04:24 file.0
#

 

Testing the File System

iozone is a file system benchmarking tool. Download it and play around with it. Its a nice tool to stress a file system. I have not done too much with it in the context of a distributed file system so I'm still messing around with it. To install it is quite simple though. In the output below, I was compiling on an AMD64 platform.

 

# cd /usr/src/
# wget http://www.iozone.org/src/current/iozone3_308.tar
# tar xvf iozone3_308.tar
# cd /usr/src/iozone3_308/src/current
# make linux-AMD64
# ./iozone
        Usage: For usage information type iozone -h

# ./iozone -g 1024M -f /mnt/ceph/iozone-file.tmp
        Iozone: Performance Test of File I/O
                Version $Revision: 3.308 $
                Compiled for 64 bit mode.
                Build: linux-AMD64

        Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
                     Al Slater, Scott Rhine, Mike Wisner, Ken Goss
                     Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
                     Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
                     Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
                     Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

        Run began: Sat Aug  2 05:27:50 2008

        Using maximum file size of 1048576 kilobytes.
        Command line used: ./iozone -g 1024M -f /mnt/ceph/iozone-file.tmp
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
                                                            random  random    bkwd   record   stride                          
              KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
             512       4  135952  585814  1854787  3481620 2600470 1672748  595397  2007358  1712772    95665  1815584 1968713  3325278

iozone test complete.
#

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值