memcache源码分析之items,slabs

http://www.cnblogs.com/xianbei/archive/2011/01/18/1924893.html

items是memcache用来管理item的封装,采用的hash表和LRU链的形式,关于hash表的操作见我前几天的文章  memcache源码分析之assoc

 

  关于item内容的存储机制简介

  item的内容存储是在slab中管理的,为了对内存进行有效的管理,slab采用的是分桶的大小来存储item的内容的,简单举例解释一下,初始化时会有不同块大小的桶,比如桶1里面的

内存块都是80b的,专门用来存储item内容大小接近80b的。桶2的内存块是100b的,专门用来存储内容大小接近100b的item,桶3是120b的,用来存储大小接近120b的item,等等。所以,如果有一个item的内容大小是90b,那它只能存储在100b的桶内,不能存储在其他里面的,120b的也不可以。具体详细介绍请见我后续关于slab的文章。

  问题:当100b的桶存储满的时候,memcache怎么办呢?

  这个问题的答案就在本文介绍的内容里面。

  为一个item分配存储空间的时候,具体的操作是这样的:

  1、首先,计算该item占用的空间大小,只有知道了它的大小,才能知道它需要存储在哪个桶中。一个item的大小包括它的item结构体大小部分、名字长度部分、状态标识部分、内容大小部分等的总和。具体计算方法请看下面的代码分析中 item_make_header 函数。

  2、然后寻找合适的slab用于存储,这一部分主要是比较item 和各slab桶的大小,寻找最合适的slab,此部分代码是文件  slabs.c 中的  slabs_clsid 函数,具体内容我后续关于slab的文章会详细分析。

  3、从对应slab的tail队列中寻找是否存在过期的item,如果有,清除掉,此处操作最多尝试50次。

  4、如果第3步操作失败,并且在对应slab中分配空间失败,那么从slab对应的tail队列中删除没有被引用的item,且最多也是尝试50次。

  5、尝试从slab中分配空间。

  6、如果第5步失败,会从slab对应的tail队列中删除3个小时(默认)之前的正在引用的item。

  7、然后尝试从slab中分配空间。如果失败,返回NULL,成功则会设置item对应的一些信息,返回成功标识。

 

  item的删除过程:

  1、设置已被删除状态。并从hash表中删除,次部分代码调用的是  memcache源码分析之assoc   中介绍到的函数assoc_delete

  2、从LRU链中删除。函数item_unlink_q。

  3、如果要清除item占用的资源,则调用函数do_item_remove和item_free,释放占用内存空间。


  另外还提供了一些其他操作,分别包括,获取某个item(会判断是否过期),获取某个item(不判断是否过期),客户端通过flush_all操作清空所有过期item,item的新值替换,访问时间更新等。

  当然,有item的删除操作,就要有相应的加入hash表和LRU链的操作。

  另外,还提供了一些item和slab状态函数。


  想了解详细代码的同学可以看一下下面的简要分析。有错误之处请指正。


  items.h

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/* See items.c */
uint64_t get_cas_id( void );
  
/*@null@*/
item *do_item_alloc( char *key, const size_t nkey, const int flags, const rel_time_t exptime, const int nbytes);
void item_free(item *it);
bool item_size_ok( const size_t nkey, const int flags, const int nbytes);
  
int  do_item_link(item *it);     /** may fail if transgresses limits */
void do_item_unlink(item *it);
void do_item_remove(item *it);
void do_item_update(item *it);   /** update LRU time to current and reposition */
int  do_item_replace(item *it, item *new_it);
  
/*@null@*/
char *do_item_cachedump( const unsigned int slabs_clsid, const unsigned int limit, unsigned int *bytes);
void do_item_stats(ADD_STAT add_stats, void *c);
/*@null@*/
void do_item_stats_sizes(ADD_STAT add_stats, void *c);
void do_item_flush_expired( void );
  
item *do_item_get( const char *key, const size_t nkey);
item *do_item_get_nocheck( const char *key, const size_t nkey);
void item_stats_reset( void );
extern pthread_mutex_t cache_lock;

 

 

  items.c

 

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */

#include "memcached.h"

#include <sys/stat.h>

#include <sys/socket.h>

#include <sys/signal.h>

#include <sys/resource.h>

#include <fcntl.h>

#include <netinet/in.h>

#include <errno.h>

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <time.h>

#include <assert.h>

  

/* Forward Declarations */

static void item_link_q(item *it);

static void item_unlink_q(item *it);

  

/*

 * We only reposition items in the LRU queue if they haven't been repositioned

 * in this many seconds. That saves us from churning on frequently-accessed

 * items.

 */

#define ITEM_UPDATE_INTERVAL 60

  

#define LARGEST_ID POWER_LARGEST

  

//item状态信息结构体

typedef struct {

    unsigned int evicted;

    unsigned int evicted_nonzero;

    rel_time_t evicted_time;

    unsigned int reclaimed;

    unsigned int outofmemory;

    unsigned int tailrepairs;

} itemstats_t;

  

static item *heads[LARGEST_ID];

static item *tails[LARGEST_ID];

static itemstats_t itemstats[LARGEST_ID];

static unsigned int sizes[LARGEST_ID];//记录每个slab的元素个数

  

void item_stats_reset(void) {

    pthread_mutex_lock(&cache_lock);

    memset(itemstats, 0, sizeof(itemstats));

    pthread_mutex_unlock(&cache_lock);

}

  

  

//获取新的CAS值

uint64_t get_cas_id(void) {

    static uint64_t cas_id = 0;

    return ++cas_id;

}

  

/* Enable this for reference-count debugging. */

#if 0

# define DEBUG_REFCNT(it,op) \

                fprintf(stderr, "item %x refcnt(%c) %d %c%c%c\n", \

                        it, op, it->refcount, \

                        (it->it_flags & ITEM_LINKED) ? 'L' : ' ', \

                        (it->it_flags & ITEM_SLABBED) ? 'S' : ' ')

#else

# define DEBUG_REFCNT(it,op) while(0)

#endif

  

/**

 * Generates the variable-sized part of the header for an object.

 *

 * key     - The key

 * nkey    - The length of the key

 * flags   - key flags

 * nbytes  - Number of bytes to hold value and addition CRLF terminator

 * suffix  - Buffer for the "VALUE" line suffix (flags, size).

 * nsuffix - The length of the suffix is stored here.

 *

 * Returns the total size of the header.

 */

//计算item占用空间大小

static size_t item_make_header(const uint8_t nkey, const int flags, const int nbytes,char *suffix, uint8_t *nsuffix) {

    /* suffix is defined at 40 chars elsewhere.. */

    *nsuffix = (uint8_t) snprintf(suffix, 40, " %d %d\r\n", flags, nbytes - 2);

    return sizeof(item) + nkey + *nsuffix + nbytes;

}

  

  

//分配一个item空间

item *do_item_alloc(char *key, const size_t nkey, const int flags, const rel_time_t exptime, const int nbytes) {

    uint8_t nsuffix;

    item *it = NULL;

    char suffix[40];

    size_t ntotal = item_make_header(nkey + 1, flags, nbytes, suffix, &nsuffix);//获取item占用空间大小

    if (settings.use_cas) {

        ntotal += sizeof(uint64_t);

    }

  

    unsigned int id = slabs_clsid(ntotal);//寻找合适的slab

    if (id == 0)

        return 0;

  

    /* do a quick check if we have any expired items in the tail.. */

    int tries = 50;

    item *search;

  

    for (search = tails[id];tries > 0 && search != NULL;tries--, search=search->prev) {

        if (search->refcount == 0 && (search->exptime != 0 && search->exptime < current_time)) {//过期

            it = search;

            /* I don't want to actually free the object, just steal

             * the item to avoid to grab the slab mutex twice ;-)

             */

            STATS_LOCK();

            stats.reclaimed++;

            STATS_UNLOCK();

            itemstats[id].reclaimed++;

            it->refcount = 1;

            do_item_unlink(it);//从hash表删除

            /* Initialize the item block: */

            it->slabs_clsid = 0;

            it->refcount = 0;

            break;

        }

    }

  

    if (it == NULL && (it = slabs_alloc(ntotal, id)) == NULL) {//没有过期元素且加入相应slab失败

  

        tries = 50;

  

        /* If requested to not push old items out of cache when memory runs out,

         * we're out of luck at this point...

         */

  

        if (settings.evict_to_free == 0) {

            itemstats[id].outofmemory++;

            return NULL;

        }

  

        /*

         * try to get one off the right LRU

         * don't necessariuly unlink the tail because it may be locked: refcount>0

         * search up from tail an item with refcount==0 and unlink it; give up after 50

         * tries

         */

  

        if (tails[id] == 0) {

            itemstats[id].outofmemory++;

            return NULL;

        }

  

        for (search = tails[id]; tries > 0 && search != NULL; tries--, search=search->prev) {

            if (search->refcount == 0) {//没有被引用的情况下删除之

                if (search->exptime == 0 || search->exptime > current_time) {

                    itemstats[id].evicted++;

                    itemstats[id].evicted_time = current_time - search->time;

                    if (search->exptime != 0)

                        itemstats[id].evicted_nonzero++;

                    STATS_LOCK();

                    stats.evictions++;

                    STATS_UNLOCK();

                } else {

                    itemstats[id].reclaimed++;

                    STATS_LOCK();

                    stats.reclaimed++;

                    STATS_UNLOCK();

                }

                do_item_unlink(search);

                break;

            }

        }

        it = slabs_alloc(ntotal, id);

        if (it == 0) {

            itemstats[id].outofmemory++;

            /* Last ditch effort. There is a very rare bug which causes

             * refcount leaks. We've fixed most of them, but it still happens,

             * and it may happen in the future.

             * We can reasonably assume no item can stay locked for more than

             * three hours, so if we find one in the tail which is that old,

             * free it anyway.

             */

            tries = 50;

            for (search = tails[id]; tries > 0 && search != NULL; tries--, search=search->prev) {

                if (search->refcount != 0 && search->time + TAIL_REPAIR_TIME < current_time) {//没有被引用并且是3小时之前的item

                    itemstats[id].tailrepairs++;

                    search->refcount = 0;

                    do_item_unlink(search);

                    break;

                }

            }

            it = slabs_alloc(ntotal, id);

            if (it == 0) {

                return NULL;

            }

        }

    }

  

    assert(it->slabs_clsid == 0);

  

    it->slabs_clsid = id;

  

    assert(it != heads[it->slabs_clsid]);

  

    it->next = it->prev = it->h_next = 0;

    it->refcount = 1;     /* the caller will have a reference */

    DEBUG_REFCNT(it, '*');

    it->it_flags = settings.use_cas ? ITEM_CAS : 0;

    it->nkey = nkey;

    it->nbytes = nbytes;

    memcpy(ITEM_key(it), key, nkey);

    it->exptime = exptime;

    memcpy(ITEM_suffix(it), suffix, (size_t)nsuffix);

    it->nsuffix = nsuffix;

    return it;

}

  

  

//释放item

void item_free(item *it) {

    size_t ntotal = ITEM_ntotal(it);

    unsigned int clsid;

    assert((it->it_flags & ITEM_LINKED) == 0);//没有在hash表和LRU链中

    assert(it != heads[it->slabs_clsid]);

    assert(it != tails[it->slabs_clsid]);

    assert(it->refcount == 0);

  

    /* so slab size changer can tell later if item is already free or not */

    clsid = it->slabs_clsid;

    it->slabs_clsid = 0;

    it->it_flags |= ITEM_SLABBED;//内存空闲交给slab

    DEBUG_REFCNT(it, 'F');

    slabs_free(it, ntotal, clsid);

}

  

  

//检验某item是否有适合的slab来存储

bool item_size_ok(const size_t nkey, const int flags, const int nbytes) {

    char prefix[40];

    uint8_t nsuffix;

  

    return slabs_clsid(item_make_header(nkey + 1, flags, nbytes,prefix, &nsuffix)) != 0;

}

  

  

//加入LRU队列,成为新的head

static void item_link_q(item *it) { /* item is the new head */

    item **head, **tail;

    assert(it->slabs_clsid < LARGEST_ID);//判断所设置slab是否有效

    assert((it->it_flags & ITEM_SLABBED) == 0);//判断状态

  

    head = &heads[it->slabs_clsid];

    tail = &tails[it->slabs_clsid];

    assert(it != *head);

    assert((*head && *tail) || (*head == 0 && *tail == 0));

    it->prev = 0;

    it->next = *head;

    if (it->next) it->next->prev = it;

    *head = it;

    if (*tail == 0) *tail = it;//只有tail为空时才加入?

    sizes[it->slabs_clsid]++;

    return;

}

  

  

//从对应的slab的LRU链上删除

static void item_unlink_q(item *it) {

    item **head, **tail;

    assert(it->slabs_clsid < LARGEST_ID);

    head = &heads[it->slabs_clsid];

    tail = &tails[it->slabs_clsid];

  

    if (*head == it) {

        assert(it->prev == 0);

        *head = it->next;

    }

    if (*tail == it) {

        assert(it->next == 0);

        *tail = it->prev;

    }

    assert(it->next != it);

    assert(it->prev != it);

  

    if (it->next) it->next->prev = it->prev;

    if (it->prev) it->prev->next = it->next;

    sizes[it->slabs_clsid]--;

    return;

}

  

  

//将item加入到hashtable和LRU链中

int do_item_link(item *it) {

    MEMCACHED_ITEM_LINK(ITEM_key(it), it->nkey, it->nbytes);//ITEM_key在memcached.h中定义

    assert((it->it_flags & (ITEM_LINKED|ITEM_SLABBED)) == 0);//判断状态,既没有在hash表LRU链中或被释放

    it->it_flags |= ITEM_LINKED;//设置linked状态

    it->time = current_time;//设置最近访问时间

    assoc_insert(it);//插入hashtable   assoc.c

  

    STATS_LOCK();

    stats.curr_bytes += ITEM_ntotal(it);//增加每个item所需要的字节大小,包括item结构体和item内容大小

    stats.curr_items += 1;

    stats.total_items += 1;

    STATS_UNLOCK();

  

    /* Allocate a new CAS ID on link. */

    ITEM_set_cas(it, (settings.use_cas) ? get_cas_id() : 0);//设置新CAS,CAS是memcache用来处理并发请求的一种机制

  

    item_link_q(it);//加入LRU链

  

    return 1;

}

  

  

//从hash表和LRU链中删除item

void do_item_unlink(item *it) {

    MEMCACHED_ITEM_UNLINK(ITEM_key(it), it->nkey, it->nbytes);

    if ((it->it_flags & ITEM_LINKED) != 0) {

        it->it_flags &= ~ITEM_LINKED;//设置为非linked

        STATS_LOCK();

        stats.curr_bytes -= ITEM_ntotal(it);

        stats.curr_items -= 1;

        STATS_UNLOCK();

        assoc_delete(ITEM_key(it), it->nkey);//从hash表中删除

        item_unlink_q(it);//从LRU链中删除

        if (it->refcount == 0) item_free(it);

    }

}

  

  

//remove item

void do_item_remove(item *it) {

    MEMCACHED_ITEM_REMOVE(ITEM_key(it), it->nkey, it->nbytes);

    assert((it->it_flags & ITEM_SLABBED) == 0);

    if (it->refcount != 0) {

        it->refcount--;

        DEBUG_REFCNT(it, '-');

    }

    if (it->refcount == 0 && (it->it_flags & ITEM_LINKED) == 0) {//没有人在引用并且没有在hash表和LEU链中

        item_free(it);

    }

}

  

  

//更新item最后访问时间

void do_item_update(item *it) {

    MEMCACHED_ITEM_UPDATE(ITEM_key(it), it->nkey, it->nbytes);

    if (it->time < current_time - ITEM_UPDATE_INTERVAL) {

        assert((it->it_flags & ITEM_SLABBED) == 0);//没有被释放

  

        if ((it->it_flags & ITEM_LINKED) != 0) {

            item_unlink_q(it);

            it->time = current_time;

            item_link_q(it);

        }

    }

}

  

  

//item替换

int do_item_replace(item *it, item *new_it) {

    MEMCACHED_ITEM_REPLACE(ITEM_key(it), it->nkey, it->nbytes,ITEM_key(new_it), new_it->nkey, new_it->nbytes);

    assert((it->it_flags & ITEM_SLABBED) == 0);//确保没有被释放

  

    do_item_unlink(it);

    return do_item_link(new_it);

}

  

  

/*@null@*/

char *do_item_cachedump(const unsigned int slabs_clsid, const unsigned int limit, unsigned int *bytes) {

    unsigned int memlimit = 2 * 1024 * 1024;   /* 2MB max response size */

    char *buffer;

    unsigned int bufcurr;

    item *it;

    unsigned int len;

    unsigned int shown = 0;

    char key_temp[KEY_MAX_LENGTH + 1];

    char temp[512];

  

    it = heads[slabs_clsid];

  

    buffer = malloc((size_t)memlimit);

    if (buffer == 0) return NULL;

    bufcurr = 0;

  

    while (it != NULL && (limit == 0 || shown < limit)) {

        assert(it->nkey <= KEY_MAX_LENGTH);

        /* Copy the key since it may not be null-terminated in the struct */

        strncpy(key_temp, ITEM_key(it), it->nkey);

        key_temp[it->nkey] = 0x00; /* terminate */

        len = snprintf(temp, sizeof(temp), "ITEM %s [%d b; %lu s]\r\n",key_temp, it->nbytes - 2,(unsigned long)it->exptime + process_started);

        if (bufcurr + len + 6 > memlimit)  /* 6 is END\r\n\0 */

            break;

        memcpy(buffer + bufcurr, temp, len);

        bufcurr += len;

        shown++;

        it = it->next;

    }

  

    memcpy(buffer + bufcurr, "END\r\n", 6);

    bufcurr += 5;

  

    *bytes = bufcurr;

    return buffer;

}

  

  

//slab状态信息

void do_item_stats(ADD_STAT add_stats, void *c) {

    int i;

    for (i = 0; i < LARGEST_ID; i++) {

        if (tails[i] != NULL) {

            const char *fmt = "items:%d:%s";

            char key_str[STAT_KEY_LEN];

            char val_str[STAT_VAL_LEN];

            int klen = 0, vlen = 0;

  

            APPEND_NUM_FMT_STAT(fmt, i, "number", "%u", sizes[i]);

  

            APPEND_NUM_FMT_STAT(fmt, i, "age", "%u", tails[i]->time);

  

            APPEND_NUM_FMT_STAT(fmt, i, "evicted","%u", itemstats[i].evicted);

  

            APPEND_NUM_FMT_STAT(fmt, i, "evicted_nonzero","%u", itemstats[i].evicted_nonzero);

  

            APPEND_NUM_FMT_STAT(fmt, i, "evicted_time","%u", itemstats[i].evicted_time);

  

            APPEND_NUM_FMT_STAT(fmt, i, "outofmemory","%u", itemstats[i].outofmemory);

  

            APPEND_NUM_FMT_STAT(fmt, i, "tailrepairs","%u", itemstats[i].tailrepairs);;

  

            APPEND_NUM_FMT_STAT(fmt, i, "reclaimed","%u", itemstats[i].reclaimed);;

        }

    }

  

    /* getting here means both ascii and binary terminators fit */

    add_stats(NULL, 0, NULL, 0, c);

}

  

  

/** dumps out a list of objects of each size, with granularity of 32 bytes */

/*@null@*/

void do_item_stats_sizes(ADD_STAT add_stats, void *c) {

  

    /* max 1MB object, divided into 32 bytes size buckets */

    const int num_buckets = 32768;

    unsigned int *histogram = calloc(num_buckets, sizeof(int));

  

    if (histogram != NULL) {

        int i;

  

        /* build the histogram */

        for (i = 0; i < LARGEST_ID; i++) {

            item *iter = heads[i];

            while (iter) {

                int ntotal = ITEM_ntotal(iter);

                int bucket = ntotal / 32;

                if ((ntotal % 32) != 0) bucket++;

                if (bucket < num_buckets) histogram[bucket]++;

                iter = iter->next;

            }

        }

  

        /* write the buffer */

        for (i = 0; i < num_buckets; i++) {

            if (histogram[i] != 0) {

                char key[8];

                int klen = 0;

                klen = snprintf(key, sizeof(key), "%d", i * 32);

                assert(klen < sizeof(key));

                APPEND_STAT(key, "%u", histogram[i]);

            }

        }

        free(histogram);

    }

    add_stats(NULL, 0, NULL, 0, c);

}

  

  

//获取item

item *do_item_get(const char *key, const size_t nkey) {

    item *it = assoc_find(key, nkey);

    int was_found = 0;

  

    if (settings.verbose > 2) {//输出调试信息

        if (it == NULL) {

            fprintf(stderr, "> NOT FOUND %s", key);

        } else {

            fprintf(stderr, "> FOUND KEY %s", ITEM_key(it));

            was_found++;

        }

    }

  

    //忽略比设置日期早的item

    if (it != NULL && settings.oldest_live != 0 && settings.oldest_live <= current_time && it->time <= settings.oldest_live) {

        do_item_unlink(it);           /* MTSAFE - cache_lock held */

        it = NULL;

    }

  

    if (it == NULL && was_found) {

        fprintf(stderr, " -nuked by flush");//被忽略错误信息

        was_found--;

    }

  

    if (it != NULL && it->exptime != 0 && it->exptime <= current_time) {//过期

        do_item_unlink(it);           /* MTSAFE - cache_lock held */

        it = NULL;

    }

  

    if (it == NULL && was_found) {

        fprintf(stderr, " -nuked by expire");//过期错误

        was_found--;

    }

  

    if (it != NULL) {

        it->refcount++;

        DEBUG_REFCNT(it, '+');

    }

  

    if (settings.verbose > 2)

        fprintf(stderr, "\n");

  

    return it;

}

  

  

//获取一个item,不论过期与否

item *do_item_get_nocheck(const char *key, const size_t nkey) {

    item *it = assoc_find(key, nkey);

    if (it) {

        it->refcount++;

        DEBUG_REFCNT(it, '+');

    }

    return it;

}

  

  

//flush all items

void do_item_flush_expired(void) {

    int i;

    item *iter, *next;

    if (settings.oldest_live == 0)

        return;

    for (i = 0; i < LARGEST_ID; i++) {

        /* The LRU is sorted in decreasing time order, and an item's timestamp

         * is never newer than its last access time, so we only need to walk

         * back until we hit an item older than the oldest_live time.

         * The oldest_live checking will auto-expire the remaining items.

         */

        for (iter = heads[i]; iter != NULL; iter = next) {

            if (iter->time >= settings.oldest_live) {

                next = iter->next;

                if ((iter->it_flags & ITEM_SLABBED) == 0) {//没有被释放,unlink

                    do_item_unlink(iter);

                }

            } else {

                break;

            }

        }

    }

}

 
 
 

http://www.cnblogs.com/xianbei/archive/2011/02/26/1938494.html

slab是memcache用来管理item的内容存储部分。

  分配内存时,memcache把我们通过参数m设置的内存大小分配到每个slab中

  1、slab默认最多为200个,但是由于item的最大为1MB,而且每个slab里面存储的item的尺寸是根据factor来确定的,所以能够分配的slab的个数小于200。

  2、关于增长因子factor参数(配置时参数名为f),默认为1.25,即每个slab所能存储的item的大小是根据factor的大小来变化的。

  3、每个slab中含有一个或多个trunk,trunk中存储的就是item,item的最大为1M,所以trunk最大为1M

  4、每个slab中会有一个item空闲列表,当新的item需要存储时,首先会考虑空闲列表,从中取出一个位置用来存储。当空闲列表满时,系统会去自动扩充。

  5、每个slab中有二个参数为end_page_ptr、end_page_free,前者指向当前空闲的trunk指针,后者当前trunk指向空闲处,当4中的空闲列表为空时,如果end_page_ptr和end_page_free不为空,则会在此trunk中存储item。如果没有多余的trunk可用,系统会自动扩充trunk。

  采用这种方式管理内存的好处是最大程度的减少了内存碎片的产生,提高了存储和读取效率。

  下面是一些源码注释

  slabs.h

?
1
2
3
4
5
6
7
8
/* stats */
void stats_prefix_init( void );
void stats_prefix_clear( void );
void stats_prefix_record_get( const char *key, const size_t nkey, const bool is_hit);
void stats_prefix_record_delete( const char *key, const size_t nkey);
void stats_prefix_record_set( const char *key, const size_t nkey);
/*@null@*/
char *stats_prefix_dump( int *length);

  slabs.c

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
/* -*- Mode: C; tab-width: 4; c-basic-offset: 4; indent-tabs-mode: nil -*- */
/*
  * Slabs memory allocation, based on powers-of-N. Slabs are up to 1MB in size
  * and are divided into chunks. The chunk sizes start off at the size of the
  * "item" structure plus space for a small key and value. They increase by
  * a multiplier factor from there, up to half the maximum slab size. The last
  * slab size is always 1MB, since that's the maximum item size allowed by the
  * memcached protocol.
  */
#include "memcached.h"
#include <sys/stat.h>
#include <sys/socket.h>
#include <sys/signal.h>
#include <sys/resource.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <errno.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <pthread.h>
  
/* powers-of-N allocation structures */
  
typedef struct {
     unsigned int size;      /* item的大小 */
     unsigned int perslab;   /* 每个trunk有多少item */
  
     void **slots;           //空闲item列表
     unsigned int sl_total;  //空闲总量
     unsigned int sl_curr;   //当前空闲处
  
     void *end_page_ptr;         //当前trunk空闲处
     unsigned int end_page_free; //当前trunk空闲个数
  
     unsigned int slabs;     //已分配chunk数目
  
     void **slab_list;       //trunk指针
     unsigned int list_size; //trunk数目
  
     unsigned int killing;  /* index+1 of dying slab, or zero if none */
     size_t requested; //已分配总内存大小
} slabclass_t;
  
static slabclass_t slabclass[MAX_NUMBER_OF_SLAB_CLASSES];
static size_t mem_limit = 0; //内存限制大小
static size_t mem_malloced = 0; //已分配大小
static int power_largest;
  
static void *mem_base = NULL;
static void *mem_current = NULL; //内存使用当前地址
static size_t mem_avail = 0; //剩余内存
  
/**
  * slab 线程锁
  */
static pthread_mutex_t slabs_lock = PTHREAD_MUTEX_INITIALIZER;
  
/*
  * Forward Declarations
  */
static int do_slabs_newslab( const unsigned int id);
static void *memory_allocate( size_t size);
  
#ifndef DONT_PREALLOC_SLABS
/* Preallocate as many slab pages as possible (called from slabs_init)
    on start-up, so users don't get confused out-of-memory errors when
    they do have free (in-slab) space, but no space to make new slabs.
    if maxslabs is 18 (POWER_LARGEST - POWER_SMALLEST + 1), then all
    slab types can be made.  if max memory is less than 18 MB, only the
    smaller ones will be made.  */
static void slabs_preallocate ( const unsigned int maxslabs);
#endif
  
  
//寻找适合给定大小的item存储的slab
unsigned int slabs_clsid( const size_t size) {
     int res = POWER_SMALLEST;
  
     if (size == 0)
         return 0;
     while (size > slabclass[res].size) //找到第一个比item size大的slab
         if (res++ == power_largest)
             return 0;
     return res;
}
  
  
/* slab初始化*/
/* limit:内存大小(字节);factor:增长因子;prealloc:是否一次性分配内存*/
void slabs_init( const size_t limit, const double factor, const bool prealloc) {
     int i = POWER_SMALLEST - 1; //0
     unsigned int size = sizeof (item) + settings.chunk_size; //chunk_size 最小分配空间
  
     mem_limit = limit;
  
     if (prealloc) { //一次分配所有设置的内存
         /* Allocate everything in a big chunk with malloc */
         mem_base = malloc (mem_limit);
         if (mem_base != NULL) {
             mem_current = mem_base;
             mem_avail = mem_limit;
         } else {
             fprintf (stderr, "Warning: Failed to allocate requested memory in one large chunk.\nWill allocate in smaller chunks\n" );
         }
     }
  
     memset (slabclass, 0, sizeof (slabclass));
  
     while (++i < POWER_LARGEST && size <= settings.item_size_max / factor) {
         /* Make sure items are always n-byte aligned */
         if (size % CHUNK_ALIGN_BYTES) //字节数为8的倍数
             size += CHUNK_ALIGN_BYTES - (size % CHUNK_ALIGN_BYTES);
  
         slabclass[i].size = size; //item大小
         slabclass[i].perslab = settings.item_size_max / slabclass[i].size; //item数目
         size *= factor; //乘以增长因子
         if (settings.verbose > 1) {
             fprintf (stderr, "slab class %3d: chunk size %9u perslab %7u\n" ,i, slabclass[i].size, slabclass[i].perslab);
         }
     }
  
     power_largest = i;
     slabclass[power_largest].size = settings.item_size_max;
     slabclass[power_largest].perslab = 1; //最大的只能存储一个item
     if (settings.verbose > 1) {
         fprintf (stderr, "slab class %3d: chunk size %9u perslab %7u\n" ,i, slabclass[i].size, slabclass[i].perslab);
     }
  
     /* for the test suite:  faking of how much we've already malloc'd */
     {
         char *t_initial_malloc = getenv ( "T_MEMD_INITIAL_MALLOC" );
         if (t_initial_malloc) {
             mem_malloced = ( size_t ) atol (t_initial_malloc);
         }
  
     }
  
#ifndef DONT_PREALLOC_SLABS
     {
         char *pre_alloc = getenv ( "T_MEMD_SLABS_ALLOC" );
  
         if (pre_alloc == NULL || atoi (pre_alloc) != 0) {
             slabs_preallocate(power_largest);
         }
     }
#endif
}
  
  
//新分配trunk
#ifndef DONT_PREALLOC_SLABS
static void slabs_preallocate ( const unsigned int maxslabs) {
     int i;
     unsigned int prealloc = 0;
  
     /* pre-allocate a 1MB slab in every size class so people don't get
        confused by non-intuitive "SERVER_ERROR out of memory"
        messages.  this is the most common question on the mailing
        list.  if you really don't want this, you can rebuild without
        these three lines.  */
  
     for (i = POWER_SMALLEST; i <= POWER_LARGEST; i++) {
         if (++prealloc > maxslabs)
             return ;
         do_slabs_newslab(i);
     }
  
}
#endif
  
  
//扩充trunk数目
static int grow_slab_list ( const unsigned int id) {
     slabclass_t *p = &slabclass[id];
     if (p->slabs == p->list_size) {
         size_t new_size =  (p->list_size != 0) ? p->list_size * 2 : 16;
         void *new_list = realloc (p->slab_list, new_size * sizeof ( void *));
         if (new_list == 0) return 0;
         p->list_size = new_size;
         p->slab_list = new_list;
     }
     return 1;
}
  
  
  
//分配trunk
static int do_slabs_newslab( const unsigned int id) {
     slabclass_t *p = &slabclass[id];
     int len = p->size * p->perslab; //1MB
     char *ptr;
  
     if ((mem_limit && mem_malloced + len > mem_limit && p->slabs > 0) || (grow_slab_list(id) == 0) || ((ptr = memory_allocate(( size_t )len)) == 0)) {
         MEMCACHED_SLABS_SLABCLASS_ALLOCATE_FAILED(id);
         return 0;
     }
  
     memset (ptr, 0, ( size_t )len);
     p->end_page_ptr = ptr;
     p->end_page_free = p->perslab;
  
     p->slab_list[p->slabs++] = ptr;
     mem_malloced += len;
     MEMCACHED_SLABS_SLABCLASS_ALLOCATE(id);
  
     return 1;
}
  
  
  
/*@存储item@*/
static void *do_slabs_alloc( const size_t size, unsigned int id) {
     slabclass_t *p;
     void *ret = NULL;
  
     if (id < POWER_SMALLEST || id > power_largest) {
         MEMCACHED_SLABS_ALLOCATE_FAILED(size, 0);
         return NULL;
     }
  
     p = &slabclass[id];
     assert (p->sl_curr == 0 || ((item *)p->slots[p->sl_curr - 1])->slabs_clsid == 0);
  
#ifdef USE_SYSTEM_MALLOC
     if (mem_limit && mem_malloced + size > mem_limit) {
         MEMCACHED_SLABS_ALLOCATE_FAILED(size, id);
         return 0;
     }
     mem_malloced += size;
     ret = malloc (size);
     MEMCACHED_SLABS_ALLOCATE(size, id, 0, ret);
     return ret;
#endif
  
     /* fail unless we have space at the end of a recently allocated page,
        we have something on our freelist, or we could allocate a new page */
     if (! (p->end_page_ptr != 0 || p->sl_curr != 0 || do_slabs_newslab(id) != 0)) { //没有空闲 也不能扩展
         ret = NULL;
     } else if (p->sl_curr != 0) {
         /* return off our freelist */
         ret = p->slots[--p->sl_curr];
     } else {
         /* if we recently allocated a whole page, return from that */
         assert (p->end_page_ptr != NULL);
         ret = p->end_page_ptr;
         if (--p->end_page_free != 0) {
             p->end_page_ptr = ((caddr_t)p->end_page_ptr) + p->size;
         } else {
             p->end_page_ptr = 0;
         }
     }
  
     if (ret) {
         p->requested += size;
         MEMCACHED_SLABS_ALLOCATE(size, id, p->size, ret);
     } else {
         MEMCACHED_SLABS_ALLOCATE_FAILED(size, id);
     }
  
     return ret;
}
  
  
//释放item内存
static void do_slabs_free( void *ptr, const size_t size, unsigned int id) {
     slabclass_t *p;
  
     assert (((item *)ptr)->slabs_clsid == 0);
     assert (id >= POWER_SMALLEST && id <= power_largest);
     if (id < POWER_SMALLEST || id > power_largest)
         return ;
  
     MEMCACHED_SLABS_FREE(size, id, ptr);
     p = &slabclass[id];
  
#ifdef USE_SYSTEM_MALLOC
     mem_malloced -= size;
     free (ptr);
     return ;
#endif
  
     if (p->sl_curr == p->sl_total) { //需要扩充空闲列表
         int new_size = (p->sl_total != 0) ? p->sl_total * 2 : 16;  /* 16 is arbitrary */
         void **new_slots = realloc (p->slots, new_size * sizeof ( void *));
         if (new_slots == 0)
             return ;
         p->slots = new_slots;
         p->sl_total = new_size;
     }
     p->slots[p->sl_curr++] = ptr;
     p->requested -= size;
     return ;
}
  
  
static int nz_strcmp( int nzlength, const char *nz, const char *z) {
     int zlength= strlen (z);
     return (zlength == nzlength) && ( strncmp (nz, z, zlength) == 0) ? 0 : -1;
}
  
  
//获取状态
bool get_stats( const char *stat_type, int nkey, ADD_STAT add_stats, void *c) {
     bool ret = true ;
  
     if (add_stats != NULL) {
         if (!stat_type) {
             /* prepare general statistics for the engine */
             STATS_LOCK();
             APPEND_STAT( "bytes" , "%llu" , (unsigned long long )stats.curr_bytes);
             APPEND_STAT( "curr_items" , "%u" , stats.curr_items);
             APPEND_STAT( "total_items" , "%u" , stats.total_items);
             APPEND_STAT( "evictions" , "%llu" ,(unsigned long long )stats.evictions);
             APPEND_STAT( "reclaimed" , "%llu" ,(unsigned long long )stats.reclaimed);
             STATS_UNLOCK();
         } else if (nz_strcmp(nkey, stat_type, "items" ) == 0) {
             item_stats(add_stats, c);
         } else if (nz_strcmp(nkey, stat_type, "slabs" ) == 0) {
             slabs_stats(add_stats, c);
         } else if (nz_strcmp(nkey, stat_type, "sizes" ) == 0) {
             item_stats_sizes(add_stats, c);
         } else {
             ret = false ;
         }
     } else {
         ret = false ;
     }
  
     return ret;
}
  
  
/*状态*/
static void do_slabs_stats(ADD_STAT add_stats, void *c) {
     int i, total;
     /* Get the per-thread stats which contain some interesting aggregates */
     struct thread_stats thread_stats;
     threadlocal_stats_aggregate(&thread_stats);
  
     total = 0;
     for (i = POWER_SMALLEST; i <= power_largest; i++) {
         slabclass_t *p = &slabclass[i];
         if (p->slabs != 0) {
             uint32_t perslab, slabs;
             slabs = p->slabs;
             perslab = p->perslab;
  
             char key_str[STAT_KEY_LEN];
             char val_str[STAT_VAL_LEN];
             int klen = 0, vlen = 0;
  
             APPEND_NUM_STAT(i, "chunk_size" , "%u" , p->size);
             APPEND_NUM_STAT(i, "chunks_per_page" , "%u" , perslab);
             APPEND_NUM_STAT(i, "total_pages" , "%u" , slabs);
             APPEND_NUM_STAT(i, "total_chunks" , "%u" , slabs * perslab);
             APPEND_NUM_STAT(i, "used_chunks" , "%u" ,slabs*perslab - p->sl_curr - p->end_page_free);
             APPEND_NUM_STAT(i, "free_chunks" , "%u" , p->sl_curr);
             APPEND_NUM_STAT(i, "free_chunks_end" , "%u" , p->end_page_free);
             APPEND_NUM_STAT(i, "mem_requested" , "%llu" ,(unsigned long long )p->requested);
             APPEND_NUM_STAT(i, "get_hits" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].get_hits);
             APPEND_NUM_STAT(i, "cmd_set" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].set_cmds);
             APPEND_NUM_STAT(i, "delete_hits" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].delete_hits);
             APPEND_NUM_STAT(i, "incr_hits" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].incr_hits);
             APPEND_NUM_STAT(i, "decr_hits" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].decr_hits);
             APPEND_NUM_STAT(i, "cas_hits" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].cas_hits);
             APPEND_NUM_STAT(i, "cas_badval" , "%llu" ,(unsigned long long )thread_stats.slab_stats[i].cas_badval);
  
             total++;
         }
     }
  
     /* add overall slab stats and append terminator */
  
     APPEND_STAT( "active_slabs" , "%d" , total);
     APPEND_STAT( "total_malloced" , "%llu" , (unsigned long long )mem_malloced);
     add_stats(NULL, 0, NULL, 0, c);
}
  
  
//为item分配内存
static void *memory_allocate( size_t size) {
     void *ret;
  
     if (mem_base == NULL) {
         /* We are not using a preallocated large memory chunk */
         ret = malloc (size);
     } else {
         ret = mem_current;
  
         if (size > mem_avail) {
             return NULL;
         }
  
         /* mem_current pointer _must_ be aligned!!! */
         if (size % CHUNK_ALIGN_BYTES) {
             size += CHUNK_ALIGN_BYTES - (size % CHUNK_ALIGN_BYTES);
         }
  
         mem_current = (( char *)mem_current) + size;
         if (size < mem_avail) {
             mem_avail -= size;
         } else {
             mem_avail = 0;
         }
     }
  
     return ret;
}
  
  
//存储
void *slabs_alloc( size_t size, unsigned int id) {
     void *ret;
  
     pthread_mutex_lock(&slabs_lock);
     ret = do_slabs_alloc(size, id);
     pthread_mutex_unlock(&slabs_lock);
     return ret;
}
  
  
//释放
void slabs_free( void *ptr, size_t size, unsigned int id) {
     pthread_mutex_lock(&slabs_lock);
     do_slabs_free(ptr, size, id);
     pthread_mutex_unlock(&slabs_lock);
}
  
  
//状态
void slabs_stats(ADD_STAT add_stats, void *c) {
     pthread_mutex_lock(&slabs_lock);
     do_slabs_stats(add_stats, c);
     pthread_mutex_unlock(&slabs_lock);
}

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值