Mongo 内存映射与日志

最新推荐文章于 2023-06-20 21:24:48 发布

lmm2003

最新推荐文章于 2023-06-20 21:24:48 发布

阅读量2.5k

点赞数

本文链接：https://blog.csdn.net/lmm2003/article/details/8179067

版权

内存映射

目的：

MongoDB的存储引擎用的是OS的内存映射机制。

原理：

以linux 内存映射为例:

存在的问题：

问题一：消耗大量的内存。虽然内存映射在读具体的数据时，才从文件中加载相应的块,(写也一样，需要先读入数据才能在内存中写）。但还是避免不了随着时间的积累，文件中的数据会全部加载到内存中。

实现：

MongoFile

MongoFile:nocopyable:即代表单个MemoryMapped文件，有可以管理所有的MongoFIle(通过静态函数和静态变量来实现）;因为同时支持win和posix平台，所有有些在posix平台下的函数就是空函数。是个抽象基类。子类 MemoryMappedFile就是具体的MMapped类，而孙子辈的MongoMMF用于日志中，稍后介绍。

#define O_NOATIME (0) //打开文件时不修改访问时间，0表示修改
#define MAP_NORESERVE (0) //map的flags参数中的一个,不使用swap,0表示使用

文件的访问方式，1表示文件要顺序访问，OS可以根据此选项进行优化(windows)，2，表示只读

enum Options {
SEQUENTIAL = 1, // hint - e.g. FILE_FLAG_SEQUENTIAL_SCAN on windows
READONLY = 2 // not contractually guaranteed, but if specified the impl has option to fault writes
};

静态变量:供MongoFile的静态函数调用(静态函数操作下面两个集合）

mmfiles代表mmapfiles,

set<MongoFile*> MongoFile::mmfiles;
map<string,MongoFile*> MongoFile::pathToFile; 有个辅助类叫做MongoFileFinder,其中只有一个函数，根据MongoFile的path来找到mmfiles

// lock order: lock dbMutex before this if you lock both
static RWLockRecursive mmmutex;

静态函数：

static exist(filesystem::path){判断文件是否存在}

static void closeAllFiles( stringstream &message );

static void markAllWritable() { } //辅助类，MongoFileAllowWrites，只有构造和析构函数，

static void unmarkAllWritable() { } //构造函数调用markAllWritable,析构函数调用unmark

static void forEach( F fun );对每一个set中的mmfiles调用fun函数

static getAllfiles()

static *notifyPreFlush() =nullFunc //通知器，需要在PreFlush前被调用的函数，在这里进行注册

static *notifyPostFlush() =nullFunk //定义了两个函数指针,初始为空函数，在flush调用

void nullFunc={} 和MongoFile同级的

static int flushAll( bool sync ); // returns n flushed

notifyPreFlush()

ret= _flushAll(sync)

notifyPostFulsh()

return ret

static int _flushAll(bool sync);

_flushAll(sync) //sync=false,一次调用每个成员的flush（MongoFile并没有实现flush函数，靠子类来实现。同步模式和异步模式一样，可能为了兼容其他平台，dio

static long long totalMappedLength() //获取所有mmfiles的总大小

普通函数：

void created(); /* subclass must call after create */mmfiles.insert(this);

/* subclass must call in destructor (or at close).
removes this from pathToFile and other maps
safe to call more than once, albeit might be wasted work
ideal to call close to the close, if the close is well before object destruction
*/
void MongoFile::destroyed() {
mmmutex.assertExclusivelyLocked();
mmfiles.erase(this);
pathToFile.erase( filename() );
}

virtual bool isMongoMMF()=return false

string filename()

setfilename ;{

MongoFile *&ptf = pathToFile[fn];
massert(13617, "MongoFile : multiple opens of same filename", ptf == 0);
ptf = this;//还何以引用内部值

}

string _filename

未实现的成员函数。

/** Flushable has to fail nicely if the underlying object gets killed */
class Flushable { //不知道在linux有什么用，
public:
virtual ~Flushable() {}
virtual void flush() = 0;
};

create=0

flush(bool sync)=0

Flushable prepareFlush()=0 //没什么用，在linux 下就是调用对象的flush,主要目的是用在windows中。构造mmf的flush函数。

close()=0

   void created(); /* subclass must call after create =init*/ 把该对象添加到set,map中

/* subclass must call in destructor (or at close).
            removes this from pathToFile and other maps
            safe to call more than once, albeit(尽管） might be wasted work
           ideal to call close(接近) to the close, if the close is well before object destruction
*/
         void destroyed(); //从set,map中移除

long long length()=0

_lock()={assert(mprotect(views[0], len, PROT_READ | PROT_WRITE) == 0);获取读写权限）

_unlock()=只开放读权限

MemoryMappedFile:{

成员：

覆盖未实现的函数：

}

Journal

目的：

用于故障恢复，在Mongo不正常关闭后，可以通过redo日志进行修复。

原理：

为什么要使用日志?

因为Mongo是用的是内存映射(MMap附录一)进行存储数据。数据的修改都是在内存中进行的，如果每次修改数据都调用msync函数的话(即把内存中的数据flush到文件中）， I/O代价太大。

因此，MongoDB每次把修改的数据先写到内存和日志缓存中，每隔100ms(默认的时间间隔)把日志缓存写入到日志文件中。这样即便宕机(Mongo不正常退出，因而没有调用msync函数），导致数据文件的数据和宕机前内存中的数据不一致，但由于日志文件的存在，可以在重启时把未写入到文件中数据重新写到文件中。

日志机制中存在的问题

问题一，日志缓存机制。由于日志也是文件，因此如果日志的数据每次都是写入到日志文件中也会造成很大的I/O,因此日志需要缓存,即缓存100ms的日志数据(或者缓存一定量的数据),可见由于日志缓存每隔100ms才刷新到文件这一限制的存在，Mongo也肯能丢失那些只在日志缓存而没有flush到日志文件中的数据。

问题一的解决方法，在getLastError中加入j参数，当日志缓存flush到文件中才返回上一次修改操作的结果。(加入j参数的getLastError会使下一次刷新日志缓存的时间间隔缩短到原来的1/3)

问题二，msync的调用时间，每隔100ms,会进行groupcommit,groupcommit调用RecoverJob,进行

问题三，private_view机制。由于写操作是写入内存后再写入日志。

问题四，read机制。

实现：

参考图1-1