Mongo 内存映射与日志

最新推荐文章于 2023-06-20 21:24:48 发布

lmm2003

最新推荐文章于 2023-06-20 21:24:48 发布

阅读量2.2k

点赞数

本文链接：https://blog.csdn.net/lmm2003/article/details/8180277

版权

Memory Map

对文件的操作较快

把文件的指定内容映射到内存空间中，普通文件被映射到进程地址空间后，进程可以像访问普通内存一样对文件进行访问，不必再调用read(),write()等操作。

#include<sys/mman.h>

void *mmap(void *start,size_t length,int prot,int flags,int fd,off_t offset) ;

从fd的offset映射length个字节到start起始的进程空间(虚拟内存)，并不是一次把整个文件都映射进去，而是按序换入换出。

prot描述要获得的对内存操作的权限，不得与打开文件的权限冲突

read

write

EXEC //允许执行该内存端

NONE //内存不能被访问

flag 选项

private:内存段私用，其他进程不可见

shared:内存的改变对其他进程可见

要更新磁盘的文件需调用：

msync(start ,length,flag)

flag: Async //告诉内核刷新，自己返回，异步刷新

Sync //在刷新后才返回，同步刷新

invalidate //让核心自行看着办，刷不刷新无所谓。仅在特殊情况下使用

munmap() //解除mmap,刷新磁盘文件

存在的问题：

问题一：消耗大量的内存。虽然内存映射在读具体的数据时，才从文件中加载相应的块,(写也一样，需要先读入数据才能在内存中写）。但还是避免不了随着时间的积累，文件中的数据会全部加载到内存中。

实现：

MongoFile

MongoFile:nocopyable:即代表单个MemoryMapped文件，有可以管理所有的MongoFIle(通过静态函数和静态变量来实现）;因为同时支持win和posix平台，所有有些在posix平台下的函数就是空函数。是个抽象基类。子类 MemoryMappedFile就是具体的MMapped类，而孙子辈的MongoMMF用于日志中，稍后介绍。

#define O_NOATIME (0) //打开文件时不修改访问时间，0表示修改
#define MAP_NORESERVE (0) //map的flags参数中的一个,不使用swap,0表示使用

文件的访问方式，1表示文件要顺序访问，OS可以根据此选项进行优化，2，表示只读

enum Options {
SEQUENTIAL = 1, // hint - e.g. FILE_FLAG_SEQUENTIAL_SCAN
READONLY = 2 // not contractually guaranteed, but if specified the impl has option to fault writes
};

静态变量:供MongoFile的静态函数调用(静态函数操作下面两个集合）

mmfiles代表mmapfiles,

set<MongoFile*> MongoFile::mmfiles;
map<string,MongoFile*> MongoFile::pathToFile; 有个辅助类叫做MongoFileFinder,其中只有一个函数，根据MongoFile的path来找到mmfiles

// lock order: lock dbMutex before this if you lock both
static RWLockRecursive mmmutex;

静态函数：

static exist(filesystem::path){判断文件是否存在}

static void closeAllFiles( stringstream &message );

static void markAllWritable() { } //辅助类，MongoFileAllowWrites，只有构造和析构函数，

static void unmarkAllWritable() { } //构造函数调用markAllWritable,析构函数调用unmark

static void forEach( F fun );对每一个set中的mmfiles调用fun函数

static getAllfiles()

static *notifyPreFlush() =nullFunc //通知器，需要在PreFlush前被调用的函数，在这里进行注册

static *notifyPostFlush() =nullFunk //定义了两个函数指针,初始为空函数，在flush调用

void nullFunc={} 和MongoFile同级的

static int flushAll( bool sync ); // returns n flushed //有一个线程会定期(60s)调用该函数（munmap,msync只会刷新脏页)

notifyPreFlush()

ret= _flushAll(sync)

notifyPostFulsh()

return ret

static int _flushAll(bool sync);目前凡是调用的都是调用的同步的函数。

_flushAll(sync) //sync=false,一次调用每个成员的flush（MongoFile并没有实现flush函数，靠子类来实现。同步模式和异步模式一样，可能为了兼容其他平台，dio

static long long totalMappedLength() //获取所有mmfiles的总大小

普通函数：

void created(); /* subclass must call after create */mmfiles.insert(this);

/* subclass must call in destructor(or at close).
removes this from pathToFile and other maps
safe to call more than once, albeit might be wasted work
ideal to call close to the close, if the close is well before object destruction
*/
void MongoFile::destroyed() {
mmmutex.assertExclusivelyLocked();
mmfiles.erase(this);
pathToFile.erase( filename() );
}

virtual bool isMongoMMF()=return false

string filename()

setfilename ;{

MongoFile *&ptf = pathToFile[fn];
massert(13617, "MongoFile : multiple opens of same filename", ptf == 0);
ptf = this;//还何以引用内部值

}

string _filename

未实现的成员函数。

/** Flushable has to fail nicely if the underlying object gets killed */
class Flushable { //不知道在linux有什么用，
public:
virtual ~Flushable() {}
virtual void flush() = 0;
};

create=0

flush(bool sync)=0

Flushable prepareFlush()=0 //没什么用，在linux 下就是调用对象的flush,主要目的是用在windows中。构造mmf的flush函数。

close()=0

   void created(); /* subclass must call after create =init*/ 把该对象添加到set,map中,在构造函数中调用

/* subclass must call in destructor (or at close).
            removes this from pathToFile and other maps
            safe to call more than once, albeit(尽管） might be wasted work
           ideal to call close(接近) to the close, if the close is well before object destruction
*/
         void destroyed(); //从set,map中移除

long long length()=0

_lock()={assert(mprotect(views[0], len, PROT_READ | PROT_WRITE) == 0);获取读写权限）

_unlock()=只开放读权限

typedef MemoryMappedFile MMF;

MemoryMappedFile:{

成员：

HANDLE fd;
HANDLE maphandle;
vector<void *> views;
unsigned long long len; //先open再map,这个views是一个文件的多个mmap地址，maphandle是什么:windows用的句柄

覆盖未实现的函数：

create(string filename, unsigned long long len, bool zero) //创建文件，存在时直接map,不存在时再创建。

调用p=map(filename.c_str(), len);

zero为真时，填充0

return p

flush

msync(viewForFlushing(), len, sync ? MS_SYNC : MS_ASYNC) ;//viewForFlushing 目前该类的viewForFlushing 只能有一个。

prepareFlush

本质也是调用msync(viewForFlushing)

自己的函数：

map(filename)

map(filename,len(filename))

mapWithOptions(filename,options)

map(name,len,options)

map(name,lenghth,options) //所有的map都调用到这里了。

把文件扩展到指定长度,len=lenghth

调用fd=open(rw,noatime),

void * view = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); //shared让其他进程可见。

views.push_back( view );

如果选项中有sequental

调用madvise(address,len,sequential) //

调用_unlock

return view

It allows an application to tell the kernel how it expects to
       use some mapped or shared memory areas, so that the kernel can choose
       appropriate read-ahead and caching techniques.  This call does not influence
       the semantics of the application (except in the case of MADV_DONTNEED), but
       may influence its performance.  The kernel is free to ignore the advice.

MemoryMappedFile::createReadOnlyMap()

void * x = mmap( /*start*/0 , len , PROT_READ , MAP_SHARED , fd , 0 ); 并没有添加到views中。

void* createPrivateMap(); //但没见谁调用过

void * x = mmap( /*start*/0 , len , PROT_READ|PROT_WRITE , MAP_PRIVATE|MAP_NORESERVE , fd , 0 );
添加到views

void* MemoryMappedFile::remapPrivateView(void *oldPrivateAddr) ;添加到了views 中,调用的是Linux自带的mmap

仅调用：void * x = mmap( oldPrivateAddr, len , PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE|MAP_FIXED , fd , 0 );

重新调用一次，貌似会减少内存的使用量。//重新对该文件进行一次内存映射

long shortLength() const { return (long) len; }

构造函数：

调用created

析构函数：

调用close()

对每个 view调用unmap,并调用close,调用destroyed

静态函数:

void updateLength(filename,long long &length) //获得filename文件的长度

普通函数：

void clearWritableBits(void *privateView) { }

void MemoryMappedFile::_lock() { //对内存加写权限
if (! views.empty() && isMongoMMF() )
assert(mprotect(views[0], len, PROT_READ | PROT_WRITE) == 0);
}

void MemoryMappedFile::_unlock() { //关闭内存的写权限
if (! views.empty() && isMongoMMF() )
assert(mprotect(views[0], len, PROT_READ) == 0);
}

} // namespace mongo

虚函数，供子类调用；

viewForFlushing(); 只返回第一个。

}

Journal

目的：

用于故障恢复，在Mongo不正常关闭后，可以通过redo日志进行修复。

原理：

为什么要使用日志?

因为Mongo是用的是内存映射(MMap附录一)进行存储数据。数据的修改都是在内存中进行的，如果每次修改数据都调用msync函数的话(即把内存中的数据flush到文件中）， I/O代价太大。

因此，MongoDB每次把修改的数据先写到内存和日志缓存中，每隔100ms(默认的时间间隔)把日志缓存写入到日志文件中。这样即便宕机(Mongo不正常退出，因而没有调用msync函数），导致数据文件的数据和宕机前内存中的数据不一致，但由于日志文件的存在，可以在重启时把未写入到文件中数据重新写到文件中。

日志机制中存在的问题

问题一，日志缓存机制。由于日志也是文件，因此如果日志的数据每次都是写入到日志文件中也会造成很大的I/O,因此日志需要缓存,即缓存100ms的日志数据(或者缓存一定量的数据),可见由于日志缓存每隔100ms才刷新到文件这一限制的存在，Mongo也肯能丢失那些只在日志缓存而没有flush到日志文件中的数据。

问题一的解决方法，在getLastError中加入j参数，当日志缓存flush到文件中才返回上一次修改操作的结果。(加入j参数的getLastError会使下一次刷新日志缓存的时间间隔缩短到原来的1/3)

问题二，msync的调用时间，每隔100ms,会进行groupcommit,groupcommit调用RecoverJob,进行

问题三，private_view机制。由于写操作是写入内存后再写入日志。

问题四，read机制。

实现：

参考图1-1