关于FileObject
一个文件可以多个FO,每个FO代表一个打开实例.
对于代表同一个Stream来说,它们的
Section Object Pointers值是一样的.
FsContext值也是一样的.
FsContext对同一个文件的所有FO都一样.
FsContext2 每个用户句柄上下文都有,metada没有用户句柄上下文.
SectionObjectPointers 单个实例的指针.
DataSection 若已创建一个Mapped Section则此值非空.
SharedCacheMap 若此Stream已被缓冲管理器Set up则此值非空.
ImageSection 对可Executables文件才有此项
PrivateCacheMap ---per handle Cc context
(readahead) that alse servers as erference form this file object to the shared cache Map.
单个实例和Metadata
文件系统用一个Stream代表Metadata,但它们对用户是不可见的.
Directoryies require a level of indirection to escape single instancing exposing the data.
Filesystems create a second internal "stream" fileobject
-- user's fileobject has NULL members in its Section Object Pointers
--stream fileobjects have no FsContext2(user handle context)
All metadata streams are built lik this (MFTs,FATs,ects.)
FsContext2 == NULL play an important role in how Cc treats these streams,
which we'll discuss later.
View Management
•
A Shared Cache Map has an array of View Access Control Block (VACB) pointers which record the base cache address of each view
–
promoted to a sparse form for files > 32MB
Access interfaces map File+FileOffset to a cache address.
.Taking a view miss results in
a new mapping ,possibly unmapping an unreferenced
view in another file(views are recycled LRU).
•
Since a view is fixed size, mapping across a view is impossible – Cc returns .e address
•
Fixed size means no fragmentation …
Interface Summary<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
• File objects start out unadorned
• CcInitializeCacheMap to initiate caching via Cc . a file object
– setup the Shared/Private Cache Map & Mm if neccesary
• Access methods (Copy, Mdl, Mapping/Pinning)
• Maintenance Functions
• CcUninitializeCacheMap to terminate caching . a file object
– teardown S/P Cache Maps
– Mm lives .. Its data section is the cache!
The Cache Manager Doesn’t Stand Alone
• Cc is an extension of either Mm or the FS depending how you look at it
• Cc is intimately tied into the filesystem model
• Understanding Cc means we have to take a slight detour to mention some concepts filesystem folks think are interesting. Raise your hand if you’re a filesystem person :-)
<?xml:namespace prefix = v ns = "urn:schemas-microsoft-com:vml" />
The Slight Filesystem Digression
• Three basic types of IO . NT: cached, noncached and “paging”
• Paging IO is simply IO generated by Mm – flushing or faulting
– the data section implies the file is big enough
– can never extend a file
• A filesystem will re-enter itself . the same callstack as Mm dispatches cache pagefaults
• This makes things exciting! (ERESOURCEs)
The Three File Sizes
• FileSize – how big the file looks to the user
– 1 byte, 102 bytes, 1040592 bytes
• AllocationSize – how much backing store is allocated . the volume
– multiple of cluster size, which is 2n * sector size
– ... a more practical definition shortly
• ValidDataLength – how much of the file has been written by the user in cache, zeros seen beyond (some OS use sparse allocation)
• ValidDataLength <= FileSize <= AllocationSize
•
Why not use Fast IO all the time?
– file locks
– oplocks
– extending files (and so forth)
Pagefault Cluster Hints
• Taking a pagefault can result in Mm opportunistically bringing surrounding pages in (up 7/15 depending)
• Since Cc takes pagefaults . streams, but knows a lot about which pages are useful, Mm provides a hinting mechanism in the TLS
– MmSetPageFaultReadAhead()
• Not exposed to usermode …
Readahead
• CcScheduleReadAhead detects patterns . a handle and schedules readahead into the next suspected ranges
– Regular motion, backwards and forwards, with gaps
– Private Cache Map contains the per-handle info
– Called by CcCopyRead and CcMdlRead
• Readahead granularity (64KB) controls the scheduling trigger points and length
– Small IOs – don’t want readahead every 4KB
– Large IOs – ya get what ya need (up to 8MB, thanks to Jim Gray)
• CcPerformReadAhead maps and touch-faults pages in a Cc worker thread, will use the new Mm prefetch APIs in a future release
Unmap Behind
• Recall how views are managed (misses)
• On view miss, Cc will unmap two views behind the current (missed) view before mapping
• Unmapped valid pages go to the standby list in LRU order and can be soft-faulted. In practice, this is where much of the actual cache is as of Windows 2000.
• Unmap behind logic is default due to large file read/write operations causing huge swings in working set. Mm’s working set trim falls down at the speed a disk can produce pages, Cc must help.
Write Throttling
• Avoids out of memory problems by delaying writes to the cache
– Filling memory faster than writeback speed is not useful, we may as well run into it sooner
• Throttle limit is twofold
– CcDirtyPageThreshold – dynamic, but ~1500 . all current machines (small, but see above)
– MmAvailablePages & pagefile page backlog
• CcCanIWrite sees if write is ok, optionally blocking, also serving as the restart test
• CcDeferWrite sets up for callback when write should be allowed (async case)
• !defwrites debugger extension triages and shows the state of the throttle
Writing Cached Data
• There are three basic sets of threads involved, .ly .e of which is Cc’s
– Mm’s modified page writer
• the paging file
– Mm’s mapped page writer
• almost anything else
– Cc’s lazy writer pool
• executing in the kernel critical work queue
• writes data produced through Cc interfaces
The Lazy Writer
• Name is misleading, its really delayed
• All files with dirty data have been queued .to CcDirtySharedCacheMapList
• Work queueing – CcLazyWriteScan()
– Once per second, queues work to arrive at writing 1/8th of dirty data given current dirty and production rates
– Fairness considerations are interesting
• CcLazyWriterCursor rotated around the list, pointing at the next file to operate . (fairness)
– 16th pass rule for user and metadata streams
• Work issuing – CcWriteBehind()
– Uses a special mode of CcFlushCache() which flushes front to back (HotSpots – fairness again)
Letting the Filesystem Into The Cache
• Two distinct access interfaces
– Map – given File+FileOffset, return a cache address
– Pin – same, but acquires synchronization – this is a range lock . the stream
• Lazy writer acquires synchronization, allowing it to serialize metadata production with metadata writing
• Pinning also allows setting of a log sequence number (LSN) . the update, for transactional FS
– FS receives an LSN callback from the lazy writer prior to range flush
Remember FsContext2?
• Synchronization . Pin interfaces requires that Cc be the writer of the data
• Mm provides a method to turn off the mapped page writer for a stream, MmDisableModifiedWriteOfSection()
– confusing name, I know (modified writer is not involved)
• Serves as the trigger for Cc to perform synchronization . write
BCBs and Lies Thereof
• Mapping and Pinning interfaces return opaque Buffer Control Block (BCB) pointers
• Unpin receives BCBs to indicate regions
• BCBs for Map interfaces are usually VACB pointers
• BCBs for Pin interfaces are pointers to a real BCB structure in Cc, which references a VACB for the cache address
Cache Manager Summary
Virtual block cache for files not logical block cache for disks
Memory manager is the ACTUAL cache manager
Cache Manager context integrated into FileObjects
Cache Manager manages views . files in kernel virtual address space
I/O has special fast path for cached accesses
The Lazy Writer periodically flushes dirty data to disk
Filesystems need two interfaces to CC: map and pin
转载于:https://blog.51cto.com/laokaddk/126770