关于FileObject
 
一个文件可以多个FO,每个FO代表一个打开实例.
对于代表同一个Stream来说,它们的
Section Object Pointers值是一样的.
FsContext值也是一样的.
 
 
FsContext对同一个文件的所有FO都一样.
FsContext2  每个用户句柄上下文都有,metada没有用户句柄上下文.
 
SectionObjectPointers  单个实例的指针.
    DataSection  若已创建一个Mapped Section则此值非空.
    SharedCacheMap  若此Stream已被缓冲管理器Set up则此值非空.
    ImageSection      对可Executables文件才有此项
 
PrivateCacheMap ---per handle Cc context
                                    (readahead) that alse servers as erference form this file object to the shared cache Map.
   
单个实例和Metadata
文件系统用一个Stream代表Metadata,但它们对用户是不可见的.
Directoryies require a level of  indirection to escape single instancing exposing the data.
 
Filesystems create a second internal "stream" fileobject
-- user's fileobject has NULL members in its Section Object Pointers
--stream fileobjects have no FsContext2(user handle context)
 
All metadata streams are built lik this (MFTs,FATs,ects.)
FsContext2 == NULL play an important role in how Cc treats these streams,
which we'll discuss later.
 
 
View Management
 
 
A Shared Cache Map has an array of View Access Control Block (VACB) pointers which record the base cache address of each view
promoted to a sparse form for files > 32MB
 
Access interfaces map File+FileOffset to a cache address.
 .Taking a view miss results in
a new mapping ,possibly unmapping  an unreferenced
view in another file(views are recycled LRU).
 
 
Since a view is fixed size, mapping across a view is impossible – Cc returns .e address
 
 
 
Fixed size means no fragmentation …
 
 
 
Interface Summary<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

 

 

    File objects start out unadorned

    CcInitializeCacheMap to initiate caching via Cc . a file object

  setup the Shared/Private Cache Map & Mm if neccesary

    Access methods (Copy, Mdl, Mapping/Pinning)

    Maintenance Functions

    CcUninitializeCacheMap to terminate caching . a file object

  teardown S/P Cache Maps

  Mm lives .. Its data section is the cache!

 

The Cache Manager Doesn’t Stand Alone

 

 

    Cc is an extension of either Mm or the FS depending how you look at it

    Cc is intimately tied into the filesystem model

    Understanding Cc means we have to take a slight detour to mention some concepts filesystem folks think are interesting. Raise your hand if you’re a filesystem person :-)

<?xml:namespace prefix = v ns = "urn:schemas-microsoft-com:vml" />

 

 

The Slight Filesystem Digression

    Three basic types of IO . NT: cached, noncached and “paging”

    Paging IO is simply IO generated by Mm – flushing or faulting

  the data section implies the file is big enough

  can never extend a file

    A filesystem will re-enter itself . the same callstack as Mm dispatches cache pagefaults

    This makes things exciting! (ERESOURCEs)

 

 

The Three File Sizes

 

    FileSize – how big the file looks to the user

  1 byte, 102 bytes, 1040592 bytes

    AllocationSize – how much backing store is allocated . the volume

  multiple of cluster size, which is 2n * sector size

  ... a more practical definition shortly

    ValidDataLength – how much of the file has been written by the user in cache, zeros seen beyond (some OS use sparse allocation)

    ValidDataLength <= FileSize <= AllocationSize

 

 

    Why not use Fast IO all the time?

  file locks

  oplocks

  extending files (and so forth)

 

 

Pagefault Cluster Hints

    Taking a pagefault can result in Mm opportunistically bringing surrounding pages in (up 7/15 depending)

    Since Cc takes pagefaults . streams, but knows a lot about which pages are useful, Mm provides a hinting mechanism in the TLS

  MmSetPageFaultReadAhead()

    Not exposed to usermode …

 

 

 

Readahead

 

    CcScheduleReadAhead detects patterns . a handle and schedules readahead into the next suspected ranges

  Regular motion, backwards and forwards, with gaps

  Private Cache Map contains the per-handle info

  Called by CcCopyRead and CcMdlRead

    Readahead granularity (64KB) controls the scheduling trigger points and length

  Small IOs – don’t want readahead every 4KB

  Large IOs – ya get what ya need (up to 8MB, thanks to Jim Gray)

    CcPerformReadAhead maps and touch-faults pages in a Cc worker thread, will use the new Mm prefetch APIs in a future release

 

 

Unmap Behind

 

    Recall how views are managed (misses)

    On view miss, Cc will unmap two views behind the current (missed) view before mapping

    Unmapped valid pages go to the standby list in LRU order and can be soft-faulted. In practice, this is where much of the actual cache is as of Windows 2000.

    Unmap behind logic is default due to large file read/write operations causing huge swings in working set. Mm’s working set trim falls down at the speed a disk can produce pages, Cc must help.

 

 

Write Throttling

    Avoids out of memory problems by delaying writes to the cache

  Filling memory faster than writeback speed is not useful, we may as well run into it sooner

    Throttle limit is twofold

  CcDirtyPageThreshold – dynamic, but ~1500 . all current machines (small, but see above)

  MmAvailablePages & pagefile page backlog

    CcCanIWrite sees if write is ok, optionally blocking, also serving as the restart test

    CcDeferWrite sets up for callback when write should be allowed (async case)

    !defwrites debugger extension triages and shows the state of the throttle

 

 

Writing Cached Data

 

    There are three basic sets of threads involved, .ly .e of which is Cc’s

  Mm’s modified page writer

    the paging file

  Mm’s mapped page writer

    almost anything else

  Cc’s lazy writer pool

    executing in the kernel critical work queue

    writes data produced through Cc interfaces

 

 

 

The Lazy Writer

    Name is misleading, its really delayed

    All files with dirty data have been queued .to CcDirtySharedCacheMapList

    Work queueing – CcLazyWriteScan()

  Once per second, queues work to arrive at writing 1/8th of dirty data given current dirty and production rates

  Fairness considerations are interesting

    CcLazyWriterCursor rotated around the list, pointing at the next file to operate . (fairness)

  16th pass rule for user and metadata streams

    Work issuing – CcWriteBehind()

  Uses a special mode of CcFlushCache() which flushes front to back (HotSpots – fairness again)

 

 

 

 

Letting the Filesystem Into The Cache

 

    Two distinct access interfaces

  Map – given File+FileOffset, return a cache address

  Pin – same, but acquires synchronization – this is a range lock . the stream

    Lazy writer acquires synchronization, allowing it to serialize metadata production with metadata writing

    Pinning also allows setting of a log sequence number (LSN) . the update, for transactional FS

  FS receives an LSN callback from the lazy writer prior to range flush

 

 

Remember FsContext2?

 

    Synchronization . Pin interfaces requires that Cc be the writer of the data

    Mm provides a method to turn off the mapped page writer for a stream, MmDisableModifiedWriteOfSection()

  confusing name, I know (modified writer is not involved)

    Serves as the trigger for Cc to perform synchronization . write

 

 

BCBs and Lies Thereof

 

    Mapping and Pinning interfaces return opaque Buffer Control Block (BCB) pointers

    Unpin receives BCBs to indicate regions

    BCBs for Map interfaces are usually VACB pointers

    BCBs for Pin interfaces are pointers to a real BCB structure in Cc, which references a VACB for the cache address

 

 

Cache Manager Summary

Virtual block cache for files not logical block cache for disks

Memory manager is the ACTUAL cache manager

Cache Manager context integrated into FileObjects

Cache Manager manages views . files in kernel virtual address space

I/O has special fast path for cached accesses

The Lazy Writer periodically flushes dirty data to disk

Filesystems need two interfaces to CC: map and pin