kdgregory.com | |
Blog Food Programming Travel | Byte Buffers and Non-Heap Memory Most Java programs spend their time working with objects on the JVM heap, using getter and setter methods to retrieve or change the data in those objects. A few programs, however, need to do something different. Perhaps they're exchanging data with a program written in C. Or they need to manage large chunks of data without the risk of garbage collection pauses. Or maybe they need efficient random access to files. For all these programs, a Prologue: The Organization of ObjectsLet's start by comparing two class definitions. The one on the left is C++, the one on the right is Java. They both declare the same member variables, with (mostly) the same types, but there's an important difference: the C++ class describes the layout of a block of memory, the Java class doesn't. class TCP_Header { unsigned short sourcePort; unsigned short destPort; unsigned int seqNum; unsigned int ackNum; unsigned short flags; unsigned short windowSize; unsigned short checksum; unsigned short urgentPtr; char data[1]; }; public class TcpHeader { private short sourcePort; private short destPort; private int seqNum; private int ackNum; private short flags; private short windowSize; private short checksum; private short urgentPtr; private byte[] data; } C++, like C before it, is a systems programming language. That means that it will be used to directly access objects like network protocol buffers, which are defined in terms of byte offsets from a base address. One way to do this is with pointer arithmetic and casts, but that's error-prone. Instead, C (and C++) allows you to use a structure or class definition as a “view” on arbitrary memory. You can take any pointer, cast it as a pointer-to-structure, and then access that memory using code like In a real C++ program, of course, you'd define such structures using the fixed-width types from However, Java comes with its own data access caveats. Foremost among them is that the in-memory layout of instance data is explicitly not defined. Code like By giving the JVM the flexibility to arrange its objects' fields as it sees fit, different implementations can make the most efficient use of their hardware. For example, a machine that allows access to memory in 32-bit increments, combined with an object that has several ByteBufferThe fact that Java objects may be laid out differently than defined is irrelevant to most programmers. Since a Java class cannot be used as a view on arbitrary memory, you'll never notice if the JVM has decided to shuffle its members. However, there are situations where it would be nice to have this ability; there is a lot of structured binary data in the the real world. Prior to JDK 1.4, Java programmers had limited options: they could read data into a The byte[] data = new byte[16]; ByteBuffer buf = ByteBuffer.wrap(data); buf.putShort(0, (short)0x1234); buf.putInt(2, 0x12345678); buf.putLong(8, 0x1122334455667788L); for (int ii = 0 ; ii < data.length ; ii++) System.console().printf("index %2d = %02x\n", ii, data[ii]); When working with a System.console().printf( "retrieving value from wrong index = %04x\n", buf.getInt(0)); The best way to ensure that you're always using the correct indices is to create a class that wraps the actual buffer and provides bean-style setters and getters. Taking the TCP header as an example: public class TcpHeaderWrapper { ByteBuffer buf; public TcpHeaderWrapper(byte[] data) { buf = ByteBuffer.wrap(data); } public short getSourcePort() { return buf.getShort(0); } public void setSourcePort(short value) { buf.putShort(0, value); } public short getDestPort() { return buf.getShort(2); } // and so on Slicing a ByteBufferContinuing with the TCP example, let's consider the actual content of the TCP packet, an arbitrary length array of bytes that follows the fixed header fields. The C++ version defines There are two ways to extract an arbitrary array of bytes from a public byte[] getData() { buf.position(getDataOffset()); int size = buf.remaining(); byte[] data = new byte[size]; buf.get(data); return data; } This is usually the wrong approach, because it copies the data from the buffer into your array. Most of the time, you'll want to access the data via a public ByteBuffer getDataAsBuffer() { buf.position(getDataOffset()); return buf.slice(); } If you looked closely, you may have noticed that the last two code snippets both called Something else to remember: when you call Beware EndiannessIn Gulliver's Travels, the two societies of Lilliputians break their eggs from different ends, and that minor difference has led to eternal strife. Computer architectures suffer from a similar strife, based on the way that multi-byte values (eg, 32-bit integers) are stored in memory. “Little-endian” machines, such as the PDP-11, 8080, and 80x86 store low-order bytes first in memory: the integer value Java manages data in Big-Endian form. However, most Java programs run on Intel processors, which are Little-Endian. This can cause a lot of problems if you're trying to exchange data between a Java program and a C or C++ program running on the same machine. For example, here's a C program that writes a 4-byte signed integer to a file: #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> int main(int argc, char** argv) { int fd = creat("/tmp/example.dat", 0777); if (fd < 0) { perror("unable to create file"); return(1); } int value = 0x12345678; write(fd, &value, sizeof(value)); close(fd); return (0); } On a Linux system, you can use the od command to dump the file's content: ~, 524> od -tx1 /tmp/example.dat 0000000 78 56 34 12 0000004 When you write a naive Java program to retrieve that data, you see the same thing. byte[] data = new byte[4]; FileInputStream in = new FileInputStream("/tmp/example.dat"); if (in.read(data) < 4) throw new Exception("unable to read file contents"); ByteBuffer buf = ByteBuffer.wrap(data); System.console().printf("data = %x\n", buf.getInt(0)); If you want to see the correct data, you must explicitly tell the buffer that it's Little-Endian: buf.order(ByteOrder.LITTLE_ENDIAN); System.console().printf("data = %x\n", buf.getInt(0)); Here's the problem with that code: how do you know that the data is Little-Endian? One common solution is to start files with a “magic number” that indicates the byte order. For example, UTF-16 files begin with the value An alternative is to specify the ordering, and require writers to follow that specification. For example, the set of protocols collectively known as TCP/IP all require Big-Endian ordering, while the GIF graphics file format is Little-Endian. Interlude: A Short Tour of Virtual Memory
A program running on a modern operating system thinks that it has a large, contiguous allotment of memory: 2 gigabytes in the case of 32-bit editions of Windows and Linux, 8 terabytes or more for x64 editions (limited both by the operating system and the hardware itself). Behind the scenes, the operating system maintains a “page table” that identifies where in physical memory (or disk) the data for a given virtual address resides. I've written elsewhere about how the JVM uses virtual memory: it assigns space for the Java heap, per-thread stacks, shared native libraries including the JVM itself, and memory-mapped files (primarily JAR files). On Linux, the program pmap will show you the virtual address space of a running process, divided into segments of different sizes, with different access permissions. In thinking about virtual memory, there are two concepts that every programmer should understand: resident set size and commit charge. The second is easiest to explain: it's the total amount of memory that your program might be able to modify (ie, it excludes memory-mapped files and read-only program code). The potential commit charge for an entire system is the sum of RAM and swap space, and no program can exceed this. It doesn't matter how big your virtual address space is: if you have 2G of RAM, and 2G of swap, you can never work with more than 4G of in-memory data; there's no place to store it. In practice, no one program can reach that maximum commit charge either, because there are always other programs running, and they have their own claims upon memory. If you try to allocate memory that would exceed the available commit charge, you will get an The second concept, resident set size (RSS), refers to how many of your program's virtual pages are currently residing in RAM. If a page isn't in RAM, then it needs to be read from disk — faulted into RAM — before your program can access it. The important thing to know about RSS is that you have very little control over it. The operating system tries to minimize the number of system-wide page faults, typically by managing RSS on the basis of time and access frequency: pages that are infrequently accessed get swapped out, making room for pages that are actively accessed. RSS is one reason that “full” garbage collections can take a long time: the GC has to walk the list of live objects, which will involve touching every page in the heap and faulting-in those that haven't been accessed recently. One final concept: pages in the resident set can be “dirty,” meaning that the program has changed their content. A dirty page must be written to swap space before its physical memory can be used by another page. By comparison, a clean (unmodified) page may simply be discarded; it will be reloaded from disk when needed. If you can guarantee that a page will never be modified, it doesn't count against a program's commit charge — we'll return to this topic when discussing memory-mapped files. Direct ByteBuffersThere are three ways to create a Knowing this, you might think that a direct buffer is a great way to extend the memory that your program can use. It isn't. The JVM is very good about growing the heap to the limits of physical and virtual memory, so if you've already maxed out your heap, there won't be any place to put a direct buffer. In fact, the only reason that I can see for using direct buffers in a pure Java program is that they won't be moved during garbage collection. If you've read my article on reference objects, you'll remember that the garbage collector compacts the heap after disposing of dead objects. If you have large blocks of heap memory allocated as buffers, they may get moved as part of compaction, and no matter how fast your CPU, that takes time; it's not something you want to do on every full collection. Since the direct buffer lives outside of the heap, it isn't affected by collections. On the other hand, every data access is a JNI call. Only benchmarking will tell you whether this helps or hurts your particular application. Direct buffers are useful in a program that mixes Java and native libraries: JNI provides methods to access the physical memory behind a direct buffer, and to allocate new buffers at known locations. Since this technique has a limited audience, it's outside of the scope of this article. If you're interested, I link to an example program at the end. Mapped FilesWhile I don't see much reason to use direct buffers in a pure Java program, they're the foundation for mapping files into the virtual address space — a feature that is rarely used, but invaluable when you need it. Mapping a file gives you random access with — depending on your access patterns — a significant performance boost. To understand why, we'll need to take a short detour into the way that Java file I/O works. The first thing to understand is that the Java file classes are simply wrappers around native file operations. When you call The key point here is that “immediately” does not mean “quickly”: you're invoking the operating system kernel to do the read, which means that the computer has to perform a “context switch” from application mode to kernel mode. To make this switch, it will save the CPU registers and page table for your application, and load the registers and page table for the kernel; when the kernel call is done, the reverse happens. This is a matter of a few microseconds, but those add up if you're constantly accessing a file. At worst, the OS schedule will decide that your program has had the CPU for long enough, and suspend it while another program runs. With a memory-mapped file, by comparison, there's no need to invoke the OS unless the data isn't already in memory. And since the amount of RAM devoted to programs is larger than that devoted to disk buffers, the data is far more likely to be in memory. Of course, whether or not your data is in memory depends on many things. Foremost is whether you're accessing the data sequentially: there's no point to replacing a The second important question is how big your file is, and how randomly you access it. If you have a multi-gigabyte file and bounce from one spot to another, then you'll be constantly waiting for pages to be read from disk. But most programs don't access their data in a truly random manner. Typically there's one group of blocks that are hit far more frequently than others, and these will remain in RAM. For example, a database server reads the root node of an index on almost every query, while individual data blocks are accessed far less frequently. Even if you don't gain a speed benefit from memory-mapping your files, you may gain a maintenance benefit by accessing them via a bean-style wrapper class. This will also improve testability, as you can construct buffers around known test data, without any files involved. Creating the MappingCreating a mapped file is a multi-step process, starting with a File file = new File("/tmp/example.dat"); FileChannel channel = new RandomAccessFile(file, "r").getChannel(); ByteBuffer buf = channel.map(MapMode.READ_ONLY, 0L, file.length()); buf.order(ByteOrder.LITTLE_ENDIAN); System.console().printf("data = %x ", buf.getInt(0)); Although I assign the return value from The The second method, Read-Only versus Read-Write MappingsYou'll note that I created the Read-write files require some more thought. The first thing to consider is just how important your writes are. As I noted above, the memory manager doesn't want to constantly write dirty pages to disk. Which means that your changes may remain in memory, unwritten, for a very long time — which will become a problem if the power goes out. To flush dirty pages to disk, call the buffer's buf.putInt(0, 0x87654321); buf.force(); Those two lines of code are actually an anti-pattern: you don't want to flush dirty pages after every write, or you'll make your program IO-bound. Instead, take a lesson from database developers, and group your changes into atomic units (or better, if you're planning on a lot of updates, use a real database). Mapping Files Bigger than 2 GBDepending on your filesystem, you can create files larger than 2GB. But if you look at the One solution is to create those buffers as needed. The same underlying A better approach, in my opinion, is to create a “super buffer” that maps the entire file and presents an API that uses public int getInt(long index) { return buffer(index).getInt(); } private ByteBuffer buffer(long index) { ByteBuffer buf = _buffers[(int)(index / _segmentSize)]; buf.position((int)(index % _segmentSize)); return buf; } That's straightforward, but what's a good value for Instead, you should overlap buffers, with the size of the overlap being the maximum sub-buffer (or public MappedFileBuffer(File file, int segmentSize, boolean readWrite) throws IOException { if (segmentSize > MAX_SEGMENT_SIZE) throw new IllegalArgumentException( "segment size too large (max " + MAX_SEGMENT_SIZE + "): " + segmentSize); _segmentSize = segmentSize; _fileSize = file.length(); RandomAccessFile mappedFile = null; try { String mode = readWrite ? "rw" : "r"; MapMode mapMode = readWrite ? MapMode.READ_WRITE : MapMode.READ_ONLY; mappedFile = new RandomAccessFile(file, mode); FileChannel channel = mappedFile.getChannel(); _buffers = new MappedByteBuffer[(int)(_fileSize / segmentSize) + 1]; int bufIdx = 0; for (long offset = 0 ; offset < _fileSize ; offset += segmentSize) { long remainingFileSize = _fileSize - offset; long thisSegmentSize = Math.min(2L * segmentSize, remainingFileSize); _buffers[bufIdx++] = channel.map(mapMode, offset, thisSegmentSize); } } finally { // close quietly if (mappedFile != null) { try { mappedFile.close(); } catch (IOException ignored) { /* */ } } } } There are two things to notice here. The first notice is my use of The second — and perhaps more important — thing is that I I close the Garbage Collection of Direct/Mapped BuffersThat brings up another topic: how does the non-heap memory for direct buffers and mapped files get released? After all, there's no method to explicitly close or release them. The answer is that they get garbage collected like any other object, but with one twist: if you don't have enough virtual memory space or commit charge to allocate a direct buffer, that will trigger a full collection even if there's plenty of heap memory available. Normally, this won't be an issue: you probably won't be allocating and releasing direct buffers more often than heap-resident objects. If, however, you see full GC's appearing when you don't think they should, take a look at your program's use of buffers. Along the same lines, when you're using direct buffers and mapped files, you'll get to see some of the more esoteric variants of Enabling Large Direct BuffersYou may be surprised, the first time that you try to allocate direct buffers on a 64-bit machine, that you get
-d64
-XX:MaxDirectMemorySize
To summarize, if you're running a program that needs to allocate 12 GB of direct buffers, you'd use a command-line like this: java -XX:MaxDirectMemorySize=12g com.example.MyApp If you're working with large buffers (direct buffers or memory mapped files), you should also use the java -d64 -XX:MaxDirectMemorySize=12g -XX:+UseLargePages com.example.MyApp By default, the memory manager maps physical memory to the virtual address space in small chunks (4k is typical). This means that page faults can be handled more efficiently, because there's less data to read or write. However, small pages mean that memory management hardware has to keep track of more information to translate virtual addresses to physical. At best, this means less efficient usage of the TLB, which makes every memory access slower. At worst, you'll run out of entries in the page table (which is reported as Thread Safety
There are two methods that let you create a new buffer from an existing one: The JavaDoc for these methods states that “[c]hanges to this buffer's content will be visible in the new buffer, and vice versa.” However, I don't think this takes the Java memory model into account. To be safe, consider buffers with shared backing store equivalent to an object shared between threads: it's possible that concurrent accesses will see different values. Of course, this only matters when you're writing to the buffer; for read-only buffers, simply having a unique buffer per thread is sufficient. That said, you still have the issue of creating buffers: you need to synchronize access to the public class ByteBufferThreadLocal extends ThreadLocal<ByteBuffer> { private ByteBuffer _src; public ByteBufferThreadLocal(ByteBuffer src) { _src = src; } @Override protected synchronized ByteBuffer initialValue() { return _src.duplicate(); } } In this example, the original buffer is never accessed by application code. Instead, it serves as a master for producing copies in a synchronized method, and those copies are used by the application. Once a thread finishes, the garbage collector will dispose of the buffer(s) that it used, leaving the master untouched. For More InformationThere are several example programs that go with this article:
I've also written some utility classes for working with buffers. They are all licensed for open-source consumption under the Apache 2.0 license, and are available on SourceForge (at present this library is not available from Maven Central).
I gave a presentation on ByteBuffers to the Philadelphia Java Users Group in November 2010, which focused on using implementing an off-heap cache similar to EHCache BigMemory. You'll probably find it a bit sparse; I tend to use slides only as a starting point for an extended monologue. However, it does contain a nearly-complete off-heap cache implementation in a couple dozen lines of code. Wikipedia has some nice articles on virtual memory and paging. However, I recommend ignoring the article on virtual address space; it is simultaneously too detailed and not detailed enough. There's also the article on the translation lookaside buffer that I linked earlier. To enable large pages, you might need to change your OS configuration. The specific instructions are quite likely to change over time, so I won't repeat them here. I've found a couple of blog postings that are useful and authoritative: one from Sun, and one from Andrig Miller of JBoss. If you try using large pages and get an error message, take a look at these blogs and/or Google the message text. Copyright © Keith D Gregory, all rights reserved |
Byte Buffers and Non-Heap Memory
最新推荐文章于 2024-07-15 22:27:12 发布