What is an efficient way for a Java multithreaded application where many threads have to read the exact same file (> 1GB in size) and expose it as an input stream? I've noticed that if there are many threads (> 32), the system starts to contend over I/O and has a lot of I/O waits.
I've considered loading the file into a byte array that's shared by all the threads - each thread would create a ByteArrayInputStream, but allocating a 1GB byte array just won't work well.
I've also considered using a single FileChannel and each thread creating an InputStream on top of it using Channels.newInputStream(), however it seems that it's the FileChannel that's maintaining the state for the InputStream.
解决方案
It seems to me that you're going to have to load the file into memory if you want to avoid IO contention. The operating system will do some buffering, but if you're finding that's not enough, you're going to have to do it yourself.
Do you really need 32 threads though? Presumably you don't have nearly that many cores - so use fewer threads and you'll get less context switching etc.
Do your threads all process the file from start to finish? If so, could you effectively split the file into chunks? Read the first (say) 10MB of data into memory, let all the threads process it, then move on to the next 10MB etc.
If that doesn't work for you, how much memory do you have compared with the size of the file? If you have plenty of memory but you don't want to allocate one huge array, you could read the whole file into memory, but into lots of separate smaller byte arrays. You'd then have to write an input stream which spans all of those byte arrays, but that should be doable.