how to write super fast file streaming code in C#?

From: http://stackoverflow.com/questions/955911/how-to-write-super-fast-file-streaming-code-in-c?answertab=votes#tab-top

I have to split huge file into many smaller files. each of the destination file is defined by offset and length as number of bytes. I'm using the following code:

private void copy(string srcFile, string dstFile, int offset, int length)
{
    BinaryReader reader = new BinaryReader(File.OpenRead(srcFile));
    reader.BaseStream.Seek(offset, SeekOrigin.Begin);
    byte[] buffer = reader.ReadBytes(length);

    BinaryWriter writer = new BinaryWriter(File.OpenWrite(dstFile));
    writer.Write(buffer);
}

Considering that I have to call this function about 100,000 times, it is remarkably slow.

  1. Is there a way to make the Writer connected directly to the Reader? (i.e. without actually loading the contents into the Buffer in memory)
share | improve this question

 
File.OpenRead and File.OpenWrite 100,000 will be slow alright... –  siz  Jun 5 '09 at 14:13
 
Are you splitting the file perfectly, i.e. could you rebuild the large file by just joining all the small files together? If so there are savings to be had there. If not, do the ranges of the small files overlap? Are they sorted in order of offset? –  jamie  Jan 29 '11 at 5:39
Was this post useful to you?      

9 Answers

up vote 24 down vote accepted

I don't believe there's anything within .NET to allow copying a section of a file without buffering it in memory. However, it strikes me that this is inefficient anyway, as it needs to open the input file and seek many times. If you're just splitting up the file, why not open the input file once, and then just write something like:

public static void CopySection(Stream input, string targetFile, int length)
{
    byte[] buffer = new byte[8192];

    using (Stream output = File.OpenWrite(targetFile))
    {
        int bytesRead = 1;
        // This will finish silently if we couldn't read "length" bytes.
        // An alternative would be to throw an exception
        while (length > 0 && bytesRead > 0)
        {
            bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
            output.Write(buffer, 0, bytesRead);
            length -= bytesRead;
        }
    }
}

This has a minor inefficiency in creating a buffer on each invocation - you might want to create the buffer once and pass that into the method as well:

public static void CopySection(Stream input, string targetFile,
                               int length, byte[] buffer)
{
    using (Stream output = File.OpenWrite(targetFile))
    {
        int bytesRead = 1;
        // This will finish silently if we couldn't read "length" bytes.
        // An alternative would be to throw an exception
        while (length > 0 && bytesRead > 0)
        {
            bytesRead = input.Read(buffer, 0, Math.Min(length, buffer.Length));
            output.Write(buffer, 0, bytesRead);
            length -= bytesRead;
        }
    }
}

Note that this also closes the output stream (due to the using statement) which your original code didn't.

The important point is that this will use the operating system file buffering more efficiently, because you reuse the same input stream, instead of reopening the file at the beginning and then seeking.

think it'll be significantly faster, but obviously you'll need to try it to see...

This assumes contiguous chunks, of course. If you need to skip bits of the file, you can do that from outside the method. Also, if you're writing very small files, you may want to optimise for that situation too - the easiest way to do that would probably be to introduce a BufferedStream wrapping the input stream.

share | improve this answer
 
I know this is a two year old post, just wondered... is this still the fastest way? (i.e. Nothing new in .Net to be aware of?). Also, would it be faster to perform the Math.Min prior to entering the loop? Or better yet, to remove the length parameter as it can be calculated by means of the buffer? Sorry to be picky and necro this! Thanks in advance. –  Smudge202  Apr 19 at 9:13
 
@Smudge202: Given that this is performing IO, the call to Math.Min is certainly not going to be relevant in terms of performance. The point of having both the length parameter and the buffer length is to allow you to reuse a possibly-oversized buffer. –  Jon Skeet  Apr 19 at 9:20
 
Gotcha, and thanks for getting back to me. I'd hate to start a new question when there is likely a good enough answer right here, but would you say, that if you wanted to read the first x bytes of a large number of files (for the purpose of grabbing the XMP metadata from a large number of files), the above approach (with some tweaking) would still be recommended? –  Smudge202  Apr 19 at 9:33
 
@Smudge202: Well the code above is for copying. If you only want to read the first x bytes, I'd still loop round, but just read into a right-sized buffer, incrementing the index at which the read will write into the buffer appropriately on each iteration. –  Jon Skeet  Apr 19 at 9:34
 
Yup, I'm less interested in the writing part, I just wanted to confirm that the fastest way to read one file is also the fastest way to read many files. I imagined being able to P/Invoke for file pointers/offsets and from there being able to scan across multiple files with the same/less streams/buffers, which in my imaginary world of make believe, would possibly be even faster for what I want to achieve (though not applicable to the OP). If I'm not barking mad, probably best I start a new question. If I am, could you let me know so I don't waste even more peoples' time? :-) –  Smudge202  Apr 19 at 9:44
 
@Smudge202: Do you actually have a performance problem right now? Have you written the simplest code that works and found that it's too slow? Bear in mind that a lot can depend on context - reading in parallel may help if you're using solid state, but not on a normal hard disk, for example. –  Jon Skeet  Apr 19 at 10:25
 
SO is advising I take this to a chat, I think I'll actually start a new question. Thank you thus far! –  Smudge202 Apr 19 at 10:34

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值