How to split large files efficiently

From: http://stackoverflow.com/questions/3967541/how-to-split-large-files-efficiently

I'd like to know how I can split a large file without using too many system resources. I'm currently using this code:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    byte[] buffer = new byte[chunkSize];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "\\" + index))
            {
                int chunkBytesRead = 0;
                while (chunkBytesRead < chunkSize)
                {
                    int bytesRead = input.Read(buffer, 
                                               chunkBytesRead, 
                                               chunkSize - chunkBytesRead);

                    if (bytesRead == 0)
                    {
                        break;
                    }
                    chunkBytesRead += bytesRead;
                }
                output.Write(buffer, 0, chunkBytesRead);
            }
            index++;
        }
    }
}

The operation takes 52.3707742 seconds to split a 1.6GB file into 14mb files. I'm not concerned about how long the operation takes. I'm more concerned about the system resource used as this app will be deployed to a shared hosting environment. Currently this operation max's out my systems HDD IO usage at 100%. And slows my system down considerably. CPU usage is low; RAM ramps up a bit, but seems fine.

Is there a way I can restrict this operation from using too many resources?

Thanks

share | improve this question

 
Could you not run it on a separate lower priority Thread? –  w69rdy  Oct 19 '10 at 11:08
 
@w69rdy - note "CPU usage is low" - CPU is not the bottleneck here. –  Marc Gravell  Oct 19 '10 at 11:14
 
@Marc Ok fair point –  w69rdy  Oct 19 '10 at 12:14
feedback

3 Answers

up vote 1 down vote accepted

It seems odd to assemble each output file in memory; I suspect you should be running an inner buffer (maybe 20k or something) and calling Write more frequently.

Ultimately, if you need IO, you need IO. If you want to be courteous to a shared hosting environment you could add deliberate pauses - maybe short pauses within the inner loop, and a longer pause (maybe 1s) in the outer loop. This won't affect your overall timing much, but may help other processes get some IO.

Example of a buffer for the inner-loop:

public static void SplitFile(string inputFile, int chunkSize, string path)
{
    const int BUFFER_SIZE = 20 * 1024;
    byte[] buffer = new byte[BUFFER_SIZE];

    using (Stream input = File.OpenRead(inputFile))
    {
        int index = 0;
        while (input.Position < input.Length)
        {
            using (Stream output = File.Create(path + "\\" + index))
            {
                int remaining = chunkSize, bytesRead;
                while (remaining > 0 && (bytesRead = input.Read(buffer, 0,
                        Math.Min(remaining, BUFFER_SIZE))) > 0)
                {
                    output.Write(buffer, 0, bytesRead);
                    remaining -= bytesRead;
                }
            }
            index++;
            Thread.Sleep(500); // experimental; perhaps try it
        }
    }
}
share | improve this answer
 
Thanks. I'll give this a try... –  Bruce Adams  Oct 19 '10 at 11:18
 
Your code works great! Thanks. I'll play around with the thread sleep time to see what works best. –  Bruce Adams  Oct 19 '10 at 11:33
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值