快速读写excel

source==



Faster MS Excel Reading using Office Interop Assemblies

Summary:

OleDB provides fast read access to Excel data, but it didn't meet my specific needs, which included accessing only certain columns and data validation. While this article will not get into these specifics, it does explain the concepts used to read Excel data in a fairly quick manner. My first pass accessed each cell one by one, which is slow and used a lot of CPU. So in looking for a better way to accomplish my goals with reasonable CPU and speed, I experimented with the Office 2003 Interop Assemblies. I found, in my opinion, a decent way to accomplish the needed speed. CPU usage can still be high, but at an acceptable trade off for my needs. My attempts at finding an article to address this situation came up short, therefore, I am writing one.

The Implementation:

Setup

We first need to setup a project with references to the Interop Assemblies. Lars-Inge Tnnessen has written a good article about this here: An introduction on how to control Excel 2003 with J#.NET. It's for J# but should translate to C# without too much effort.

Once referenced, you can add the following using statement:

    using Microsoft.Office.Interop.Excel;

I've created a console application and just kept all the code within the main method. I've done this to make it a bit easier to follow. Next, we need to setup the objects that we'll be working with.

    ApplicationClass app = new ApplicationClass(); // the Excel application.
// the reference to the workbook,
// which is the xls document to read from.
Workbook book = null;
// the reference to the worksheet,
// we'll assume the first sheet in the book.
Worksheet sheet = null;
Range range = null;
// the range object is used to hold the data
// we'll be reading from and to find the range of data.

The following options will help speed up the Excel application. They can also be set to true which will help us see what's going on with the document while debugging.

    app.Visible = false;
app.ScreenUpdating = false;
app.DisplayAlerts = false;

Now that the application setup is out of the way, we can open an XLS document and get the first worksheet in the workbook. Excel seems to prefer a full path to the document. As such, I get the current executing directory and move up two directories to the XLS document.

    string execPath = 
Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);

book = app.Workbooks.Open(execPath + @"/../../Book1.xls",
Missing.Value, Missing.Value, Missing.Value,
Missing.Value, Missing.Value, Missing.Value, Missing.Value,
Missing.Value, Missing.Value, Missing.Value, Missing.Value,
Missing.Value, Missing.Value, Missing.Value);
sheet = (Worksheet)book.Worksheets[1];

Finding the range of data

Next, we need to get an initial range to work with. We'll start with A1, you can start with the row your data starts on, to exclude the header information.

    range = sheet.get_Range("A1", Missing.Value);

Now that we have a range to work with, we can use the get_End method of the Range object and the XlDirection enumeration to specify which direction to find the end. We'll go to the right first and down second. The get_End stops at the first empty cell. And works on the first row or column in the range. So based on our initial range selection of A1, it will look for the first empty cell in row 1 moving to the right from column A.

    range = range.get_End(XlDirection.xlToRight);

In this example, it will find cell F1 is empty, and return E1 as the end range. We'll use this range, meaning cell E1, to find the end of the data moving down.

    range = range.get_End(XlDirection.xlDown);

This will get us to cell E20. Using this method, we get the bottom right cell with data. Now we can obtain the full range of data using the starting cell, A1, and the ending cell, E20. In order to get the address of the cell from the Range, we use the get_Address method of the Range object. The XlReferenceStyle specifies the format of the address returned. We want xlA1 because the get_Range method expects that format. The following returns a string containing E20:

    string downAddress = range.get_Address(
false, false, XlReferenceStyle.xlA1,
Type.Missing, Type.Missing);

We'll use the get_Range method to get a range from A1 to E20.

    range = sheet.get_Range("A1", downAddress);

We now have a reference to the data.

Reading the data

Range objects will return their data in a two dimensional array of objects with the Value2 property. Dimension one represents the rows, while dimension two represents the columns. This is much faster than reading the data cell by cell.

    object[,] values = (object[,])range.Value2;

Console.WriteLine("Row Count: " + values.GetLength(0).ToString());
Console.WriteLine("Col Count: " + values.GetLength(1).ToString());

With the values object array, all we need to do is loop through to get the data. We'll start by writing out the column numbers.

    Console.Write("");
for (int j = 1; j <= values.GetLength(1); j++) {
Console.Write("{0}", j);
}

What we really want is the data, so we'll need to loop through the rows, then the columns to access each value in values.

    Console.WriteLine();
for (int i = 1; i <= values.GetLength(0); i++) {
Console.Write("{0}", i);
for (int j = 1; j <= values.GetLength(1); j++) {
Console.Write("{0}", values[i, j]);
}
Console.WriteLine();
}

Clean up

In order for the GC to collect the objects, which can have a large memory footprint, we want to set the references to null, close the workbook, and quit the Excel application.

    range = null;
sheet = null;
if (book != null)
book.Close(false, Missing.Value, Missing.Value);
book = null;
if (app != null)
app.Quit();
app = null;

Timed access

Included in the sample code is a timed access class which gets the range and then reads it two different ways (see screen shot at top). First it reads it using the method described above. Secondly, it loops through the rows and columns, reading each value from a Range object. Running this will illustrate the difference in time between the methods.

Conclusion:

The interop assemblies provide a lot of options for working with Excel data, both reading and writing. Some experimenting with the object model can yield decent performance and expose some very useful options which are not obvious from reading the documentation. Writing to Excel can be done fairly quickly using the same technique. See An introduction on how to control Excel 2003 with J#.NET for more detail in writing to Excel using this method.

======================================

Main.cs

/*
 * Created by Dusty Candland.
 * Date: 3/2/2005 8:27 PM
 *
 */
using System;
using Microsoft.Office.Interop.Excel;
using System.Reflection;
using System.IO;

namespace FasterExcelAccess
{
 class MainClass
 {
  public static void Main(string[] args)
  {
   #region Uncomment the code in this region to run the TimedAccess class
   // TimedAccess timedAccess = new TimedAccess();
   // timedAccess.Read();
   // return;
   #endregion
   
   ApplicationClass app = new ApplicationClass();
   Workbook book = null;
   Worksheet sheet = null;
   Range range = null;
   
   try {
    app.Visible = false;
    app.ScreenUpdating = false;
    app.DisplayAlerts = false;
    
    string execPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);
    
    book = app.Workbooks.Open(execPath + @"/../../Book1.xls", Missing.Value, Missing.Value, Missing.Value
                                      , Missing.Value, Missing.Value, Missing.Value, Missing.Value
                                     , Missing.Value, Missing.Value, Missing.Value, Missing.Value
                                    , Missing.Value, Missing.Value, Missing.Value);
    sheet = (Worksheet)book.Worksheets[1];
    
    // get a range to work with
    range = sheet.get_Range("A1", Missing.Value);
    // get the end of values to the right (will stop at the first empty cell)
    range = range.get_End(XlDirection.xlToRight);
    // get the end of values toward the bottom, looking in the last column (will stop at first empty cell)
    range = range.get_End(XlDirection.xlDown);
    
    // get the address of the bottom, right cell
    string downAddress = range.get_Address(
        false, false, XlReferenceStyle.xlA1,
        Type.Missing, Type.Missing);
   
    // Get the range, then values from a1
    range = sheet.get_Range("A1", downAddress);
    object[,] values = (object[,])range.Value2;
    
    // Value2 is a two dimenial array dime one = row, dime two = column.
    Console.WriteLine("Col Count: " + values.GetLength(1).ToString());
    Console.WriteLine("Row Count: " + values.GetLength(0).ToString());
    
    // View the values
    Console.Write("/t");
    for (int j = 1; j <= values.GetLength(1); j++) {
     Console.Write("{0}/t", j);
    }
    Console.WriteLine();
    for (int i = 1; i <= values.GetLength(0); i++) {
     Console.Write("{0}/t", i);
     for (int j = 1; j <= values.GetLength(1); j++) {
      Console.Write("{0}/t", values[i, j]);
     }
     Console.WriteLine();
    }
    
   }
   catch (Exception e) {
    Console.WriteLine(e);
   }
   finally {
    range = null;
    sheet = null;
    if (book != null)
     book.Close(false, Missing.Value, Missing.Value);
    book = null;
    if (app != null)
     app.Quit();
    app = null;
   }
   
  }
 }
}


 =====================================================

TimedAccess.cs

/*
 * Created by Dusty Candland
 * Date: 3/27/2005 3:36 PM
 *
 */

using System;
using Microsoft.Office.Interop.Excel;
using System.Reflection;
using System.IO;

namespace FasterExcelAccess
{
 /// <summary>
 /// Description of TimedAccess.
 /// </summary>
 public class TimedAccess
 {
  public TimedAccess() {}
  
  public void Read() {
   
   ApplicationClass app = new ApplicationClass();
   Workbook book = null;
   Worksheet sheet = null;
   Range range = null;
   
   try {
    DateTime startTime = DateTime.Now;
    
    app.Visible = false;
    app.ScreenUpdating = false;
    app.DisplayAlerts = false;
    
    string execPath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().CodeBase);
    
    book = app.Workbooks.Open(execPath + @"/../../Book1.xls", Missing.Value, Missing.Value, Missing.Value
                                      , Missing.Value, Missing.Value, Missing.Value, Missing.Value
                                     , Missing.Value, Missing.Value, Missing.Value, Missing.Value
                                    , Missing.Value, Missing.Value, Missing.Value);
    sheet = (Worksheet)book.Worksheets[1];

    Console.WriteLine("Open document Seconds: {0}", DateTime.Now.Subtract(startTime).TotalSeconds);
    startTime = DateTime.Now;
    
    // get a range to work with
    range = sheet.get_Range("A1", Missing.Value);
    // get the end of values to the right (will stop at the first empty cell)
    range = range.get_End(XlDirection.xlToRight);
    // get the end of values toward the bottom, looking in the last column (will stop at first empty cell)
    range = range.get_End(XlDirection.xlDown);
    
    // get the address of the bottom, right cell
    string downAddress = range.get_Address(
        false, false, XlReferenceStyle.xlA1,
        Type.Missing, Type.Missing);
   
    // Get the range, then values from a1
    range = sheet.get_Range("A1", downAddress);
    
    Console.WriteLine("Get Data Range Seconds: {0}", DateTime.Now.Subtract(startTime).TotalSeconds);
    
    ReadRange(range);
    
    ReadCells(range);
    
   }
   catch (Exception e) {
    Console.WriteLine(e);
   }
   finally {
    range = null;
    sheet = null;
    if (book != null)
     book.Close(false, Missing.Value, Missing.Value);
    book = null;
    if (app != null)
     app.Quit();
    app = null;
   }
  }

  private void ReadRange(Range range) {
   DateTime startTime = DateTime.Now;
   
   object[,] values = (object[,])range.Value2;
   
   // Value2 is a two dimenial array dime one = row, dime two = column.
   Console.WriteLine("Col Count: " + values.GetLength(1).ToString());
   Console.WriteLine("Row Count: " + values.GetLength(0).ToString());
   
   // View the values
   Console.Write("/t");
   for (int j = 1; j <= values.GetLength(1); j++) {
    Console.Write("{0}/t", j);
   }
   Console.WriteLine();
   for (int i = 1; i <= values.GetLength(0); i++) {
    Console.Write("{0}/t", i);
    for (int j = 1; j <= values.GetLength(1); j++) {
     Console.Write("{0}/t", values[i, j]);
    }
    Console.WriteLine();
   }
   
   Console.WriteLine("Range Read Seconds: {0}", DateTime.Now.Subtract(startTime).TotalSeconds);
  }
  
  private void ReadCells(Range range) {
   DateTime startTime = DateTime.Now;
   
   // View the values
   Console.Write("/t");
   for (int j = 1; j <= range.Columns.Count; j++) {
    Console.Write("{0}/t", j);
   }
   Console.WriteLine();
   for (int i = 1; i <= range.Rows.Count; i++) {
    Console.Write("{0}/t", i);
    for (int j = 1; j <= range.Columns.Count; j++) {
     Console.Write("{0}/t", ((Range)range.Cells[i, j]).Value2);
    }
    Console.WriteLine();
   }
   
   Console.WriteLine("Cells Read Seconds: {0}", DateTime.Now.Subtract(startTime).TotalSeconds);
  }
 }
}

=======================================================

Fast Exporting from DataSet to Excel

 

Introduction

Exporting data from a .NET application to Excel is a very common requirement. A simple search on the Web results in several examples that show us the method to copy data and put it into the Excel cells. However, there is a payload with this method: each interaction to put a value into an Excel cell requires an InterOp invocation. If the amount of data to transfer is huge, we have a problem with the performance of the method. Is there a better way to accomplish this?

Traditional "COPY CELL-BY-CELL" Method

Searching the Web for a method to transfer data to Excel, the most commonly used method consists of copying the values cell by cell into Excel. The following C# code shows how to transfer data from a DataTable to an Excel sheet, copying each value cell by cell:

// Copy the values from a DataTable to an Excel Sheet (cell-by-cell)
for (int col = 0; col < dataTable.Columns.Count; col++)
{
    for (int row = 0; row < dataTable.Rows.Count; row++)
    {
        excelSheet.Cells[row + 1, col + 1] = 
                dataTable.Rows[row].ItemArray[col];
    }
}

Each InterOp invocation has an associated payload in performance, so a large amount of data can degenerate our application.

A "Fast Bulk-Copy" Method

Our method consists of using the Value2 property for the Range class provided by the Microsoft Excel Object Library. We can select a range of cells, and assign a value for all of them, with just one InterOp invocation. To correctly assign a value to a range of cells, we can use a bi-dimensional object array. The following C# code shows how to transfer data from a bi-dimensional object array to a range of cells:

// Copy a bi-dimensional object array to an Excel cell range
excelSheet.get_Range("A1:H25", Type.Missing).Value2 = 
    bidimensionalObjectArray;

Measuring the Performance

The source code included with this article shows a small Windows application which uses the two described methods to export the same data to Excel. It shows the time that it takes for each method to finish. This DEMO uses the Northwind database to create an SQL Server local connection. It generates a DataSet with the content of the Customers table. To make the amount of data more significant, we duplicate the DataTable to obtain 24 copies from it. Then we apply the two methods to generate two Excel books, one for each method.

The source code includes a C# and a VB.NET version for the DEMO application. My own testing shows me that this method is about 35 times faster. Test it and arrive at your own conclusions.

History

  • November 28, 2007: First publication

source

FastExportingMethod.cs

using Microsoft.Office.Interop.Excel;
using System;
using System.Collections.Generic;
using System.Data;
using System.Text;

 

namespace FastExcelExportingDemoCs
{
 static class FastExportingMethod
 {

  public static void ExportToExcel(DataSet dataSet, string outputPath)
  {
   // Create the Excel Application object
   ApplicationClass excelApp = new ApplicationClass();

   // Create a new Excel Workbook
   Workbook excelWorkbook = excelApp.Workbooks.Add(Type.Missing);

   int sheetIndex = 0;

   // Copy each DataTable
   foreach (System.Data.DataTable dt in dataSet.Tables)
   {

    // Copy the DataTable to an object array
    object[,] rawData = new object[dt.Rows.Count + 1, dt.Columns.Count];

    // Copy the column names to the first row of the object array
    for (int col = 0; col < dt.Columns.Count; col++)
    {
     rawData[0, col] = dt.Columns[col].ColumnName;
    }

    // Copy the values to the object array
    for (int col = 0; col < dt.Columns.Count; col++)
    {
     for (int row = 0; row < dt.Rows.Count; row++)
     {
      rawData[row + 1, col] = dt.Rows[row].ItemArray[col];
     }
    }

    // Calculate the final column letter
    string finalColLetter = string.Empty;
    string colCharset = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    int colCharsetLen = colCharset.Length;

    if (dt.Columns.Count > colCharsetLen) {
     finalColLetter = colCharset.Substring(
      (dt.Columns.Count - 1) / colCharsetLen - 1, 1);
    }

    finalColLetter += colCharset.Substring(
      (dt.Columns.Count - 1) % colCharsetLen, 1);

    // Create a new Sheet
    Worksheet excelSheet = (Worksheet) excelWorkbook.Sheets.Add(
     excelWorkbook.Sheets.get_Item(++sheetIndex),
     Type.Missing, 1, XlSheetType.xlWorksheet);

    excelSheet.Name = dt.TableName;

    // Fast data export to Excel
    string excelRange = string.Format("A1:{0}{1}",
     finalColLetter, dt.Rows.Count + 1);

    excelSheet.get_Range(excelRange, Type.Missing).Value2 = rawData;

    // Mark the first row as BOLD
    ((Range) excelSheet.Rows[1, Type.Missing]).Font.Bold = true;
   }

   // Save and Close the Workbook
   excelWorkbook.SaveAs(outputPath, XlFileFormat.xlWorkbookNormal, Type.Missing,
    Type.Missing, Type.Missing, Type.Missing, XlSaveAsAccessMode.xlExclusive,
    Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
   excelWorkbook.Close(true, Type.Missing, Type.Missing);
   excelWorkbook = null;

   // Release the Application object
   excelApp.Quit();
   excelApp = null;

   // Collect the unreferenced objects
   GC.Collect();
   GC.WaitForPendingFinalizers();

  }

 }
}

=====================================

MainForm.cs
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Data.SqlClient;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Diagnostics;

namespace FastExcelExportingDemoCs
{

 internal partial class MainForm : Form
 {

  [STAThread]
  static void Main()
  {
   Application.EnableVisualStyles();
   Application.SetCompatibleTextRenderingDefault(false);
   Application.Run(new MainForm());
  }

  public MainForm()
  {
   InitializeComponent();
  }

  private DataSet getDemoDataSet()
  {
   SqlConnection cnn = new SqlConnection("server=.;database=Northwind;integrated security=sspi;");
   SqlDataAdapter da = new SqlDataAdapter("select * from Customers", cnn);

   DataSet ds = new DataSet();
   da.Fill(ds);
   ds.Tables[0].TableName = "Customers0";

   for (int i = 1; i < 25; i++)
   {
    DataTable dt = ds.Tables[0].Copy();
    dt.TableName = "Customers" + i.ToString();
    ds.Tables.Add(dt);
   }

   return ds;
  }

  private void startDemoButton_Click(object sender, EventArgs e)
  {
   // Prepare the output filenames
   string timeMark = DateTime.Now.ToString("yyyyMMdd HHmmss");
   string cellByCellFilePath = "C://ExcelExportCellByCell_" + timeMark + ".xls";
   string fastExportFilePath = "C://ExcelExportFastExport_" + timeMark + ".xls";

   this.demoResultListBox.Items.Clear();

   // Object to mark the times for each process
   Stopwatch stopwatch = new Stopwatch();
   this.UseWaitCursor = true;

   try
   {
    // Get the demo DataSet
    stopwatch.Start();
    DataSet demoDataSet = this.getDemoDataSet();
    stopwatch.Stop();

    this.demoResultListBox.Items.Add("* Generate DEMO DataSet: " + stopwatch.Elapsed.ToString());
    stopwatch.Reset();

    // Use the "Copy-cell-by-cell" method
    stopwatch.Start();
    ExportingCellByCellMethod.ExportToExcel(demoDataSet, cellByCellFilePath);
    stopwatch.Stop();

    this.demoResultListBox.Items.Add("* COPY CELL-BY-CELL method: " + stopwatch.Elapsed.ToString());
    stopwatch.Reset();

    // Use the "fast export" method
    stopwatch.Start();
    FastExportingMethod.ExportToExcel(demoDataSet, fastExportFilePath);
    stopwatch.Stop();

    this.demoResultListBox.Items.Add("* FAST EXPORT method: " + stopwatch.Elapsed.ToString());
    stopwatch.Reset();

   }
   finally
   {
    this.UseWaitCursor = false;
   }

  }
 }
}

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值