一个支持DOCX、PPTX、Html等文件合并、拆分、互相转换的C#开源项目

OpenXML是一个基于XML的Office文档格式,包括docx、excel、pptx以及图表等格式,该规范是由微软开发并发布的。虽然OpenXML功能很强大,但是在实际开发过程中,我们还是会面临不少困难,毕竟其功能比较基础。

所以今天给大家推荐一个使用 Open XML 文档(DOCX、XLSX 和 PPTX)编程接口,在此基础上进行了很多优化、并实现DOCX、PPTX、Html等文件合并、拆分、互相转换等实用的功能。

项目简介

这是一个Open XML 文档编程接口开发的,并扩展了Open XML SDK的功能。

它支持以下功能:

1、将docx、pptx文件拆分为多个文件;

2、将多个docx、pptx文件合并为一个文件;

3、使用XML数据模板生成docx文件;

4、doxc文档高保值转换为Html页面;

5、html页面高保值转换为docx文档;

6、支持正则表达式搜索和替换 DOCX/PPTX 中的内容;

7、支持docx、pptx文件,管理跟踪修订,包括检测跟踪修订和接受跟踪修订;

8、更新 DOCX/PPTX 文件中的图表,包括更新缓存数据以及嵌入的 XLSX;

9、对比两个doxc文件,并生成带有修订跟踪标记的doxc文档,并支持检索修订列表;

10、支持从doxc文档检索,包括使用样式、层次结构、使用的语言与字体;

11、与直接编写标记相比,使用简单得多的代码编写XLSX文件,包括一种可以编写数百万行的XLSX文档的流式方法。

12、支持从Excel提取数据,包括内容的格式。

技术架构

1、平台:net45;net46;netstandard2.0 开发

2、开发工具:Visual Studio 2017

项目结构

图片

使用方法

该项目集成了各种功能的使用示例,下面挑几个常用的分享:

Hyml转Docx

public static void ConvertToHtml(string file, string outputDirectory)
    {
var fi = new FileInfo(file);
        Console.WriteLine(fi.Name);
byte[] byteArray = File.ReadAllBytes(fi.FullName);
using (MemoryStream memoryStream = new MemoryStream())
        {
            memoryStream.Write(byteArray, 0, byteArray.Length);
using (WordprocessingDocument wDoc = WordprocessingDocument.Open(memoryStream, true))
            {
var destFileName = new FileInfo(fi.Name.Replace(".docx", ".html"));
if (outputDirectory != null && outputDirectory != string.Empty)
                {
                    DirectoryInfo di = new DirectoryInfo(outputDirectory);
if (!di.Exists)
                    {
throw new OpenXmlPowerToolsException("Output directory does not exist");
                    }
                    destFileName = new FileInfo(Path.Combine(di.FullName, destFileName.Name));
                }
var imageDirectoryName = destFileName.FullName.Substring(0, destFileName.FullName.Length - 5) + "_files";
int imageCounter = 0;

var pageTitle = fi.FullName;
var part = wDoc.CoreFilePropertiesPart;
if (part != null)
                {
                    pageTitle = (string) part.GetXDocument().Descendants(DC.title).FirstOrDefault() ?? fi.FullName;
                }

// TODO: Determine max-width from size of content area.
                HtmlConverterSettings settings = new HtmlConverterSettings()
                {
                    AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
                    PageTitle = pageTitle,
                    FabricateCssClasses = true,
                    CssClassPrefix = "pt-",
                    RestrictToSupportedLanguages = false,
                    RestrictToSupportedNumberingFormats = false,
                    ImageHandler = imageInfo =>
                    {
                        DirectoryInfo localDirInfo = new DirectoryInfo(imageDirectoryName);
if (!localDirInfo.Exists)
                            localDirInfo.Create();
                        ++imageCounter;
string extension = imageInfo.ContentType.Split('/')[1].ToLower();
                        ImageFormat imageFormat = null;
if (extension == "png")
                            imageFormat = ImageFormat.Png;
else if (extension == "gif")
                            imageFormat = ImageFormat.Gif;
else if (extension == "bmp")
                            imageFormat = ImageFormat.Bmp;
else if (extension == "jpeg")
                            imageFormat = ImageFormat.Jpeg;
else if (extension == "tiff")
                        {
// Convert tiff to gif.
                            extension = "gif";
                            imageFormat = ImageFormat.Gif;
                        }
else if (extension == "x-wmf")
                        {
                            extension = "wmf";
                            imageFormat = ImageFormat.Wmf;
                        }

// If the image format isn't one that we expect, ignore it,
// and don't return markup for the link.
if (imageFormat == null)
return null;

string imageFileName = imageDirectoryName + "/image" +
                            imageCounter.ToString() + "." + extension;
try
                        {
                            imageInfo.Bitmap.Save(imageFileName, imageFormat);
                        }
catch (System.Runtime.InteropServices.ExternalException)
                        {
return null;
                        }
string imageSource = localDirInfo.Name + "/image" +
                            imageCounter.ToString() + "." + extension;

                        XElement img = new XElement(Xhtml.img,
new XAttribute(NoNamespace.src, imageSource),
                            imageInfo.ImgStyleAttribute,
                            imageInfo.AltText != null ?
new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
return img;
                    }
                };
                XElement htmlElement = HtmlConverter.ConvertToHtml(wDoc, settings);

// Produce HTML document with <!DOCTYPE html > declaration to tell the browser
// we are using HTML5.
var html = new XDocument(
new XDocumentType("html", null, null, null),
                    htmlElement);
var htmlString = html.ToString(SaveOptions.DisableFormatting);
                File.WriteAllText(destFileName.FullName, htmlString, Encoding.UTF8);
            }
        }
    }

Docx、PPTX文档合并

var n = DateTime.Now;
var tempDi = new DirectoryInfo(string.Format("ExampleOutput-{0:00}-{1:00}-{2:00}-{3:00}{4:00}{5:00}", n.Year - 2000, n.Month, n.Day, n.Hour, n.Minute, n.Second));
            tempDi.Create();

var sourceDi = new DirectoryInfo("../../");
foreach (var file in sourceDi.GetFiles("*.docx"))
                File.Copy(file.FullName, Path.Combine(tempDi.FullName, file.Name));
foreach (var file in sourceDi.GetFiles("*.pptx"))
                File.Copy(file.FullName, Path.Combine(tempDi.FullName, file.Name));

var fileList = Directory.GetFiles(tempDi.FullName, "*.docx");
foreach (var file in fileList)
            {
var fi = new FileInfo(file);
                Console.WriteLine(fi.Name);
var newFileName = "Updated-" + fi.Name;
var fi2 = new FileInfo(Path.Combine(tempDi.FullName, newFileName));
                File.Copy(fi.FullName, fi2.FullName);

using (var wDoc = WordprocessingDocument.Open(fi2.FullName, true))
                {
var chart1Data = new ChartData
                    {
                        SeriesNames = new[] {
"Car",
"Truck",
"Van",
"Bike",
"Boat",
                        },
                        CategoryDataType = ChartDataType.String,
                        CategoryNames = new[] {
"Q1",
"Q2",
"Q3",
"Q4",
                        },
                        Values = new double[][] {
new double[] {
100, 310, 220, 450,
                        },
new double[] {
200, 300, 350, 411,
                        },
new double[] {
80, 120, 140, 600,
                        },
new double[] {
120, 100, 140, 400,
                        },
new double[] {
200, 210, 210, 480,
                        },
                    },
                    };
                    ChartUpdater.UpdateChart(wDoc, "Chart1", chart1Data);

var chart2Data = new ChartData
                    {
                        SeriesNames = new[] {
"Series"
                        },
                        CategoryDataType = ChartDataType.String,
                        CategoryNames = new[] {
"Cars",
"Trucks",
"Vans",
"Boats",
                        },
                        Values = new double[][] {
new double[] {
320, 112, 64, 80,
                        },
                    },
                    };
                    ChartUpdater.UpdateChart(wDoc, "Chart2", chart2Data);

var chart3Data = new ChartData
                    {
                        SeriesNames = new[] {
"X1",
"X2",
"X3",
"X4",
"X5",
"X6",
                        },
                        CategoryDataType = ChartDataType.String,
                        CategoryNames = new[] {
"Y1",
"Y2",
"Y3",
"Y4",
"Y5",
"Y6",
                        },
                        Values = new double[][] {
new double[] {      3.0,      2.1,       .7,      .7,      2.1,      3.0,      },
new double[] {      3.0,      2.1,       .8,      .8,      2.1,      3.0,      },
new double[] {      3.0,      2.4,      1.2,     1.2,      2.4,      3.0,      },
new double[] {      3.0,      2.7,      1.7,     1.7,      2.7,      3.0,      },
new double[] {      3.0,      2.9,      2.5,     2.5,      2.9,      3.0,      },
new double[] {      3.0,      3.0,      3.0,     3.0,      3.0,      3.0,      },
                    },
                    };
                    ChartUpdater.UpdateChart(wDoc, "Chart3", chart3Data);

var chart4Data = new ChartData
                    {
                        SeriesNames = new[] {
"Car",
"Truck",
"Van",
                        },
                        CategoryDataType = ChartDataType.DateTime,
                        CategoryFormatCode = 14,
                        CategoryNames = new[] {
                            ToExcelInteger(new DateTime(2013, 9, 1)),
                            ToExcelInteger(new DateTime(2013, 9, 2)),
                            ToExcelInteger(new DateTime(2013, 9, 3)),
                            ToExcelInteger(new DateTime(2013, 9, 4)),
                            ToExcelInteger(new DateTime(2013, 9, 5)),
                            ToExcelInteger(new DateTime(2013, 9, 6)),
                            ToExcelInteger(new DateTime(2013, 9, 7)),
                            ToExcelInteger(new DateTime(2013, 9, 8)),
                            ToExcelInteger(new DateTime(2013, 9, 9)),
                            ToExcelInteger(new DateTime(2013, 9, 10)),
                            ToExcelInteger(new DateTime(2013, 9, 11)),
                            ToExcelInteger(new DateTime(2013, 9, 12)),
                            ToExcelInteger(new DateTime(2013, 9, 13)),
                            ToExcelInteger(new DateTime(2013, 9, 14)),
                            ToExcelInteger(new DateTime(2013, 9, 15)),
                            ToExcelInteger(new DateTime(2013, 9, 16)),
                            ToExcelInteger(new DateTime(2013, 9, 17)),
                            ToExcelInteger(new DateTime(2013, 9, 18)),
                            ToExcelInteger(new DateTime(2013, 9, 19)),
                            ToExcelInteger(new DateTime(2013, 9, 20)),
                        },
                        Values = new double[][] {
new double[] {
1, 2, 3, 2, 3, 4, 5, 4, 5, 6, 5, 4, 5, 6, 7, 8, 7, 8, 8, 9,
                        },
new double[] {
2, 3, 3, 4, 4, 5, 6, 7, 8, 7, 8, 9, 9, 9, 7, 8, 9, 9, 10, 11,
                        },
new double[] {
2, 3, 3, 3, 3, 2, 2, 2, 3, 2, 3, 3, 4, 4, 4, 3, 4, 5, 5, 4,
                        },
                    },
                    };
                    ChartUpdater.UpdateChart(wDoc, "Chart4", chart4Data);
                }
            }

            fileList = Directory.GetFiles(tempDi.FullName, "*.pptx");
foreach (var file in fileList)
            {
var fi = new FileInfo(file);
                Console.WriteLine(fi.Name);
var newFileName = "Updated-" + fi.Name;
var fi2 = new FileInfo(Path.Combine(tempDi.FullName, newFileName));
                File.Copy(fi.FullName, fi2.FullName);

using (var pDoc = PresentationDocument.Open(fi2.FullName, true))
                {
var chart1Data = new ChartData
                    {
                        SeriesNames = new[] {
"Car",
"Truck",
"Van",
                        },
                        CategoryDataType = ChartDataType.String,
                        CategoryNames = new[] {
"Q1",
"Q2",
"Q3",
"Q4",
                        },
                        Values = new double[][] {
new double[] {
320, 310, 320, 330,
                        },
new double[] {
201, 224, 230, 221,
                        },
new double[] {
180, 200, 220, 230,
                        },
                    },
                    };
                    ChartUpdater.UpdateChart(pDoc, 1, chart1Data);
                }
            }

执行Excel公式

var n = DateTime.Now;
var tempDi = new DirectoryInfo(string.Format("ExampleOutput-{0:00}-{1:00}-{2:00}-{3:00}{4:00}{5:00}", n.Year - 2000, n.Month, n.Day, n.Hour, n.Minute, n.Second));
            tempDi.Create();

// Change sheet name in formulas
using (OpenXmlMemoryStreamDocument streamDoc = new OpenXmlMemoryStreamDocument(
                SmlDocument.FromFileName("../../Formulas.xlsx")))
            {
using (SpreadsheetDocument doc = streamDoc.GetSpreadsheetDocument())
                {
                    WorksheetAccessor.FormulaReplaceSheetName(doc, "Source", "'Source 2'");
                }
                streamDoc.GetModifiedSmlDocument().SaveAs(Path.Combine(tempDi.FullName, "FormulasUpdated.xlsx"));
            }

// Change sheet name in formulas
using (OpenXmlMemoryStreamDocument streamDoc = new OpenXmlMemoryStreamDocument(
                SmlDocument.FromFileName("../../Formulas.xlsx")))
            {
using (SpreadsheetDocument doc = streamDoc.GetSpreadsheetDocument())
                {
                    WorksheetPart sheet = WorksheetAccessor.GetWorksheet(doc, "References");
                    WorksheetAccessor.CopyCellRange(doc, sheet, 1, 1, 7, 5, 4, 8);
                }
                streamDoc.GetModifiedSmlDocument().SaveAs(Path.Combine(tempDi.FullName, "FormulasCopied.xlsx"));
            }

具体示例代码如下:

图片

项目地址:https://github.com/OpenXmlDev/Open-Xml-PowerTools

- End -

推荐阅读

专注分享编程知识、热门有用有趣的开源项目

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
你可以使用Java中的Apache POI库来读取和操作docx文件。以下是一个简单的示例代码,可以将多个docx文件合并成一个大的docx文件: ```java import java.io.*; import org.apache.poi.xwpf.usermodel.*; public class MergeDocxFiles { public static void main(String[] args) throws Exception { // 创建一个新的空白文档 XWPFDocument finalDoc = new XWPFDocument(); // 读取要合并的多个文件 String[] fileNames = { "file1.docx", "file2.docx", "file3.docx" }; for (String fileName : fileNames) { FileInputStream fis = new FileInputStream(fileName); XWPFDocument doc = new XWPFDocument(fis); // 将每个文件的内容复制到新的文档中 for (XWPFParagraph para : doc.getParagraphs()) { XWPFParagraph newPara = finalDoc.createParagraph(); newPara.createRun().setText(para.getText()); } // 关闭文件输入流 fis.close(); } // 将新的文档保存到磁盘 FileOutputStream fos = new FileOutputStream("merged.docx"); finalDoc.write(fos); fos.close(); System.out.println("文档合并完成!"); } } ``` 在这个示例代码中,我们首先创建一个空白的XWPFDocument对象作为最终合并后的文档。然后,我们循环读取要合并的多个文件,并将每个文件的段落逐个复制到新的文档中。最后,我们将新的文档保存到磁盘上。 请注意,这只是一个简单的示例代码,可能需要根据你的具体需求进行修改。特别是,在复制段落时,可能需要考虑更多的内容,例如表格、图片等。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

编程乐趣

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值