Aspose.Words For .Net是一种高级Word文档处理API,用于执行各种文档管理和操作任务。API支持生成,修改,转换,呈现和打印文档,而无需在跨平台应用程序中直接使用Microsoft Word。此外,API支持所有流行的Word处理文件格式,并允许将Word文档导出或转换为固定布局文件格式和最常用的图像/多媒体格式。
【下载Aspose.Words for .NET最新试用版】
提取目录
如果希望从任何Word文档中提取内容表,可以使用以下代码示例。
// The path to the documents directory. string dataDir = RunExamples.GetDataDir_WorkingWithDocument(); string fileName = "TOC.doc"; Aspose.Words.Document doc = new Aspose.Words.Document(dataDir + fileName); foreach (Field field in doc.Range.Fields) { if (field.Type.Equals(Aspose.Words.Fields.FieldType.FieldHyperlink)) { FieldHyperlink hyperlink = (FieldHyperlink)field; if (hyperlink.SubAddress != null && hyperlink.SubAddress.StartsWith("_Toc")) { Paragraph tocItem = (Paragraph)field.Start.GetAncestor(NodeType.Paragraph); Console.WriteLine(tocItem.ToString(SaveFormat.Text).Trim()); Console.WriteLine("------------------"); if (tocItem != null) { Bookmark bm = doc.Range.Bookmarks[hyperlink.SubAddress]; // Get the location this TOC Item is pointing to Paragraph pointer = (Paragraph)bm.BookmarkStart.GetAncestor(NodeType.Paragraph); Console.WriteLine(pointer.ToString(SaveFormat.Text)); } } // End If }// End If }// End Foreach
计算段落的行数
如果您想为任何Word文档计算段落中的行数,可以使用下面的代码示例。
// The path to the documents directory. string dataDir = RunExamples.GetDataDir_WorkingWithDocument(); string fileName = "Properties.doc"; Document document = new Document(dataDir + fileName); var collector = new LayoutCollector(document); var it = new LayoutEnumerator(document); foreach (Paragraph paragraph in document.GetChildNodes(NodeType.Paragraph, true)) { var paraBreak = collector.GetEntity(paragraph); object stop = null; var prevItem = paragraph.PreviousSibling; if (prevItem != null) { var prevBreak = collector.GetEntity(prevItem); if (prevItem is Paragraph) { it.Current = collector.GetEntity(prevItem); // para break it.MoveParent(); // last line stop = it.Current; } else if (prevItem is Table) { var table = (Table)prevItem; it.Current = collector.GetEntity(table.LastRow.LastCell.LastParagraph); // cell break it.MoveParent(); // cell it.MoveParent(); // row stop = it.Current; } else { throw new Exception(); } } it.Current = paraBreak; it.MoveParent(); // We move from line to line in a paragraph. // When paragraph spans multiple pages the we will follow across them. var count = 1; while (it.Current != stop) { if (!it.MovePreviousLogical()) break; count++; } const int MAX_CHARS = 16; var paraText = paragraph.GetText(); if (paraText.Length > MAX_CHARS) paraText = $"{paraText.Substring(0, MAX_CHARS)}..."; Console.WriteLine($"Paragraph '{paraText}' has {count} line(-s)."); }
使用导入格式选项
Aspose.Words For .Net提供ImportFormatOptions类,该类允许指定各种导入选项来格式化输出。信息。
▲设定智能的样式行为
启用此选项后,如果使用KeepSourceFormatting导入模式,则源样式将扩展为目标文档中的直接属性。当禁用此选项时,只有在对源样式进行编号时才会展开它。不会覆盖现有的目标属性,包括列表。
目前,这个选项只能与DocumentBuilder类的新公共方法一起使用,如下面的示例所示:
Document srcDoc = new Document(dataDir + "source.docx"); Document dstDoc = new Document(dataDir + "destination.docx"); DocumentBuilder builder = new DocumentBuilder(dstDoc); builder.MoveToDocumentEnd(); builder.InsertBreak(BreakType.PageBreak); ImportFormatOptions options = new ImportFormatOptions(); options.SmartStyleBehavior = true; builder.InsertDocument(srcDoc, ImportFormatMode.UseDestinationStyles, options);
▲设置保持源编号
在不同文档之间导入节点时,可能会出现这样的情况:源文档具有与目标文档中已经使用的标识符相同的列表。在这种情况下,Word总是使用目标列表的格式。为了允许用户选择适当的行为,ImportFormatOptions类中引入了KeepSourceNumbering属性,该属性指定了当编号在源文档和目标文档中发生冲突时将如何导入编号。默认值为false。
为了使用这个priperty,引入了一个新的公共方法,它接受新的KeepSourceNumbering选项,如下面的示例所示。
Document srcDoc = new Document(dataDir + "source.docx"); Document dstDoc = new Document(dataDir + "destination.docx"); ImportFormatOptions importFormatOptions = new ImportFormatOptions(); // Keep source list formatting when importing numbered paragraphs. importFormatOptions.KeepSourceNumbering = true; NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.KeepSourceFormatting, importFormatOptions); ParagraphCollection srcParas = srcDoc.FirstSection.Body.Paragraphs; foreach (Paragraph srcPara in srcParas) { Node importedNode = importer.ImportNode(srcPara, false); dstDoc.FirstSection.Body.AppendChild(importedNode); } dstDoc.Save(dataDir + "output.docx");
▲设置忽略文本框
当在不同文档之间导入文本框时,将对其应用目标文档的格式。这与单词的行为相对应。为了允许用户选择适当的行为,ImportFormatOptions类中引入了IgnoreTextBoxes选项。此属性指示在导入期间是否忽略源目标文本框中的格式设置,默认值为true。
Document srcDoc = new Document(dataDir + "source.docx"); Document dstDoc = new Document(dataDir + "destination.docx"); ImportFormatOptions importFormatOptions = new ImportFormatOptions(); // Keep the source text boxes formatting when importing. importFormatOptions.IgnoreTextBoxes = false; NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.KeepSourceFormatting, importFormatOptions); ParagraphCollection srcParas = srcDoc.FirstSection.Body.Paragraphs; foreach (Paragraph srcPara in srcParas) { Node importedNode = importer.ImportNode(srcPara, true); dstDoc.FirstSection.Body.AppendChild(importedNode); } dstDoc.Save(dataDir + "output.docx");
更多教程资源可关注ASPOSE技术交流QQ群(642018183)哦~欢迎交流讨论!