python xps_python处理xps文件_从XPS文档中提取文本

bd96500e110b49cbb3cd949968f18be7.png

i need to extract the text of a specific page from a XPS document.

The extracted text should be written in a string. I need this to read out the extracted text using Microsofts SpeechLib.

Please examples only in C#.

Thanks

解决方案

Add References to ReachFramework and WindowsBase and the following using statement:

using System.Windows.Xps.Packaging;

Then use this code:

XpsDocument _xpsDocument=new XpsDocument("/path",System.IO.FileAccess.Read);

IXpsFixedDocumentSequenceReader fixedDocSeqReader

=_xpsDocument.FixedDocumentSequenceReader;

IXpsFixedDocumentReader _document = fixedDocSeqReader.FixedDocuments[0];

IXpsFixedPageReader _page

= _document.FixedPages[documentViewerElement.MasterPageNumber];

StringBuilder _currentText = new StringBuilder();

System.Xml.XmlReader _pageContentReader = _page.XmlReader;

if (_pageContentReader != null)

{

while (_pageContentReader.Read())

{

if (_pageContentReader.Name == "Glyphs")

{

if (_pageContentReader.HasAttributes)

{

if (_pageContentReader.GetAttribute("UnicodeString") != null )

{

_currentText.

Append(_pageContentReader.

GetAttribute("UnicodeString"));

}

}

}

}

}

string _fullPageText = _currentText.ToString();

Text exists in Glyphs -> UnicodeString string attribute. You have to use XMLReader for fixed page.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值