i need to extract the text of a specific page from a XPS document.
The extracted text should be written in a string. I need this to read out the extracted text using Microsofts SpeechLib.
Please examples only in C#.
Thanks
解决方案
Add References to ReachFramework and WindowsBase and the following using statement:
using System.Windows.Xps.Packaging;
Then use this code:
XpsDocument _xpsDocument=new XpsDocument("/path",System.IO.FileAccess.Read);
IXpsFixedDocumentSequenceReader fixedDocSeqReader
=_xpsDocument.FixedDocumentSequenceReader;
IXpsFixedDocumentReader _document = fixedDocSeqReader.FixedDocuments[0];
IXpsFixedPageReader _page
= _document.FixedPages[documentViewerElement.MasterPageNumber];
StringBuilder _currentText = new StringBuilder();
System.Xml.XmlReader _pageContentReader = _page.XmlReader;
if (_pageContentReader != null)
{
while (_pageContentReader.Read())
{
if (_pageContentReader.Name == "Glyphs")
{
if (_pageContentReader.HasAttributes)
{
if (_pageContentReader.GetAttribute("UnicodeString") != null )
{
_currentText.
Append(_pageContentReader.
GetAttribute("UnicodeString"));
}
}
}
}
}
string _fullPageText = _currentText.ToString();
Text exists in Glyphs -> UnicodeString string attribute. You have to use XMLReader for fixed page.