1.
<object classid="clsid:CA8A9780-280D-11CF-A24D-444553540000" width="50%" height="50%" border="0"
id="Object1" name="pdf">
<param name="toolbar" value="false">
<param name="_Version" value="65539">
<param name="_ExtentX" value="20108">
<param name="_ExtentY" value="10866">
<param name="_StockProps" value="0">
<param name="SRC" value="StarUML_5.0_Developer_Guide.pdf">
</object>
---------------------------------------------------------------------------------------------------------------------------------------------
2.摘 自: http://www.codeproject.com/KB/string/pdf2text.aspx
PDFBox is another Java PDF library. It is also ready to use with the original Java Lucene (see LucenePDFDocument).
Fortunately, there is a .NET version of PDFBox that is created using IKVM.NET (just download the PDFBox package, it's in the bin directory).
Using PDFBox in .NET requires adding references to:
PDFBox-0.7.2.dllIKVM.GNU.Classpathand copying IKVM.Runtime.dll to the bin directory.
Using the PDFBox to parse PDFs is fairly easy:
private static string parseUsingPDFBox(string filename){
PDDocument doc = PDDocument.load(filename);
PDFTextStripper stripper = new PDFTextStripper();
return stripper.getText(doc);
}
----------------------------------------------------------------------------------------------
3.Java中,可参考文档:使用PDFBox处理PDF文档
-------------------------------------------------------------------------------------------------
4.参考网站:http://blog.csdn.net/maicar1235/article/details/2777897