pdfbox

How to read pdf files using C# .NET

including iText, PDFBox, PDF-Excel, etc

A summary of some resources available online for programming in C# to produce software that will read files stored in Adobe portable document format (pdf).

Firstly, what pdf is

  • the Adobe PDF Technology Center PDF Reference webpage includes links to the definitive pdf specification, including the PDF Reference and Related Documentation (over 15MB). Adobe publishes the full specification to "foster the creation of an ecosystem around the PDF format"

C# Resources for reading PDF files

  • A PDF Forms Parser by Michael Ganss addresses the problem of filling data into a pdf form programmatically (for example, with generated content or data read from a database). He writes, "The parser is not a full-fledged PDF parser but rather a small, one-class parser that can be dropped into any project where form field parsing is necessary instead of a whole library that adds a lot of overhead. Although the parser supports all types of PDF objects except for streams, it parses just the form fields of a PDF file by looking at the AcroForm dictionary." The page links to a 22.3KB source code download. (Oct 2004)
  • iText is a library that enables you to generate PDF files on the fly. The documentation says, "the iText classes are very useful for people who need to generate read-only, platform independent documents containing text, lists, tables and images; or who want to perform specific manipulations on existing PDF documents." It is written for use in Java systems but there is a .NET port available: iTextSharp (written in C#), implemented as an assembly and downloadable from this page on SourceForge (Nov 2007)
  • PDFBox is a Java library (see sub-bullet for how to use it in C# .NET) that lets you create new PDF documents, manipulate existing documents and extract content from documents. PDFBox also includes several command line utilities. Functionality includes; PDF to text extraction; Merge PDF Documents; PDF Document Encryption/Decryption; Lucene Search Engine Integration; Fill in form data FDF and XFDF; Create a PDF from a text file; Create images from PDF pages; Print a PDF
  • Some Open Source PDF Libraries in C# here include iTextSharp, PDFsharp, Report.NET, SharpPDF, ASP.NET fo PDF, PDF Clown, PDFjet Open Source Edition.

Writing data to a pdf file:

Chris Hornberger wrote on Jul 2 2003, 6:59 am: "Create a Crystal report with the information you want on it, then simply export it to PDF. The fact that you're using C#, I assume you're also using VS Studio.NET and hence, have Crystal too. This will allow you to create your PDF file. Another choice is to spring for Adobe Pagemill and print to the PDF file format."

Other Assorted PDF Utilities:

Some free utilities are available for download - instead of writing your own software, this section may save you the trouble of re-inventing the wheel...

  • The RubyPDF Blog contains an assortment of free utilities for manipulating pdf files:
    • pdf cropper to remove white margins
    • BookmarkExtractor to extract all bookmarks from a pdf file
    • PDF2PPT converts a pdf to a PowerPoint file
    • Pdfrotate rotates every page a chosen multiple of 90°
    • pdfselect extracts pages, splits a pdf or reverses it
    • PDF N-UP Maker builds a n-up pdf or booklet
  • WOA PDF-Excel is a free utility that lets you convert pdf files into Excel; customisable, it lets you extract data from a folder of pdf files into a single spreadsheet (eg a folder of invoices, etc). Produced by Wilkie Office Automation (2006)

Other useful links:

Some background reading...

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值