使用iTextSharp处理PDF

hohoyu

已于 2023-08-24 10:38:11 修改

阅读量1.8k

点赞数 1

分类专栏：程序开发文章标签： c# pdf

于 2023-08-23 09:02:19 首次发布

本文链接：https://blog.csdn.net/hohoyu/article/details/132443839

版权

程序开发专栏收录该内容

12 篇文章 1 订阅

订阅专栏

创建PDF文件

using System;
using System.Collections.Generic;
using System.Text;
using iTextsharp.text;
using iTextsharp.text.pdf;
using System.IO;
using System.Windows.Forms;
namespace ConsoleApplication4
{
class Program
{
    static void Main(string[] args)
    {
        iTextsharp.text.Document pdf doc new Document();
        Pdfwriter pdf write Pdfwriter.GetInstance(pdf doc,new Filestream(@"I:\chap1.pdf",FileMode.Create));
        pdf doc.Open()
        pdf doc.Add(new Paragraph ("new pdf!"));
        pdf doc.close();
        MessageBox.Show ("OK!",Environment.UserName);
        Console.Read();
    }
}
}

用iTextSharp库获取PDF文件页数

using iTextsharp.text.pdf;
计算PDE文档页数
private int pdf_pages (string filename)
{
    PdfReader pdf new pdfReader (filename);
    return pdf.Numberofpages;
}

用iTextSharp库获取PDF文件信息

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Text.RegularExpressions;
using iTextSharp.text.pdf;
namespace WindowsApplication2
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
        private void btn_browse_Click(object sender, EventArgs e)
        {
            DialogResult dia_dir_res = folderBrowserDialog_dir.ShowDialog();
            if (dia_dir_res == DialogResult.OK)
            {
                tbx_dir.Text = folderBrowserDialog_dir.SelectedPath;
                //tbx_dir.Update();
            }
        }
        private void btn_cancel_Click(object sender, EventArgs e)
        {
            Application.Exit();
        }
        private void btn_ok_Click(object sender, EventArgs e)
        {
            //判断文件夹是否存在
            if (!Directory.Exists(tbx_dir.Text))
            {
                MessageBox.Show("ERR: NO THIS DIR!");
                return;
            }
           
            //提取文件夹中的PDF文件
            string[] str_dirs = Directory.GetFiles(tbx_dir.Text, "*pdf");
            if (str_dirs.Length == 0)
            {
                MessageBox.Show("ERR: NO PDF FILE!");
                return;
            }
            // 将文件名中的空格改为-
            string str_new_dir = null;
            foreach (String str_dir in str_dirs)
            {
                if (str_dir.Contains(" "))
                {
                    str_new_dir = str_dir.Replace(" ", "-");
                    File.Copy(str_dir, str_new_dir);
                    File.Delete(str_dir);
                }
            }
           
            //设置图纸类型
            string str_drawing_type = null;
            if (rbtn_final.Checked == true)
                str_drawing_type = "完工图纸";
            else
                str_drawing_type = "审批图纸";
            //重新提取文件夹中的PDF文件
            str_dirs = Directory.GetFiles(tbx_dir.Text, "*pdf");
            if (str_dirs.Length == 0)
            {
                MessageBox.Show("ERR: NO PDF FILE!");
                return;
            }
           
            //写入文件的字符串
            string str_flush = null;
            string str_drawing_name = null;
            foreach (string str_dir in str_dirs)
            {
                str_drawing_name = str_dir.Replace(tbx_dir.Text + @"\", "");
                str_drawing_name = str_drawing_name.Replace(".pdf", "");
                str_flush += str_drawing_name + "  ";
                str_flush += str_drawing_type + "  ";
                str_flush += pdf_pages(str_dir).ToString() + "  ";
                str_flush += "无" + System.Environment.NewLine;
            }
            //写入文件
            StreamWriter sw = new StreamWriter(tbx_dir.Text + @"\pdf_pages.txt", false, System.Text.Encoding.Default);
            sw.Write(str_flush);
            sw.Close();
            MessageBox.Show("OK!");
        }
        计算PDF文档页数
        private int pdf_pages(string filename)
        {
            PdfReader pdf = new PdfReader(filename);
            return pdf.NumberOfPages;
        }
    }
}

提取PDF文本内容

提取PDF文件内容可以使用ITEXTSHARP进行。

1、定义读取器，解析器，读取策略，如下所示。

读取过程为：

a)通过指定文件路径生成读取器；

b)根据读取器生成解析器;

c)通过解析器的模板方法ProcessContent指定页码和相应的读取方法生成读取策略（下图中的i为页码，第二个参数为读取方法类，前面尖括号内为该模板方法的类型指定，对于文本读取应为如下图所示的类型）；

d)通过读取策略的GetResultantText方法获取指定页的全部文本内容；

//定义读取文件和解析方法
PdfReader pr new PdfReader (tbx pdf.Text.Trim（）);
PdfReaderContentparser prcp new pdfReaderContentparser(pr);
ITextExtractionstrategy ites;
ites prcp.ProcessContent<simpleTextExtractionstrategy>(i,new SimpleTextExtractionstrategy （）)
str_pdf ites.GetResultantText（）;

e)以上方法需要使用命名空间如下所示：

具体示例如下：

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.Text.RegularExpressions;
//读取PDF文件
PdfReader pr = new PdfReader(pdfPath);
//PDF解析器
PdfReaderContentParser prcp = new PdfReaderContentParser(pr);
//PDF文本提取方法，其中的i是读取的PDF第几页,页码从1开始，总页数可以使用PdfReader.NumberOfPages获取
ITextExtractionStrategy ites = prcp.ProcessContent<SimpleTextExtractionStrategy>(i, new SimpleTextExtractionStrategy());
string pdfText = ites.GetResultantText();

分割PDF文件（提取指定页）

分割PDF文件可以使用ITEXTSHARP，其过程有如下几步（如下图所示）：

1、定义空文档（ITEXTSHARP中的Document类型）并使用其Open方法打开；

2、使用Document对象和文件流（指定文件流模式和分割后的文件存储的路径）生成PdfCopy类的对象（该对象将Document对象与输出文件流关联）；

4、定义PdfImportPage类对象，该类对象用于存储由源PDF中提取出来的一页PDF；

5、使用PdfCopy对象的GetImportedPage方法从一个PdfReader读取器（关联了源PDF文件）中提取一个页面（该方法的第二个参数）；

6、将提取出的PdfImportPage类对象（一页PDF）通过PdfCopy对象的AddPage方法添加至相关联原Document中；

Document doc new Document()
PdfCopy pc new Pdfcopy (doc,new Filestream(tbx targpath.Text +@"\KC"str id ".pdf",System.IO.FileMode.Append));
PdfImportedpage pip null;
doc.Open()
pip pc.GetImportedPage(pr,i);
pc.Addpage(pip);

hohoyu

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
使用iTextSharp处理PDF

c)通过解析器的模板方法ProcessContent指定页码和相应的读取方法生成读取策略（下图中的i为页码，第二个参数为读取方法类，前面尖括号内为该模板方法的类型指定，对于文本读取应为如下图所示的类型）；2、使用Document对象和文件流（指定文件流模式和分割后的文件存储的路径）生成PdfCopy类的对象（该对象将Document对象与输出文件流关联）；6、将提取出的PdfImportPage类对象（一页PDF）通过PdfCopy对象的AddPage方法添加至相关联原Document中；
复制链接

扫一扫

专栏目录