C#读取doc,pdf,ppt,TXT文件

C#读取doc,pdf,ppt文件

doc  pdf ppt与 txt之间的转换 :

组件的作用一般是将文件读出成字符格式,并不是单纯的转换文件名后缀,所以需要将读出的东西写入txt文件 。

 

添加office引用

.net中对office中的wordppt进行编程时,确保安装office时已经安装了wordppt可编程组件(自定义安装时可查看)或者安装“Microsoft Office 2003 Primary Interop Assemblies

安装后,在编程页面添加引用:

添加引用-com—microsoft powerpoint object 11.0 libaray/word 11.0 object library;

还得添加office组件

using Microsoft.Office.Interop.Word;

using Microsoft.Office.Interop.PowerPoint;

using org.pdfbox.pdmodel;                    

using org.pdfbox.util;

using Microsoft.Office.Interop.Word;

using Microsoft.Office.Interop.PowerPoint;

publicvoid pdf2txt(FileInfo file,FileInfo txtfile)

    {

        PDDocument doc =PDDocument.load(file.FullName);

        PDFTextStripper pdfStripper =newPDFTextStripper();

        string text = pdfStripper.getText(doc);

            StreamWriter swPdfChange =newStreamWriter(txtfile.FullName,false, Encoding.GetEncoding("gb2312"));

        swPdfChange.Write(text);

        swPdfChange.Close();

    }

对于doc文件中的表格,读出的结果是去除掉了网格线,内容按行读取。

    publicvoid word2text(FileInfo file,FileInfo txtfile)

    {

        object readOnly =true;

        object missing = System.Reflection.Missing.Value;

        object fileName = file.FullName;

        Microsoft.Office.Interop.Word.ApplicationClass wordapp =new Microsoft.Office.Interop.Word.ApplicationClass();

        Document doc = wordapp.Documents.Open(ref fileName,

    ref missing,ref readOnly, ref missing, ref missing,ref missing,

    ref missing,ref missing, ref missing, ref missing,ref missing,

    ref missing,ref missing, ref missing, ref missing,ref missing);

        string text = doc.Content.Text;

        doc.Close(ref missing,ref missing, ref missing);

        wordapp.Quit(ref missing,ref missing, ref missing);

        StreamWriter swWordChange =newStreamWriter(txtfile.FullName,false, Encoding.GetEncoding("gb2312"));

        swWordChange.Write(text);

        swWordChange.Close();

    }

    publicvoidppt2txt(FileInfo file, FileInfo txtfile)

    {

         Microsoft.Office.Interop.PowerPoint.Application pa =new Microsoft.Office.Interop.PowerPoint.ApplicationClass();

        Microsoft.Office.Interop.PowerPoint.Presentation pp = pa.Presentations.Open(file.FullName,

                        Microsoft.Office.Core.MsoTriState.msoTrue,

                        Microsoft.Office.Core.MsoTriState.msoFalse,

                        Microsoft.Office.Core.MsoTriState.msoFalse);

        string pps ="";

        StreamWriter swPPtChange =newStreamWriter(txtfile.FullName,false, Encoding.GetEncoding("gb2312"));

       

        foreach (Microsoft.Office.Interop.PowerPoint.Slide slidein pp.Slides)

        {

            foreach (Microsoft.Office.Interop.PowerPoint.Shape shapein slide.Shapes)

           

                pps += shape.TextFrame.TextRange.Text.ToString();

   

        }

        swPPtChange.Write(pps);

        swPPtChange.Close();

   

    }

读取不同类型的文件

    publicStreamReader text2reader(FileInfo file)

    {

        StreamReader st =null;

        switch (file.Extension.ToLower())

        {

            case".txt":

                st = newStreamReader(file.FullName,Encoding.GetEncoding("gb2312"));

                break;

            case".doc":

                FileInfo wordfile =newFileInfo(@"E:/my programs/200807program/FileSearch/App_Data/word2txt.txt");//不能使用相对路径,想办法改进

                word2text(file, wordfile);

                st = newStreamReader(wordfile.FullName,Encoding.GetEncoding("gb2312"));

                break;

            case".pdf":

                FileInfo pdffile =newFileInfo(@"E:/my programs/200807program/FileSearch/App_Data/pdf2txt.txt");

                pdf2txt(file, pdffile);

                st = newStreamReader(pdffile.FullName,Encoding.GetEncoding("gb2312"));

                break;

            case".ppt":

                FileInfo pptfile =newFileInfo(@"E:/my programs/200807program/FileSearch/App_Data/ppt2txt.txt");

                ppt2txt(file,pptfile);

                st = newStreamReader(pptfile.FullName,Encoding.GetEncoding("gb2312"));

                break;

        }

        return st;

    }

 

 

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值