ASP.NET读取Word、PDF和PPT文件

原文地址:原文地址
doc  pdf ppt与 txt之间的转换 :

组件的作用一般是将文件读出成字符格式,并不是单纯的转换文件名后缀,所以需要将读出的东西写入txt文件 。

 

添加office引用

.net中对office中的wordppt进行编程时,确保安装office时已经安装了wordppt可编程组件(自定义安装时可查看)或者安装“Microsoft Office 2003 Primary Interop Assemblies

安装后,在编程页面添加引用:

添加引用-com—microsoft powerpoint object 11.0 libaray/word 11.0 object library;

还得添加office组件

using Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.PowerPoint;
 
using org.pdfbox.pdmodel;                     
using org.pdfbox.util;
 
using Microsoft.Office.Interop.Word;
using Microsoft.Office.Interop.PowerPoint;

publicvoid pdf2txt(FileInfo file,FileInfo txtfile)
    {
        PDDocument doc = PDDocument.load(file.FullName);
        PDFTextStripper pdfStripper = newPDFTextStripper();
        string text = pdfStripper.getText(doc);
            StreamWriter swPdfChange = newStreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));
        swPdfChange.Write(text);
        swPdfChange.Close();
    }


 

对于doc文件中的表格,读出的结果是去除掉了网格线,内容按行读取。

 

   publicvoid word2text(FileInfo file,FileInfo txtfile)
    {
 
        object readOnly = true;
        object missing = System.Reflection.Missing.Value;
        object fileName = file.FullName;
        Microsoft.Office.Interop.Word.ApplicationClass wordapp = new Microsoft.Office.Interop.Word.ApplicationClass();
        Document doc = wordapp.Documents.Open(ref fileName,
    ref missing, ref readOnly, ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing, ref missing, ref missing);
        string text = doc.Content.Text;
        doc.Close(ref missing, ref missing, ref missing);
        wordapp.Quit(ref missing, ref missing, ref missing);
        StreamWriter swWordChange = newStreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));
        swWordChange.Write(text);
        swWordChange.Close();
 
    }
 
    public void ppt2txt(FileInfo file, FileInfo txtfile)
    {
         Microsoft.Office.Interop.PowerPoint.Application pa = new Microsoft.Office.Interop.PowerPoint.ApplicationClass();
        Microsoft.Office.Interop.PowerPoint.Presentation pp = pa.Presentations.Open(file.FullName,
                        Microsoft.Office.Core.MsoTriState.msoTrue,
                        Microsoft.Office.Core.MsoTriState.msoFalse,
                        Microsoft.Office.Core.MsoTriState.msoFalse);
        string pps = "";
        StreamWriter swPPtChange = newStreamWriter(txtfile.FullName, false, Encoding.GetEncoding("gb2312"));
       
        foreach (Microsoft.Office.Interop.PowerPoint.Slide slide in pp.Slides)
        {
            foreach (Microsoft.Office.Interop.PowerPoint.Shape shape in slide.Shapes)
           
                pps += shape.TextFrame.TextRange.Text.ToString();
   
        }
        swPPtChange.Write(pps);
        swPPtChange.Close();
 
   
    }

 

读取不同类型的文件

   

 publicStreamReader text2reader(FileInfo file)
    {
        StreamReader st = null;
        switch (file.Extension.ToLower())
        {
            case".txt":
                st = newStreamReader(file.FullName, Encoding.GetEncoding("gb2312"));
                break;
            case".doc":
                FileInfo wordfile = newFileInfo(@"E:/my programs/200807program/FileSearch/App_Data/word2txt.txt");//不能使用相对路径,想办法改进
                word2text(file, wordfile);
                st = newStreamReader(wordfile.FullName, Encoding.GetEncoding("gb2312"));
                break;
            case".pdf":
                FileInfo pdffile = newFileInfo(@"E:/my programs/200807program/FileSearch/App_Data/pdf2txt.txt");
                pdf2txt(file, pdffile);
                st = newStreamReader(pdffile.FullName, Encoding.GetEncoding("gb2312"));
                break;
            case".ppt":
                FileInfo pptfile = newFileInfo(@"E:/my programs/200807program/FileSearch/App_Data/ppt2txt.txt");
                ppt2txt(file,pptfile);
                st = newStreamReader(pptfile.FullName,Encoding.GetEncoding("gb2312"));
                break;
        }
        return st;
    }

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
【核心代码】 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 using Aspose.Cells; using Aspose.Slides.Pptx; using System; using System.Collections.Generic; using System.Data; using System.IO; using System.Linq; using System.Net; using System.Net.Http; using System.Text; using System.Web.Http; namespace DocOnlineView.UI.Controllers.MVCAPI { public class HomeController : ApiController { [HttpGet] public DataTable CourseViewOnLine(string fileName) { DataTable dtlist = new DataTable(); dtlist.Columns.Add("TempDocHtml", typeof(string)); string fileDire = "/Files"; string sourceDoc = Path.Combine(fileDire, fileName); string saveDoc = ""; string docExtendName = System.IO.Path.GetExtension(sourceDoc).ToLower(); bool result = false; if (docExtendName == ".pdf") { //pdf模板文件 string tempFile = Path.Combine(fileDire, "temppdf.html"); saveDoc = Path.Combine(fileDire, "viewFiles/onlinepdf.html"); result = PdfToHtml( sourceDoc, System.Web.HttpContext.Current.Server.MapPath(tempFile), System.Web.HttpContext.Current.Server.MapPath(saveDoc)); } else { saveDoc = Path.Combine(fileDire, "viewFiles/onlineview.html"); result = OfficeDocumentToHtml( System.Web.HttpContext.Current.Server.MapPath(sourceDoc), System.Web.HttpContext.Current.Server.MapPath(saveDoc)); }
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值