C#转换rtf到纯文本

 

How to: Convert RTF to Plain Text (C# Programming Guide)

Rich Text Format (RTF) is a document format developed by Microsoft in the late 1980s to enable the exchange of documents across operating systems. Both Microsoft Word and WordPad can read and write RTF documents. In the .NET Framework, you can use theRichTextBox control to create a word processor that supports RTF and enables a user to apply formatting to text in a WYSIWIG manner.

You can also use the RichTextBox control to programmatically remove the RTF formatting codes from a document and convert it to plain text. You do not need to embed the control in a Windows Form to perform this kind of operation.

To use the RichTextBox control in a project

  1. Add a reference to System.Windows.Forms.dll.

  2. Add a using directive for the System.Windows.Forms namespace (optional).

Example


The following example provides a sample RTF file to be converted. The file contains RTF formatting, such as font information, and it also contains four Unicode characters and four extended ASCII characters. The file is opened, passed to theRichTextBox as RTF, retrieved as text, displayed in aMessageBox, and output to a file in UTF-8 format.

C#
    // Save the following RTF file to the same folder as your .exe file, and call it "test.rtf".
    /*
    {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}{\f1\fnil\fprq1\fcharset0 Courier New;}{\f2\fswiss\fprq2\fcharset0 Arial;}}
{\colortbl ;\red0\green128\blue0;\red0\green0\blue0;}
{\*\generator Msftedit 5.41.21.2508;}\viewkind4\uc1\pard\f0\fs20 This is the \i Greek \i0 word "psyche": \cf1\f1\u968?\u965?\u967?\u942?\cf2\f2 . It is encoded in Unicode.\par
Here are four extended \b ASCII \b0 characters (Windows code page 1252):  \'e2\'e4\u1233?\'e5\cf0\par
}
     */
    class ConvertFromRTF
    {
        static void Main()
        {

            string path = @"test.rtf";

            //Create the RichTextBox. (Requires a reference to System.Windows.Forms.dll.)
            System.Windows.Forms.RichTextBox rtBox = new System.Windows.Forms.RichTextBox();

            // Get the contents of the RTF file. Note that when it is
            // stored in the string, it is encoded as UTF-16.
            string s = System.IO.File.ReadAllText(path);

            // Display the RTF text.
            System.Windows.Forms.MessageBox.Show(s);

            // Convert the RTF to plain text.
            rtBox.Rtf = s;
            string plainText = rtBox.Text;

            // Display plain text output in MessageBox because console
            // cannot display Greek letters.
            System.Windows.Forms.MessageBox.Show(plainText);

            // Output plain text to file, encoded as UTF-8.
            System.IO.File.WriteAllText(@"output.txt", plainText);
        }
    }


RTF characters are encoded in eight bits. However, the format does let users specify Unicode characters in addition to extended ASCII characters from specified code pages. Because theRichTextBox.Text property is of typestring, the characters are encoded as Unicode UTF-16. Any extended ASCII characters and Unicode characters from the source RTF document are correctly encoded in the text output.

If you use the File.WriteAllText method to write the text to disk, the text will be encoded as UTF-8 (without a Byte Order Mark).

 

http://msdn.microsoft.com/en-us/library/cc488002.aspx

 

Rich Text Format (RTF) Version 1.5 Specification

http://www.biblioscape.com/rtf15_spec.htm

 

Word 2007: Rich Text Format (RTF) Specification, version 1.9.1

http://www.microsoft.com/en-us/download/details.aspx?id=10725

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值