word 转html utf-8,c# - Save Word to UTF-8 Encoded HTML - Stack Overflow

I am writing some C# VSTO code that reads a Microsoft Word document and saves it to Filtered HTML. When I perform this function on a generic Word document, the output of the html file uses a Windows Charset as witnessed here:

If I open a document and go to File->Options->Advanced->Web Options, I can choose UTF8, and the resulting filtered html document output looks like this:

I want to write c# code that saves any Word document to filtered html with utf-8. After doing some research, I found some people saying the "SaveAs2" function does not work (even though Microsoft documents it as a feature). That means, this code does not work for me:

doc.SaveAs2("C:\\Temp\\Test.htm", MsWord.WdSaveFormat.wdFormatFilteredHTML, Encoding: "65001");

(note: I tried putting the 65001 in quotes and without quotes.. neither throw errors, but neither works).

Next, I moved on to setting the web options for the document like this:

doc = app.Documents.Open("C:\\Temp\\Test.docx");

doc.WebOptions.Encoding = Microsoft.Office.Core.MsoEncoding.msoEncodingUTF8;

doc.SaveAs2(destFile, MsWord.WdSaveFormat.wdFormatFilteredHTML);

To the best of my knowledge the above code performs the same exact function as my manually opening a file, going to file->options..., setting to UTF-8 and saving the file to filtered html, yet the output still looks like this:

Is there a way to force Microsoft Word to output a file to UTF-8 without having to manually configure the document first?

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值