delphi html编码,delphi – 来自TWebBrowser的HTML源代码 – 如何检测流编码?

如果我使用具有Unicode代码页的html页面运行this code,则结果是乱码,因为在D7中TStringStream不是Unicode.页面可能是UTF8编码或其他(Ansi)代码页编码.

如何检测TStream / IPersistStreamInit是否为Unicode / UTF8 / Ansi?

我如何始终为此函数返回正确的WideString结果?function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;

如果我用TMemoryStream替换TStringStream,并将TMemoryStream保存到文件中就可以了.它可以是Unicode / UTF8 / Ansi.但我总是希望以WideString的形式返回流:function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString;

var

// LStream: TStringStream;

LStream: TMemoryStream;

Stream : IStream;

LPersistStreamInit : IPersistStreamInit;

begin

if not Assigned(WebBrowser.Document) then exit;

// LStream := TStringStream.Create('');

LStream := TMemoryStream.Create;

try

LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;

Stream := TStreamAdapter.Create(LStream,soReference);

LPersistStreamInit.Save(Stream,true);

// result := LStream.DataString;

LStream.SaveToFile('c:\test\test.txt'); // test only - file is ok

Result := ??? // WideString

finally

LStream.Free();

end;

end;

这完全符合我的需要.但它仅适用于Delphi Unicode编译器(D2009).阅读Conclusion部分:There is obviously a lot more we could do. A couple of things

immediately spring to mind. We retro-fit some of the Unicode

functionality and support for non-ANSI encodings to the pre-Unicode

compiler code. The present code when compiled with anything earlier

than Delphi 2009 will not save document content to strings correctly

if the document character set is not ANSI.

魔术显然是在TEncoding类(TEncoding.GetBufferEncoding)中.但是D7没有TEncoding.有任何想法吗?

最佳答案 我使用

GpTextStream来处理转换(应该适用于所有Delphi版本):function GetCodePageFromHTMLCharSet(Charset: WideString): Word;

const

WIN_CHARSET = 'windows-';

ISO_CHARSET = 'iso-';

var

S: string;

begin

Result := 0;

if Charset = 'unicode' then

Result := CP_UNICODE else

if Charset = 'utf-8' then

Result := CP_UTF8 else

if Pos(WIN_CHARSET, Charset) <> 0 then

begin

S := Copy(Charset, Length(WIN_CHARSET) + 1, Maxint);

Result := StrToIntDef(S, 0);

end else

if Pos(ISO_CHARSET, Charset) <> 0 then // ISO-8859 (e.g. iso-8859-1: => 28591)

begin

S := Copy(Charset, Length(ISO_CHARSET) + 1, Maxint);

S := Copy(S, Pos('-', S) + 1, 2);

if S = '15' then // ISO-8859-15 (Latin 9)

Result := 28605

else

Result := StrToIntDef('2859' + S, 0);

end;

end;

function GetWebBrowserHTML(WebBrowser: TWebBrowser): WideString;

var

LStream: TMemoryStream;

Stream: IStream;

LPersistStreamInit: IPersistStreamInit;

TextStream: TGpTextStream;

Charset: WideString;

Buf: WideString;

CodePage: Word;

N: Integer;

begin

Result := '';

if not Assigned(WebBrowser.Document) then Exit;

LStream := TMemoryStream.Create;

try

LPersistStreamInit := WebBrowser.Document as IPersistStreamInit;

Stream := TStreamAdapter.Create(LStream, soReference);

if Failed(LPersistStreamInit.Save(Stream, True)) then Exit;

Charset := (WebBrowser.Document as IHTMLDocument2).charset;

CodePage := GetCodePageFromHTMLCharSet(Charset);

N := LStream.Size;

SetLength(Buf, N);

TextStream := TGpTextStream.Create(LStream, tsaccRead, [], CodePage);

try

N := TextStream.Read(Buf[1], N * SizeOf(WideChar)) div SizeOf(WideChar);

SetLength(Buf, N);

Result := Buf;

finally

TextStream.Free;

end;

finally

LStream.Free();

end;

end;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值