MFC WebBrowser获取网面完整源码 WebBrowser获取网面完整HTML

最新推荐文章于 2021-06-22 05:01:26 发布

kim-2006

最新推荐文章于 2021-06-22 05:01:26 发布

阅读量579

点赞数

分类专栏： VC/MFC/C/C++

本文链接：https://blog.csdn.net/k83133058/article/details/113244422

版权

VC/MFC/C/C++ 专栏收录该内容

36 篇文章 0 订阅

订阅专栏

反爬比较厉害的网站，用WebBrowser来获取网页源码，也可使用开源的浏览器来实现。

#include <MsHTML.h>
#include <atlconv.h>
void CMFCApplication1Dlg::DocumentCompleteExplorer1(LPDISPATCH pDisp, VARIANT* URL)
{
	CComDispatchDriver lpDisp = m_web.get_Application();
	if (lpDisp == pDisp)
	{
		//把地址更新到地址栏
		m_szUrl = URL->bstrVal;
		UpdateData(FALSE);

		//获取网页源码
		//IHTMLDocument get_Script();
		//CComQIPtr <IHTMLDocument> pDocument1(m_web.get_Document());
		CComQIPtr <IHTMLDocument2> pDocument2(m_web.get_Document());	//get_frames()
		//IHTMLDocument3 get_parentDocument() getElementsByTagName() getElementsByName() getElementById()
		//CComQIPtr <IHTMLDocument3> pDocument3(m_web.get_Document());
		//CComQIPtr <IHTMLDocument6> pDocument6(m_web.get_Document());
		//IHTMLDocument7 get_head() getElementsByTagNameNS() getElementsByClassName()
		//CComQIPtr <IHTMLDocument7> pDocument7(m_web.get_Document());
		CComQIPtr <IHTMLElement> pBody;
		CComQIPtr <IHTMLElement> pHTML;
		HRESULT hRet = pDocument2->get_body(&pBody);	//获得body
		if (FAILED(hRet) || !pBody) return;
		BSTR bszHtml, bszHtml2;
		//pBody->get_outerHTML(&bszHtml);		//获得body中的html

		CComQIPtr <IHTMLElement> pEle;
		pBody->get_parentElement(&pEle);	//获得body的父元素
		pEle->get_outerHTML(&bszHtml);		//获得页面完整html
		pEle->get_innerHTML(&bszHtml2);		//部分html，head+body
		//pEle->get_innerText(&bszHtml2);	//获得页面所有可见文本
		//TRACE(_T("%Ls"), bszHtml2);

		CString szTemp;
		szTemp.Format(_T("%Ts"), bszHtml);

		CString filename = _T("z:\\test.html");
		CFile oFile;
		if (oFile.Open(filename, CFile::modeCreate | CFile::modeWrite))
			oFile.Write(szTemp, szTemp.GetLength() * sizeof(TCHAR));
		//oFile.Flush();
		oFile.Close();

		//pDocument2->get_all(&pEle);
		//pEle->tags();

		/*pDocument7->get_head(&pHTML);
		pHTML->get_innerHTML(&bstr);*/
	}
}

kim-2006

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
MFC WebBrowser获取网面完整源码 WebBrowser获取网面完整HTML

反爬比较厉害的网站，用WebBrowser来获取网页源码，也可使用开源的浏览器来实现。#include <MsHTML.h>#include <atlconv.h>void CMFCApplication1Dlg::DocumentCompleteExplorer1(LPDISPATCH pDisp, VARIANT* URL){ CComDispatchDriver lpDisp = m_web.get_Application(); if (lpDisp == pD
复制链接

扫一扫

专栏目录