在使用WebBrowser开发网页采集小工具的时候遇到了一些麻烦并花了很长时间去解决这些问题,为了让更多的人少走弯路,故从产品抽离出来一个sample提供给大家参考。
源码地址:https://github.com/CupNoCake/SampleBrowser.git
功能简介:
1.支持IE版本设置,sample中设置的是系统安装的最高IE版本,你也可以根据需求设置自己想要的版本,但是不能超过系统安装的版本。
static class Program
{
#region 浏览器设置
/// <summary>
/// 修改注册表信息来兼容当前程序
/// </summary>
private static void SetWebBrowserFeatures(int ieVersion)
{
if (LicenseManager.UsageMode != LicenseUsageMode.Runtime)
return;
//获取程序及名称
var appName = System.IO.Path.GetFileName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
//得到浏览器的模式的值
uint ieMode = GetEmulationMode(ieVersion);
var featureControlRegKey = @"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\";
//设置浏览器对应用程序(appName)以什么模式(ieMode)运行
Registry.SetValue(featureControlRegKey + "FEATURE_BROWSER_EMULATION",
appName, ieMode, RegistryValueKind.DWord);
//不晓得设置有什么用
Registry.SetValue(featureControlRegKey + "FEATURE_ENABLE_CLIPCHILDREN_OPTIMIZATION",
appName, 1, RegistryValueKind.DWord);
}
/// <summary>
/// 获取浏览器的版本
/// </summary>
/// <returns></returns>
private static int GetBrowserVersion()
{
int browserVersion = 0;
using (var ieKey = Registry.LocalMachine.OpenSubKey(@"SOFTWARE\Microsoft\Internet Explorer",
RegistryKeyPermissionCheck.ReadSubTree,
System.Security.AccessControl.RegistryRights.QueryValues))
{
var version = ieKey.GetValue("svcVersion");
if (null == version)
{
version = ieKey.GetValue("Version");
if (null == version)
throw new ApplicationException("Microsoft Internet Explorer is required!");
}
int.TryParse(version.ToString().Split('.')[0], out browserVersion);
}
//如果小于7
if (browserVersion < 7)
{
throw new ApplicationException("不支持的浏览器版本!");
}
return browserVersion;
}
/// <summary>
/// 通过版本得到浏览器模式的值
/// </summary>
/// <param name="browserVersion"></param>
/// <returns></returns>
private static uint GetEmulationMode(int browserVersion)
{
UInt32 mode = 11000; // Internet Explorer 11
switch (browserVersion)
{
case 7:
mode = 7000; // Internet Explorer 7
break;
case 8:
mode = 8000; // Internet Explorer 8
break;
case 9:
mode = 9000; // Internet Explorer 9
break;
case 10:
mode = 10000; // Internet Explorer 10.
break;
case 11:
mode = 11000; // Internet Explorer 11
break;
}
return mode;
}
/// <summary>
/// 查询系统环境是否支持IE8以上版本
/// </summary>
private static bool IfWindowsSupport()
{
bool isWin7 = Environment.OSVersion.Version.Major > 6;
bool isSever2008R2 = Environment.OSVersion.Version.Major == 6
&& Environment.OSVersion.Version.Minor >= 1;
if (!isWin7 && !isSever2008R2)
{
return false;
}
else return true;
}
private static void SetIEVersion()
{
int ieVersion = GetBrowserVersion();
if (IfWindowsSupport())
{
SetWebBrowserFeatures(ieVersion < 11 ? ieVersion : 11);
}
else
{
// 如果不支持IE8 则修改为当前系统的IE版本
SetWebBrowserFeatures(ieVersion < 7 ? 7 : ieVersion);
}
}
#endregion
/// <summary>
/// 应用程序的主入口点。
/// </summary>
[STAThread]
static void Main()
{
Application.EnableVisualStyles();
Application.SetCompatibleTextRenderingDefault(false);
SetIEVersion();
Application.Run(new Form1());
}
}
2.更好的网页跳转策略,跳转自动创建新的标签页而不是跳转的默认浏览器。sample使用了TabControl控件管理标签页,感兴趣的同学可以使用其他更好的三方界面库来管理标签页。
想要控制网页跳转首先需要扩展一下WebBrowser控件,关键事件是NewWindow3。
扩展WebBrowser的时候需要用到Interop.SHDocVw,请添加引用COM类型库Microsoft Internet Controls
axIWebBrowser2.Silent = true;这句代码是为了避免JaveScript脚本错误弹窗。如果需要显示请把这句代码注释掉。
网页跳转时会触发NewWindow3事件,此时我们需要新建一个包含ExWebBrowser控件的标签页,并把WxWebBrowser的ActiveXInstance传给NewWindow3的ppDisp,记住Cancel不要改成true否则无法进行正确跳转。完成这些之后在网页发生跳转的时候就会在新标签页中打开跳转的网页,而不是跳转的默认浏览器打开。
public class ExWebBrowser : WebBrowser
{
#region 私有变量
SHDocVw.IWebBrowser2 axIWebBrowser2;
AxHost.ConnectionPointCookie cookie;
ExWebBrowserEvents events;
#endregion
#region 属性
/// <summary>
/// Returns the automation object for the web browser
/// </summary>
public object Application
{
get { return axIWebBrowser2.Application; }
}
#endregion
#region override method
/// <summary>
/// This method supports the .NET Framework infrastructure and is not intended to be used directly from your code.
/// Called by the control when the underlying ActiveX control is created.
/// </summary>
/// <param name="nativeActiveXObject"></param>
[PermissionSet(SecurityAction.LinkDemand, Name = "FullTrust")]
protected override void AttachInterfaces(object nativeActiveXObject)
{
axIWebBrowser2 = (SHDocVw.IWebBrowser2)nativeActiveXObject;
axIWebBrowser2.Silent = true;
axIWebBrowser2.RegisterAsBrowser = true;
base.AttachInterfaces(nativeActiveXObject);
}
/// <summary>
/// This method supports the .NET Framework infrastructure and is not intended to be used directly from your code.
/// Called by the control when the underlying ActiveX control is discarded.
/// </summary>
[PermissionSet(SecurityAction.LinkDemand, Name = "FullTrust")]
protected override void DetachInterfaces()
{
axIWebBrowser2 = null;
base.DetachInterfaces();
}
/// <summary>
/// This method will be called to give you a chance to create your own event sink
/// </summary>
[PermissionSet(SecurityAction.LinkDemand, Name = "FullTrust")]
protected override void CreateSink()
{
// Make sure to call the base class or the normal events won't fire
base.CreateSink();
events = new ExWebBrowserEvents(this);
cookie = new AxHost.ConnectionPointCookie(this.ActiveXInstance, events, typeof(SHDocVw.DWebBrowserEvents2));
}
/// <summary>
/// Detaches the event sink
/// </summary>
[PermissionSet(SecurityAction.LinkDemand, Name = "FullTrust")]
protected override void DetachSink()
{
if (null != cookie)
{
cookie.Disconnect();
cookie = null;
}
base.DetachSink();
}
/// <summary>
/// Overridden
/// </summary>
/// <param name="m">The <see cref="Message"/> send to this procedure</param>
[PermissionSet(SecurityAction.LinkDemand, Name = "FullTrust")]
protected override void WndProc(ref Message m)
{
switch (m.Msg)
{
case (int)WindowMessage.WM_DESTROY:
OnQuit();
break;
case (int)WindowMessage.WM_PARENTNOTIFY:
{
int X = (int)m.WParam & 0xFFFF;
if (X == 0x2/*WM_DESTROY*/)//若收到该消息,引发WindowClosed事件
{
OnQuit();
}
}
break;
default:
break;
}
base.WndProc(ref m);
}
#endregion
#region event definition
/// <summary>
/// Fires when downloading of a document begins
/// </summary>
public event EventHandler Downloading;
/// <summary>
/// Fires when downloading is completed
/// </summary>
/// <remarks>
/// Here you could start monitoring for script errors.
/// </remarks>
public event EventHandler DownloadComplete;
/// <summary>
/// Fires before navigation occurs in the given object (on either a window or frameset element).
/// </summary>
public event EventHandler<ExWebBrowserBeforeNavigate2EventArgs> BeforeNavigate2;
/// <summary>
/// Raised when a new window is to be created. Extends DWebBrowserEvents2::NewWindow2 with additional information about the new window.
/// </summary>
public event EventHandler<ExWebBrowserNewWindow2EventArgs> NewWindow2;
/// <summary>
/// Raised when a new window is to be created. Extends DWebBrowserEvents2::NewWindow3 with additional information about the new window.
/// </summary>
public event EventHandler<ExWebBrowserNewWindow3EventArgs> NewWindow3;
/// <summary>
/// Raised when StatusText is to be changed.
/// </summary>
public event EventHandler<ExWebBrowserStatusTextChangeEventArgs> StatusTextChange;
/// <summary>
/// Raised when Title is to be changed.
/// </summary>
public event EventHandler<ExWebBrowserTitleChangeEventArgs> TitleChange;
/// <summary>
/// Raised when the browser application quits
/// </summary>
/// <remarks>
/// Do not confuse this with DWebBrowserEvents2.Quit... That's something else.
/// </remarks>
public event EventHandler Quit;
#endregion
#region event invoke
/// <summary>
/// Raises the <see cref="Downloading"/> event
/// </summary>
/// <param name="e">Empty <see cref="EventArgs"/></param>
/// <remarks>
/// You could start an animation or a notification that downloading is starting
/// </remarks>
protected void OnDownloading(EventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
Downloading?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="DownloadComplete"/> event
/// </summary>
/// <param name="e">Empty <see cref="EventArgs"/></param>
protected virtual void OnDownloadComplete(EventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
DownloadComplete?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="StartNewWindow"/> event
/// </summary>
/// <exception cref="ArgumentNullException">Thrown when BrowserExtendedNavigatingEventArgs is null</exception>
protected void OnNewWindow2(ExWebBrowserNewWindow2EventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
NewWindow2?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="StartNewWindow"/> event
/// </summary>
/// <exception cref="ArgumentNullException">Thrown when BrowserExtendedNavigatingEventArgs is null</exception>
protected void OnNewWindow3(ExWebBrowserNewWindow3EventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
NewWindow3?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="StartNavigate"/> event
/// </summary>
/// <exception cref="ArgumentNullException">Thrown when BrowserExtendedNavigatingEventArgs is null</exception>
protected void OnBeforeNavigate2(ExWebBrowserBeforeNavigate2EventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
BeforeNavigate2?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="StatusTextChange"/> event
/// </summary>
/// <param name="e"></param>
/// <exception cref="ArgumentNullException">Thrown when StatusTextEventArgs is null</exception>
protected void OnStatusTextChange(ExWebBrowserStatusTextChangeEventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
StatusTextChange?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="TitleChange"/> event
/// </summary>
/// <param name="e"></param>
/// <exception cref="ArgumentNullException">Thrown when TitleEventArgs is null</exception>
protected void OnTitleChange(ExWebBrowserTitleChangeEventArgs e)
{
if (e == null)
throw new ArgumentNullException("e");
TitleChange?.Invoke(this, e);
}
/// <summary>
/// Raises the <see cref="Quit"/> event
/// </summary>
protected void OnQuit()
{
Quit?.Invoke(this, EventArgs.Empty);
}
#endregion
#region The Implementation of DWebBrowserEvents2 for firing extra events
//This class will capture events from the WebBrowser
class ExWebBrowserEvents : SHDocVw.DWebBrowserEvents2
{
public ExWebBrowserEvents() { }
ExWebBrowser _Browser;
public ExWebBrowserEvents(ExWebBrowser browser) { _Browser = browser; }
#region DWebBrowserEvents2 Members
public void StatusTextChange(string Text)
{
ExWebBrowserStatusTextChangeEventArgs args = new ExWebBrowserStatusTextChangeEventArgs(Text);
_Browser.OnStatusTextChange(args);
}
public void ProgressChange(int Progress, int ProgressMax)
{
//throw new NotImplementedException();
}
public void CommandStateChange(int Command, bool Enable)
{
//throw new NotImplementedException();
}
public void DownloadBegin()
{
_Browser.OnDownloading(EventArgs.Empty);
}
public void DownloadComplete()
{
_Browser.OnDownloadComplete(EventArgs.Empty);
}
public void TitleChange(string Text)
{
_Browser.OnTitleChange(new ExWebBrowserTitleChangeEventArgs(Text));
}
public void PropertyChange(string szProperty)
{
//throw new NotImplementedException();
}
public void BeforeNavigate2(object pDisp, ref object URL, ref object Flags, ref object TargetFrameName, ref object PostData, ref object Headers, ref bool Cancel)
{
string tFrame = null;
if (TargetFrameName != null)
tFrame = TargetFrameName.ToString();
ExWebBrowserBeforeNavigate2EventArgs args = new ExWebBrowserBeforeNavigate2EventArgs(pDisp, URL.ToString(), tFrame);
_Browser.OnBeforeNavigate2(args);
Cancel = args.Cancel;
pDisp = args.ActiveXInstance;
}
public void NewWindow2(ref object ppDisp, ref bool Cancel)
{
ExWebBrowserNewWindow2EventArgs args = new ExWebBrowserNewWindow2EventArgs(ppDisp);
_Browser.OnNewWindow2(args);
Cancel = args.Cancel;
ppDisp = args.ActiveXInstance;
}
public void NavigateComplete2(object pDisp, ref object URL)
{
//throw new NotImplementedException();
}
public void DocumentComplete(object pDisp, ref object URL)
{
//throw new NotImplementedException();
}
public void OnQuit()
{
_Browser.OnQuit();
}
public void OnVisible(bool Visible)
{
//throw new NotImplementedException();
}
public void OnToolBar(bool ToolBar)
{
//throw new NotImplementedException();
}
public void OnMenuBar(bool MenuBar)
{
//throw new NotImplementedException();
}
public void OnStatusBar(bool StatusBar)
{
//throw new NotImplementedException();
}
public void OnFullScreen(bool FullScreen)
{
//throw new NotImplementedException();
}
public void OnTheaterMode(bool TheaterMode)
{
//throw new NotImplementedException();
}
public void WindowSetResizable(bool Resizable)
{
//throw new NotImplementedException();
}
public void WindowSetLeft(int Left)
{
//throw new NotImplementedException();
}
public void WindowSetTop(int Top)
{
//throw new NotImplementedException();
}
public void WindowSetWidth(int Width)
{
//throw new NotImplementedException();
}
public void WindowSetHeight(int Height)
{
//throw new NotImplementedException();
}
public void WindowClosing(bool IsChildWindow, ref bool Cancel)
{
//throw new NotImplementedException();
}
public void ClientToHostWindow(ref int CX, ref int CY)
{
//throw new NotImplementedException();
}
public void SetSecureLockIcon(int SecureLockIcon)
{
//throw new NotImplementedException();
}
public void FileDownload(bool ActiveDocument, ref bool Cancel)
{
//throw new NotImplementedException();
}
public void NavigateError(object pDisp, ref object URL, ref object Frame, ref object StatusCode, ref bool Cancel)
{
//throw new NotImplementedException();
}
public void PrintTemplateInstantiation(object pDisp)
{
//throw new NotImplementedException();
}
public void PrintTemplateTeardown(object pDisp)
{
//throw new NotImplementedException();
}
public void UpdatePageStatus(object pDisp, ref object nPage, ref object fDone)
{
//throw new NotImplementedException();
}
public void PrivacyImpactedStateChange(bool bImpacted)
{
//throw new NotImplementedException();
}
public void NewWindow3(ref object ppDisp, ref bool Cancel, uint dwFlags, string bstrUrlContext, string bstrUrl)
{
ExWebBrowserNewWindow3EventArgs args = new ExWebBrowserNewWindow3EventArgs(ppDisp, bstrUrl, bstrUrlContext);
_Browser.OnNewWindow3(args);
Cancel = args.Cancel;
ppDisp = args.ActiveXInstance;
}
public void SetPhishingFilterStatus(int PhishingFilterStatus)
{
//throw new NotImplementedException();
}
public void WindowStateChanged(uint dwWindowStateFlags, uint dwValidFlagsMask)
{
//throw new NotImplementedException();
}
public void NewProcess(int lCauseFlag, object pWB2, ref bool Cancel)
{
//throw new NotImplementedException();
}
public void ThirdPartyUrlBlocked(ref object URL, uint dwCount)
{
//throw new NotImplementedException();
}
public void RedirectXDomainBlocked(object pDisp, ref object StartURL, ref object RedirectURL, ref object Frame, ref object StatusCode)
{
//throw new NotImplementedException();
}
public void BeforeScriptExecute(object pDispWindow)
{
//throw new NotImplementedException();
}
public void WebWorkerStarted(uint dwUniqueID, string bstrWorkerLabel)
{
//throw new NotImplementedException();
}
public void WebWorkerFinsihed(uint dwUniqueID)
{
//throw new NotImplementedException();
}
#endregion
}
#endregion
}
3.支持ActiveX控件。
最开始时我尝试使用AxWebBrowser来实现支持ActiveX组件,结果实现脚本注入以及脚本执行时,AxWebBrowser实现起来相当麻烦,所以查了很多资料才找到下面的方法。
首先需要添加引用Microsoft.VisualStudio.OLE.Interop,并在主程序Form中继承IOleClientSite接口并实现接口函数。
在创建WebBrowser控件后把WebBrowser注册一下。
IOleObject obj = (IOleObject)webPage.GetActiveXInstance();
obj.SetClientSite(this);
public partial class Form1 : Form, IOleClientSite
{
public Form1()
{
InitializeComponent();
}
#region The Implementation of IOleClientSite
public void SaveObject()
{
}
public void GetMoniker(uint dwAssign, uint dwWhichMoniker, out IMoniker ppmk)
{
ppmk = (IMoniker)this;
}
public void GetContainer(out IOleContainer ppContainer)
{
ppContainer = (IOleContainer)this;
}
public void ShowObject()
{
}
public void OnShowWindow(int fShow)
{
}
public void RequestNewObjectLayout()
{
}
#endregion
private object CreateNewWebPage(string url)
{
TabPage tabPage = new TabPage("新标签页");
tabPage.Name = "tabPage" + (tabControl1.TabPages.Count + 1);
WebPage webPage = new WebPage();
webPage.Dock = DockStyle.Fill;
webPage.Tag = tabPage;
webPage.NewPage += WebPage_NewPage;
webPage.StatusTextChange += WebPage_StatusTextChange;
webPage.TitleChange += WebPage_TitleChange;
IOleObject obj = (IOleObject)webPage.GetActiveXInstance();
obj.SetClientSite(this);
tabPage.Controls.Add(webPage);
tabControl1.TabPages.Add(tabPage);
tabControl1.SelectedTab = tabPage;
if (url != null && url.Length > 0)
{
webPage.Navigate(url);
}
else
{
webPage.FocusAddressInput();
}
return webPage.GetActiveXInstance();
}
private void Form1_Load(object sender, EventArgs e)
{
CreateNewWebPage(null);
}
private void WebPage_TitleChange(object sender, ComponentModel.WebPageTitleChangeEventArgs e)
{
TabPage tabPage = (sender as WebPage).Tag as TabPage;
tabPage.Text = e.Title;
tabControl1.Refresh();
}
private void WebPage_StatusTextChange(object sender, ComponentModel.WebPageStatusTextChangeEventArgs e)
{
statusTextLabel.Text = e.Text;
}
private void WebPage_NewPage(object sender, ComponentModel.WebPageNewPageEventArgs e)
{
e.ActiveXInstance = CreateNewWebPage(null);
}
private void tsmi_tabPage_close_Click(object sender, EventArgs e)
{
if(tabControl1.SelectedTab != null)
{
TabPage tabPage = tabControl1.SelectedTab;
tabPage.Controls.Clear();
tabControl1.TabPages.Remove(tabPage);
if(tabControl1.TabPages.Count == 0)
{
Close();
}
}
}
private void tsmi_new_page_Click(object sender, EventArgs e)
{
CreateNewWebPage(null);
}
}
如果你想控制网页加载内容,你可以在Form中实现下列函数:
BrowserOptions webBrowserOptions = BrowserOptions.Images | BrowserOptions.Videos | BrowserOptions.BackgroundSounds;
[DispId(-5512)]
public virtual int IDispatch_Invoke_Handler()
{
//System.Diagnostics.Debug.WriteLine("-5512");
return (int)webBrowserOptions;
}
public enum BrowserOptions : uint
{
/// <summary>
/// No flags are set.
/// </summary>
None = 0,
/// <summary>
/// The browser will operate in offline mode. Equivalent to DLCTL_FORCEOFFLINE.
/// </summary>
AlwaysOffline = 0x10000000,
/// <summary>
/// The browser will play background sounds. Equivalent to DLCTL_BGSOUNDS.
/// </summary>
BackgroundSounds = 0x00000040,
/// <summary>
/// Specifies that the browser will not run Active-X controls. Use this setting
/// to disable Flash movies. Equivalent to DLCTL_NO_RUNACTIVEXCTLS.
/// </summary>
DontRunActiveX = 0x00000200,
/// <summary>
/// Specifies that the browser should fetch the content from the server. If the server's
/// content is the same as the cache, the cache is used.Equivalent to DLCTL_RESYNCHRONIZE.
/// </summary>
IgnoreCache = 0x00002000,
/// <summary>
/// The browser will force the request from the server, and ignore the proxy, even if the
/// proxy indicates the content is up to date.Equivalent to DLCTL_PRAGMA_NO_CACHE.
/// </summary>
IgnoreProxy = 0x00004000,
/// <summary>
/// Specifies that the browser should download and display images. This is set by default.
/// Equivalent to DLCTL_DLIMAGES.
/// </summary>
Images = 0x00000010,
/// <summary>
/// Disables downloading and installing of Active-X controls.Equivalent to DLCTL_NO_DLACTIVEXCTLS.
/// </summary>
NoActiveXDownload = 0x00000400,
/// <summary>
/// Disables web behaviours.Equivalent to DLCTL_NO_BEHAVIORS.
/// </summary>
NoBehaviours = 0x00008000,
/// <summary>
/// The browser suppresses any HTML charset specified.Equivalent to DLCTL_NO_METACHARSET.
/// </summary>
NoCharSets = 0x00010000,
/// <summary>
/// Indicates the browser will ignore client pulls.Equivalent to DLCTL_NO_CLIENTPULL.
/// </summary>
NoClientPull = 0x20000000,
/// <summary>
/// The browser will not download or display Java applets.Equivalent to DLCTL_NO_JAVA.
/// </summary>
NoJava = 0x00000100,
/// <summary>
/// The browser will download framesets and parse them, but will not download the frames
/// contained inside those framesets.Equivalent to DLCTL_NO_FRAMEDOWNLOAD.
/// </summary>
NoFrameDownload = 0x00080000,
/// <summary>
/// The browser will not execute any scripts.Equivalent to DLCTL_NO_SCRIPTS.
/// </summary>
NoScripts = 0x00000080,
/// <summary>
/// If the browser cannot detect any internet connection, this causes it to default to
/// offline mode.Equivalent to DLCTL_OFFLINEIFNOTCONNECTED.
/// </summary>
OfflineIfNotConnected = 0x80000000,
/// <summary>
/// Specifies that UTF8 should be used.Equivalent to DLCTL_URL_ENCODING_ENABLE_UTF8.
/// </summary>
UTF8 = 0x00040000,
/// <summary>
/// The browser will download and display video media.Equivalent to DLCTL_VIDEOS.
/// </summary>
Videos = 0x00000020
}