（三）Selenium使用和常见问题

chantor7

已于 2022-03-08 09:29:03 修改

阅读量859

点赞数

分类专栏：爬虫文章标签： golang selenium 爬虫

于 2022-03-08 09:26:20 首次发布

本文链接：https://blog.csdn.net/chantor7/article/details/123345016

版权

爬虫专栏收录该内容

7 篇文章 0 订阅

订阅专栏

（三）Selenium使用和常见问题

文章目录

（三）Selenium使用和常见问题

Selenium使用指南

前面说了直接使用http发送请求，但是很多网站都有很强的反爬，迷惑不了他们。这时，可以想到平常我们用浏览器轻轻松松打开一个网站根本不需要考虑什么被限制，那有没有什么能够像人一样打开浏览器，然后拿到页面数据的东西呢。此时，就要祭出我们的大杀器**Selenium**,selenium是一款web自动化测试的工具，换句话说就是模拟人工操作浏览器的工具。

打开selenium官网，我们发现它支Java/Python/C#/Ruby/JS/Kotlin，官网上的教材非常详细。等等…GO呢？不要慌，顺着github往下看，https://github.com/tebeka/selenium。感谢大神们。

需要一说，其实也不需要说。

流程：

1.安装go get github.com/tebeka/selenium，下载jar包和对应浏览器驱动，安装上浏览器，一定一定要版本对应。我使用的是谷歌浏览器。这里提醒一下自己一定要及时记录，现在记笔记回去找下载地址笑死根本没收藏。那大家自己去网上搜搜吧，也可以在评论区留言给以后看到的小伙伴~

2.可能会碰到一个找不到浏览器的错误，大概是这个意思，配一下环境变量就OK了，在环境变量path中添加一条如xxxxx\Google\GoogleChromePortableDev\App\Chrome-bin,即你安装浏览器的位置。

3.上代码

import(
    "github.com/tebeka/selenium"
	"github.com/tebeka/selenium/chrome"
)

const (
	seleniumPath    = `\other\selenium-server-standalone-3.9.1.jar`
	geckoDriverPath = `\other\chromedriver.exe`  //其实我是谷歌浏览器
	port            = 9515
)

func main(){
    	//初始化基本参数
	opts := []selenium.ServiceOption{
		selenium.ChromeDriver(geckoDriverPath), // Specify the path to GeckoDriver in order to use Firefox.
		selenium.Output(ioutil.Discard),        // Output debug information to STDERR.
	}
	service, err := selenium.NewSeleniumService(seleniumPath, port, opts...)
	if err != nil {
		fmt.Println(err)
	}
	defer service.Stop()
    
    //打开浏览器
    wd,_:=OpenChrome()
    
	//判断网站是否加载完成
	jsRt, err := wd.ExecuteScript("return document.readyState", nil)
	if err != nil {
		log.Println("exe js err", err)
	}
	fmt.Println("jsRt", jsRt)
	if jsRt != "complete" {
		log.Println("网页加载未完成" )
		time.Sleep(time.Second * 5)
	}    
    
}

//打开 chrome 浏览器
func OpenChrome()(wd selenium.WebDriver,err error){
	caps := selenium.Capabilities{"browserName": "chrome"}
	//禁止图片加载，加快渲染速度
	imagCaps := map[string]interface{}{
		"profile.managed_default_content_settings.images": 2,
	}
	chromeCaps := chrome.Capabilities{
		Prefs: imagCaps,
		Path:  "",
		Args: []string{
			//"--headless",  //不显示窗口,注意打开窗口才能加载插件
			"--start-maximized",
			"--window-size=128,128",
			"--no-sandbox",
			"--user-agent=" + utils.GetRandomUserAgent(), //获取随机User-Agent
			"--disable-gpu",
			"--disable-impl-side-painting",
			"--disable-gpu-sandbox",
			"--disable-accelerated-2d-canvas",
			"--disable-accelerated-jpeg-decoding",
			"--test-type=ui",
			//"-proxy-server="+proxyStr, //+ proxyPool.GetIp(), //"xxx.xxx.xxx.xxx:9999"  //使用代理ip
		},
	}

	//以上是设置浏览器参数
	err=chromeCaps.AddExtension("/other/proxy.zip")	//添加插件，这里是一个使用隧道代理的插件，以后说
	if err != nil {
		return
	}
	caps.AddChrome(chromeCaps)
	//打开 chrome 浏览器
	wd, err = selenium.NewRemote(caps, fmt.Sprintf("http://localhost:%d/wd/hub", port))
	if err != nil {
		return
	}
	return
}

常见问题

1.打包.exe运行报找不到设备错误，SSL错误，其实是之前在电脑上使用了耳机、鼠标等外接设备，但是现在没连接，但是你的电脑里有设备记录，就会报找不到设备错误，解决方法：

		chromeCaps := chrome.Capabilities{
			Prefs: imagCaps,
			Path:  "",
			Args: []string{
				//"--headless",
				"--start-maximized",
				//"--window-size=12,12",
				"--no-sandbox",
				"--user-agent=" + GetRandomUserAgent(),
				"--disable-gpu",
				"--disable-impl-side-painting",
				"--disable-gpu-sandbox",
				"--disable-accelerated-2d-canvas",
				"--disable-accelerated-jpeg-decoding",
				"--test-type=ui",
				//"-proxy-server="+proxyStr, //+ proxyPool.GetIp(),
				"--ignore-certificate-errors",
				"--ignore-ssl-errors",
			},
			ExcludeSwitches:[]string{   //这里
				"enable-automation",
				"enable-logging",
			},
		}

2.模拟鼠标点击时报找不到元素错误，一个原因是元素是不可见的，直接调用元素的Click()就会报错。解决方法用js语句运行：

	element,_:=wd.FindElement(selenium.ByXPATH,`//div[@id="order"]//div[text()="默认"]`)
	_,err=wd.ExecuteScript(`arguments[0].click();`,[]interface{}{element})

学习资料

capabilities

[1] https://peter.sh/experiments/chromium-command-line-switches/ “参数列表”
[2] https://sites.google.com/a/chromium.org/chromedriver/capabilities
[3] https://chromium.googlesource.com/chromium/src/+/master/chrome/common/pref_names.cc “首选项参数”