springboot集成webmagic和selenium，并部署到linux（问题坑）

最新推荐文章于 2024-08-01 16:56:44 发布

孤狼逐月

最新推荐文章于 2024-08-01 16:56:44 发布

阅读量1.4k

点赞数

分类专栏： linux JAVA spring-boot 文章标签： spring boot selenium linux

本文链接：https://blog.csdn.net/qq_36249132/article/details/130315387

版权

JAVA 同时被 3 个专栏收录

11 篇文章 0 订阅

订阅专栏

linux

8 篇文章 0 订阅

订阅专栏

spring-boot

8 篇文章 0 订阅

订阅专栏

springboot集成webmagic和selenium，并部署到linux（问题坑）

首先参考两个源代码
spring boot集成
找不到org.openqa.selenium.remote.AbstractDriverOptions的类文件
代理ip--更换一个网页同时更换一个代理ip
代理ip网址
部署linux

首先参考两个源代码

黄亿华 / webmagic
NAMEniubi / webmagic-selenium

通过以上源码可以基本了解怎么使用

spring boot集成

将Pipeline、PageProcessor、Spider进行拆分为不同类，拆分后添加@Component使其交给spring管理，使用时进行@Autowired注入

找不到org.openqa.selenium.remote.AbstractDriverOptions的类文件

集成后有时会出现找不到类的情况，是因为spring boot自带<selenium.version>版本号太老导致，需要手动添加maven依赖，这里我是用的是4.8.3版本，如下所示：（缺少哪个添加哪个即可）
在这里插入图片描述

        <dependency>
            <groupId>org.seleniumhq.selenium</groupId>
            <artifactId>selenium-remote-driver</artifactId>
            <version>4.8.3</version>
        </dependency>

代理ip–更换一个网页同时更换一个代理ip

添加代理Ip池，说实话怎么添加ip在网上找到了，但是怎么实现上述目标没有找到，所以只好使用笨方法—在更换网页后手动更改当前系统代理

            System.setProperty("proxyType", "4");
            System.setProperty("proxyHost", ip);
            System.setProperty("proxyPort", 端口);
            System.setProperty("proxySet", "true");

有时候需要将代理取消，换回原有代理，执行清理系统代理

        System.clearProperty("proxyHost");
        System.clearProperty("proxyPort");
        System.clearProperty("proxyType");
        System.clearProperty("proxyType");

放置位置在==webDriver.get(request.getUrl());==之前即可

代理ip网址

当前我在用的两个免费代理网站（不过有条件的话还是建议买代理服务，免费的。。。你们懂的）
快代理
 66免费代理

部署linux

谷歌浏览器

配置 yum 源

在 /etc/yum.repos.d/ 目录下新建文件 google-chrome.repo

vi /etc/yum.repos.d/google-chrome.repo

写入以下内容

[google-chrome]
name=google-chrome
baseurl=http://dl.google.com/linux/chrome/rpm/stable/$basearch
enabled=1
gpgcheck=1
gpgkey=https://dl-ssl.google.com/linux/linux_signing_key.pub

安装 google chrome 浏览器

 yum -y install google-chrome-stable

Google 官方源可能在中国无法使用，导致安装失败或者在国内无法更新，可以添加以下参数来安装

yum -y install google-chrome-stable --nogpgcheck

位置
安装后位置在 /usr/bin/google-chrome

下载谷歌浏览器驱动网址(注意和浏览器的版本一致性)

chromedriver
下载linux版本，上传到服务器并解压，复制到/usr/bin下

unzip chromedriver_linux64.zip
cp chromedriver /usr/bin

在这里插入图片描述

查看版本

google-chrome --version
chromedriver --version

在这里插入图片描述

验证代理Ip是否可用

    //无效的ip 返回true 有效的ip返回false
    static boolean ifUselessBaidu(String ip, String port) {
        String[] split = ip.split(":");
        URL url = null;
        try {`在这里插入代码片`
            url = new URL("http://www.baidu.com");
            InetSocketAddress addr = new InetSocketAddress(ip, Integer.parseInt(port));
            Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
            InputStream in = null;
            try {
                URLConnection conn = url.openConnection(proxy);
                conn.setConnectTimeout(2000);
                in = conn.getInputStream();
            } catch (Exception e) {
                return true;
            }
            String s = IOUtils.toString(in);
            if (s.indexOf("baidu") > 0) {
                return false;
            }
            return true;
        } catch (Exception e) {
            return true;
        }
    }


    static List<String> arrs = new ArrayList<String>() {{
        add("http://ip.3322.net/");
        add("http://icanhazip.com/");
    }};
//无效的ip 返回true 有效的ip返回false
    static boolean ifUseless(String ip, String port) {
        URL url = null;
        try {
            url = new URL(arrs.get(new Random().nextInt(arrs.size())));
            InetSocketAddress addr = new InetSocketAddress(ip, Integer.parseInt(port));
            Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
            InputStream in = null;
            try {
                URLConnection conn = url.openConnection(proxy);
                conn.setConnectTimeout(2000);
                in = conn.getInputStream();
            } catch (Exception e) {
                return true;
            }
            String resultIp = IOUtils.toString(in);
//            System.out.println(ip + "----" + resultIp);
            if (resultIp.trim().equals(ip)) {
                return false;
            } else {
                return true;
            }
        } catch (Exception e) {
            return true;
        }
    }