多线程爬虫爬取豆瓣标签和评论

最新推荐文章于 2021-11-28 11:13:29 发布

流浪剑客孙

最新推荐文章于 2021-11-28 11:13:29 发布

阅读量455

点赞数 2

分类专栏：爬虫文章标签： JAVA 多线程爬虫 JAVA爬虫 BFS爬虫

本文链接：https://blog.csdn.net/qq_40663503/article/details/90813318

版权

利用多线程技术开九路爬虫对数据进行爬取

package com;
/**
 * 孙煜晗爬虫九倍速魔改版
 * sunYuhan
 */
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.*;

public class exe2{
	static exe e1;
	public static void main(String[] args)
	{
		exe e=new exe();
		e.firstGo();
		exeThread e1=new exeThread(e);
		exeThread e2=new exeThread(e);
		exeThread e3=new exeThread(e);
		exeThread e4=new exeThread(e);
		exeThread e5=new exeThread(e);
		exeThread e6=new exeThread(e);
		exeThread e7=new exeThread(e);
		exeThread e8=new exeThread(e);
		exeThread e9=new exeThread(e);
		e1.start();
		e2.start();
		e3.start();
		e4.start();
		e5.start();
		e6.start();
		e7.start();
		e8.start();
		e9.start();
	}
}
class exe {
    //提取的数据存放到该目录下
	//为html转化为的TXT文件
    private static String savepath="C:/Users/54781/Desktop/爬虫文件2/";
    //等待爬取的url
    private static List<String> allwaiturl=new ArrayList<>();
    //记录爬取过的url
    private static Set<String> alloverurl=new HashSet<>();
    //记录所有url的深度进行爬取判断
    private static Map<String,Integer> allurldepth=new HashMap<>();
    //爬取的深度
    private static int maxdepth=10;
    public void firstGo()
    {

        String strurl="https://book.douban.com";
        
        workurl(strurl,1);
    }
    public boolean go()
    {
    	while(true)

最低0.47元/天解锁文章

流浪剑客孙

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
多线程爬虫爬取豆瓣标签和评论

利用多线程技术开九路爬虫对数据进行爬取package com;/** * 孙煜晗爬虫九倍速魔改版 * sunYuhan */import java.io.*;import java.net.*;import java.util.*;import java.util.regex.*;public class exe2{ static exe e1; public sta...
复制链接

扫一扫