java图片爬虫脚本_代码下载

 这篇文章主要介绍了,如何用纯java写一个图片爬虫,美女图片爬虫代码分享,本文以采集抓取美女图片为例,需要的朋友可以参考下继续鼓捣爬虫,你懂的!

1、需要用到核心架包

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.5.13</version>
</dependency>

2.程序入口(取个好名叫index)

import java.io.InputStream;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
public class index {
	
	private static final int page = 1264;
	public static void main(String[] args) {
		//HttpClient 超时配置
		RequestConfig Config = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD).setConnectionRequestTimeout(6000).setConnectTimeout(6000).build();
		CloseableHttpClient httpClient = HttpClients.custom().setDefaultRequestConfig(Config).build();
		System.out.println("开始lol....");
		for (int i = page; i > 0; i--) {
			HttpPost httpPost = new HttpPost("http://www.jf258.com/nansheng/"+ i+"1.html"); //需要爬的网站
			httpPost.addHeader("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");//伪装一个浏览器
			try {
				CloseableHttpResponse response = httpClient.execute(httpPost);//开始
				InputStream ism = response.getEntity().getContent();
				String context = Utils.convertStreamToString(ism);
				new Thread(new CheDHtmlParser(context, i)).start();
			} catch (Exception e) {
				e.printStackTrace();
			}
		}
	}
}</strong></span>

2、请求页面分析

import java.util.List;

public class CheDHtmlParser implements Runnable {
	private String html;
	private int page;
	
	public CheDHtmlParser(String html,int page) {
		this.html = html;
		this.page = page;
	}
	@Override
	public void run() {
		List<String> list = new ArrayList<String>();
		html = html.substring(html.indexOf("list"));
			String[] ss = html.split("li>");
			for (String s : ss) {
				if (s.indexOf("<img src=") > 0) {
					try{
						int i = s.indexOf("<img src=\"") + "<img src=\"".length();
						list.add(s.substring(i, s.indexOf("\"", i + 1)));
					}catch (Exception e) {
						System.out.println(s);
					}
				}
			}
		
		for(String imageUrl : list){
				new Thread(new CheDImageCreator(imageUrl,page)).start();
		}
	}
}
</strong></span>

3、文件流部分(主要是将分析好的请求页面,做一个字符转换,方便后面截取图片的路径)

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
public class Utils {

	public static String convertStreamToString(InputStream in) {
		BufferedReader reader = new BufferedReader(new InputStreamReader(in));
		StringBuilder sb = new StringBuilder();
		String line = null;
		try {
			while ((line = reader.readLine()) != null) {
				sb.append(line + "/n");
			}
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				in.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return sb.toString();

	}
}
</strong></span>

4、图片创建

import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URL;
import java.net.URLConnection;

public class CheDImageCreator implements Runnable {
	private static int count = 0;
	private String imageUrl;
	private int page;
	 //存储路径
	private static final String basePath = "D:/tupian"; 
	public CheDImageCreator(String imageUrl,int page) {
		this.imageUrl = imageUrl;
		this.page = page;
	}
	@Override
	public void run() {
		File dir = new File(basePath);
		if(!dir.exists()){
			dir.mkdirs();
			
		}
		String imageName = imageUrl.substring(imageUrl.lastIndexOf("/")+1);//获取图片名字
		try {
			File file = new File( basePath+"/"+page+"--"+imageName);//拼接
			OutputStream os = new FileOutputStream(file);
			    //创建一个url对象
			    String u="http://www.jf258.com"+imageUrl;
			    URL uri = new URL(u);  
		        URLConnection connection = uri.openConnection();  
		        connection.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");//伪装成一个浏览器
		        InputStream is = connection.getInputStream();  //开始一个流

			byte[] buff = new byte[1024];
			while(true) {
				int readed = is.read(buff);
				if(readed == -1) {
					break;
				}
				byte[] temp = new byte[readed];
				System.arraycopy(buff, 0, temp, 0, readed);
				//写入文件
				os.write(temp);
			}
			System.out.println("第"+(count++)+"张:"+file.getAbsolutePath());
			is.close(); 
            os.close();
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}
</strong></span>

5、最后的效果就是这样的了。

6、源码下载地址(亲测有效,不行就来砍我

https://download.csdn.net/download/zhaoxiangpeng16/21159505

  • 3
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 8
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 8
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值