Java,python爬虫获取IP所在地区代码对比(附完整代码)学习交流

Java代码示例 

package javaTest2077;//记得修改自己的package
import java.io.BufferedReader;//accelate the speed of reading
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.*;
import java.util.IllegalFormatCodePointException;
import java.util.regex.*;
public class Request {
    public static String getAddressByIp(String ip) {
		if (ip == null || ip.equals("")) {
			return null;
		}
		String httpUrl ="https://www.ip138.com/iplookup.asp";
		BufferedReader reader = null;
		String result = null;
		StringBuffer sbf = new StringBuffer();
		String thisUrl = httpUrl + "?ip=" + ip+"&action=2";//
		System.out.println(thisUrl);
		try {
			URL url = new URL(thisUrl);
			HttpURLConnection connection = (HttpURLConnection) url.openConnection();
			//setting request header
			connection.setRequestMethod("GET");			
			connection.setRequestProperty("Referer", thisUrl);
			connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36 SLBrowser/7.0.0.12151 SLBChan/30");
			connection.connect();//try to connect
			InputStream is = connection.getInputStream();	
			reader = new BufferedReader(new InputStreamReader(is, "gbk"));
			String strRead = null;
			while ((strRead = reader.readLine()) != null) {
				sbf.append(strRead);
				sbf.append("\r\n");
			}
			reader.close();
			result = sbf.toString();
			String patternString = "(?s)\"ASN归属地\".*?:(\\\".*?\\\")";
			Pattern pattern = Pattern.compile(patternString);
			Matcher matcher = pattern.matcher(result);
			while (matcher.find()) {
			System.out.println(matcher.group());
			};
			return matcher.group();
		} catch (Exception e) {
			System.out.println("获取IP地址失败");
		}
		return null;}

    public static void main(String[] args) throws Exception {       	           
             Request request = new Request();
             request.getAddressByIp("137.172.142.47");

};
};

 Python代码示例

import re,requests
ip = "223.172.142.47"
url = f"https://www.ip138.com/iplookup.asp?ip={ip}&action=2"
header = {
          'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.152 Safari/537.36',
          'Referer': url
          }
result = requests.get(url=url,headers=header)
text=result.content.decode('gbk')
try:
    print(re.findall('"ASN归属地":"(.*?)"',text))
except:
    print('获取ip所在地区失败!')

两者都是用正则表达式提取数据

你们觉得哪种写爬虫比较好呢?

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

布语world

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值