java爬取button_java 爬虫

最新推荐文章于 2024-04-30 20:57:15 发布

杜耶

最新推荐文章于 2024-04-30 20:57:15 发布

阅读量306

点赞数

文章标签： java爬取button

本文链接：https://blog.csdn.net/weixin_42475535/article/details/114658311

版权

狗比网络，，，新版的linux上的eclipse还在下载所以来记录一笔。idea界面很不熟悉就放弃。我不造java爬虫框架有什么，先自己满足功能需求再看爬虫框架好用不好用吧。

爬虫就是你爬取一个页面然后找到你想要的数据信息。想起当初愚蠢的我觉得爬虫就是内容。我的愚蠢一直没有变化啊。

爬取页面=下载页面源代码

涉及知识点。java下载某个url的页面源码是如下步骤：

/*建立http连接*/

HttpURLConnection connection = null ；

// 创建远程url连接对象 URL url = new URL(httpurl);//httpurl为你要下载的url // 通过远程url连接对象打开一个连接，强转成httpURLConnection类 connection = (HttpURLConnection) url.openConnection();

// 设置连接方式：get connection.setRequestMethod("GET");

// 设置连接主机服务器的超时时间：15000毫秒 // 发送请求 connection.connect();

/* 获取请求内容就是你要下载的url里面的内容 */

InputStream is = null;//输入流 if (connection.getResponseCode() == 200) {

is=connection.getInputStream();}

2.Java iO流。

这又是个恶心的东西，每年来恶心我几次。

1.输入输出流

2.抽象类包装类

3.什么是读什么的写

了解一下。一般这几个问题明白了，就可了。恶心的东西。

3.暂存下代码。。。eclipse 丢github总是很麻烦，为什么不能和c一样简单呢。

这部分就是首先得到url然后根据id class获取自己想要的。

丢几行关键代码。4点再来写。

package hdfs;

import java.io.File;

import java.io.IOException;

import java.util.ArrayList;

import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.jsoup.nodes.Element;

import org.jsoup.select.Elements;

public class Testjsoup {

public static void main(String[] args) throws IOException {

// TODO Auto-generated method stub

//File input=new File("./copy.html");

//Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

String url = "https://www.zhihu.com/people/liu-cheng-51-70/followers?page=1";

Document doc1=Jsoup.connect(url).get();

//System.out.println(doc1.text());

Element e=doc1.select("[class=Button PaginationButton PaginationButton--current Button--plain]").first();

// System.out.println(e.text());

/*for(int i=1;i

{

Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Elements links = doc.select("[class=List-item]");

for(Element l:links)

{

Elements e2=l.select("[class=UserLink-link]");

String href=e2.attr("href");

System.out.println("href:"+href+"\n");//获取关注者的url

href="http:"+href;

message(href);

}

public static void message(String href) throws IOException

{

Document doc1=Jsoup.connect(href).get();

/***************************获取关注者的名字********************************/

Element name=doc1.select("[class=ProfileHeader-name]").first();

System.out.println(name.text());

/***************************获取关注者的性别******************************/

String e3=doc1.select("meta[itemprop=gender]").get(0).attr("content");

System.out.println(e3);

/**************************获取关注者的最后使用时间***************************/

Element el=doc1.getElementsByClass("List-item").first();

Element e2=el.select("[class=ActivityItem-meta]").first();

Element e4=e2.select("span").last();

System.out.println(e4.text());

/*****************************获取关注者的职业*********************************/

Element e = doc1.getElementsByClass("ProfileHeader-contentBody").first();//当根据name之类,取多个元素的时候

Element career=e.select("[class=ProfileHeader-infoItem]").first();

try {

String c=career.text();

String ret = null;

if(c.indexOf(" ")!=-1)

{

ret=c.substring(0,c.indexOf(" "));

}

System.out.println(ret);//职业工作地点

} catch (Exception ex) {

System.out.println("无业游民");

}

杜耶

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java爬取button_java 爬虫

狗比网络，，，新版的linux上的eclipse还在下载所以来记录一笔。idea界面很不熟悉就放弃。我不造java爬虫框架有什么，先自己满足功能需求再看爬虫框架好用不好用吧。爬虫就是你爬取一个页面然后找到你想要的数据信息。想起当初愚蠢的我觉得爬虫就是内容。我的愚蠢一直没有变化啊。爬取页面=下载页面源代码涉及知识点。java下载某个url的页面源码是如下步骤：/*建立http连接*/HttpURLC...
复制链接

扫一扫