html没有内容怎么爬,Url没有在网页中返回正确的html（对于我的Java爬虫）

最新推荐文章于 2022-11-20 00:36:41 发布

weixin_39897015

最新推荐文章于 2022-11-20 00:36:41 发布

阅读量223

点赞数

文章标签： html没有内容怎么爬

我想从网页上下载一些图像，为此我正在编写爬网程序。我测试了这个页面的几个抓取工具，但没有工作，因为我想。Url没有在网页中返回正确的html(对于我的Java爬虫)

第一步，我收集了770+相机型号(parent_url)的链接，然后我想收集每个链接中的图像(child_urls)。但是，该页面的组织方式使得child_urls返回与parent_url相同的html。

这里是我的代码，以收集相机链接：

public List html_compiler(String url, String exp, String atr){

List outs = new ArrayList();

try {

Document doc = Jsoup.connect(url).get();

Elements links = doc.select(exp);

for (Element link : links) {

outs.add(link.attr(atr));

System.out.println("\nlink : " + link.attr(atr));

}

} catch (IOException | SelectorParseException e) {

e.printStackTrace();

}

return outs;

}

有了这个代码，我收集的链接

String expCam = "tr[class='gallery cameras'] > td[class='title'] > a[href]";

String url = "https://www.dpreview.com/sample-galleries?category=cameras";

String atr = "href";

List cams = html_compiler(url, exp, atr); // This gives me the links of individual cameras

String exp2 = "some expression";

html_compiler(cams.get(0), exp2, "src"); // --> this should give me image links of the first

//camera but webpage returns same html as above

我怎样才能解决这个问题？我很想听听根据相机型号分类图像的其他页面。 (除Flickr之外)

编辑：例如在java中，以下两个链接给出了相同的html。

2016-08-16

smttsp

+0

你可以尝试使用'abs：href' attr？ [看看这里的例子](http://stackoverflow.com/a/14205979/1992780)。 –

+0

@DavidePastore都返回相同的结果，我不认为它是关于绝对链接。 –

+0

第二个链接似乎在加载图片的浏览器中触发了一些javascript。尝试使用浏览器中的调试工具打开这两个链接。 (Ctrl + Shift + Q在Firefox中)你必须找出图片链接是如何在页面源的某个地方创建的。 –

weixin_39897015

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
html没有内容怎么爬,Url没有在网页中返回正确的html（对于我的Java爬虫）

我想从网页上下载一些图像，为此我正在编写爬网程序。我测试了这个页面的几个抓取工具，但没有工作，因为我想。Url没有在网页中返回正确的html(对于我的Java爬虫)第一步，我收集了770+相机型号(parent_url)的链接，然后我想收集每个链接中的图像(child_urls)。但是，该页面的组织方式使得child_urls返回与parent_url相同的html。这里是我的代码，以收集相机链接...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。