pup：终端中的HTML处理器

最新推荐文章于 2024-11-08 11:23:08 发布

幸竹任

最新推荐文章于 2024-11-08 11:23:08 发布

阅读量500

点赞数 3

本文链接：https://blog.csdn.net/gitblog_00016/article/details/138699227

版权

pup 是一个命令行工具，专为处理HTML而设计。它从标准输入读取，向标准输出打印，并允许用户通过CSS选择器来筛选页面的部分内容。灵感来源于jq，pup致力于提供一种快速且灵活的方式来在终端中探索HTML。

你可以直接从pup的最新发布页下载。如果你的电脑上已经安装了Go，只需运行以下命令：

go get github.com/ericchiang/pup

对于OS X用户，可以通过Homebrew无须Go环境进行安装：

brew install https://raw.githubusercontent.com/EricChiang/pup/master/pup.rb

让我们用几个例子来看看pup如何工作：

$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a'

上面的命令将过滤Hacker News首页的故事标题。

$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a attr{href}'

这个命令会提取这些故事链接的URL。

$ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a json{}'

最后，这个命令将获取链接文本和它们的属性一起以JSON格式输出。

$ cat index.html | pup [flags] '[selectors] [display function]'

简单来说，就是输入你的HTML文件，然后添加CSS选择器和显示函数。

pup支持广泛的CSS选择器，包括标签、ID、属性、伪类等。它还提供了text{}、attr{}和json{}等显示函数，方便你以文本、属性或JSON格式获取选中元素的内容。

总而言之，无论你是开发者还是数据分析师，pup都是你探索和处理HTML的强大助手，值得加入到你的工具箱中。现在就试试看吧！