linux 获取网址页面内容,Linux:终端获取页面内容 (curl,lynx,wget,w3m) Bash: Display Web Page Content In Terminal...

How can I fetch HTML web page content from bash and display on screen using shell utilities?

You can use any one of the following tool or combination of all of them to get the contents of a webpage in a shell:

[1] curl command - It is a tool to transfer data from or to a server using http/https/ftp and much more.

[2] lynx command - It is a fully-featured World Wide Web (WWW) client/browser for users running terminals.

[3] wget command - It is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

[4] w3m command - It is a text based Web browser and pager.

Installation

Above tools may not be installed on your Linux or Unix like operating systems.

Note: You need to login as root user to install required tools.

Debian / Ubuntu Linux install curl, wget, lynx, and w3m

Open a terminal and and then type:

$ sudo apt-get install curl wget lynx w3m

Fedora / RHEL / CentOS Linux install curl, wget, lynx, and w3m

Open a terminal and and then type:

$ sudo yum install curl wget lynx w3m

FreeBSD Unix install curl, wget, lynx, and w3m (binary package)

Open a terminal and and then type:

$ sudo pkg_add -v -r curl lynx w3m wget

Examples

You can use curl command to download the page:

curl http://www.cyberciti.biz/

curl http://www.cyberciti.biz/faq/bash-for-loop/

Use curl and store output into a variable as follows:

page="$(curl http://www.cyberciti.biz/)"

page="$(curl http://www.cyberciti.biz/faq/bash-for-loop/)"

echo "$page"

printf "%s" $page

lynx command examples

Use the lynx command as follows:

lynx -dump www.cyberciti.biz

lynx -dump www.cyberciti.biz/faq/bash-for-loop/

The -dump option dumps the formatted output of the default document or those specified on the command line to standard output. Unlike interactive mode, all documents are processed.

wget command examples

The syntax is as follows:

wget -O - http://www.cyberciti.biz

wget -O - http://www.cyberciti.biz/faq/bash-for-loop/

OR use wget command to grab the page and store it into a variable called page:

page="$(wget -O - http://www.cyberciti.biz)"

## display page ##

echo "$page"

## or pass it to lynx / w3m ##

echo "$page" | w3m -dump -T text/html

echo "$page" | lynx -dump -stdin

w3m command examples

The syntax is as follows to dump web page content in terminal using the w3m command:

w3m -dump http://www.cyberciti.biz/

w3m -dump http://www.cyberciti.biz/faq/bash-for-loop/

OR use w3m command to grab the page and store it into a variable called page:

page="$(w3m -dump http://www.cyberciti.biz/)"

echo "$page"

Practical examples

Get the definition of linux from a dictionary:

$ curl dict://dict.org/d:linux

Sample outputs:

220 pan.alephnull.com dictd 1.12.0/rf on Linux 3.0.0-14-server <21853866.27331.1375614736@pan.alephnull.com>

250 ok

150 1 definitions retrieved

151 "linux" wn "WordNet (r) 3.0 (2006)"

Linux

n 1: an open-source version of the UNIX operating system

.

250 ok [d/m/c = 1/0/30; 0.000r 0.000u 0.000s]

221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]

Backup your del.icio.us bookmarks:

$ wget --user=Your-Username-Here --password=Your-Password-Here https://api.del.icio.us/v1/posts/all -O my-old-bookmarks.xml

$ more my-old-bookmarks.xml

Grab all .mp3s from url:

mp3=$(lynx -dump http://server1.cyberciti.biz/media/index.html | grep 'http://' | awk '/mp3/{print $2}')

for i in $mp3

wget $i

done

See also

See man pages for more info - curl(1), w3m(1), lynx(1), wget(1)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值