shell抓取百度和有道词典

GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

百度词典:

#!/bin/sh

# usage: ./dict_badu.sh word

tmpfile=/tmp/dict_baidu.txt
curl -s http://dict.baidu.com/s?wd=$1 > $tmpfile

if [ $(ls -l $tmpfile | awk '{printf $5}') = "0" ];then
	echo "error: curl failed"
	exit
fi

grep "<strong>$1" $tmpfile | tr ' ' '#' | sed -e 's/\[/\ \[/g' -e 's/\]/\]\ /g' | cut -d ' ' -f 2,4,6,8
grep "explain:" $tmpfile | cut -d '"' -f 2 | sed "s/<br\ \/>/\n/g"

有道词典:

#!/bin/sh

# usage: ./dict_youdao.sh word

tmpf=/tmp/dict_youdao.txt

# get web page sourcecode
curl -s "http://dict.youdao.com/search?q=$1" > $tmpf

if [ $(ls -l $tmpf | awk '{printf $5}') = "0" ];then
	echo "error: curl failed"
	exit
fi

# set start/end line number of translation area in sourcecode
div_start_line=$(grep -n '<div id="phrsListTab" class="trans-wrapper clearfix">' $tmpf | cut -d ':' -f1)
div_end_line=$(grep -n '<div id="webTrans" class="trans-wrapper trans-tab">' $tmpf | cut -d ':' -f1)

# cutoff useless infomation in sourcecode
sed -e "1,${div_start_line:=1}d" -e "${div_end_line:=1},\$d" -i $tmpf	# strange here, need to add a '\' before the $d

# show the result
grep -o ">\[.*\]<" $tmpf | sed -r 's/>|<//g' | xargs echo "br/us: "
grep -o "<li>.*</li>" $tmpf | sed 's/<.\{,1\}li>//g'

效果:


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值