前天一个朋友说她需要批量下载一些flash音效,我说那很简单啊,你直接使用flashget的批量下载即可,因为她使用电脑不方便,我自然而然的自告奋勇说帮她搞定,去那个网站瞅了瞅,确实不能使用flashget下载啊,因为它的所有的连接地址都是二级界面后的某一处特定地方有,这样,flashget根本不能够实现这个功能,我立马感觉到了问题的严重性,这个比我想象的要不容易啊;所以,赶紧到群里面发了问题,问这种问题如何解决,基本上没有人有比较好的办法,有人提到使用编程,我faint,从某种意义上来讲,用编程我还用问这个问题么?
于是我打算自己再linux下面写一个脚本来完成这个任务,没想到这个小小的决定,我这两天的休息时间就全没了,昨天搞到晚上2:30左右,今天到了现在才搞完,昨天是写完脚本,今天是调试完毕,并全部下载完,东西并不大,全部加起来经过rar压缩,不过80多M,可是原来的文件却又几千个,如果你手工下载的话,嘿嘿嘿嘿。。。
经过这次小风波,知道自己再shell脚本上知识太匮乏,以后要多多加油了,不论是不是当前紧要的,都要努力学习相关的各种东西,目的是拓展视野,而不是为了更好的工作待遇等等。。。
下面是我写的教本,写在这里,为那些写自动下载的人做个示范,这样即使你在重写,也只需花很少的功夫。。。
/*
* lsosa.BIT
* 2006 12.1
* linux shell script for downloading thousands of file from some specified web site;
* for backing up ;
*/
#! /bin/bash
# download the mp3 files of specified web sites;
WEB_SITE=http://www.unkown.com/123.shtml
ORIG_WEB_SITE=http://www.unkown.com/mp3/123
ORIG_DIR=http://www.unkown.com/mp3/
ORIG_FILE=123
# echo $1
# WEB_SITE=$1
topdir=$(pwd)
HTMLDIR=$(echo ${topdir}"/html_tmp")
MP3DIR=$(echo ${topdir}"/mp3")
mkdir -p ${HTMLDIR}
mkdir -p ${MP3DIR}
loop=0
seperator="_"
# WEB_SITE=""
suffix=".shtml"
tmpfile="tmp.shtml"
# get the next website with index loop
get_next_website(){
# loop ++
echo "get the next web site"
loop=$(expr ${loop} + 1)
#
WEB_SITE=${ORIG_WEB_SITE}${seperator}${loop}${suffix}
echo ${WEB_SITE}
}
# analysize the string like this:
#
# then make the corresponding directory and enter it then download the mp3 file in it;
position_line=""
download_mp3(){
mp3file=""
tmpdir=""
mp3dir=""
mp3filefulllink=""
mp3reallink=""
# echo ${position_line} | /
# awk '{split($0, myarray, "/"")} '
# display the line which including the filename and name of dir like
#
# split the string with seperator "/""
echo ${position_line}
mp3file=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
tmpdir=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 7 ) print myarray[i] }}')
#
mp3dir=$(echo ${tmpdir} | awk '{print substr($1,5)}')
echo "mp3dir is " ${mp3dir}
#
mp3filefulllink=${ORIG_DIR}${mp3file}
# download the file
wget ${mp3filefulllink}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${mp3file} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${mp3file} -o ${tmpfile}
# get the line which including mp3 file like : <li><a href=
#
mp3line=$(cat ${tmpfile} | grep "mp3" | grep "镜像")
echo "mp3line is " ${mp3line}
mp3reallink=$(echo ${mp3line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
echo "The real link address is :"
echo ${mp3reallink}
# change the dir to $(pwd)/mp3
cd ${MP3DIR}
#
mkdir -p ${mp3dir}
#
cd ${mp3dir}
echo "downloading the file " ${mp3reallink} " ..."
# download...
wget ${mp3reallink}
# change current dir to html dir;
cd ${HTMLDIR}
}
# ======================= main start ============================
# deal with the first web site and download all mp3 files from it;
# cd $(pwd)/html
echo "main download thread start..."
cd ${HTMLDIR}
echo "current dir is " ${HTMLDIR}
#
echo "download file " ${WEB_SITE}
wget ${WEB_SITE}
filename=${ORIG_FILE}${suffix}
echo "current filename is " ${filename}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
echo "grep key words like target=_blank..."
grep "target=/"_blank/"/ alt=" ${tmpfile} | /
while read position_line
do
# get the second class web site and to save it;
echo "start to download the mp3 file from position_line"
echo ${position_line}
download_mp3 # call function - download_mp3()
done
# deal with all other files;
while true
do
#
#
get_next_website
cd ${HTMLDIR}
echo "current dir is " ${HTMLDIR}
echo "download file " ${WEB_SITE}
wget ${WEB_SITE}
#
filename=${ORIG_FILE}${seperator}${loop}${suffix}
echo "current filename is " ${filename}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
echo "grep key words like target=_blank..."
grep "target=/"_blank/"/ alt=" ${tmpfile} | /
while read position_line
do
# get the second class web site and to save it;
echo "start to download the mp3 file from position_line"
echo ${position_line}
download_mp3
done
if [ $? != 0 ]
then
# exit 1
echo "exit..."
exit 1
fi
done
num=0
于是我打算自己再linux下面写一个脚本来完成这个任务,没想到这个小小的决定,我这两天的休息时间就全没了,昨天搞到晚上2:30左右,今天到了现在才搞完,昨天是写完脚本,今天是调试完毕,并全部下载完,东西并不大,全部加起来经过rar压缩,不过80多M,可是原来的文件却又几千个,如果你手工下载的话,嘿嘿嘿嘿。。。
经过这次小风波,知道自己再shell脚本上知识太匮乏,以后要多多加油了,不论是不是当前紧要的,都要努力学习相关的各种东西,目的是拓展视野,而不是为了更好的工作待遇等等。。。
下面是我写的教本,写在这里,为那些写自动下载的人做个示范,这样即使你在重写,也只需花很少的功夫。。。
/*
* lsosa.BIT
* 2006 12.1
* linux shell script for downloading thousands of file from some specified web site;
* for backing up ;
*/
#! /bin/bash
# download the mp3 files of specified web sites;
WEB_SITE=http://www.unkown.com/123.shtml
ORIG_WEB_SITE=http://www.unkown.com/mp3/123
ORIG_DIR=http://www.unkown.com/mp3/
ORIG_FILE=123
# echo $1
# WEB_SITE=$1
topdir=$(pwd)
HTMLDIR=$(echo ${topdir}"/html_tmp")
MP3DIR=$(echo ${topdir}"/mp3")
mkdir -p ${HTMLDIR}
mkdir -p ${MP3DIR}
loop=0
seperator="_"
# WEB_SITE=""
suffix=".shtml"
tmpfile="tmp.shtml"
# get the next website with index loop
get_next_website(){
# loop ++
echo "get the next web site"
loop=$(expr ${loop} + 1)
#
WEB_SITE=${ORIG_WEB_SITE}${seperator}${loop}${suffix}
echo ${WEB_SITE}
}
# analysize the string like this:
#
# then make the corresponding directory and enter it then download the mp3 file in it;
position_line=""
download_mp3(){
mp3file=""
tmpdir=""
mp3dir=""
mp3filefulllink=""
mp3reallink=""
# echo ${position_line} | /
# awk '{split($0, myarray, "/"")} '
# display the line which including the filename and name of dir like
#
# split the string with seperator "/""
echo ${position_line}
mp3file=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
tmpdir=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 7 ) print myarray[i] }}')
#
mp3dir=$(echo ${tmpdir} | awk '{print substr($1,5)}')
echo "mp3dir is " ${mp3dir}
#
mp3filefulllink=${ORIG_DIR}${mp3file}
# download the file
wget ${mp3filefulllink}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${mp3file} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${mp3file} -o ${tmpfile}
# get the line which including mp3 file like : <li><a href=
#
mp3line=$(cat ${tmpfile} | grep "mp3" | grep "镜像")
echo "mp3line is " ${mp3line}
mp3reallink=$(echo ${mp3line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
echo "The real link address is :"
echo ${mp3reallink}
# change the dir to $(pwd)/mp3
cd ${MP3DIR}
#
mkdir -p ${mp3dir}
#
cd ${mp3dir}
echo "downloading the file " ${mp3reallink} " ..."
# download...
wget ${mp3reallink}
# change current dir to html dir;
cd ${HTMLDIR}
}
# ======================= main start ============================
# deal with the first web site and download all mp3 files from it;
# cd $(pwd)/html
echo "main download thread start..."
cd ${HTMLDIR}
echo "current dir is " ${HTMLDIR}
#
echo "download file " ${WEB_SITE}
wget ${WEB_SITE}
filename=${ORIG_FILE}${suffix}
echo "current filename is " ${filename}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
echo "grep key words like target=_blank..."
grep "target=/"_blank/"/ alt=" ${tmpfile} | /
while read position_line
do
# get the second class web site and to save it;
echo "start to download the mp3 file from position_line"
echo ${position_line}
download_mp3 # call function - download_mp3()
done
# deal with all other files;
while true
do
#
#
get_next_website
cd ${HTMLDIR}
echo "current dir is " ${HTMLDIR}
echo "download file " ${WEB_SITE}
wget ${WEB_SITE}
#
filename=${ORIG_FILE}${seperator}${loop}${suffix}
echo "current filename is " ${filename}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
echo "grep key words like target=_blank..."
grep "target=/"_blank/"/ alt=" ${tmpfile} | /
while read position_line
do
# get the second class web site and to save it;
echo "start to download the mp3 file from position_line"
echo ${position_line}
download_mp3
done
if [ $? != 0 ]
then
# exit 1
echo "exit..."
exit 1
fi
done
num=0