一个download某些网站的所有某类歌曲的脚本;

前天一个朋友说她需要批量下载一些flash音效,我说那很简单啊,你直接使用flashget的批量下载即可,因为她使用电脑不方便,我自然而然的自告奋勇说帮她搞定,去那个网站瞅了瞅,确实不能使用flashget下载啊,因为它的所有的连接地址都是二级界面后的某一处特定地方有,这样,flashget根本不能够实现这个功能,我立马感觉到了问题的严重性,这个比我想象的要不容易啊;所以,赶紧到群里面发了问题,问这种问题如何解决,基本上没有人有比较好的办法,有人提到使用编程,我faint,从某种意义上来讲,用编程我还用问这个问题么?
于是我打算自己再linux下面写一个脚本来完成这个任务,没想到这个小小的决定,我这两天的休息时间就全没了,昨天搞到晚上2:30左右,今天到了现在才搞完,昨天是写完脚本,今天是调试完毕,并全部下载完,东西并不大,全部加起来经过rar压缩,不过80多M,可是原来的文件却又几千个,如果你手工下载的话,嘿嘿嘿嘿。。。
经过这次小风波,知道自己再shell脚本上知识太匮乏,以后要多多加油了,不论是不是当前紧要的,都要努力学习相关的各种东西,目的是拓展视野,而不是为了更好的工作待遇等等。。。
下面是我写的教本,写在这里,为那些写自动下载的人做个示范,这样即使你在重写,也只需花很少的功夫。。。

/*
* lsosa.BIT
* 2006 12.1
* linux shell script for downloading thousands of file from some specified web site;
* for backing up ;
*/
#! /bin/bash
# download the mp3 files of specified web sites;
WEB_SITE=http://www.unkown.com/123.shtml
ORIG_WEB_SITE=http://www.unkown.com/mp3/123
ORIG_DIR=http://www.unkown.com/mp3/
ORIG_FILE=123
# echo $1
# WEB_SITE=$1

topdir=$(pwd)
HTMLDIR=$(echo ${topdir}"/html_tmp")
MP3DIR=$(echo ${topdir}"/mp3")
mkdir -p ${HTMLDIR}
mkdir -p ${MP3DIR}

loop=0
seperator="_"
# WEB_SITE=""
suffix=".shtml"
tmpfile="tmp.shtml"
# get the next website with index loop
get_next_website(){
    # loop ++
    echo "get the next web site"
    loop=$(expr ${loop} + 1)
    #
    WEB_SITE=${ORIG_WEB_SITE}${seperator}${loop}${suffix}
    echo ${WEB_SITE}
}
# analysize the string like this:
#
# then make the corresponding directory and enter it then download the mp3 file in it;
position_line=""
download_mp3(){
    mp3file=""
    tmpdir=""
    mp3dir=""
    mp3filefulllink=""
    mp3reallink=""
    # echo ${position_line} | /
    # awk '{split($0, myarray, "/"")} '
    # display the line which including the filename and name of dir like
    #
    # split the string with seperator "/""
    echo ${position_line}
    mp3file=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
    tmpdir=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 7 ) print myarray[i] }}')
    #
    mp3dir=$(echo ${tmpdir} | awk '{print substr($1,5)}')
    echo "mp3dir is " ${mp3dir}
    #
    mp3filefulllink=${ORIG_DIR}${mp3file}
    # download the file
    wget ${mp3filefulllink}
    # change its encoding from utf8 to gb18030 for using grep
    echo "convert " ${mp3file} " to encoding gb18030 file " ${tmpfile}
    iconv -f utf8 -t gb18030 ${mp3file} -o ${tmpfile}
    # get the line which including mp3 file like : <li><a href=
    #
    mp3line=$(cat ${tmpfile} | grep "mp3" | grep "镜像")
    echo "mp3line is " ${mp3line}
    mp3reallink=$(echo ${mp3line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
    echo "The real link address is :"
    echo ${mp3reallink}
    # change the dir to $(pwd)/mp3
    cd ${MP3DIR}
    #
    mkdir -p ${mp3dir}
    #
    cd ${mp3dir}
    echo "downloading the file " ${mp3reallink} " ..."
    # download...
    wget ${mp3reallink}
    # change current dir to html dir;
    cd ${HTMLDIR}
}
# ======================= main start ============================
# deal with the first web site and download all mp3 files from it;
# cd $(pwd)/html
echo "main download thread start..."
cd ${HTMLDIR}
echo "current dir is " ${HTMLDIR}
#
echo "download file " ${WEB_SITE}
wget ${WEB_SITE}
filename=${ORIG_FILE}${suffix}
echo "current filename is " ${filename}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
echo "grep key words like target=_blank..."
grep "target=/"_blank/"/ alt=" ${tmpfile} | /
while read position_line
do
    # get the second class web site and to save it;
    echo "start to download the mp3 file from position_line"
    echo ${position_line}
    download_mp3        # call function - download_mp3()
done

# deal with all other files;
while true
do
    #
    #
    get_next_website
    cd ${HTMLDIR}
    echo "current dir is " ${HTMLDIR}
    echo "download file " ${WEB_SITE}
    wget ${WEB_SITE}
    #
    filename=${ORIG_FILE}${seperator}${loop}${suffix}
    echo "current filename is " ${filename}
    # change its encoding from utf8 to gb18030 for using grep
    echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
    iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
    echo "grep key words like target=_blank..."
    grep "target=/"_blank/"/ alt=" ${tmpfile} | /
    while read position_line
    do
        # get the second class web site and to save it;
        echo "start to download the mp3 file from position_line"
        echo ${position_line}
        download_mp3
    done
    if [ $? != 0 ]
    then
        # exit 1
        echo "exit..."
        exit 1
    fi
done

num=0
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值