一个download某些网站的所有某类歌曲的脚本；

最新推荐文章于 2023-11-17 09:15:00 发布

baymoon

最新推荐文章于 2023-11-17 09:15:00 发布

阅读量1k

点赞数

文章标签： download 脚本 file encoding web website

本文链接：https://blog.csdn.net/baymoon/article/details/1425786

版权

前天一个朋友说她需要批量下载一些flash音效，我说那很简单啊，你直接使用flashget的批量下载即可，因为她使用电脑不方便，我自然而然的自告奋勇说帮她搞定，去那个网站瞅了瞅，确实不能使用flashget下载啊，因为它的所有的连接地址都是二级界面后的某一处特定地方有，这样，flashget根本不能够实现这个功能，我立马感觉到了问题的严重性，这个比我想象的要不容易啊；所以，赶紧到群里面发了问题，问这种问题如何解决，基本上没有人有比较好的办法，有人提到使用编程，我faint，从某种意义上来讲，用编程我还用问这个问题么？
于是我打算自己再linux下面写一个脚本来完成这个任务，没想到这个小小的决定，我这两天的休息时间就全没了，昨天搞到晚上2：30左右，今天到了现在才搞完，昨天是写完脚本，今天是调试完毕，并全部下载完，东西并不大，全部加起来经过rar压缩，不过80多M，可是原来的文件却又几千个，如果你手工下载的话，嘿嘿嘿嘿。。。
经过这次小风波，知道自己再shell脚本上知识太匮乏，以后要多多加油了，不论是不是当前紧要的，都要努力学习相关的各种东西，目的是拓展视野，而不是为了更好的工作待遇等等。。。
下面是我写的教本，写在这里，为那些写自动下载的人做个示范，这样即使你在重写，也只需花很少的功夫。。。

/*
* lsosa.BIT
* 2006 12.1
* linux shell script for downloading thousands of file from some specified web site;
* for backing up ;
*/
#! /bin/bash
# download the mp3 files of specified web sites;
WEB_SITE=http://www.unkown.com/123.shtml
ORIG_WEB_SITE=http://www.unkown.com/mp3/123
ORIG_DIR=http://www.unkown.com/mp3/
ORIG_FILE=123
# echo $1
# WEB_SITE=$1

topdir=$(pwd)
HTMLDIR=$(echo ${topdir}"/html_tmp")
MP3DIR=$(echo ${topdir}"/mp3")
mkdir -p ${HTMLDIR}
mkdir -p ${MP3DIR}

loop=0
seperator="_"
# WEB_SITE=""
suffix=".shtml"
tmpfile="tmp.shtml"
# get the next website with index loop
get_next_website(){
    # loop ++
    echo "get the next web site"
    loop=$(expr ${loop} + 1)
    #
    WEB_SITE=${ORIG_WEB_SITE}${seperator}${loop}${suffix}
    echo ${WEB_SITE}
}
# analysize the string like this:
#
# then make the corresponding directory and enter it then download the mp3 file in it;
position_line=""
download_mp3(){
    mp3file=""
    tmpdir=""
    mp3dir=""
    mp3filefulllink=""
    mp3reallink=""
    # echo ${position_line} | /
    # awk '{split($0, myarray, "/"")} '
    # display the line which including the filename and name of dir like
    #
    # split the string with seperator "/""
    echo ${position_line}
    mp3file=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
    tmpdir=$(echo ${position_line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 7 ) print myarray[i] }}')
    #
    mp3dir=$(echo ${tmpdir} | awk '{print substr($1,5)}')
    echo "mp3dir is " ${mp3dir}
    #
    mp3filefulllink=${ORIG_DIR}${mp3file}
    # download the file
    wget ${mp3filefulllink}
    # change its encoding from utf8 to gb18030 for using grep
    echo "convert " ${mp3file} " to encoding gb18030 file " ${tmpfile}
    iconv -f utf8 -t gb18030 ${mp3file} -o ${tmpfile}
    # get the line which including mp3 file like : <li><a href=
    #
    mp3line=$(cat ${tmpfile} | grep "mp3" | grep "镜像")
    echo "mp3line is " ${mp3line}
    mp3reallink=$(echo ${mp3line} | awk '{split($0, myarray, "/"")} {for (i in myarray) {if ( i == 2 ) print myarray[i] }}')
    echo "The real link address is :"
    echo ${mp3reallink}
    # change the dir to $(pwd)/mp3
    cd ${MP3DIR}
    #
    mkdir -p ${mp3dir}
    #
    cd ${mp3dir}
    echo "downloading the file " ${mp3reallink} " ..."
    # download...
    wget ${mp3reallink}
    # change current dir to html dir;
    cd ${HTMLDIR}
}
# ======================= main start ============================
# deal with the first web site and download all mp3 files from it;
# cd $(pwd)/html
echo "main download thread start..."
cd ${HTMLDIR}
echo "current dir is " ${HTMLDIR}
#
echo "download file " ${WEB_SITE}
wget ${WEB_SITE}
filename=${ORIG_FILE}${suffix}
echo "current filename is " ${filename}
# change its encoding from utf8 to gb18030 for using grep
echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
echo "grep key words like target=_blank..."
grep "target=/"_blank/"/ alt=" ${tmpfile} | /
while read position_line
do
    # get the second class web site and to save it;
    echo "start to download the mp3 file from position_line"
    echo ${position_line}
    download_mp3        # call function - download_mp3()
done

# deal with all other files;
while true
do
    #
    #
    get_next_website
    cd ${HTMLDIR}
    echo "current dir is " ${HTMLDIR}
    echo "download file " ${WEB_SITE}
    wget ${WEB_SITE}
    #
    filename=${ORIG_FILE}${seperator}${loop}${suffix}
    echo "current filename is " ${filename}
    # change its encoding from utf8 to gb18030 for using grep
    echo "convert " ${filename} " to encoding gb18030 file " ${tmpfile}
    iconv -f utf8 -t gb18030 ${filename} -o ${tmpfile}
    echo "grep key words like target=_blank..."
    grep "target=/"_blank/"/ alt=" ${tmpfile} | /
    while read position_line
    do
        # get the second class web site and to save it;
        echo "start to download the mp3 file from position_line"
        echo ${position_line}
        download_mp3
    done
    if [ $? != 0 ]
    then
        # exit 1
        echo "exit..."
        exit 1
    fi
done

num=0