每天定时下载mp3并移动昨日旧文件

最新推荐文章于 2024-09-13 22:33:55 发布

luotuo512

最新推荐文章于 2024-09-13 22:33:55 发布

阅读量995

点赞数

文章标签： url download 正则表达式 command import exe

本文链接：https://blog.csdn.net/luotuo512/article/details/4517229

版权

目标：在网页中寻找当日voa special engelish的mp3，并下载。
要点分析：
首先得到当日日期，因为在mp3的url中需要。url的例子如http://njtelecom.unsv.com/archives/voanews/specialenglish/2009/09/03/0041/%7B1d33143e-ccc8-41a1-8e02-d9d891f88b0b%7D/special200909030041.mp3。但是在网页中不能找到上述网址，使用正则表达式匹配出来的url是
url='http://njtelecom.unsv.com/archives/voanews/specialenglish/'+time.strftime('%Y/%m/%d',time.localtime())+'//d{4}//{.{30,50}/}//w*'+time.strftime('%Y%m%d',time.localtime())+'/d{4}.mp3'。
代码：

#目标：在网页中寻找当日voa special engelish的mp3，并下载。
#分析：
#首先得到当日日期，因为在mp3的url中需要。
import urllib2,time,re
import os
from os import listdir
from os.path import isdir

def test(url):
    print 'Begin download %s'%url[-23:]
    f = open('e:/xdj/voa//new/%s'%url[-23:], 'wb')
    data = urllib2.urlopen(url)
    f.write(data.read())
    print 'Download OK!'
    f.close()
def downloadnew():
    n=0
    url='http://njtelecom.unsv.com/archives/voanews/specialenglish/'+time.strftime('%Y/%m/%d',time.localtime())+'//d{4}//{.{30,50}/}//w*'+time.strftime('%Y%m%d',time.localtime())+'/d{4}.mp3'
    sock=urllib2.urlopen("http://www.unsv.com/learning-english/")
    source=sock.read()
    namepattern=re.compile(url)
    link=namepattern.findall(source)
    link=list(set(link))
    for i in link:
        test(i)
        n=n+1
    print 'totally download ',n,' new files'

def moveoldfile():
    k=0
    source = 'e://xdj//voa//new'
    target_dir = 'e://xdj//voa'
    filelist=listdir(source)
    print 'move old files:/n',listdir(source)
    for name in filelist :
        srcFilename = source + '//' + name
        srcFilename = '"' + srcFilename + '"'
#        desFilename = target_dir + '//' + now + '_' + name
        desFilename = '"' + target_dir + '"'
#        print
        copy_command = "move /Y %s %s" % (srcFilename, desFilename)
#        print copy_command
        if os.system(copy_command) == 0:
            k = k + 1
#            print 'Successful backup to copy from', srcFilename, 'to' ,desFilename
        else:
            print 'Fail to copy', srcFilename, 'to', desFilename
    print 'total move', k, 'files'

if __name__== '__main__':
#    url='http://njtelecom.unsv.com/archives/voanews/specialenglish/'+time.strftime('%Y/%m/%d',time.localtime())+'//d{4}//{[0-9a-zA-Z-]{38}/}7D/special'+time.strftime('%Y%m%d',time.localtime())+'/d{4}.mp3'
    moveoldfile()
    downloadnew()

遗留问题：如何获得文件的大小，以便下载的时候显示进度。
参考资料：
1. windowsxp计划任务设置http://www.docin.com/p-18820441.html
2. py2exe使用：http://www.jb51.net/article/9296.htm
3. 文件移动：http://www.blogjava.net/daning/archive/2008/01/11/113764.html
4. 字符串格式化：http://www.tsnc.edu.cn/default/tsnc_wgrj/doc/pythonhtml/html/native_data_types/formatting_strings.html
5. 正则表达式参考文档：http://www.regexlab.com/zh/regref.htm
6. date使用：http://blog.csdn.net/suiyunonghen/archive/2009/03/18/3999986.aspx和http://blog.alexa-pro.cn/?p=214和http://python.kgblog.net/2009/08/19/python-date-time.html
7. python文件下载：http://topic.csdn.net/u/20090707/15/f48d1118-fa3a-4b36-bb16-58af47b353ca.html