- 目标:在网页中寻找当日voa special engelish的mp3,并下载。
- 要点分析:
首先得到当日日期,因为在mp3的url中需要。url的例子如http://njtelecom.unsv.com/archives/voanews/specialenglish/2009/09/03/0041/%7B1d33143e-ccc8-41a1-8e02-d9d891f88b0b%7D/special200909030041.mp3。但是在网页中不能找到上述网址,使用正则表达式匹配出来的url是 - 代码:
#目标:在网页中寻找当日voa special engelish的mp3,并下载。
#分析:
#首先得到当日日期,因为在mp3的url中需要。
import urllib2,time,re
import os
from os import listdir
from os.path import isdirdef test(url):
print 'Begin download %s'%url[-23:]
f = open('e:/xdj/voa//new/%s'%url[-23:], 'wb')
data = urllib2.urlopen(url)
f.write(data.read())
print 'Download OK!'
f.close()
def downloadnew():
n=0
url='http://njtelecom.unsv.com/archives/voanews/specialenglish/'+time.strftime('%Y/%m/%d',time.localtime())+'//d{4}//{.{30,50}/}//w*'+time.strftime('%Y%m%d',time.localtime())+'/d{4}.mp3'
sock=urllib2.urlopen("http://www.unsv.com/learning-english/")
source=sock.read()
namepattern=re.compile(url)
link=namepattern.findall(source)
link=list(set(link))
for i in link:
test(i)
n=n+1
print 'totally download ',n,' new files'def moveoldfile():
k=0
source = 'e://xdj//voa//new'
target_dir = 'e://xdj//voa'
filelist=listdir(source)
print 'move old files:/n',listdir(source)
for name in filelist :
srcFilename = source + '//' + name
srcFilename = '"' + srcFilename + '"'
# desFilename = target_dir + '//' + now + '_' + name
desFilename = '"' + target_dir + '"'
# print
copy_command = "move /Y %s %s" % (srcFilename, desFilename)
# print copy_command
if os.system(copy_command) == 0:
k = k + 1
# print 'Successful backup to copy from', srcFilename, 'to' ,desFilename
else:
print 'Fail to copy', srcFilename, 'to', desFilename
print 'total move', k, 'files'if __name__== '__main__':
# url='http://njtelecom.unsv.com/archives/voanews/specialenglish/'+time.strftime('%Y/%m/%d',time.localtime())+'//d{4}//{[0-9a-zA-Z-]{38}/}7D/special'+time.strftime('%Y%m%d',time.localtime())+'/d{4}.mp3'
moveoldfile()
downloadnew()- 遗留问题:如何获得文件的大小,以便下载的时候显示进度。
- 参考资料:
- windowsxp计划任务设置http://www.docin.com/p-18820441.html
- py2exe使用:http://www.jb51.net/article/9296.htm
- 文件移动:http://www.blogjava.net/daning/archive/2008/01/11/113764.html
- 字符串格式化:http://www.tsnc.edu.cn/default/tsnc_wgrj/doc/pythonhtml/html/native_data_types/formatting_strings.html
- 正则表达式参考文档:http://www.regexlab.com/zh/regref.htm
- date使用:http://blog.csdn.net/suiyunonghen/archive/2009/03/18/3999986.aspx和http://blog.alexa-pro.cn/?p=214和http://python.kgblog.net/2009/08/19/python-date-time.html
- python文件下载:http://topic.csdn.net/u/20090707/15/f48d1118-fa3a-4b36-bb16-58af47b353ca.html
每天定时下载mp3并移动昨日旧文件
最新推荐文章于 2024-09-13 22:33:55 发布