代码是使用python3写的
核心部分是拼凑URL和HEADERS,这个URL是我个人对中国天气网的观察总结出来的!!!
预报信息都可以从中国天气网中抓取出来,下面的是针对实时预报的信息
#!/usr/bin/python3
import requests
import re
import time
s = str(time.time())
s = re.sub(r'\.', '', s)
if (len(s)==12):
s = s + '0'
elif (len(s) == 11) :
s = s + '00'
elif (len(s) == 10):
s = s + '000'
url = 'http://d1.weather.com.cn/sk_2d/101010100.html?_=' + s
headers_str = '''Host: d1.weather.com.cn
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
Referer: http://d1.weather.com.cn/sk_2d/101010100.html?_=1485338200590676
Cookie: vjuids=-2a0fa51f.15929686632.0.1f0ebc7551e7e; vjlast=1482458425.1485337372.23; Hm_lvt_080dabacb001ad3dc8b9b9049b36d43b=1485337372; f_city=%E5%8C%97%E4%BA%AC%7C101010100%7C; Hm_lpvt_080dabacb001ad3dc8b9b9049b36d43b=1485337417; BIGipServerd1src_pool=1874396221.20480.0000
Connection: keep-alive
'''
headers_list = re.findall(r'(.*?):(.*)', headers_str)
headers = {}
for item in headers_list:
headers[item[0]] = item[1].strip()
html_text = ''
times = 20
while (times != 0):
try:
html = requests.get(url, headers = headers)
# set html encoding
html.encoding = 'utf-8'
if html.encoding == 'ISO-8859-1':
encodings = requests.utils.get_encodings_from_content(html.content)
if encodings:
html.encoding = encodings[0]
else:
html.encoding = html.apparent_encoding
# get html text or content
html_text = html.text
break
except requests.ConnectionError as e:
time.sleep(0.05)
times -= 1
# html may be is null
if (len(html_text) > 0):
html_text = re.search(r'=.*?{(.*)}', html_text).group(1)
real_time_weather_list = re.findall('\"(.*?)\":\"(.*?)\"', html_text)
real_time_weather_map = {}
for item in real_time_weather_list:
real_time_weather_map[item[0]] = item[1]
print('place: %s' % real_time_weather_map['cityname'])
print('time: %s' % real_time_weather_map['time'])
print('temp: %s oC' % real_time_weather_map['temp'])
print('SD: %s' % real_time_weather_map['SD'])
print('WD: %s' % real_time_weather_map['WD'])
print('WS: %s' % real_time_weather_map['WS'])
print('AQI: %s' % real_time_weather_map['aqi'])
print('limitnumber: %s' % real_time_weather_map['limitnumber'])
获取省级代码:http://www.weather.com.cn/data/list3/city.xml?level=1
获取城市代码(比如安徽是22):http://www.weather.com.cn/data/list3/city22.xml?level=2
获取区域代码(比如安庆是2206):http://www.weather.com.cn/data/list3/city2206.xml?level=3
获取到安徽省安庆市望江县的代码是220607
然后去加上中国代码请求URL:http://m.weather.com.cn/data/101220607.html
就可以获取当地天气。
另外再给几个有用的探索得到的URL:
天气 FLASH实况:http://flash.weather.com.cn/sk2/101220607.xml
实况FLASH:http://flash.weather.com.cn/sk2/shikuang.swf?id=101220607