我又来了,总结一下昨天编程遇到的问题,方便以后查看。
目录
1. dict与list的嵌套使用(dict.setdefault()与list.append())
- list.append()
import numpy as np
a = [1, 2, 3, 4]
b = a.copy()
c = a.copy()
d = a.copy()
b.append([5, 6, 7])
c.append(8)
print('b=', b)
print('c=', c)
d.append(9, 10)
print('d=', d)
# 结果:
b= [1, 2, 3, 4, [5, 6, 7]]
c= [1, 2, 3, 4, 8]
Traceback (most recent call last):
File "d:/test.py", line 10, in <module>
d.append(9, 10)
TypeError: append() takes exactly one argument (2 given)
append() 只能给一个参数,如果你要添加多个值的话就放在一个list里。
- dict.setdefault()
# dict.setdefault(k, d) # 等价于dict.get(k, d), also set dict[k]=d if k not in D
d = {1:4, '1': [1, 2]}
a = d.setdefault(1, 4)
print(a) # 4
b = d.get(1, 4)
print(b) # 4
d.setdefault(2, []).append([2, 3])
print(d) # {1: 4, '1': [1, 2], 2: [[2, 3]]}
d.setdefault('1', []).append([2, 3])
print(d) # {1: 4, '1': [1, 2, [2, 3]], 2: [[2, 3]]}
当dict存在key值时就将值append到该key值对应的list中;当dict不存在key值时就建立这个key,然后将值append到该key值对应的values里。
2. 如何将汉字数字转换为阿拉伯数字
def transform(num_han):
tmp_dict ={'零':0, '一':1, '二':2, '两':2, '三':3, '四':4, '五':5, '六':6, '七':7, '八':8, '九':9, '十':10}
if u'十' in num_han:
total = 0
r = 1 # 表示单位:个十百千...
for i in range(len(num_han) - 1, -1, -1):
val = tmp_dict.get(num_han[i])
if val >= 10 and i == 0:
if val > r:
r = val
total = total + val
else:
r = r * val
elif val >= 10:
if val > r:
r = val
else:
r = r * val
else:
total = total + r * val
else:
val = []
for s in num_han:
val.append(str(tmp_dict.get(s)))
num = ''.join(val)
total = eval(num)
return total
3. 如何跳过异常继续执行程序
处理较大且复杂数据的时候经常会出现异常情况,但遇到异常情况后程序就会中断,怎么让程序跳过异常继续执行呢?
一般用try-except语句、或if-else语句。
def getChapter(path, bkname, menu_link):
text = getHtml(menu_link)
html = etree.HTML(text)
try:
num_han = html.xpath('/html/body/div[3]/div[1]/p/span[3]/text()')[0].strip(' ').split('卷')[0].split('第')[1]
num = transform(num_han)
except:
num = 1
info_dict = {}
volume_dict = {}
over = 0
for i in range(2*num):
try:
volume= html.xpath('/html/body/div[3]/div[2]/div[2]/div[' + str(i+1)+ ']/div/em/text()')[0]
words = html.xpath('/html/body/div[3]/div[2]/div[2]/div[' + str(i+1)+ ']/div/em[2]/cite/text()')[0]
volume_num = eval(volume.strip(' ').split('章')[0].split('共')[1])
chapter_list = []
if volume_num > 1:
for j in range(volume_num):
chapter = html.xpath('/html/body/div[3]/div[2]/div[2]/div[' + str(i+1)+ ']/ul/li[' + str(j+1) + ']/a/@title')[0]
chapter_name = chapter.strip(' ').split(' ')[0]
chapter_list.append(chapter)
chapter_link = html.xpath('/html/body/div[3]/div[2]/div[2]/div[' + str(i+1)+ ']/ul/li[' + str(j+1) + ']/a/@href')[0]
# print(chapter_link)
forum_link = getForumLink(chapter_link)
commend_dict = getComment(forum_link)
info_dict = {chapter_name: commend_dict}
write_to_txt(path, bkname, info_dict)
else:
chapter = html.xpath('/html/body/div[3]/div[2]/div[2]/div[' + str(i+1)+ ']/ul/li/a/text()')[0]
chapter_name = chapter.strip(' ').split(' ')[0]
chapter_link = html.xpath('/html/body/div[3]/div[2]/div[2]/div[' + str(i+1)+ ']/ul/li/a/@href')[0]
chapter_list.append(chapter)
forum_link = getForumLink(chapter_link)
commend_dict = getComment(forum_link)
info_dict = {chapter_name: commend_dict}
write_to_txt(path, bkname, info_dict)
volume_dict.setdefault(i, [volume, words, chapter_list])
except IndexError:
over = 1
pass
if over == 1:
break
return volume_dict
我的例子有点复杂了,参考资料:python 如何跳过异常继续执行
4. 怎么让程序创建文件夹
def write_to_txt(path, bkname, info):
isExists = os.path.exists(path + '/' +bkname) # 判断是否存在文件路径,不存在就创建
if not isExists:
os.mkdir(path + '/' +bkname)
with open(path + '/' +bkname+ '/' + 'info.txt', 'a+', encoding='utf-8') as f:
f.write(str(info) + '\n')
参考资料:python创建文件夹得方法
5. 网页源码中xpath总结
写了一些爬虫代码之后,觉得还是有必要了解一下html源码,有时候copy了路径但是一直爬不下来内容就很烦。
内容还挺多了,我先占个坑之后详细看看了再写。爬虫时写xpath的几点感悟:
- 我在爬虫中用到的xpath要么是copy的路径要么是根据class写的,可以先试试与你的目标最近的一个class,如果找不到那就选上一层的class,如果一直试都没有,那就换一个容易的爬吧,办法总比问题多。
- text()取文字、href()取链接、title()取title等等。
参考资料:HTML基础知识