几天上化学课有点无聊,背了会儿元素名称,忽然想练爬虫,就顺便把元素名称爬下来。
`from bs4 import BeautifulSoup as bs
from urllib import request
import re
url = ‘https://www.proprofs.com/flashcards/story.php?title=0-most-common-chemical-elements’
thehtml = request.urlopen(url)
html_data = thehtml.read().decode()
soup =bs(html_data,’html.parser’)
theinfo = soup.find(‘table’,class_ = ‘table flashCardsPreviewTable’)
thetarget = theinfo.find_all(‘div’,class_=’front_text card_text’)
theans= theinfo.find_all(‘div’,class_=’back_text card_text’)
thelist = []
file_1 = open(‘C:\Users\YES\Desktop\hh.txt’,’a’)
”’func = re.compile(r’\b\w{2}\b’)”’
for i in range(len(theans)):
thedict={}
thedict[‘name’] = thetarget[i].string
thedict[‘abre’] = theans[i].string
thelist.append(thedict)
”’for i in thelist:
string = str(i)
string1 = func.findall(string)
string1 = str(string1)
file_1.write(string1+’\n’)”’
for i in thelist:
file_1.write(str(i)+’\n’)
file_1.close()
`真的是初学,代码写得很烂 :(
不过还是感觉挺兴奋的,可以做些自己的东西。
把抓的东西写进了文本里,不过还有好多符号。
然后就用正则,但是…… 语法不行……
晚点复习下正则再来解决把。