采用Scrapy框架爬取股票信息
思路
step1 : 建立工程和Spider模板
step2 : 编写Spider
step3 : 编写ITEM,Pipelines
建立工程
打开命令行,输入
scrapy startproject Stocks
然后会在当前位置建立一个名称为Stocks的文件夹,包含的目录如下:
编写Spider
- 编写stocks.py文件
设置start_url为上海深圳股票代码一览表
生成个股网址:
def parse(self, response):
for href in response.css('a::attr(href)').extract():
try:
stock = re.findall(r'[S][HZ]\d{6}', href)[0]
# print(stock)
url = 'https://hq.gucheng.com/' + stock
# print(url)
yield scrapy.Request(url, callback=self.parse_stock, headers={'user-agent': 'Mozilla/5.0'})
except:
continue
对每一个个股的网址进行解析,提取个股信息。
def parse_stock(self, response):
infodict = {}
stockinfo = response.css('.stock_price.clearfix')
name = stockinfo.css('h3').extract()[0]
# print(name)
keylist = stockinfo.css('dt').extract()[:-4]
value = stockinfo.css('dd').extract()[:-4]
# print(value)
# print(keylist)
for t in range(len(keylist)):
key = re.findall(r'>(.*)<', keylist[t])[0]
try:
val = re.findall(r'>(.*)<', value[t])[0]
# print(val)
except:
val = '_'
infodict[key] = val
infodict.update(
{
'股票名称': re.findall(r'>(.*)<', name)[0][:-4]
})
# print(infodict)
yield infodict
编写Pipelines
对返回的结果item进行操作
class StocksInfoPipeline(object):
def open_spider(self, spider):
self.f = open('stockinfo.txt', 'w', encoding='utf-8')
def close_spider(self, spider):
self.f.close()
def process_item(self, item, spider):
try:
line = str(dict(item)) + '\n'
self.f.write(line)
except:
pass
return item
修改settings.py 如下:
ITEM_PIPELINES = {
'Stocks.pipelines.StocksInfoPipeline': 300,
}
程序的执行
scrapy crawl stocks
结果
最后将结果保存在txt文件中,
{‘最高’: ‘3383.18’, ‘最低’: ‘3325.17’, ‘今开’: ‘3381.01’, ‘昨收’: ‘3373.28’, ‘换手率’: ‘0.81%’, ‘振幅’: ‘1.72%’, ‘成交量’: ‘2.99亿’, ‘成交额’: ‘3798.49亿’, ‘股票名称’: ‘上证指数(SH000001)’}
{‘最高’: ‘4.05’, ‘最低’: ‘3.93’, ‘今开’: ‘4.04’, ‘昨收’: ‘4.03’, ‘涨停’: ‘4.43’, ‘跌停’: ‘3.63’, ‘换手率’: ‘0.67%’, ‘振幅’: ‘2.98%’, ‘成交量’: ‘1750.5万’, ‘成交额’: ‘6962.55万’, ‘内盘’: ‘1258.68万’, ‘外盘’: ‘491.82万’, ‘量比’: ‘1.03%’, ‘涨跌幅’: ‘-1.49%’, ‘股票名称’: ‘海王生物(SZ000078)’}
{‘最高’: ‘8.36’, ‘最低’: ‘8.20’, ‘今开’: ‘8.35’, ‘昨收’: ‘8.31’, ‘涨停’: ‘9.14’, ‘跌停’: ‘7.48’, ‘换手率’: ‘0.54%’, ‘振幅’: ‘1.93%’, ‘成交量’: ‘1111.69万’, ‘成交额’: ‘9179.61万’, ‘内盘’: ‘628.8万’, ‘外盘’: ‘482.89万’, ‘量比’: ‘1.10%’, ‘涨跌幅’: ‘-0.84%’, ‘股票名称’: ‘深圳机场(SZ000089)’}
{‘最高’: ‘6.88’, ‘最低’: ‘6.52’, ‘今开’: ‘6.83’, ‘昨收’: ‘6.83’, ‘涨停’: ‘7.51’, ‘跌停’: ‘6.15’, ‘换手率’: ‘2.74%’, ‘振幅’: ‘5.27%’, ‘成交量’: ‘3.46亿’, ‘成交额’: ‘23.12亿’, ‘内盘’: ‘2.01亿’, ‘外盘’: ‘1.46亿’, ‘量比’: ‘0.49%’, ‘涨跌幅’: ‘-1.61%’, ‘股票名称’: ‘TCL科技(SZ000100)’}
{‘最高’: ‘3.15’, ‘最低’: ‘2.98’, ‘今开’: ‘3.10’, ‘昨收’: ‘3.13’, ‘涨停’: ‘3.44’, ‘跌停’: ‘2.82’, ‘换手率’: ‘2.81%’, ‘振幅’: ‘5.43%’, ‘成交量’: ‘2260.96万’, ‘成交额’: ‘6902.89万’, ‘内盘’: ‘1357.81万’, ‘外盘’: ‘903.15万’, ‘量比’: ‘0.64%’, ‘涨跌幅’: ‘-0.32%’, ‘股票名称’: ‘宜华健康(SZ000150)’}
{‘最高’: ‘10.02’, ‘最低’: ‘9.77’, ‘今开’: ‘9.91’, ‘昨收’: ‘9.90’, ‘涨停’: ‘10.89’, ‘跌停’: ‘8.91’, ‘换手率’: ‘0.09%’, ‘振幅’: ‘2.53%’, ‘成交量’: ‘45.47万’, ‘成交额’: ‘449.07万’, ‘内盘’: ‘30.79万’, ‘外盘’: ‘14.69万’, ‘量比’: ‘0.78%’, ‘涨跌幅’: ‘-1.21%’, ‘股票名称’: ‘广聚能源(SZ000096)’}
{‘最高’: ‘3.03’, ‘最低’: ‘2.92’, ‘今开’: ‘3.02’, ‘昨收’: ‘3.03’, ‘涨停’: ‘3.33’, ‘跌停’: ‘2.73’, ‘换手率’: ‘0.74%’, ‘振幅’: ‘3.63%’, ‘成交量’: ‘746.11万’, ‘成交额’: ‘2221.08万’, ‘内盘’: ‘527.24万’, ‘外盘’: ‘218.87万’, ‘量比’: ‘1.53%’, ‘涨跌幅’: ‘-2.31%’, ‘股票名称’: ‘华控赛格(SZ000068)’}