python爬虫股票交易信息(2)

采用Scrapy框架爬取股票信息

思路

step1 : 建立工程和Spider模板
step2 : 编写Spider
step3 : 编写ITEM,Pipelines

建立工程

打开命令行,输入

scrapy startproject Stocks

然后会在当前位置建立一个名称为Stocks的文件夹,包含的目录如下:
Stocks 文件目录

编写Spider

def parse(self, response):
    for href in response.css('a::attr(href)').extract():
        try:
            stock = re.findall(r'[S][HZ]\d{6}', href)[0]
            # print(stock)
            url = 'https://hq.gucheng.com/' + stock
            # print(url)
            yield scrapy.Request(url, callback=self.parse_stock, headers={'user-agent': 'Mozilla/5.0'})
        except:
            continue

对每一个个股的网址进行解析,提取个股信息。

def parse_stock(self, response):
    infodict = {}
    stockinfo = response.css('.stock_price.clearfix')
    name = stockinfo.css('h3').extract()[0]
    # print(name)
    keylist = stockinfo.css('dt').extract()[:-4]
    value = stockinfo.css('dd').extract()[:-4]
    # print(value)
    # print(keylist)
    for t in range(len(keylist)):
        key = re.findall(r'>(.*)<', keylist[t])[0]
        try:
            val = re.findall(r'>(.*)<', value[t])[0]
            # print(val)
        except:
            val = '_'
        infodict[key] = val
    infodict.update(
        {
            '股票名称': re.findall(r'>(.*)<', name)[0][:-4]
        })
    # print(infodict)
    yield infodict

编写Pipelines

对返回的结果item进行操作

class StocksInfoPipeline(object):
    def open_spider(self, spider):
        self.f = open('stockinfo.txt', 'w', encoding='utf-8')

    def close_spider(self, spider):
        self.f.close()

    def process_item(self, item, spider):
        try:
            line = str(dict(item)) + '\n'
            self.f.write(line)
        except:
            pass
        return item

修改settings.py 如下:

ITEM_PIPELINES = {
   'Stocks.pipelines.StocksInfoPipeline': 300,
}

程序的执行

scrapy crawl stocks

结果

最后将结果保存在txt文件中,

{‘最高’: ‘3383.18’, ‘最低’: ‘3325.17’, ‘今开’: ‘3381.01’, ‘昨收’: ‘3373.28’, ‘换手率’: ‘0.81%’, ‘振幅’: ‘1.72%’, ‘成交量’: ‘2.99亿’, ‘成交额’: ‘3798.49亿’, ‘股票名称’: ‘上证指数(SH000001)’}
{‘最高’: ‘4.05’, ‘最低’: ‘3.93’, ‘今开’: ‘4.04’, ‘昨收’: ‘4.03’, ‘涨停’: ‘4.43’, ‘跌停’: ‘3.63’, ‘换手率’: ‘0.67%’, ‘振幅’: ‘2.98%’, ‘成交量’: ‘1750.5万’, ‘成交额’: ‘6962.55万’, ‘内盘’: ‘1258.68万’, ‘外盘’: ‘491.82万’, ‘量比’: ‘1.03%’, ‘涨跌幅’: ‘-1.49%’, ‘股票名称’: ‘海王生物(SZ000078)’}
{‘最高’: ‘8.36’, ‘最低’: ‘8.20’, ‘今开’: ‘8.35’, ‘昨收’: ‘8.31’, ‘涨停’: ‘9.14’, ‘跌停’: ‘7.48’, ‘换手率’: ‘0.54%’, ‘振幅’: ‘1.93%’, ‘成交量’: ‘1111.69万’, ‘成交额’: ‘9179.61万’, ‘内盘’: ‘628.8万’, ‘外盘’: ‘482.89万’, ‘量比’: ‘1.10%’, ‘涨跌幅’: ‘-0.84%’, ‘股票名称’: ‘深圳机场(SZ000089)’}
{‘最高’: ‘6.88’, ‘最低’: ‘6.52’, ‘今开’: ‘6.83’, ‘昨收’: ‘6.83’, ‘涨停’: ‘7.51’, ‘跌停’: ‘6.15’, ‘换手率’: ‘2.74%’, ‘振幅’: ‘5.27%’, ‘成交量’: ‘3.46亿’, ‘成交额’: ‘23.12亿’, ‘内盘’: ‘2.01亿’, ‘外盘’: ‘1.46亿’, ‘量比’: ‘0.49%’, ‘涨跌幅’: ‘-1.61%’, ‘股票名称’: ‘TCL科技(SZ000100)’}
{‘最高’: ‘3.15’, ‘最低’: ‘2.98’, ‘今开’: ‘3.10’, ‘昨收’: ‘3.13’, ‘涨停’: ‘3.44’, ‘跌停’: ‘2.82’, ‘换手率’: ‘2.81%’, ‘振幅’: ‘5.43%’, ‘成交量’: ‘2260.96万’, ‘成交额’: ‘6902.89万’, ‘内盘’: ‘1357.81万’, ‘外盘’: ‘903.15万’, ‘量比’: ‘0.64%’, ‘涨跌幅’: ‘-0.32%’, ‘股票名称’: ‘宜华健康(SZ000150)’}
{‘最高’: ‘10.02’, ‘最低’: ‘9.77’, ‘今开’: ‘9.91’, ‘昨收’: ‘9.90’, ‘涨停’: ‘10.89’, ‘跌停’: ‘8.91’, ‘换手率’: ‘0.09%’, ‘振幅’: ‘2.53%’, ‘成交量’: ‘45.47万’, ‘成交额’: ‘449.07万’, ‘内盘’: ‘30.79万’, ‘外盘’: ‘14.69万’, ‘量比’: ‘0.78%’, ‘涨跌幅’: ‘-1.21%’, ‘股票名称’: ‘广聚能源(SZ000096)’}
{‘最高’: ‘3.03’, ‘最低’: ‘2.92’, ‘今开’: ‘3.02’, ‘昨收’: ‘3.03’, ‘涨停’: ‘3.33’, ‘跌停’: ‘2.73’, ‘换手率’: ‘0.74%’, ‘振幅’: ‘3.63%’, ‘成交量’: ‘746.11万’, ‘成交额’: ‘2221.08万’, ‘内盘’: ‘527.24万’, ‘外盘’: ‘218.87万’, ‘量比’: ‘1.53%’, ‘涨跌幅’: ‘-2.31%’, ‘股票名称’: ‘华控赛格(SZ000068)’}

参考文献

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值