mysql scrapy 重复数据_将多个Scrapy数据插入mysql

# -*- coding: utf-8 -*-

import scrapy

class BepeSpider(scrapy.Spider):

name = 'bepe'

allowed_domains = ['bpbd.jatengprov.go.id']

start_urls = ['https://bpbd.jatengprov.go.id/category/laporan-bencana/']

COUNT_MAX = 100

count = 0

def parse(self, response):

for quote in response.css('div.post'):

item = {

'judul': quote.css('h2.post-title > a::text').extract_first(),

'teks': quote.css('div.entrytext > p::text').extract_first(),

'tag': quote.css('div.up-bottom-border > p.postmetadata > a::text').extract(),

}

yield item

self.count = self.count + 1

#following pagination link

next_page_url = response.css('div.alignright > a::attr(href)').extract_first() #dapatkan link untuk selanjutnya

if (self.count < self.COUNT_MAX):

next_page_url = response.urljoin(next_page_url)

yield scrapy.Request(url=next_page_url, callback=self.parse)

Is there any way to INSERT my crawling data into mysql with such array like this?

item = {

'judul': quote.css('h2.post-title > a::text').extract_first(),

'teks': quote.css('div.entrytext > p::text').extract_first(),

'tag': quote.css('div.up-bottom-border > p.postmetadata > a::text').extract(),

}

I have tried code below but it couldnt insert any data

conn = Connection()

mycursor = conn.cursor()

sql = "insert into berita(judul, isi, tag) values(%s, %s, %s)"

item = {

'judul': quote.css('h2.post-title > a::text').extract_first(),

'teks': quote.css('div.entrytext > p::text').extract_first(),

'tag': quote.css('div.up-bottom-border > p.postmetadata > a::text').extract(),

}

val=(item['judul'], item['teks'], item['tag'])

mycursor.execute(sql,val)

conn.commit()

Sorry for my bad english and I hope anybody expert in python could help me

解决方案

After some googling and read Q&A I found the way to insert list (multiple array as I said before).

Basically I follow scrapy tutorial from youtube and the lesson not come this way to query the data into mysql. And after I able to crawling the data I need to insert the data

The first thing we have to know what our data type first so I found my data type is list not array. Then this is my code that not found in the youtube scrapy tutorial

item1 = quote.css('h2.post-title > a::text').extract_first()

item2 = quote.css('div.entrytext > p::text').extract_first()

item3 = quote.css('div.up-bottom-border > p.postmetadata > a::text').extract()

items3 = ', '.join(item3)

then the query is below

mycursor.execute("INSERT INTO berita (judul, isi, tag) VALUES (%s, %s, %s)", (item1, item2, items3))

Hopefully it can help

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值