python数据存储（TXT，CSV，Mysql，MangoDB)

最新推荐文章于 2023-01-31 14:23:09 发布

黎明之道

最新推荐文章于 2023-01-31 14:23:09 发布

阅读量302

点赞数

分类专栏： python爬虫文章标签： mysql python 数据库 sql mongodb

本文链接：https://blog.csdn.net/sjjsaaaa/article/details/115674994

版权

python爬虫专栏收录该内容

15 篇文章 1 订阅

订阅专栏

数据存储

一、存储到TXT或CSV

1.将数据存储到TXT

几种打开文件的方式：

读写方式	可否读写	若文件不存在	写入方式
w	写入	创建	覆盖写入
w+	读取+写入	创建	覆盖写入
r	读取	报错	不可写入
r+	读取+写入	创建	覆盖写入
a	写入	创建	附加写入
a+	读取+写入	创建	附加写入

title = "This is a test sentence."
with open(r'D:\title.txt', "a+") as f:
    f.write(title)
    f.close()

有格式的存储：

output = '\t'.join(['name','title','age','gender'])
with open(r'D:\test.txt', "a+") as f:
    f.write(output)
    f.close()

2.读取文件

with open(r'D:\title.txt', "r", encoding ='utf-8') as f:
    result = f.read()
    print (result)

with open(r'D:\title.txt', "r", encoding ='utf-8') as f:
    result = f.read().splitlines()
    print (result)

在这里插入图片描述

2.将数据存入CSV

import csv
with open('test.csv', 'r',encoding='utf-8') as csvfile:
    csv_reader = csv.reader(csvfile)
    for row in csv_reader:
        print(row)
        print(row[0])

在这里插入图片描述

import csv
output_list = ['1', '2','3','4']
with open('test2.csv', 'a+', encoding='utf-8', newline='') as csvfile:
    w = csv.writer(csvfile)
    w.writerow(output_list)

二、存储到MySQL数据库

import pymysql
 
# 打开数据库连接
db = pymysql.connect("localhost","root","password","scraping" )
 
# 使用cursor()方法获取操作游标 
cursor = db.cursor()
 
# SQL 插入语句
sql = """INSERT INTO urls (url, content) VALUES ('www.baidu.com', 'This is content.')"""
try:
   # 执行sql语句
   cursor.execute(sql)
   # 提交到数据库执行
   db.commit()
except:
   # 如果发生错误则回滚
   db.rollback()
# 关闭数据库连接
db.close()

将从网上爬取到的内容存储到MySql数据库中：

import requests
from bs4 import BeautifulSoup
import pymysql

db = pymysql.connect("localhost","root","password","scraping" )
cursor = db.cursor()

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "lxml")
title_list = soup.find_all("h1", class_="post-title")
for eachone in title_list:
    url = eachone.a['href']
    title = eachone.a.text.strip()
    cursor.execute("INSERT INTO urls (url, content) VALUES (%s, %s)", (url, title))
    
db.commit()
db.close()

三、存储MongoDB数据库

安装完成以后，可以尝试用Python操作MongoDB，检测能否正常连接到数据库。

from pymongo import MongoClient
client = MongoClient('localhost',27017)
db = client.blog_database
collection = db.blog

将爬取博客主页的所有文章标题存储至MongoDB数据库：

import requests
import datetime
from bs4 import BeautifulSoup
from pymongo import MongoClient

client = MongoClient('localhost',27017)
db = client.blog_database
collection = db.blog

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} 
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "lxml")
title_list = soup.find_all("h1", class_="post-title")
for eachone in title_list:
    url = eachone.a['href']
    title = eachone.a.text.strip()
    post = {"url": url,
         "title": title,
         "date": datetime.datetime.utcnow()}
    collection.insert_one(post)

在上面的代码中，首先将爬虫获取的数据存入post的字典中，然后使用insert+one加入集合colllection中，进入目录C:\Program Files\MongoDB\Server\4.0\bin，双击mongo.exe，输入：

use blog_database
db.blog.find().pretty()

这样就能够查询数据集合的数据了。

黎明之道

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
python数据存储（TXT，CSV，Mysql，MangoDB)

数据存储一、存储到TXT或CSV1.将数据存储到TXT几种打开文件的方式：读写方式可否读写若文件不存在写入方式w写入创建覆盖写入w+读取+写入创建覆盖写入r读取报错不可写入r+读取+写入创建覆盖写入a写入创建附加写入a+读取+写入创建附加写入title = "This is a test sentence."with open(r'D:\title.txt', "a+") as f: f.wri
复制链接

扫一扫