python数据存储(TXT,CSV,Mysql,MangoDB)

数据存储

一、存储到TXT或CSV

1.将数据存储到TXT

几种打开文件的方式:

读写方式可否读写若文件不存在写入方式
w写入创建覆盖写入
w+读取+写入创建覆盖写入
r读取报错不可写入
r+读取+写入创建覆盖写入
a写入创建附加写入
a+读取+写入创建附加写入
title = "This is a test sentence."
with open(r'D:\title.txt', "a+") as f:
    f.write(title)
    f.close()

有格式的存储:

output = '\t'.join(['name','title','age','gender'])
with open(r'D:\test.txt', "a+") as f:
    f.write(output)
    f.close()

2.读取文件

with open(r'D:\title.txt', "r", encoding ='utf-8') as f:
    result = f.read()
    print (result)
with open(r'D:\title.txt', "r", encoding ='utf-8') as f:
    result = f.read().splitlines()
    print (result)

在这里插入图片描述

2.将数据存入CSV

import csv
with open('test.csv', 'r',encoding='utf-8') as csvfile:
    csv_reader = csv.reader(csvfile)
    for row in csv_reader:
        print(row)
        print(row[0])

在这里插入图片描述

import csv
output_list = ['1', '2','3','4']
with open('test2.csv', 'a+', encoding='utf-8', newline='') as csvfile:
    w = csv.writer(csvfile)
    w.writerow(output_list)

二、存储到MySQL数据库

import pymysql
 
# 打开数据库连接
db = pymysql.connect("localhost","root","password","scraping" )
 
# 使用cursor()方法获取操作游标 
cursor = db.cursor()
 
# SQL 插入语句
sql = """INSERT INTO urls (url, content) VALUES ('www.baidu.com', 'This is content.')"""
try:
   # 执行sql语句
   cursor.execute(sql)
   # 提交到数据库执行
   db.commit()
except:
   # 如果发生错误则回滚
   db.rollback()
# 关闭数据库连接
db.close()

将从网上爬取到的内容存储到MySql数据库中:

import requests
from bs4 import BeautifulSoup
import pymysql

db = pymysql.connect("localhost","root","password","scraping" )
cursor = db.cursor()

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "lxml")
title_list = soup.find_all("h1", class_="post-title")
for eachone in title_list:
    url = eachone.a['href']
    title = eachone.a.text.strip()
    cursor.execute("INSERT INTO urls (url, content) VALUES (%s, %s)", (url, title))
    
db.commit()
db.close()

三、存储MongoDB数据库

安装完成以后,可以尝试用Python操作MongoDB,检测能否正常连接到数据库。

from pymongo import MongoClient
client = MongoClient('localhost',27017)
db = client.blog_database
collection = db.blog

将爬取博客主页的所有文章标题存储至MongoDB数据库:

import requests
import datetime
from bs4 import BeautifulSoup
from pymongo import MongoClient

client = MongoClient('localhost',27017)
db = client.blog_database
collection = db.blog

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'} 
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "lxml")
title_list = soup.find_all("h1", class_="post-title")
for eachone in title_list:
    url = eachone.a['href']
    title = eachone.a.text.strip()
    post = {"url": url,
         "title": title,
         "date": datetime.datetime.utcnow()}
    collection.insert_one(post)

在上面的代码中,首先将爬虫获取的数据存入post的字典中,然后使用insert+one加入集合colllection中,进入目录C:\Program Files\MongoDB\Server\4.0\bin,双击mongo.exe,输入:

use blog_database
db.blog.find().pretty()

这样就能够查询数据集合的数据了。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

黎明之道

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值