一、csv存储数据
1.csv的介绍
一般来说,存储数据量较大的就用csv,其格式简单,并且可以用office打开,容易存储数据
2.csv的基本使用
- 将数据写入的文件后缀名是csv,但是一定要先创建对象
- 创建对象之后一定要先写入表头,然后再写入数据,写入的数据一般是列表中加入元组或者字典
- 读取文件数据的时候,返回的是一个列表,可以通过下标取值
import csv
headers = ['名字','身高','年龄']
stuedents = [
('丸子',"180",'18'),
('动力',"123",'23'),
('小白',"176","12"),
]
with open('students.csv','w',encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(stuedents)
stuedents1 = [
{'名字':'三个' ,"年龄":'18','身高':'175'},
{'名字':'五个' ,"年龄":'19','身高':'165'},
{'名字':'六个' ,"年龄":'20','身高':'145'},
]
with open('students1.csv','w',encoding='utf-8') as f1:
writer = csv.DictWriter(f1,headers)
# 一定要加上这个,不然的话表头数据写不进去
writer.writeheader()
writer.writerows(stuedents1)
3.csv存储数据实例
from requests_html import HTMLSession
from lxml import etree
import csv
url = 'https://www.phb123.com/renwu/fuhao/shishi.html'
session = HTMLSession()
response = session.get(url=url)
data = []
headers = ['世界排名','名字','财富','财富来源','国家地区']
html = etree.HTML(response.text)
# 首先先将第一个去除
trs = html.xpath('//table[@class="rank-table"]/tbody/tr')
for tr in trs[1:]:
rank = tr.xpath('./td[1]/text()')[0]
name = tr.xpath('./td[2]/a/p/text()')[0]
wealth = tr.xpath('./td[3]/text()')[0]
wealth_from = tr.xpath('./td[4]/text()')[0]
areas = tr.xpath('./td[5]/a/text()')[0]
data .append((rank,name,wealth,wealth_from,areas))
print(data)
with open('wealther.csv','w',encoding='utf-8',newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(data)
二、mysql与mongDB存取
1.mysql的基本使用
- 首先要准备好数据库的ip地址,端口,数据库名,数据库的用户名和密码
- 得到后先建立连接,之后就是得到游标,通过游标来进行操作。
import pymysql
host = '127.0.0.1'
port = 3306
database = "spider"
user='admin'
password = 'as'
conn = pymysql.connect(host=host,port=port,database=database,user=user,password=password)
cursor = conn.cursor()
cursor.execute('select * from promote')
# 查看一行
cursor.fetchone()
# 查看全部
cursor.fetchall()
# 对数据进行插入
cursor.execute('insert into promote (id name ) value (12,"三个")')
conn.commit()
cursor.close()
conn.close()
2.mongoDB的基本使用
from pymongo import MongoClient
# 进行pymysql的使用
conn = MongoClient('127.0.0.1',27107)
# 连接students_,没有就自动创建
db = conn.students_
# 连接students_01集合,没有则自动创建
my_set = db.students_01
data = [{"name":"三个"},{"name":"四个"},{"name":"五个"}]
# 插入多条信息
my_set.insert_many(data)
# 查找信息并且返回的是一个对象,遍历从里面取值
print(my_set.find())
for data in my_set.find():
print(data)
- 用xpath的时候可以不用写etree.HTML(里面是html的数据)
- 可以通过类以及函数的方法来实现爬虫,并实现数据的存储