SQLite | SQLite 与 Pandas 比较篇之一

最新推荐文章于 2024-05-13 23:17:05 发布

X1AO___X1A

最新推荐文章于 2024-05-13 23:17:05 发布

阅读量2.5k

点赞数 3

分类专栏： # SQLite SQL 文章标签：数据库 sqlite sql

本文链接：https://blog.csdn.net/weixin_45488228/article/details/104450751

版权

SQLite 同时被 2 个专栏收录

9 篇文章 1 订阅

订阅专栏

SQL

9 篇文章 0 订阅

订阅专栏

文章目录

1. SQLite 与 Pandas 异同点比较
2. 总结

1. SQLite 与 Pandas 异同点比较

1.1 数据导入

1.1.1 SQLIte

SQLite 需要首先导入数据库文件并使用 select 语句选取记录：

使用Jupyter Notebook 运行 SQL 语句需安装 ipython-sql
%sql 以及 %%sql 为在 Notebook 中运行 SQL 语句，在 SQLite 命令行或 SQLite Stiduo 中不需要 %sql 或 %%sql

%load_ext sql
%sql sqlite:///DataBase/weather_stations.db

'Connected: @DataBase/weather_stations.db'

%%sql
select * from station_data
limit 0,3

 * sqlite:///DataBase/weather_stations.db
Done.

station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth	fog	rain	hail	thunder	tornado
143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36	None	1	1	1	1	1
766440	39537B	1998	10	1	72.7	1014.6	5.9	6.7	83.3	None	0	0	0	0	0
176010	C3C6D5	2001	5	18	55.7	None	7.3	4.3	69.1	None	0	0	0	0	0

1.1.2 Pandas

在 Python 中我们可以利用 sqlite3 模块来读取数据库并转换为 Pandas 的 DataFrame 格式：

import sqlite3
import pandas as pd
con = sqlite3.connect('./DataBase/weather_stations.db')
cursor = con.execute('select * from station_data')
rows = cursor.fetchall()
df = pd.DataFrame(rows, columns=[x[0] for x in cursor.description])

df.head(3)

	station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth	fog	rain	hail	thunder	tornado
0	143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36.0	NaN	1	1	1	1	1
1	766440	39537B	1998	10	1	72.7	1014.6	5.9	6.7	83.3	NaN	0	0	0	0	0
2	176010	C3C6D5	2001	5	18	55.7	NaN	7.3	4.3	69.1	NaN	0	0	0	0	0

1.2 选取数据

1.2.1 SQLite

在 SQLite 中，我们需要通过 select 语句来筛选数据，并可以使用 alias、内置函数和文字拼接等对数据进行处理：

%%sql
select
station_number ||'_'|| report_code as number, -- 将两个拼接
round(temperature*9/5+32, 2) as Fahrenheit -- 将摄氏度转换为华氏度
from station_data
limit 0,3

 * sqlite:///DataBase/weather_stations.db
Done.

number	Fahrenheit
143080_34DDA7	96.0
766440_39537B	181.94
176010_C3C6D5	156.38

1.2.2 Pandas

Pandas 中似乎没有直接作用于字符的拼接函数，因此需要多一步转换的过程才能得到与 SQL 中 concat 相同的效果：

pd.concat( [pd.Series([str(num)+'_'+code for (num,code) in 
                       zip(list(df['station_number']), 
                           list(df['report_code']))], name='number'), 
            pd.Series(round(df['temperature']*9/5+32, 2), name='Fahrenheit')],
          axis=1).head(3)

	number	Fahrenheit
0	143080_34DDA7	96.80
1	766440_39537B	181.94
2	176010_C3C6D5	156.38

1.3 筛选数据

1.3.1 SQLite

SQLite 中使用 where 对数据进行筛选，如筛选 2005 年至 2010 年的数据：

%%sql
select * from station_data
where year>=2005 and year<=2010
limit 0,3

 * sqlite:///DataBase/weather_stations.db
Done.

station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth
125600	145150	2007	10	14	33	None	6.9	2.5	39.7	None
598550	C5C66E	2006	10	15	72.9	None	14.2	1.7	82	None
941830	229317	2007	4	19	66.5	994.9	None	4	76.3	None

筛选月份为 3、6、9、12 的数据：

%%sql 
select * from station_data 
where Month in (3,6,9,12)
limit 0,3;

 * sqlite:///DataBase/weather_stations.db
Done.

station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	precipitation	snow_depth	fog	rain	hail	thunder	tornado
143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36	0	None	1	1	1	1	1
821930	1F8A7B	1953	6	18	72.8	1007.1	12.4	3.6	81.3	0	None	0	0	0	0	0
478070	D028D8	1981	6	27	73.4	None	7.9	3	77	1.93	None	0	0	0	0	0

1.3.2 Pandas

筛选 2005 年至 2010 年的数据：

df[(df['year']>=2005) & (df['year']<=2010)].head(3)

	station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	snow_depth
3	125600	145150	2007	10	14	33.0	NaN	6.9	2.5	39.7	NaN
9	598550	C5C66E	2006	10	15	72.9	NaN	14.2	1.7	82.0	NaN
18	941830	229317	2007	4	19	66.5	994.9	NaN	4.0	76.3	NaN

筛选月份为 3、6、9、12 的数据：

df[ [ [data in [3,6,9,12]] for data in df['month'].values ] ].head(3)

	station_number	report_code	year	month	day	dew_point	station_pressure	visibility	wind_speed	temperature	precipitation	snow_depth	fog	rain	hail	thunder	tornado
0	143080	34DDA7	2002	12	21	33.8	987.4	3.4	0.2	36.0	0.00	NaN	1	1	1	1	1
5	821930	1F8A7B	1953	6	18	72.8	1007.1	12.4	3.6	81.3	0.00	NaN	0	0	0	0	0
6	478070	D028D8	1981	6	27	73.4	NaN	7.9	3.0	77.0	1.93	NaN	0	0	0	0	0

1.4 数据聚合与分组

1.4.1 SQLite

在 SQLite 中，我们使用 group by 和内置聚合函数实现聚合分组操作，如统计每个月龙卷风的记录次数：

%%sql
select year, month,
count(*) as record_count
from station_data
where tornado == 1
group by year, month
order by year, month
limit 0,5;

 * sqlite:///DataBase/weather_stations.db
Done.

year	month	record_count
1937	7	3
1941	8	3
1942	10	3
1943	1	3
1943	4	3

1.4.2 Pandas

同样在 Pandas 中，也有 groupby 函数实现分组操作：

df[df['tornado']==1].groupby(['year','month'])['tornado'].count().head(5)

year  month
1937  7        3
1941  8        3
1942  10       3
1943  1        3
      4        3
Name: tornado, dtype: int64

2. 总结

从数据导入、选取、筛选、聚合与分组可以看出， SQLite 具有结构化的特点，
容易操作且易上手，代码一目了然。反观 Pandas ，则继承了 Pythonic 的特点。
虽然部分功能需要自己写循环，但由于列表推倒式的存在，使得这代码依然可以保持短小精悍。
总体而言，SQLite 的功能实现更加方便快捷，而 Pandas 则拥有更高的自由度，
但需要对 Python 比较熟悉，两者各有千秋！

X1AO___X1A

关注

3
点赞
踩
18

收藏

觉得还不错? 一键收藏
1
评论
SQLite | SQLite 与 Pandas 比较篇之一

相关文章：SQL | 目录SQLite | SelectSQLite | WhereSQLite | Group by and Order by1. SQLite 与 Pandas 异同点比较1.1 数据导入1.1.1 SQLIteSQLite 需要首先导入数据库文件并使用 select 语句选取记录：使用Jupyter Notebook 运行 SQL 语句需安装 ipyth...
复制链接

扫一扫