SQL vs Python Pandas基础语法对照

最新推荐文章于 2022-05-18 19:21:51 发布

8su

最新推荐文章于 2022-05-18 19:21:51 发布

阅读量318

点赞数 1

分类专栏： Python

本文链接：https://blog.csdn.net/qq_33291559/article/details/117172500

版权

本文旨在对比SQL（T-SQL SqlServer方言）和Python Pandas库在数据操作方面的基本语法，包括DDL（数据定义语言）如添加/删除列和索引，DML（数据操作语言）如查询、条件过滤、排序、聚合以及JOIN操作。在Pandas中，特别提到了DataFrame.loc[]的使用，它可以接受标签、列表、切片或布尔数组作为输入，并介绍了如何进行列操作和JOIN操作，以及apply函数的应用。

摘要由CSDN通过智能技术生成

目的：对于同一种数据查询/操作需求，比较SQL和Python Pandas的操作。

说明：

我的学习路径是先SQL后Python Pandas，所以本文也按照这个顺序。
SQL的方言是T-SQL(SqlServer)

# %%
import pandas as pd

# %%
wine_reviews = pd.read_csv("./data/winemag-data/winemag-data-130k-v2.csv", index_col=0)

# %%
wine_reviews.head()

DDL

ADD COLUMN
DROP COLUMN
ADD INDEX
DROP INDEX

# %%
# --  ADD NEW COLUMN 
# ALTER TABLE WINE_REVIEWS_NEW ADD price_point_ratio DECIMAL(17,4) NULL
# UPDATE WINE_REVIEWS_NEW SET price_point_ratio = price/(case when points=0 then 1 else points end) 

wine_reviews['price_point_ratio'] = wine_reviews['price']/wine_reviews['points']
wine_reviews.head()


# %%
# --  DROP COLUMN 
# ALTER TABLE WINE_REVIEWS_NEW DROP COLUMN price_point_ratio


wine_reviews = wine_reviews.drop(['price_point_ratio'], axis = 1) # drop column based on column name
# wine_reviews = wine_reviews.drop(wine_reviews.columns[13], axis = 1) # drop column based on column name


# %%
# -- ADD INDEX
# CREATE NONCLUSTERED INDEX IDX_WINE_REVIEWS_NEW_PROVINCE ON WINE_REVIEWS_NEW
# (
#   province 
# )
wine_test = wine_reviews.set_index(['province']) # set province column as index
wine_test.head()
## note: the original index column will be overwritten, so copy the original index to another column before-hand if neccessary. see above
# wine_test['province'] =wine_test.index 


# %%
# -- DROP INDEX
# DROP INDEX IDX_WINE_REVIEWS_NEW_PROVINCE ON WINE_REVIEWS_NEW

# there no method for dropping index, you simply set index to another column 
wine_test = wine_test.set_index(['country'])