《Python数据科学手册》学习笔记

最新推荐文章于 2024-08-01 08:23:39 发布

Arahbo

最新推荐文章于 2024-08-01 08:23:39 发布

阅读量2.7k

点赞数 6

分类专栏： Python 文章标签： python

本文链接：https://blog.csdn.net/Arahbo/article/details/120400890

版权

本文是《Python数据科学手册》的学习笔记，涵盖了IPython的魔法命令，如%paste、%cpaste和%lprun，以及数据处理中的聚合、层级索引和数据合并。此外，还涉及Pandas的性能优化，如eval()和query()，Matplotlib和Seaborn的数据可视化技巧，以及Scikit-Learn的基础应用，包括线性回归、支持向量机和高斯混合模型。文章记录了在不同章节遇到的问题及解决方案，如处理时间序列数据时的异常，以及Seaborn图形和Scikit-Learn库的版本更新问题。

摘要由CSDN通过智能技术生成

前言

软件安装注意事项

Miniconda的可用下载地址：Miniconda — Conda documentation。但Miniconda需自己安装各Python程序包（新手不适）。建议直接使用Anaconda。

第1章

1.4 IPython魔法命令

1.4.1 粘贴代码块：%paste和%cpaste

%paste和%cpaste在Jupyter Notebook中不可用（%lsmagic魔法函数列表中也无对应项）。报错如下：

UsageError: Line magic function `%paste` not found.

实测在IPython中可用。

1.7 与shell相关的魔法命令

此处删不掉对应临时目录（本节内容应是在Anaconda Powershell Prompt下运行ipython）：

In [20]: rm -r tmp

1.9 代码的分析和计时

1.9.3 用%lprun进行逐行分析

Python3.7下安装line-profiler需Visual Studio 2017支持。

第2章

2.4 聚合：最小值、最大值和其他值

2.4.3 示例：美国总统的身高是多少

In[13]:!head -4 data/president_heights.csv

对应Windows系统下用type指令查看文件内容：

In[13]:!type data\president_heights.csv

第3章

3.6层级索引

3.6.2 多级索引的创建方法

In[17]:pd.MultiIndex(levels=[['a', 'b'], [1, 2]],

labels=[[0, 0, 1, 1], [0, 1, 0, 1]])

Out[17]:MultiIndex(levels=[['a', 'b'], [1, 2]],

           codes=[[0, 0, 1, 1], [0, 1, 0, 1]])

d:\Users\Administrator\Anaconda3\lib\site-packages\ipykernel_launcher.py:2: FutureWarning: the 'labels' keyword is deprecated, use 'codes' instead

现版本’labels’已经被’codes’取代。

3.7 合并数据集：Concat与Append操作

3.7.2 通过pd.concat实现简易合并

现版本axis=’col’需改为axis=’columns’

In[8]: df3 = make_df('AB', [0, 1])

df4 = make_df('CD', [0, 1])

print(df3); print(df4); print(pd.concat([df3, df4], axis='columns'))

3.9 累计与分组

3.9.1 行星数据

通过Seaborn下载行星数据失败：

In[2]: import seaborn as sns

planets = sns.load_dataset('planets')

URLError: <urlopen error [Errno 11004] getaddrinfo failed>

将电脑DNS设置改为114.114.114.114有可能修复

3.11 向量化字符串操作

3.11.3 案例：食谱数据库

新建一个字符串，将所有行JSON对象连接起来，然后再通过pd.read_json来读取所有数据：

In[20]: # read the entire file into a Python array

with open(' 'data/recipeitems-latest.json', 'r') as f:

# Extract each line

data = (line.strip() for line in f)

# Reformat so each line is the element of a list

data_json = "[{0}]".format(','.join(data))

会报错：

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 4058: illegal multibyte sequence

需改为：

In[20]: # 将文件内容读取

最低0.47元/天解锁文章

Arahbo

关注

6
点赞
踩
21

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录