DataCamp “Data Scientist with Python track” 第八章 pandas Foundations 学习笔记

最新推荐文章于 2022-06-13 22:41:52 发布

DeclanNYC

最新推荐文章于 2022-06-13 22:41:52 发布

阅读量571

点赞数

文章标签： DataCamp Python 数据科学 pandas 数据处理

本文链接：https://blog.csdn.net/weixin_41803041/article/details/84604879

版权

这篇博客详细记录了在DataCamp 'Data Scientist with Python'课程中关于pandas Foundations的学习，包括DataFrame的复习，数据导入导出，探索性数据分析，时间序列操作和时间序列绘图等。重点介绍了pd.read_csv()的参数用法，数据可视化中的PDF和CDF绘制，使用.reindex()和.resample()处理时间序列数据，以及如何平滑数据和处理缺失值。还涉及了字符串搜索、时区转换和数据清洗等技巧。

摘要由CSDN通过智能技术生成

Review of pandas DataFrame

在这一节中我们重新回到了pandas package的学习，除了在之前提到过的一些命令以外（比如slice命令“.loc[]”），在下面这道题中还是用了新的values命令，注意这里values后面不加括号，此外注意log10的用法：

# Import numpy
import numpy as np

# Create array of DataFrame values: np_vals
np_vals = df.values

# Create new array of base 10 logarithm values: np_vals_log10
np_vals_log10 = np.log10(np_vals)

# Create array of new DataFrame by passing df to np.log10(): df_log10
df_log10 = np.log10(df)

# Print original and new data containers
[print(x, 'has type', type(eval(x))) for x in ['np_vals', 'np_vals_log10', 'df', 'df_log10']]

Importing & exporting data

在数据中我们常用到csv文件，但是基础的csv文件通常有如下问题：

可以在pd.read_csv()中添加argument：header=None来使得行列上拥有从index0开始的title，如图所示：

同样我们还可以使用names keyword：“names=col_names”，这样我们之前单独定义的col_names就可以使用了。

# Read the raw file as-is: df1
df1 = pd.read_csv(file_messy)

# Print the output of df1.head()
print(df1.head())

# Read in the file with the correct parameters: df2
df2 = pd.read_csv(file_messy, delimiter=' ', header=3, comment='#')

# Print the output of df2.head()
print(df2.head())

# Save the cleaned up DataFrame to a CSV file without the index
df2.to_csv(file_clean, index=False)

# Save the cleaned up DataFrame to an excel file without the index
df2.to_excel('file_clean.xlsx', index=False)

Visual exploratory data analysis

这一节主要讲了几种将数据plot的方法，并且引到了绘制PDF和CDF的图的方法，并且提到了如何将生成的图像进行保存的方法。其中对于PDF和CDF部分，要注意几个argument的写法：

# This formats the plots such that they appear on separate rows
fig, axes = plt.subplots(nrows=2, ncols=1)

# Plot the PDF
df.fraction.plot(ax=axes[0], kind='hist', normed=True, bins=30, ran

最低0.47元/天解锁文章

DeclanNYC

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
DataCamp “Data Scientist with Python track” 第八章 pandas Foundations 学习笔记

Review of pandas DataFrame在这一节中我们重新回到了pandas package的学习，除了在之前提到过的一些命令以外（比如slice命令“.loc[]”），在下面这道题中还是用了新的values命令，注意这里values后面不加括号，此外注意log10的用法：# Import numpyimport numpy as np# Create array ...
复制链接

扫一扫