pandas courses 2

最新推荐文章于 2024-07-17 09:15:39 发布

David_blog

最新推荐文章于 2024-07-17 09:15:39 发布

阅读量58

点赞数

文章标签： python

本文链接：https://blog.csdn.net/m0_53155317/article/details/125038457

版权

Data Types and Missing Values

Data Types

The data type for a column in a DataFrame or a Series is known as the dtype.

reviews.price.dtype

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MMVbwaJJ-1653838505975)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220525192249225.png)]

reviews.dtypes

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ld97uUWV-1653838505976)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220525192604271.png)]

columns consisting entirely of strings do not get their own type; they are instead given the object type.

we may transform the points column from its existing int64 data type into a float64 data type:

reviews.points.astype('float64')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KFPn954l-1653838505977)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220525192702067.png)]

Missing Values

Entries missing values are given the value NaN, short for “Not a Number”. For technical reasons these NaN values are always of the float64 dtype.

Pandas provides some methods specific to missing data. To select NaN entries you can use pd.isnull() (or its companion pd.notnull()). This is meant to be used thusly:

reviews[pd.isnull(reviews.country)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mspEqE5u-1653838505977)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220525194140126.png)]

Replacing missing values is a common operation. Pandas provides a really handy method for this problem: fillna(). fillna() provides a few different strategies for mitigating such data. For example, we can simply replace each NaN with an "Unknown":

reviews.region_2.fillna("Unknown")

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XmUKAiUU-1653838505978)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220525194258711.png)]

One way to reflect a value in the dataset is using the replace() method:

reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-oFw5gxvx-1653838505979)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220525194557016.png)]

The replace() method is worth mentioning here because it’s handy for replacing missing data which is given some kind of sentinel value in the dataset: things like "Unknown", "Undisclosed", "Invalid", and so on.

Renaming and Combining

Renaming

The first function we’ll introduce here is rename(), which lets you change index names and/or column names. For example, to change the points column in our dataset to score, we would do:

reviews.rename(columns={'points': 'score'})

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-yFRaKkm6-1653838505980)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220529215425992.png)]

reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VnJFAWDa-1653838505980)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220529220642985.png)]

Both the row index and the column index can have their own name attribute. The complimentary rename_axis() method may be used to change these names. For example:

reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RZVMnes6-1653838505981)(C:\Users\83989\AppData\Roaming\Typora\typora-user-images\image-20220529221748175.png)]

Combining

The simplest combining method is concat(). Given a list of elements, this function will smush those elements together along an axis.

canadian_youtube = pd.read_csv("../input/youtube-new/CAvideos.csv")
british_youtube = pd.read_csv("../input/youtube-new/GBvideos.csv")

pd.concat([canadian_youtube, british_youtube])

The middlemost combiner in terms of complexity is join(). join() lets you combine different DataFrame objects which have an index in common. For example, to pull down videos that happened to be trending on the same day in both Canada and the UK, we could do the following:

left = canadian_youtube.set_index(['title', 'trending_date'])
right = british_youtube.set_index(['title', 'trending_date'])

left.join(right, lsuffix='_CAN', rsuffix='_UK')

The lsuffix and rsuffix parameters are necessary here because the data has the same column names in both British and Canadian datasets. If this wasn’t true (because, say, we’d renamed them beforehand) we wouldn’t need them.

David_blog

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pandas courses 2

Data Types and Missing ValuesData TypesThe data type for a column in a DataFrame or a Series is known as the dtype.reviews.price.dtype[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MMVbwaJJ-1653838505975)(C:\Users\83989\AppData\Roaming\Typora\typora-user-imag
复制链接

扫一扫