【跟着stackoverflow学Pandas】-How do I get the row count of a Pandas dataframe-获取DataFrame行数

最新推荐文章于 2025-03-22 07:38:55 发布

探索者v

最新推荐文章于 2025-03-22 07:38:55 发布

阅读量1.3w

点赞数 2

本文链接：https://blog.csdn.net/tanzuozhev/article/details/77411467

版权

技术文档同时被 3 个专栏收录

56 篇文章

订阅专栏

python

32 篇文章

订阅专栏

pandas

8 篇文章

订阅专栏

最近做一个系列博客，跟着stackoverflow学Pandas。

专栏地址：http://blog.csdn.net/column/details/16726.html

以 pandas作为关键词，在stackoverflow中进行搜索，随后安照 votes 数目进行排序：
https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15

How do I get the row count of a Pandas dataframe-获取DataFrame行数

###数据准备

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000,3), columns=['col1', 'col2', 'col3'])
df.iloc[::2,0] = np.nan

获取行数

df.shape  # 得到df的行和列数
#(1000, 3)

df['col1'].count() #去除了NaN的数据
# 500

len(df.index)
# 1000

len(df)
# 1000

时间测评

因为CPU采用了缓存优化，所以计算的时间并不是很准确，但是也有一定的代表性。

%timeit df.shape
#The slowest run took 169.99 times longer than the fastest. This could mean that an intermediate result is being cached.
#1000000 loops, best of 3: 947 ns per loop

%timeit df['col1'].count()
#The slowest run took 50.63 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 22.6 µs per loop

%timeit len(df.index)
#The slowest run took 14.11 times longer than the fastest. This could mean that an intermediate result is being cached.
#1000000 loops, best of 3: 490 ns per loop

%timeit len(df)
#The slowest run took 18.61 times longer than the fastest. This could mean that an intermediate result is being cached.
#1000000 loops, best of 3: 653 ns per loop