目录
统计内存使用情况
info
ataFram对象调用 info() 时会显示 DataFrame 的内存使用情况(包括索引)。
例如,调用 info() 时会显示下面的 DataFrame 的内存使用情况:
import pandas as pd
import numpy as np
dtypes = [
"int8",
"uint8",
"int16",
"int32",
"int64",
"float64",
"datetime64[ns]",
"timedelta64[ns]",
"complex128",
"object",
"bool",
]
n = 5000
data = {"col_"+t: np.random.randint(100, size=n).astype(t) for t in dtypes}
df = pd.DataFrame(data)
df["categorical"] = df["col_object"].astype("category")
df.info()
# output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col_int8 5000 non-null int8
1 col_uint8 5000 non-null uint8
2 col_int16 5000 non-null int16
3 col_int32 5000 non-null int32
4 col_int64 5000 non-null int64
5 col_float64 5000 non-null float64
6 col_datetime64[ns] 5000 non-null datetime64[ns]
7 col_timedelta64[ns] 5000 non-null timedelta64[ns]
8 col_complex128 5000 non-null complex128
9 col_object 5000 non-null object
10 col_bool 5000 non-null bool
11 categorical 5000 non-null category
dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int16(1), int32(1), int64(1), int8(1), object(1), timedelta64[ns](1), uint8(1)
memory usage: 327.2+ KB
+ 符号表示实际内存使用量可能更高,因为 pandas 不计算 dtype=object 列中的值使用的内存。
传递 memory_usage='deep' 将启用更准确的内存使用报告,说明所包含对象的全部使用情况。 这是可选的,因为进行这种更深入的内省可能会很昂贵。
df.info(memory_usage="deep")
# output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col_int8 5000 non-null int8
1 col_uint8 5000 non-null uint8
2 col_int16 5000 non-null int16
3 col_int32 5000 non-null int32
4 col_int64 5000 non-null int64
5 col_float64 5000 non-null float64
6 col_datetime64[ns] 5000 non-null datetime64[ns]
7 col_timedelta64[ns] 5000 non-null timedelta64[ns]
8 col_complex128 5000 non-null complex128
9 col_object 5000 non-null object
10 col_bool 5000 non-null bool
11 categorical 5000 non-null category
dtypes: bool(1), category(1), complex128(1), datetime64[ns](1), float64(1), int16(1), int32(1), int64(1), int8(1), object(1), timedelta64[ns](1), uint8(1)
memory usage: 463.8 KB
memory_usage
每列的内存使用情况可以通过调用memory_usage()方法得到。 这将返回一个 Series,其索引由列名和每列的内存使用情况表示,以字节为单位。 对于上面的DataFrame,可以通过memory_usage方法查看每一列的内存使用量和总内存使用量:
如果要获取准确内存时候情况,可以开启参数deep=True
df.memory_usage(deep=True)
# output
Index 128
col_int8 5000
col_uint8 5000
col_int16 10000
col_int32 20000
col_int64 40000
col_float64 40000
col_datetime64[ns] 40000
col_timedelta64[ns] 40000
col_complex128 80000
col_object 179800
col_bool 5000
categorical 9968
dtype: int64
df.memory_usage(deep=True).sum()
#output
474896
数据类型和内存的关系
Data type | Description |
---|---|
bool_ | Boolean (True or False) stored as a byte |
int_ | Default integer type (same as C long ; normally either int64 or int32 ) |
intc | Identical to C int (normally int32 or int64 ) |
intp | Integer used for indexing (same as C ssize_t ; normally either int32 or int64 ) |
int8 | Byte (-128 to 127) |
int16 | Integer (-32768 to 32767) |
int32 | Integer (-2147483648 to 2147483647) |
int64 | Integer (-9223372036854775808 to 9223372036854775807) |
uint8 | Unsigned integer (0 to 255) |
uint16 | Unsigned integer (0 to 65535) |
uint32 | Unsigned integer (0 to 4294967295) |
uint64 | Unsigned integer (0 to 18446744073709551615) |
float_ | Shorthand for float64 . |
float16 | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa |
float32 | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa |
float64 | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa |
complex_ | Shorthand for complex128 . |
complex64 | Complex number, represented by two 32-bit floats |
complex128 | Complex number, represented by two 64-bit floats |