pandas_profiling
extends the pandas DataFrame with df.profile_report()
for quick data analysis.
结果由以下部分组成
- Type inference
- Essentials: type, unique values, missing values
- Quantile statistics 最小值, Q1, median, Q3, 最大值, range,四分位数
- Descriptive statistics like
mean
,mode
,standard deviation(标准差)
, sum, median absolute deviation,coefficient of variation(变异系数)
,kurtosis(峰度)
,skewness(偏态)
- Most frequent values
- Histogram
- Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
- Missing values matrix, count, heatmap and dendrogram of missing values
- Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
- File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.
Example
-
Census Income(US Adult Census data relating income)
-
NASA Meteorites (comprehensive set of meteorite landings)
-
Titanic (the “Wonderwall” of datasets)
-
NZA (open data from the Dutch Healthcare Authority)
-
Stata Auto(1978 Automobile data)
-
Vektis(Vektis Dutch Healthcare data)
-
Colors(a simple colors dataset)
-
Russian Vocabulary(demonstrates text analysis)
-
Cats and Dogs (demonstrates image analysis from the file system)
-
Celebrity Faces (demonstrates image analysis with EXIF information)
-
Website Inaccessibility (demonstrates URL analysis)
-
Orange prices Coal prices (showcases report themes)
API
from pandas_profiling import ProfileReport
profile=ProfileReport(df,title="")
防止过量计算
profile = ProfileReport(large_dataset, minimal=True)
profile.to_file("output.html")
Report界面也可以设置,详情参考github页面,Explore deeper
命令
pandas_profiling input_file output_file
参数之后再看 能看懂
保存Report
profile.to_file("your_report.html")
或者
# As a string
json_data = profile.to_json()
# As a file
profile.to_file("your_report.json")
数据类型
目前识别的数据类型
- Boolean
- Numerical
- Date
- Categorical
- URL
- Path
- File
- Image
更详细的看visdom
集成到pycharm
集成之后直接右击文件即可生成report.html
参考github