介绍
公众号:ChallengeHub
本文主要给大家介绍一些顶级的自动话EDA工具,并且通过实例展示具体效果。代码链接:https://www.kaggle.com/andreshg/automatic-eda-libraries-comparisson/notebook
AutoViz
AutoViz在众多免费软件Pythonic Rapid EDA Automation工具中脱颖而出,运行速度比较快,这比其竞争对手SweetViz或Pandas Profiling表现更好。
安装方式
!pip install git+git://github.com/AutoViML/AutoViz.git
!pip install xlrd
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
dftc = AV.AutoViz(
filename='',
sep='' ,
depVar='target',
dfte=df,
header=0,
verbose=1,
lowess=False,
chart_format='png',
max_rows_analyzed=300000,
max_cols_analyzed=30
)
Pandas Profiling
from pandas_profiling import ProfileReport
df = pd.read_csv('/kaggle/input/titanic/train.csv')
report = ProfileReport(df)
# Start of Pandas Profiling process
start_time = dt.datetime.now()
print("Started at ", start_time)
report
SweetViz
!pip install sweetviz
import sweetviz as sv
df = pd.read_csv('/kaggle/input/credit-card-customers/BankChurners.csv').head(2000)
advert_report = sv.analyze([df, 'Data'])
advert_report.show_html()
print('SweetViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)
D-Tale
!pip install dtale
import dtale
dtale.show(df)
官网链接:https://github.com/man-group/dtale
Dataprep
!pip install -U dataprep
from dataprep.eda import plot, plot_correlation
plot(df)
plot_correlation(df)
plot(df, "Customer_Age")
plot(df, "Customer_Age", "Gender")
[1]:Pandas Profiling GitHub - https://github.com/pandas-profiling/pandas-profiling
[2]: Dan Roth, AutoViz: A New Tool for Automated Visualization - https://towardsdatascience.com/autoviz-a-new-tool-for-automated-visualization-ec9c1744a6ad
[3]: George Vyshnya, PROs and CONs of Rapid EDA Tools - https://medium.com/sbc-group-blog/pros-and-cons-of-rapid-eda-tools-e1ccd159ab07
[4]: SweetViz - https://towardsdatascience.com/sweetviz-automated-eda-in-python-a97e4cabacde
[5]:DataPrep - https://sfu-db.github.io/dataprep/user_guide/eda/plot.html
欢迎扫码关注ChallengeHub公众号,讨论学习更多机器学习,数据分析等知识