Exploratory Data Analysis (EDA) is used to explore different aspects of the data we are working on. EDA should be performed in order to find the patterns, visual insights, etc. that the data set is having, before creating a model or predicting something through the dataset. EDA is a general approach of identifying characteristics of the data we are working on by visualizing the dataset. EDA is performed to visualize what data is telling us before implementing any formal modelling or creating a hypothesis testing model.
探索性数据分析(EDA)用于探索我们正在处理的数据的不同方面。 在创建模型或通过数据集进行预测之前,应执行EDA以查找数据集所具有的模式,视觉见解等。 EDA是通过可视化数据集来识别我们正在处理的数据特征的通用方法。 在执行任何正式建模或创建假设检验模型之前,将执行EDA以可视化数据在告诉我们什么。
Analyzing a dataset is a hectic task and takes a lot of time, according to a study EDA takes around 40% effort of the machine learning project but it cannot be eliminated.
根据一项研究,分析数据集是一项繁重的任务,需要很多时间,EDA花费了机器学习项目约40%的精力,但不能消除它。
What is Sweetviz ?
什么是Sweetviz ?
SWEETVIZ is an open source Python library that generates beautiful, high-density visualizations to kickstart EDA (Exploratory Data Analysis) with a single line of code. Output is a fully self-contained HTML application.
SWEETVIZ是一个开放源代码的Python库,可生成精美,高密度的可视化文件,以单行代码 启动EDA(探索性数据分析) 。 输出是一个完全独立的HTML应用程序。
The system is built around quickly visualizing target values and comparing datasets. Its goal is to help quick analysis of target characteristics, training vs testing data, and other such data characterization tasks.
该系统围绕快速可视化目标值和比较数据集而构建。 其目标是帮助快速分析目标特征,训练与测试数据以及其他此类数据表征任务。
Make sure you visit https://pypi.org/project/sweetviz/ to explore more, and also consult and go through documentation of this library.
确保您访问https://pypi.org/project/sweetviz/来探索更多内容,并查阅并阅读该库的文档 。
Why is Sweetviz far better compared to Pandas Profiling ?
为什么Sweetviz比Pandas Profiling 好得多 ?
Sweetviz packs a powerful punch; in addition to creating insightful and beautiful visualizations with just two lines of code, it provides analysis that would take a lot more time to generate manually, including some that no other library provides so quickly such as:
Sweetviz具有强大的冲击力 ; 除了仅用两行代码即可创建有见地且美观的可视化文件外 ,它还提供了需要大量时间才能手动生成的分析,其中包括一些其他库无法提供的快速分析,例如:
a) Comparison of 2 datasets (e.g. Train vs Test)
a) 2个数据集的比较(例如,训练与测试)
b) Visualization of the target value against all other variables (e.g. “What was the survival rate of male vs female” etc.)
b) 可视化所有其他变量的目标值(例如“男性与女性的存活率是多少”等)
c) Pandas profiling is seen to give awful errors on large datasets and those containing many categorical