Easy Data Hacking & Mining & Exploring
*Note: This essay uses python for data mining. *
独立博客还有其他博文 懒得扒了 移步 Crazydogen’s indie blog
Agenda
- Pandas
- Pandas-Profiling
- Statsmodels
- Missingno
- Wordcloud
Pandas
Pandas is a Python library for exploring, processing, and model data.
Here we take a dataset named mimic-III as an example.
Basic stats
# First load df from a file
df.head()
df.shape
df[a column].mean()
df[a column].std()
df[a column].max()
df[a column].min()
df[a column].quantile()
df.describe() # brief description
df.isna().any() # check every columns whether it has missing values
df.isna().sum() # count NAN values
Additional tips
Charting a tabular dataset
Supported charts
DataFrame.plot([x, y], kind)
- kind :
- 'line': line plot (default)
- 'bar': vertical bar plot
- 'barh': horizontal bar plot
-