python收入分析_python美国人口收入分析

最新推荐文章于 2022-06-02 11:48:47 发布

weixin_39603609

最新推荐文章于 2022-06-02 11:48:47 发布

阅读量423

点赞数

文章标签： python收入分析

数据描述：

数据探索：

我们在jupyter上利用python语言进行探索，探索代码如下：

# Data Manipulation 数据操作

import numpy as np

import pandas as pd

# Visualization 可视化

import matplotlib.pyplot as plt

import missingno

import seaborn as sns

from pandas.plotting import scatter_matrix

from mpl_toolkits.mplot3d import Axes3D

#读入数据

df = pd.read_csv('data.csv', header=None)

df.head()

# 数据信息

df.info()

数据清洗：

在jupyter上利用python语言进行清洗，出去无效信息，空值，代码如下：

# 数据处理：删除缺失值样本

# 将?字符串替换为NaN缺失值标志

import numpy as np

df.replace("?",np.nan,inplace=True)

# 此处直接删除缺失值样本(包含缺失值的行都删除)

df.dropna(inplace=True)

数据分析：

在jupyter上利用python语言进行分析，代码如下：

# 展示所有种类型特征

df.describe(include=['O'])

数据模型建立：

在jupyter上利用python语言进行建模，代码如下：

# 单特征展示

import math

def plot_distribution(dataset, cols=5, width=20, height=15, hspace=0.2, wspace=0.5):

plt.style.use('seaborn-whitegrid')

fig = plt.figure(figsize=(width,height))

fig.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=wspace, hspace=hspace)

rows = math.ceil(float(dataset.shape[1]) / cols)

for i, column in enumerate(dataset.columns):

ax = fig.add_subplot(rows, cols, i + 1)

ax.set_title(column)

if dataset.dtypes[column] == np.object:

g = sns.countplot(y=column, data=dataset)

substrings = [s.get_text()[:18] for s in g.get_yticklabels()]

g.set(yticklabels=substrings)

plt.xticks(rotation=25)

else:

#直方图，频数

g = sns.distplot(dataset[column])

plt.xticks(rotation=25)

plot_distribution(df, cols=3, width=20, height=20, hspace=0.45, wspace=0.5)

0-14依次为：'age', 'workclass', 'fnlwgt','education','education-num', 'marital-status', 'occupation','relationship', 'race', 'sex', 'capital-gain', 'capital-loss','hours-per-week', 'native-country','income' 单特征影响

weixin_39603609

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python收入分析_python美国人口收入分析

数据描述：数据探索：我们在jupyter上利用python语言进行探索，探索代码如下：# Data Manipulation 数据操作import numpy as npimport pandas as pd# Visualization 可视化import matplotlib.pyplot as pltimport missingnoimport seaborn as snsfrom pan...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。