# python 分类变量回归_Python数据挖掘—回归—逻辑回归

1、读取数据；

encoding="utf-8")

data=data.dropna()

dummyColumns=()

data.shape

2、处理字符型和大小无关的字段，如果字段有可比性，可进行大小比较，然后调用map一一映射，将离散型数据转化为数值型数据

#首先处理字符类型和大小无关的字段

dummyColumns=['Gender','Home Ownership','Internet Connection', 'Marital Status','Movie Selector', 'Prerec Format', 'TV Signal']for column indummyColumns:

data[column]=data[column].astype('category')

dummiesData=pandas.get_dummies(

data,

columns=dummyColumns,

prefix=dummyColumns,

prefix_sep=" ", #列名和属性值之间的分割符号

drop_first=True) #根据特征列建模，为避免模型共轭，只选取一列

data.Gender.unique()#去重

dummiesData.columns#获取所有列

#有可比性，可进行大小比较

educationLevelDict={'Post-Doc': 9,'Doctorate': 8,'Master\'s Degree': 7,'Bachelor\'s Degree': 6,'Associate\'s Degree': 5,'Some College': 4,'Trade School': 3,'High School': 2,'Grade School': 1}#调用map一一映射，将离散型数据转化为数值型数据

dummiesData["Education Level Map"]=dummiesData['Education Level'].map(educationLevelDict)

freqMap={'Never':0,'Rarely': 1,'Monthly': 2,'Weekly': 3,'Daily': 4}

dummiesData['PPV Freq Map']=dummiesData['PPV Freq'].map(freqMap)

dummiesData['Theater Freq Map'] = dummiesData['Theater Freq'].map(freqMap)

dummiesData['TV Movie Freq Map'] = dummiesData['TV Movie Freq'].map(freqMap)

dummiesData['Prerec Renting Freq Map'] = dummiesData['Prerec Renting Freq'].map(freqMap)

dummiesData['Prerec Viewing Freq Map'] = dummiesData['Prerec Viewing Freq'].map(freqMap)

3、选取自标量和因变量，县选取所有列，然后一一查看选择

1 #选取自变量和因变量

2 dummiesData.columns3

4 #先选取所有列，然后一一查看选择

5 dummiesSelect =[6 'Age', 'Num Bathrooms', 'Num Bedrooms', 'Num Cars', 'Num Children', 'Num TVs',7 'Education Level Map', 'PPV Freq Map', 'Theater Freq Map', 'TV Movie Freq Map',8 'Prerec Buying Freq Map', 'Prerec Renting Freq Map', 'Prerec Viewing Freq Map',9 'Gender Male',10 'Internet Connection DSL', 'Internet Connection Dial-Up',11 'Internet Connection IDSN', 'Internet Connection No Internet Connection',12 'Internet Connection Other',13 'Marital Status Married', 'Marital Status Never Married',14 'Marital Status Other', 'Marital Status Separated',15 'Movie Selector Me', 'Movie Selector Other', 'Movie Selector Spouse/Partner',16 'Prerec Format DVD', 'Prerec Format Laserdisk', 'Prerec Format Other',17 'Prerec Format VHS', 'Prerec Format Video CD',18 'TV Signal Analog antennae', 'TV Signal Cable',19 'TV Signal Digital Satellite', 'TV Signal Don\'t watch TV'

20 ]21

22 inputData=dummiesData[dummiesSelect] #自变量23

24

25 outputData=dummiesData[["Home Ownership Rent"]] #因变量

4、建模、训练、评分

1 #建模、训练

2 from sklearn importlinear_model3

4 lrModel=linear_model.LogisticRegression()5

6 lrModel.fit(inputData,outputData)7

8 lrModel.score(inputData,outputData)

5、预测(因为逻辑回归所用的参数是经过虚拟变量处理过的，所以新数据也许通过处理才能进行预测)

1 #因为逻辑回归所用的参数是经过虚拟变量处理过的，需对新的数据进行预测，要先处理新数据

6 for column indummyColumns:7 newData[column]=newData[column].astype(8 "category",9 categories=data[column].cat.categories)10

11 newData=newData.dropna()12

13

14 newData['Education Level Map'] = newData['Education Level'].map(educationLevelDict)15 newData['PPV Freq Map'] = newData['PPV Freq'].map(freqMap)16 newData['Theater Freq Map'] = newData['Theater Freq'].map(freqMap)17 newData['TV Movie Freq Map'] = newData['TV Movie Freq'].map(freqMap)18 newData['Prerec Buying Freq Map'] = newData['Prerec Buying Freq'].map(freqMap)19 newData['Prerec Renting Freq Map'] = newData['Prerec Renting Freq'].map(freqMap)20 newData['Prerec Viewing Freq Map'] = newData['Prerec Viewing Freq'].map(freqMap)21

22

23 dummiesNewData=pandas.get_dummies (24 newData,25 columns=dummyColumns,26 prefix=dummyColumns,27 prefix_sep=" ",28 drop_first=True)29

30 inputNewData =dummiesNewData[dummiesSelect]31

32 lrModel.predict(inputData)

get_dummies(data,prefix=None,prefix_sep="_",dummy_na=False,columns=None,drop_first=False)

① data   要处理的DataFrame

② prefix 列名的前缀，在多个列有相同的离散项时候使用

③ prefix_sep 前缀和离散值的分隔符，默认为下划线，默认即可

④ dummy_na 是否把NA值，作为一个离散值进行处理，默认为不处理

⑤ columns 要处理的列名，如果不指定该列，那么默认处理所有列

⑥ drop_first 是否从备选项中删除第一个，建模的时候为避免共线性使用

• 0
点赞
• 0
评论
• 0
收藏
• 一键三连
• 扫一扫，分享海报

10-04 54
02-11 4451
07-14 4332
07-19 2394
08-09 2731
11-04 113
10-30 6637
12-10 6万+
04-30 922
12-28 1503
06-06 2万+

¥2 ¥4 ¥6 ¥10 ¥20