利用pandas和sklearn做一个城市关联价值的量化程序

Cii为city_a的被泰晤士报的新闻报道次数,Cjj为city_b的被泰晤士报的新闻报道次数
Dij为i到j的距离
K为常数
通过以下公式:
计算核心公式

Iij为计算出来的预测联合报道次数。
已知Cij为实际city_a和city_b的联合报道次数。
最后通过线性回归的回归系数高低来确定最佳的距离的β值。
有三个相关数据的表格:

第一个为city_a和city_b的单独的报道次数的表格:

city newsCount
Hefei 67
Beijing 19820
Chongqing 448
Fuzhou 32
Guangzhou 842
Lanzhou 46
Nanning 72
Guiyang 23
Zhengzhou 61
Wuhan 233
Shijiazhuang 43
Haikou 13
Hong Kong 12630
Harbin 150
Changsha 105
Changchun 52
Nanjing 490
Nanchang 30
Shenyang 113
Macau 667
Hohhot 19
Yinchuan 9
Xining 34
Chengdu 496
Jinan 107
Shanghai 6777
Xi’an 87
Taiyuan 48
Tianjin 429
Taipei 676
Urumqi 200
Lhasa 373
Kunming 150
Hangzhou 368

第二个为城市位置的表格文件:

name_eng lon lat x y
Macau 113.54913 22.19875 888534.1928 2343374.489
Beijing 116.4124971 40.18461034 953797.005 4375584.969
Chengdu 103.9340289 30.6531764 -100832.9845 3245509.941
Fuzhou 119.1743692 26.05235244 1409434.7 2831515.381
Guangzhou 113.5337844 23.34934033 876081.9661 2469283.982
Guiyang 106.7062472 26.8414195 168722.3128 2819960.427
Harbin 127.9569832 45.63718503 1765303.817 5138617.665
Haikou 110.4218295 19.83079497 578040.8177 2062220.062
Hangzhou 119.4685546 29.89911291 1376097.269 3260772.418
Hefei 117.30794 31.79322 1145431.278 3444250.643
Hohhot 111.62299 40.80772 549636.5031 4409370.972
Jinan 117.0891579 36.73882267 1057375.583 3995764.639
Kunming 102.8726119 25.38750377 -213823.3144 2659215.073
Lhasa 91.09022765 30.03667622 -1321151.873 3268457.354
Lanzhou 103.6378312 36.35910009 -120021.8798 3889209.094
Nanchang 116.0255492 28.66004226 1065577.423 3080745.072
Nanjing 118.8423612 31.92642614 1285272.778 3477899.045
Nanning 108.4633713 23.05708563 357039.6809 2405652.99
Shanghai 121.4417996 31.21068025 1537873.683 3435260.569
Shenyang 123.1388086 42.09567633 1471695.248 4671040.445
Shijiazhuang 114.439005 38.13179374 811429.5509 4127376.824
Taipei 121.520076 25.030724 1659454.132 2755761.311
Taiyuan 112.3153931 37.95938758 630685.3457 4092532.609
Tianjin 117.3266353 39.30134541 1042171.589 4285728.003
Urumqi 87.78076963 43.72814955 -1366174.253 4838577.231
Wuhan 114.3423571 30.62356103 882743.8149 3283205.895
Xi’an 108.7865133 34.10599458 343267.556 3640410.483
Xining 101.4496242 36.81745372 -310917.4332 3945756.521
Hong Kong 114.16546 22.27534 951640.2323 2357458.289
Yinchuan 106.3546836 38.22782175 116480.0669 4100424.007
Changchun 125.7672074 44.38323421 1628651.043 4964253.621
Changsha 113.1531649 28.22737627 792675.7447 3005703.088
Zhengzhou 113.4703981 34.62717798 762127.2487 3725235.755
Chongqing 107.8745734 30.05726464 273817.4672 3182029.762

第三个为已知的联合报道次数的表格文件:

city_a city_b Cij
Beijing Urumqi 11
Beijing Shanghai 442
Beijing Taipei 27
Taipei Hangzhou 1
Shenyang Shanghai 7
Beijing Harbin 7
Beijing Lhasa 20
Guangzhou Shanghai 87
Beijing Shijiazhuang 3
Shanghai Hangzhou 36
Wuhan Shanghai 12
Nanning Macau 2
Guangzhou Macau 10
Chongqing Nanjing 9
Beijing Macau 15
Beijing Chengdu 18
Beijing Hangzhou 17
Chengdu Shanghai 20
Chengdu Hangzhou 6
Beijing Changsha 6
Chongqing Chengdu 16
Beijing Guangzhou 71
Shanghai Taipei 10
Guangzhou Shenyang 3
Harbin Changchun 1
Chongqing Guangzhou 8
Chongqing Shanghai 20
Chongqing Kunming 5
Guangzhou Nanjing 10
Guangzhou Chengdu 19
Guangzhou Kunming 4
Nanjing Chengdu 7
Nanjing Shanghai 38
Nanjing Kunming 6
Chengdu Kunming 5
Shanghai Kunming 9
Urumqi Kunming 2
Wuhan Shenyang 4
Beijing Shenyang 8
Zhengzhou Shijiazhuang 2
Zhengzhou Taiyuan 1
Zhengzhou Tianjin 3
Shijiazhuang Taiyuan 1
Shijiazhuang Tianjin 2
Taiyuan Tianjin 1
Macau Shanghai 16
Beijing Tianjin 46
Shenyang Macau 3
Shenyang Chengdu 7
Macau Chengdu 8
Macau Taipei 3
Shenyang Hangzhou 2
Macau Hangzhou 1
Shanghai Tianjin 25
Beijing Wuhan 9
Nanjing Tianjin 6
Nanjing Urumqi 1
Tianjin Urumqi 1
Beijing Chongqing 20
Beijing Nanjing 16
Nanjing Hangzhou 5
Chongqing Jinan 5
Guangzhou Nanning 2
Guangzhou Shijiazhuang 1
Guangzhou Haikou 1
Guangzhou Changsha 4
Guangzhou Nanchang 2
Guangzhou Jinan 2
Guangzhou Tianjin 16
Nanning Shijiazhuang 1
Nanning Haikou 1
Nanning Changsha 1
Nanning Nanjing 2
Nanning Nanchang 2
Nanning Shenyang 2
Nanning Chengdu 1
Nanning Jinan 2
Shijiazhuang Haikou 1
Shijiazhuang Changsha 3
Shijiazhuang Nanjing 1
Shijiazhuang Nanchang 1
Shijiazhuang Shenyang 2
Shijiazhuang Chengdu 2
Shijiazhuang Jinan 2
Haikou Changsha 1
Haikou Nanjing 1
Haikou Nanchang 1
Haikou Shenyang 1
Haikou Chengdu 1
Haikou Jinan 1
Changsha Nanjing 1
Changsha Nanchang 1
Changsha Shenyang 2
Changsha Chengdu 5
Changsha Jinan 3
Nanjing Nanchang 2
Nanjing Shenyang 3
Nanjing Jinan 3
Nanchang Shenyang 2
Nanchang Chengdu 1
Nanchang Jinan 2
Shenyang Jinan 3
Shenyang Tianjin 5
Chengdu Jinan 3
Chengdu Tianjin 5
Jinan Tianjin 2
Shanghai Lhasa 3
Chongqing Tianjin 6
Beijing Kunming 10
Lhasa Kunming 1
Beijing Xining 1
Xining Lhasa 5
Tianjin Kunming 3
Zhengzhou Shanghai 5
Chongqing Urumqi 1
Harbin Shanghai 6
Guangzhou Hangzhou 6
Changsha Shanghai 4
Changsha Hangzhou 3
Beijing Zhengzhou 3
Zhengzhou Changsha 3
Zhengzhou Wuhan 2
Guangzhou Harbin 3
Harbin Shenyang 2
Harbin Chengdu 2
Taiyuan Hangzhou 1
Fuzhou Shanghai 1
Chengdu Urumqi 1
Shanghai Urumqi 2
Guiyang Wuhan 2
Guiyang Nanjing 2
Guiyang Chengdu 2
Guiyang Kunming 2
Wuhan Nanjing 4
Wuhan Chengdu 5
Wuhan Kunming 3
Wuhan Changsha 6
Wuhan Tianjin 4
Chongqing Wuhan 5
Chongqing Zhengzhou 2
Wuhan Harbin 1
Wuhan Hohhot 1
Harbin Hohhot 2
Hefei Shijiazhuang 2
Hefei Shanghai 2
Shijiazhuang Shanghai 1
Chongqing Changsha 4
Guangzhou Guiyang 3
Beijing Fuzhou 2
Fuzhou Hangzhou 2
Macau Lhasa 1
Beijing Jinan 5
Hefei Beijing 3
Hefei Kunming 2
Harbin Jinan 2
Beijing Changchun 3
Chongqing Guiyang 2
Chongqing Nanchang 2
Chongqing Hangzhou 2
Guiyang Nanchang 2
Guiyang Tianjin 1
Guiyang Hangzhou 1
Nanchang Tianjin 2
Nanchang Hangzhou 1
Tianjin Hangzhou 5
Guangzhou Wuhan 4
Fuzhou Guangzhou 2
Beijing Yinchuan 1
Guangzhou Zhengzhou 1
Guangzhou Yinchuan 1
Zhengzhou Yinchuan 1
Yinchuan Shanghai 1
Shenyang Kunming 2
Changsha Kunming 2
Chengdu Lhasa 3
Hefei Chongqing 2
Lanzhou Guiyang 1
Jinan Hangzhou 1
Wuhan Hangzhou 1
Guangzhou Taipei 4
Jinan Shanghai 4
Hefei Fuzhou 1
Hefei Wuhan 1
Fuzhou Wuhan 1
Changsha Taiyuan 1
Fuzhou Chengdu 1
Fuzhou Tianjin 1
Beijing Guiyang 2
Zhengzhou Kunming 1
Harbin Tianjin 1
Beijing Lanzhou 3
Zhengzhou Chengdu 1
Harbin Changsha 2
Lanzhou Hangzhou 1
Hefei Zhengzhou 1
Hefei Changsha 3
Chongqing Macau 2
Hefei Tianjin 1
Changsha Tianjin 1
Hefei Guangzhou 1
Hefei Nanning 1
Hefei Guiyang 1
Hefei Harbin 1
Hefei Nanjing 1
Hefei Nanchang 1
Hefei Shenyang 1
Hefei Macau 1
Hefei Hohhot 1
Hefei Chengdu 1
Hefei Jinan 1
Beijing Hohhot 1
Chongqing Harbin 1
Chongqing Hohhot 1
Guangzhou Hohhot 1
Nanning Guiyang 1
Nanning Zhengzhou 1
Nanning Wuhan 1
Nanning Harbin 1
Nanning Hohhot 1
Nanning Shanghai 1
Nanning Tianjin 1
Nanning Kunming 1
Guiyang Harbin 1
Guiyang Changsha 1
Guiyang Macau 1
Guiyang Hohhot 1
Guiyang Jinan 1
Guiyang Shanghai 1
Zhengzhou Nanjing 1
Zhengzhou Nanchang 1
Zhengzhou Shenyang 1
Wuhan Nanchang 1
Wuhan Macau 1
Wuhan Jinan 1
Harbin Nanjing 1
Harbin Nanchang 1
Harbin Macau 1
Harbin Kunming 1
Changsha Hohhot 1
Nanjing Macau 1
Nanjing Hohhot 1
Nanchang Macau 1
Nanchang Hohhot 1
Nanchang Shanghai 1
Nanchang Kunming 1
Shenyang Hohhot 1
Macau Hohhot 1
Macau Jinan 1
Macau Tianjin 1
Macau Kunming 1
Hohhot Chengdu 1
Hohhot Jinan 1
Hohhot Shanghai 1
Hohhot Tianjin 1
Hohhot Kunming 1
Jinan Kunming 1
Yinchuan Taipei 2
Chengdu Taipei 1

表格文件自取:
点我即可获得表格文件下载地址,自备梯子

代码如下:

from pandas import Series,DataFrame
from sklearn import linear_model
import pandas as pd
import tkinter as tk
import numpy as np
import matplotlib.pyplot as plt
from tkinter import filedialog

#设置基本参数
K=1

#设置基本变量
I=[]
D=[]
Cii=[]
Cjj=[]
R2=[]
X=[]#回归模型的x坐标


#输入count文件
print('select count file in csv format\n')
root = tk.Tk()
root.withdraw()
file_path_count = filedialog.askopenfilename()
data_count=pd.read_csv(file_path_count)

#输入位置文件
print('select position file in csv format\n')
root = tk.Tk()
root.withdraw()
file_path_pos = filedialog.askopenfilename()
data_pos=pd.read_csv(file_path_pos)

#输入Cij和城市顺序文件
print('select Cij file in csv format\n')
root = tk.Tk()
root.withdraw()
file_path_comp = filedialog.askopenfilename()
data_comp=pd.read_csv(file_path_comp)


beta=np.linspace(0,1,101)#设置beta待选值

Cij = np.array(data_comp.loc[:, 'Cij'])

Rmax=0
Best_Beta=0

for i in beta:
    j=0#设置迭代器并且在一种beta算完之后重置迭代器j
    for index,row in data_comp.iterrows():
        city_i=row['city_a']
        city_j = row['city_b']
        vector1=np.array([np.float64(data_pos.loc[data_pos['name_eng']==city_i].x),np.float64(data_pos.loc[data_pos['name_eng']==city_i].y)])#记录city_a的坐标
        vector2=np.array([np.float64(data_pos.loc[data_pos['name_eng']==city_j].x),np.float64(data_pos.loc[data_pos['name_eng']==city_j].y)])#记录city_b的坐标
        D.append(np.linalg.norm(vector1-vector2))#将city_i到j的距离填充进列表
        Pi=np.int(data_count.loc[data_count['city']==city_i].newsCount)#获取Cii
        Pj = np.int(data_count.loc[data_count['city'] == city_j].newsCount)#获取Cjj
        Iij=K*Pi*Pj/np.power(D[j],i)
        X.append([Iij])
        I.append(Iij)#计算Iij并填充进列表
        Cii.append(Pi)#填充Cii
        Cjj.append(Pj)#填充Cjj
        j=j+1
    temp={"distance":D,"cii":Cii,"cjj":Cjj,"Iij":I}#拼接几个变量列表
    mix=DataFrame(temp)#转换为数据框架结构
    line=linear_model.LinearRegression()
    line.fit(X,Cij)
    Rtemp=line.score(X,Cij)
    R2.append(Rtemp)
    if (Rtemp>Rmax):
        Rmax=Rtemp
        Best_Beta=i
        prediction = line.predict(X)
    result=pd.concat([data_comp,mix],axis=1)#按照列来拼接数据框架
    name='result_beta%.2f.csv'%i#批量命名输出的CSV文件的文件名
    result.to_csv(name, index=False)#输出CSV文件
    D.clear()#清空距离列表
    Cii.clear()#清空Cii列表
    Cjj.clear()#清空Cjj列表
    I.clear()#清空Iij列表
    X.clear()#清空回归模型的x轴坐标
    mix.drop(mix.index, inplace=True)#清空转换的数据框架
    result.drop(result.index, inplace=True)#清空拼接的最终版数据框架

print('Calculated!\n')

readfilename='result_beta%.2f.csv'%Best_Beta
rdfile=pd.read_csv(readfilename)#读取最佳R2的值的CSV文件

print('the best Beta = %f\nthe best R = %f'%(Best_Beta,Rmax))

plt.figure(1)
plt.plot(beta,R2)
plt.xlabel('β')
plt.ylabel('R Squared')

plt.figure(2)
plt.scatter(rdfile.loc[:,'Iij'],rdfile.loc[:,'Cij'],label='Cij and Iij')
plt.plot(rdfile.loc[:,'Iij'],prediction,color="black",linewidth=.5,label='Regression Graph')
plt.legend(loc='upper left')
x_max=np.max(np.array(rdfile.loc[:,'Iij']))
plt.xlabel('Iij')
plt.ylabel('Cij')
plt.text(x_max/2,0,'the best Beta = %.2f\nthe best R^2 = %f'%(Best_Beta,Rmax),horizontalalignment='left')

plt.show()

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值