机器学习实战——使用k邻近算法改进约会网站配对效果

一、资源准备

1、datingTestSet.txt 链接: https://pan.baidu.com/s/1mJ-9P_54PaP4hmiwBHMduw 提取码: wrua

二、实验环境 

Anaconda

numpy & matplotlib 包的安装

IDE:pycharm

三、实现过程

1、目的:datingTestSet.txt文件内的数据总共四列,前三列分别表示约会对象每年飞行距离、每次玩游戏的时间和每年消费的冰淇淋数量,最后一列表示我对他的喜欢程度,现在我想要通过这三个指标来判断这个约会对象是否适合我。

2、从文本中发现数据,将数据转化成样本矩阵和类标签向量。

from numpy import *
import matplotlib
import matplotlib.pyplot as plt

def file2matrix(filename):
    fr = open(filename)
    numberOfLines = len(fr.readlines())   # get the number of lines in the file
    returnMat = zeros((numberOfLines,3))  # prepare matrix to return,row and column output is nDArray
    classLabelVector = []                 # prepare labels return
    fr = open(filename)
    index = 0
    for line in fr.readlines():
        line = line.strip()               # remove space from the string
        listFromLine = line.split('\t')   # split the string from left to right
        returnMat[index,:] = listFromLine[0:3]
        # get the first three columns of data from "listFromLine",store in returnMat
        classLabelVector.append(int(listFromLine[-1]))
        # append the last one column in listFromLine to classLabelVector
        index += 1
    return returnMat,classLabelVector

3、使用matplotlib创建散点图分析数据

fig=plt.figure()                        # create a figure
ax1=fig.add_subplot(2,2,1)              # 2 rows and 2 column
plt.title("play game & ice cream cost",fontsize = 10)
plt.xlabel('play game/time %')
plt.ylabel('ice cream cost/week')
datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')
ax1.scatter(datingDataMat[:,1],datingDataMat[:,2],5*array(datingLabels),5*array(datingLabels),cmap='rainbow')
ax2=fig.add_subplot(2,2,2)              # return the specific ax
plt.title("fly distance & play game",fontsize = 10)
plt.xlabel('fly distance/year')
plt.ylabel('play game/time %')
ax2.scatter(datingDataMat[:,0],datingDataMat[:,1],5*array(datingLabels),5*array(datingLabels),cmap='rainbow') # column 1 & column 2
ax3=fig.add_subplot(2,2,3)
plt.title("fly distance & ice cream cost",fontsize = 10)
plt.xlabel('fly distance/year')
plt.ylabel('ice cream cost/week')
ax3.scatter(datingDataMat[:,0],datingDataMat[:,2],5*array(datingLabels),5*array(datingLabels),cmap='rainbow')
plt.tight_layout()    #  Automatically adjust subplot parameters to give specified padding.
plt.show()

 

                             

 

tips:

1、关于 scatter()函数,需要知道以下几点:

传入的x,y分别代表横纵坐标代表的项目,属于数组类型的数据,举例:代码中datingDataMat[:,1],datingDataMat[:,2],指的就是datingDataMat数组的第二列和第三列的所有行;s指的是每个标记的面积,代码中5*array(datingLabels),根据datingLabels包含三个不同的数值,那么这里面积就表示为三种不同大小;c指的是标记的颜色,值得注意的是当c的长度和x,y相同时,c中的值会自动映射到当前色彩映射中的颜色;cmap是用来改变标记颜色的,cmap可以有多种色谱,具体查看方法在下面,当然可以选择自己贼喜欢的那个来让散点图看的更顺眼。

2、ax2=fig.add_subplot(R,C,L),表示的是返回ax实例,三个参数分别表示(子图行数,子图列数,子图位置)

(2,2,2)代表子图从左到右,从上到下依次为1,2,3,4

3、plt.tight_layout() 是自动调整子图的布局位置,防止不同轴线堆叠在一起,详见:https://matplotlib.org/users/tight_layout_guide.html

官方文档对scatter函数的具体描述: 

    def scatter(self, x, y, s=None, c=None, marker=None, cmap=None, norm=None,
                vmin=None, vmax=None, alpha=None, linewidths=None,
                verts=None, edgecolors=None,
                **kwargs):
        """
        A scatter plot of *y* vs *x* with varying marker size and/or color.

        Parameters
        ----------
        x, y : array_like, shape (n, )
            The data positions.

        s : scalar or array_like, shape (n, ), optional
            The marker size in points**2.
            Default is ``rcParams['lines.markersize'] ** 2``.

        c : color, sequence, or sequence of color, optional, default: 'b'
            The marker color. Possible values:

            - A single color format string.
            - A sequence of color specifications of length n.
            - A sequence of n numbers to be mapped to colors using *cmap* and
              *norm*.
            - A 2-D array in which the rows are RGB or RGBA.

            Note that *c* should not be a single numeric RGB or RGBA sequence
            because that is indistinguishable from an array of values to be
            colormapped. If you want to specify the same RGB or RGBA value for
            all points, use a 2-D array with a single row.

        marker : `~matplotlib.markers.MarkerStyle`, optional, default: 'o'
            The marker style. *marker* can be either an instance of the class
            or the text shorthand for a particular marker.
            See `~matplotlib.markers` for more information marker styles.

        cmap : `~matplotlib.colors.Colormap`, optional, default: None
            A `.Colormap` instance or registered colormap name. *cmap* is only
            used if *c* is an array of floats. If ``None``, defaults to rc
            ``image.cmap``.


        Notes
        -----

        * The `.plot` function will be faster for scatterplots where markers
          don't vary in size or color.

        * Any or all of *x*, *y*, *s*, and *c* may be masked arrays, in which
          case all masks will be combined and only unmasked points will be
          plotted.

        * Fundamentally, scatter works with 1-D arrays; *x*, *y*, *s*, and *c*
          may be input as 2-D arrays, but within scatter they will be
          flattened. The exception is *c*, which will be flattened only if its
          size matches the size of *x* and *y*.

        """

cmap图谱查看代码: 

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.cm import cmap_d

cmap1 = sorted([i for i in cmap_d if i[-2:]!='_r'])
cmap2 = sorted([i for i in cmap_d if i[-2:]=='_r'])

nrows = max(len(cmap1),len(cmap2))
gradient = np.linspace(0,1,256)
gradient = np.vstack((gradient,gradient))

fig,axes = plt.subplots(figsize=(8,16),nrows=nrows,ncols=2)
fig.subplots_adjust(top=0.98,bottom=0.01,left=0.13,right=0.99,wspace=0.4)

def plot(ax,cmap):
    ax.imshow(gradient,aspect='auto',cmap=cmap)
    pos = list(ax.get_position().bounds)
    x_text = pos[0] - 0.01
    y_text = pos[1] + pos[3]/2
    fig.text(x_text,y_text,cmap,va='center',ha='right',fontsize=10)
    ax.set_axis_off()

for row,ax in enumerate(axes):
    plot(ax[0],cmap1[row])
    plot(ax[1],cmap2[row])

plt.show()

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值