《机器学习实战(Scala实现)》(二)——k-邻近算法

算法流程

1.计算中的set中每一个点与Xt的距离。
2.按距离增序排。
3.选择距离最小的前k个点。
4.确定前k个点所在的label的出现频率。
5.返回频率最高的label作为测试的结果。

实现

python

# -*- coding: utf-8 -*-  
'''
Created on 2017年3月18日

@author: soso
'''
from numpy import *
import operator

def createDataSet():
    group = array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]])
    labels = ['A', 'A', 'B', 'B']
    return group, labels

def classify0(inX, dataSet, labels, k):
    dataSetSize = dataSet.shape[0]
    # 函数形式: tile(A,rep)
    # 功能:重复A的各个维度
    # 参数类型:
    # - A: Array类的都可以
    # - rep:A沿着各个维度重复的次数
    diffMat = tile(inX, (dataSetSize, 1)) - dataSet
    sqDiffMat = diffMat ** 2
    # 当加入axis=1以后就是将一个矩阵的每一行向量相加
    sqDistances = sqDiffMat.sum(axis=1)
    distance = sqDistances * 0.5
    # argsort函数返回的是数组值从小到大的索引值
    sortedDistIndicies = distance.argsort()
    classCount = {}
    for i in range(k):
        votelabel = labels[sortedDistIndicies[i]]
        classCount[votelabel] = classCount.get(votelabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]

Scala

import scala.collection.mutable.Map

object kNN {

  def getGroup(): Array[Array[Double]] = {
    return Array(Array(1.0, 1.1), Array(1.0, 1.0), Array(0, 0), Array(0, 0.1))
  }
  def getLabels(): Array[Char] = {
    return Array('A', 'A', 'B', 'B')
  }

  def classify0(inX: Array[Double], dataSet: Array[Array[Double]], labels: Array[Char], k: Int): Char = {
    val dataSetSize = dataSet.length
    val sortedDisIndicies = dataSet.map { x =>
      val v1 = x(0) - inX(0)
      val v2 = x(1) - inX(1)
      v1 * v1 + v2 * v2
    }.zipWithIndex.sortBy(f => f._1).map(f => f._2)
    var classsCount: Map[Char, Int] = Map.empty
    for (i <- 0 to k - 1) {
      val voteIlabel = labels(sortedDisIndicies(i))
      classsCount(voteIlabel) = classsCount.getOrElse(voteIlabel, 0) + 1
    }
    classsCount.toArray.sortBy(f => -f._2).head._1
  }
  def main(args: Array[String]) {
    println(classify0(Array(0, 0), getGroup(), getLabels(), 3))
  }
}

这里写图片描述

Scala:Applied Machine Learning by Pascal Bugnion English | 23 Feb. 2017 | ISBN-13: 9781787126640 | 1843 Pages | EPUB/PDF (conv) | 33.15 MB Leverage the power of Scala and master the art of building, improving, and validating scalable machine learning and AI applications using Scala's most advanced and finest features. About This Book Build functional, type-safe routines to interact with relational and NoSQL databases with the help of the tutorials and examples provided Leverage your expertise in Scala programming to create and customize your own scalable machine learning algorithms Experiment with different techniques; evaluate their benefits and limitations using real-world financial applications Get to know the best practices to incorporate new Big Data machine learning in your data-driven enterprise and gain future scalability and maintainability Who This Book Is For This Learning Path is for engineers and scientists who are familiar with Scala and want to learn how to create, validate, and apply machine learning algorithms. It will also benefit software developers with a background in Scala programming who want to apply machine learning. What You Will Learn Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Deploy scalable parallel applications using Apache Spark, loading data from HDFS or Hive Solve big data problems with Scala parallel collections, Akka actors, and Apache Spark clusters Apply key learning strategies to perform technical analysis of financial markets Understand the principles of supervised and unsupervised learning in machine learning Work with unstructured data and serialize it using Kryo, Protobuf, Avro, and AvroParquet Construct reliable and robust data pipelines and manage data in a data-driven enterprise Implement scalable model monitoring and alerts with Scala In Detail This Learning Path aims to put the entire world of machine learning with Scala in fron
以下是使用Scala语言实现逻辑回归的Newton-Raphson算法的示例代码: ``` import breeze.linalg.{DenseMatrix, DenseVector} import breeze.numerics.{exp, log} import scala.annotation.tailrec object LogisticRegression { /** * Compute the sigmoid function * * @param z input value * @return sigmoid value */ def sigmoid(z: Double): Double = { 1.0 / (1.0 + exp(-z)) } /** * Compute the gradient of the log-likelihood function * * @param X design matrix * @param y target variable * @param weights current weights * @return gradient vector */ def gradient(X: DenseMatrix[Double], y: DenseVector[Double], weights: DenseVector[Double]): DenseVector[Double] = { val activation = sigmoid(X * weights) X.t * (activation - y) } /** * Compute the Hessian matrix of the log-likelihood function * * @param X design matrix * @param weights current weights * @return Hessian matrix */ def hessian(X: DenseMatrix[Double], weights: DenseVector[Double]): DenseMatrix[Double] = { val activation = sigmoid(X * weights) val diagonal = activation *:* (1.0 - activation) X.t * (X(::, *) * diagonal) } /** * Compute the log-likelihood function * * @param X design matrix * @param y target variable * @param weights current weights * @return log-likelihood value */ def logLikelihood(X: DenseMatrix[Double], y: DenseVector[Double], weights: DenseVector[Double]): Double = { val activation = sigmoid(X * weights) val epsilon = 1e-16 val clippedActivation = activation.map(a => math.max(a, epsilon)).map(a => math.min(a, 1.0 - epsilon)) val logActivation = log(clippedActivation) val logOneMinusActivation = log(1.0 - clippedActivation) val logLikelihood = y.t * logActivation + (1.0 - y).t * logOneMinusActivation -logLikelihood } /** * Train a logistic regression model using Newton-Raphson algorithm * * @param X design matrix * @param y target variable * @param maxIterations maximum number of iterations * @param tolerance convergence tolerance * @return weights vector */ def train(X: DenseMatrix[Double], y: DenseVector[Double], maxIterations: Int = 100, tolerance: Double = 1e-6): DenseVector[Double] = { val numFeatures = X.cols val weights = DenseVector.zeros[Double](numFeatures) @tailrec def loop(iteration: Int): DenseVector[Double] = { val grad = gradient(X, y, weights) val hess = hessian(X, weights) val delta = hess \ grad weights -= delta val llh = logLikelihood(X, y, weights) val improvement = llh - logLikelihood(X, y, weights + delta) if (iteration >= maxIterations || improvement < tolerance) { weights } else { loop(iteration + 1) } } loop(0) } } ``` 该示例代码定义了sigmoid函数、梯度函数、Hessian矩阵函数、对数似然函数和训练函数。在训练函数中,使用了尾递归进行迭代,直到满足最大迭代次数或收敛容差的条件为止。最终,训练函数返回权重向量作为模型的输出。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小爷毛毛(卓寿杰)

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值