目标函数常见解法

最新推荐文章于 2024-01-22 10:34:27 发布

Flyingzhan

最新推荐文章于 2024-01-22 10:34:27 发布

阅读量9.5k

点赞数 3

分类专栏：机器学习深度学习文章标签： linalg.norm

本文链接：https://blog.csdn.net/Flyingzhan/article/details/84978518

版权

机器学习同时被 2 个专栏收录

13 篇文章 1 订阅

订阅专栏

深度学习

3 篇文章 0 订阅

订阅专栏

如何求解目标函数

简介

相信大家在写程序的时候，尤其是一些机器学习算法的时候，都会遇到将自己的目标函数使用矩阵形式表达的情况。大多数时候，我们写目标函数都是以 $\sum$ 的形式递归的计算，但是写程序的时候一般输入都是所有的数据，如numpy储存的数组，因此需要转化为矩阵形式。这里简单介绍一下常用的方法以及函数(tensorflow)，希望有所收获。

此外，本文还包括一些常见的矩阵形式的目标函数的解法。

常用函数

1：numpy.linalg.norm（x, ord=None, axis=None, keepdims=False）

numpy.linalg.norm(x, ord=None, axis=None, keepdims=False)[source]

'''
   Parameters:	
            x : array_like Input array. If axis is None, x must be 1-D or 2-D.

            ord : {non-zero int, inf, -inf, ‘fro’, ‘nuc’}, optional Order of 
                  the norm (see table under Notes). inf means numpy’s inf object.

            axis : {int, 2-tuple of ints, None}, optional  If axis is an integer, it             
                   specifies the axis of x along which to compute the vector norms. 
                   If axis is a 2-tuple, it specifies the axes that hold 2-D matrices, 
                   and the matrix norms of these matrices are computed. If axis is None 
                   then either a vector norm (when x is 1-D) or a matrix norm (when x 
                   is 2-D) is returned.

            keepdims : bool, optional If this is set to True, the axes which are normed 
                       over are left in the result as dimensions with size one. With this 
                       option the result will broadcast correctly against the original x.


  Returns:	
            n : float or ndarray
'''

tf.linalg.norm主要用于计算向量的范数。上面的代码摘抄自numpy官网https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html，仅仅看注释还是很难理解每个参数的定义，下面我简单在介绍一下：

X是输入，一般就是一个多维的ndarray，ord代表范数的类型，axis代表怎么看待每一个向量，axis=1表示按行向量处理，求多个行向量的范数；axis=0表示按列向量处理，求多个列向量的范数；axis=None表示矩阵范数。keepdims代表是否保持结果维度不变。

这里主要根据一个例子说明上面几个属性的作用：

np.linalg.norm(x)使用的ord=None,axis=None,keepdim=False说明做的是矩阵二范数（默认），sqrt(0^2+3^2+4^2+1^2+6^2+4^2)=8.83，与结果比较可以看出是正确的。

当使用axis=1时，代表按行向量处理，因此计算[0,3,4]和[1,6,4]的二范数，分别是5和7.28.

当使用axis=0时，代表按列向量处理，因此存在三个向量[0,1],[3,6],[4,4]，因此可以得到对应的三个值。

1.1 欧氏距离

使用上面的函数可以求解各类距离函数，如欧式距离，python代码如下：

def calc_l2_norm(feature1, others):
    diffs = feature1 - others
    dists = np.linalg.norm(diffs, axis=1)
    return dists

上面的函数中feature1代表一个n维的数据，others代表1个或者多个n维的数据，上面的函数可以计算feature1和所有others的欧氏距离。

1.2 汉明距离

def calc_hammingDist(request, retrieval_all):
    K = retrieval_all.shape[1]
    distH = 0.5 * (K - np.dot(request, retrieval_all.transpose()))
    return distH

汉明距离可以近似使用内积进行替换，如下公式，那么使用内积进行计算的汉明距离公式如上。

1.3 余弦距离

def calc_cosineDist(request, retrieval_all):
    numerator = np.dot(request, retrieval_all.transpose())
    denominator = np.linalg.norm(request) * np.linalg.norm(retrieval_all, axis=1)
    return numerator/denominator

上面的公式不能解决0/0的问题，当0/0的时候结果是nan，因此需要做一些处理：


def calc_cosineDist(request, retrieval_all):
    numerator = np.dot(request, retrieval_all.transpose())
    denominator = np.linalg.norm(request) * np.linalg.norm(retrieval_all, axis=1)
    index = [idx for idx in np.where(numerator == 0)[0] if np.where(denominator == 0)[0]]
    result[index] = 1
return result

其中第三行的公式就是将分子，分母都是0的结果置为1，因此分为为0肯定存在有至少一个向量的模长为0，也就是0向量，所以分子肯定也为0。

2：tf.reduce_sum()

tf.reduce_sum(
    input_tensor,
    axis=None,
    keepdims=None,
    name=None,
    reduction_indices=None,
    keep_dims=None
)
'''
    Args:
        input_tensor: The tensor to reduce. Should have numeric type.
        axis: The dimensions to reduce. If `None` (the default),
              reduces all dimensions. Must be in the range
              '[-rank(input_tensor), rank(input_tensor))'.
        keepdims: If true, retains reduced dimensions with length 1.
        name: A name for the operation (optional).
        reduction_indices: The old (deprecated) name for axis.
        keep_dims: Deprecated alias for `keepdims`.
    Returns:
        The reduced tensor, of the same dtype as the input_tensor.

    Example:
        x = tf.constant([[1, 1, 1], [1, 1, 1]])
        tf.reduce_sum(x)  # 6
        tf.reduce_sum(x, 0)  # [2, 2, 2]
        tf.reduce_sum(x, 1)  # [3, 3]
        tf.reduce_sum(x, 1, keepdims=True)  # [[3], [3]]
        tf.reduce_sum(x, [0, 1])  # 6
'''

3：矩阵的迹

矩阵的点积(element-wise multiply)可以写成矩阵相乘的迹的形式，注意这里是矩阵的转置相乘的结果。

因为矩阵的F范数的平方等于矩阵每个元素的平方和，所以也可以写成矩阵的迹的形式。对于向量来说，我们已经知道向量x的二范数的平方等于x'x，其中x'代表x的转置。因此，遇到上述的对应位置元素相乘的式子，可以转化为矩阵的迹的形式。

4：矩阵的逆

求矩阵的逆需要大约 $O(N^{3})$ 的时间复杂度，但是存在一种特殊的矩阵，也就是正交矩阵，正交阵的逆矩阵等于矩阵的转置。

除此之外，存在一些关于矩阵的重要性质：

实对称矩阵的特征向量相互正交；

实对称矩阵是半正定的；

5：F范数

目标函数里面可能存在F范数，如果要对目标函数求梯度的话，可以首先转化为Tr()迹的形式，然后根据求导法则进行求导。

参考：

https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.norm.html

https://blog.csdn.net/hqh131360239/article/details/79061535

https://www.cnblogs.com/devilmaycry812839668/p/9352814.html

https://blog.csdn.net/txwh0820/article/details/46392293

6：one-hot向量

one-hot编码的意思很简单，就是一个c维的向量，只存在一位是1，其余维都为0。举个简单的例子，如果想要分类MNIST手写体数字的话，我们可以为每一张图像生成一个label，该label就是one-hot形式的，如果该数字是2，那么one-hot编码为{0,0,1,0,0,0,0,0,0,0}。假设现在存在y∈ $(0,1)^{c}$ ，X∈ $R^{n*c}$ ，那么Xy = $X_{k}$ ，满足 $y_{k}$ = 1。换句话说，如果对于一个c维的列向量y,它的第k位是1，那么一个矩阵X乘以y的结果就是X的第k列。

7：Orthogonal Procrustes problem

关于该问题的描述，请查看https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem，简单的说，就是要找到一个正交基，使得一个将某个矩阵A进行一个变换之后，去和另一个矩阵B对其。

上面的公式就是该问题的一般描述。这里，我说一下该问题的实际用处，如下是哈希编码学习的一个目标函数：

其中V是连续域R下的矩阵，B是{0,1}的哈希编码矩阵，V∈ $R^{n*d}$ , B∈ $(0,1)^{n*k}$ ，希望找到一个正交的矩阵R使得B和V尽可能的一致，该问题的解法就是：

M = B $A^{T}$ ，使用SVD分解M，结果R=U $V^{T}$ 。

参考：

https://en.wikipedia.org/wiki/Orthogonal_Procrustes_problem

8：tf.matmul和tf.multiply的区别

很多算法需要用到矩阵的乘法，如果你使用的是tensorflow作为深度学习框架的话，就离不开tf.matmul和tf.multiply，他们的区别主要在于tf.multiply只能应用element-wise的乘法上，也就是元素之间的乘法，而tf.matmul适用于矩阵乘法，详情看如下代码：

import tensorflow as tf
import numpy as np

# matmul
inputs = tf.placeholder(tf.int32,[2,3])
inputs2 = tf.placeholder(tf.int32, [3,1])
result = tf.matmul(inputs, inputs2)
temp = np.array([[1,2,3],[4,5,6]])
temp2 = np.ones((3,1))


with tf.Session() as sess:
    print(sess.run(result, feed_dict={inputs:temp, inputs2:temp2}))

结果：

import tensorflow as tf
import numpy as np

# multiply
inputs = tf.placeholder(tf.int32,[2,3])
inputs2 = tf.placeholder(tf.int32, [3,1])
result = tf.multiply(inputs, inputs2)
temp = np.array([[1,2,3],[4,5,6]])
temp2 = np.ones((3,1))


with tf.Session() as sess:
    print(sess.run(result, feed_dict={inputs:temp, inputs2:temp2}))

结果：

换成dimension一致之后：

import tensorflow as tf
import numpy as np

# multiply
inputs = tf.placeholder(tf.int32,[2,3])
inputs2 = tf.placeholder(tf.int32, [2,3])
result = tf.multiply(inputs, inputs2)
temp = np.array([[1,2,3],[4,5,6]])
temp2 = np.ones((2,3))


with tf.Session() as sess:
    print(sess.run(result, feed_dict={inputs:temp, inputs2:temp2}))

结果：

Flyingzhan

关注

3
点赞
踩
10

收藏

觉得还不错? 一键收藏
2
评论
目标函数常见解法

如何求解目标函数简介相信大家在写程序的时候，尤其是一些机器学习算法的时候，都会遇到将自己的目标函数使用矩阵形式表达的情况。大多数时候，我们写目标函数都是以的形式递归的计算，但是写程序的时候一般输入都是所有的数据，如numpy储存的数组，因此需要转化为矩阵形式。这里简单介绍一下常用的方法以及函数(tensorflow)，希望有所收获。此外，本文还包括一些常见的矩阵形式的目标函数的解法。...
复制链接

扫一扫

专栏目录