算法积累

1.欧几里得距离

 Euclidean distance 欧氏距离也称欧几里得距离,它是一个通常采用的距离定义,它是在m维空间中两个点之间的真实距离。

在二维和三维空间中的欧氏距离的就是两点之间的距离

二维的公式

d = sqrt((x1-x2)^2+(y1-y2)^2)

三维的公式

d=sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2)
推广到n维空间,

欧氏距离的公式

d=sqrt( ∑(xi1-xi2)^2 ) 这里i=1,2..n
xi1表示第一个点的第i维坐标,xi2表示第二个点的第i维坐标
n维欧氏空间是一个点集,它的每个点可以表示为(x(1),x(2),...x(n)),其中x(i)(i=1,2...n)是实数,称为x的第i个坐标,两个点x和y=(y(1),y(2)...y(n))之间的距离d(x,y)定义为上面的公式.
欧几里得度量
定义欧几里得空间中,点 x = ( x 1,..., x n) 和 y = ( y 1,..., y n) 之间的距离为
d(x,y):=\sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + \cdots + (x_n-y_n)^2} = \sqrt{\sum_{i=1}^n (x_i-y_i)^2} 
向量 \vec{x} 的自然长度,即该点到原点的距离为

\|\vec{x}\|_2 = \sqrt{|x_1|^2 + \cdots + |x_n|^2}.

它是一个纯数值。在欧几里得度量下,两点之间直线最短。





2.拉格朗日插值法



http://zh.wikipedia.org/wiki/%E6%8B%89%E6%A0%BC%E6%9C%97%E6%97%A5%E6%8F%92%E5%80%BC%E6%B3%95




3.景深效果 Depth of field


http://en.wikipedia.org/wiki/Depth_of_field




4.多普勒效应(声音)


http://zh.wikipedia.org/wiki/%E5%A4%9A%E6%99%AE%E5%8B%92%E6%95%88%E5%BA%94


公式

观察者 (Observer) 和发射源 (Source) 的频率关系为: f' = \left( \frac{v \pm v_o}{v \mp v_s} \right) f

  • f' 为观察到的频率;
  • f 为发射源于该介质中的原始发射频率;
  • v 为波在该介质中的行进速度;
  • v_o 为观察者相对于介质的移动速度,若接近发射源则前方运算符号为 + 号`, 反之则为 - 号;
  • v_s 为发射源相对于介质的移动速度,若接近观察者则前方运算符号为 - 号,反之则为 + 号。




5.拓扑学(拓扑结构)



http://baike.baidu.com/view/41881.htm?fr=aladdin



6.点积(数量积,无向积)


两个向量a = [a1, a2,…, an]和b = [b1, b2,…, bn]的点积定义为

a·b=a1b1+a2b2+……+anbn

使用矩阵乘法并把(纵列)向量当作n×1矩阵,点积还可以写为:
a·b=a*b^T,这里的b^T指示矩阵b的转置
http://baike.baidu.com/view/2744555.htm?fr=aladdin


7.叉积(向量积,矢量积)


两个 向量 ab的叉积写作 a× b(有时也被写成 ab,避免和字母x混淆)。向量积可以被定义为:
|向量 a×向量 b|=| a|| b|sinθ在这里θ表示两向量之间的角夹角(0° ≤ θ ≤ 180°),它位于这两个矢量所定义的 平面上。
这个定义有一个问题,就是同时有两个单位向量都垂直于和:若满足垂直的条件,那么也满足。
一个简单的确定满足“右手定则”的结果向量的方向的方法是这样的:若坐标系是满足右手定则的,当右手的四指从 a以不超过180度的转角转向 b时,竖起的大拇指指向是 c的方向。
向量积| c|=| a× b|=| a| | b|sin< a,b>
c的长度在数值上等于以 ab,夹角为θ组成的平行四边形的面积。
c的方向垂直于a与b所决定的平面,c的指向按右手规则从a转向b来确定。


http://baike.baidu.com/item/%E5%90%91%E9%87%8F%E7%A7%AF?from_id=2812058&type=syn&fromtitle=%E5%8F%89%E7%A7%AF&fr=aladdin


8.哈希算法

常用字符串哈希函数有 BKDRHash,APHash,DJBHash,JSHash,RSHash,SDBMHash,PJWHash,ELFHash等等。对于以上几种哈希函数,对其进行了一个小小的评测。

Hash函数 数据1 数据2 数据3 数据4 数据1得分 数据2得分 数据3得分 数据4得分 平均分
BKDRHash 2 0 4774 481 96.55 100 90.95 82.05 92.64
APHash 2 3 4754 493 96.55 88.46 100 51.28 86.28
DJBHash 2 2 4975 474 96.55 92.31 0 100 83.43
JSHash 1 4 4761 506 100 84.62 96.83 17.95 81.94
RSHash 1 0 4861 505 100 100 51.58 20.51 75.96
SDBMHash 3 2 4849 504 93.1 92.31 57.01 23.08 72.41
PJWHash 30 26 4878 513 0 0 43.89 0 21.95
ELFHash 30 26 4878 513 0 0 43.89 0 21.95



http://baike.baidu.com/view/273836.htm?fr=aladdin

ELFHash

http://blog.csdn.net/yinxusen/article/details/6317466


 

BKDRHash

http://www.360doc.com/content/14/0610/10/14505022_385328710.shtml


9.哈希表

http://baike.baidu.com/view/329976.htm?fr=aladdin

http://blog.chinaunix.net/uid-24951403-id-2212565.html


unity3d 哈希表

http://www.tuicool.com/articles/ABzUFjf


10.最小二乘法

http://baike.baidu.com/link?url=M6K_E5mDVXU9Yn7COxW2fs-V5viOTgnpuMDiTahj_oFX0bmFbqss0OFjBkMvyEvMFAZnqR1VWBG8bf5jpZAfoa


11.Poisson Disc

float3 SiGrowablePoissonDisc13FilterRGB 
(sampler tSource, float2 texCoord, float2 pixelSize, float discRadius)
{
float3 cOut;
float2 poisson[12] = {float2(-0.326212f, -0.40581f),
float2(-0.840144f, -0.07358f),
float2(-0.695914f, 0.457137f),
float2(-0.203345f, 0.620716f),
float2(0.96234f, -0.194983f),
float2(0.473434f, -0.480026f),
float2(0.519456f, 0.767022f),
float2(0.185461f, -0.893124f),
float2(0.507431f, 0.064425f),
float2(0.89642f, 0.412458f),
float2(-0.32194f, -0.932615f),
float2(-0.791559f, -0.59771f)};
// Center tap
cOut = tex2D (tSource, texCoord);
for (int tap = 0; tap < 12; tap++)
{
float2 coord = texCoord.xy + (pixelSize * poisson[tap] * discRadius);
// Sample pixel
cOut += tex2D (tSource, coord);
}
return (cOut / 13.0f);
}



12.Gaussian function

http://en.wikipedia.org/wiki/Gaussian_function

wiki

In mathematics, a Gaussian function, often simply referred to as aGaussian, is afunction of the form:

f\left(x\right) = a \exp{\left(- { \frac{(x-b)^2 }{ 2 c^2} } \right)}

for arbitrary real constants a, b and c. It is named after the mathematicianCarl Friedrich Gauss.

The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parametera is the height of the curve's peak, b is the position of the center of the peak andc (thestandard deviation, sometimes called the Gaussian RMS width) controls the width of the "bell".

Gaussian functions are widely used in statistics where they describe the normal distributions, in signal processing where they serve to define Gaussian filters, in image processing where two-dimensional Gaussians are used for Gaussian blurs, and in mathematics where they are used to solve heat equations and diffusion equations and to define the Weierstrass transform.

Properties

Gaussian functions arise by composing the exponential function with a concave quadratic function. The Gaussian functions are thus those functions whose logarithm is a concave quadratic function.

The parameter c is related to the full width at half maximum (FWHM) of the peak according to

\mathrm{FWHM} = 2 \sqrt{2 \ln 2}\ c \approx 2.35482 c. [1]

Alternatively, the parameter c can be interpreted by saying that the twoinflection points of the function occur at x = b − c andx = b + c.

The full width at tenth of maximum (FWTM) for a Gaussian could be of interest and is

\mathrm{FWTM} = 2 \sqrt{2 \ln 10}\ c \approx 4.29193 c. [2]

Gaussian functions are analytic, and their limit as x → ∞ is 0 (for the above case ofd = 0).

Gaussian functions are among those functions that are elementary but lack elementary antiderivatives; the integral of the Gaussian function is the error function. Nonetheless their improper integrals over the whole real line can be evaluated exactly, using theGaussian integral

\int_{-\infty}^\infty e^{-x^2}\,dx=\sqrt{\pi}

and one obtains

\int_{-\infty}^\infty a e^{- { (x-b)^2 \over 2 c^2 } }\,dx=ac\cdot\sqrt{2\pi}.

This integral is 1 if and only if a = \tfrac{1}{c\sqrt{2\pi}}, and in this case the Gaussian is theprobability density function of a normally distributed random variable with expected value μ = b and variance σ2 = c2:

 g(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2 }.

These Gaussians are plotted in the accompanying figure.

Gaussian functions centered at zero minimize the Fourier uncertainty principle.

The product of two Gaussian functions is a Gaussian, and the convolution of two Gaussian functions is also a Gaussian, with variance being the sum of the original variances:c^2 = c_{1}^2 + c_{2}^2. The product of two Gaussian probability density functions, though, is not in general a Gaussian PDF.

Taking the Fourier transform (unitary, angular frequency convention) of a Gaussian function with parametersa = 1,b = 0 andc yields another Gaussian function, with parameters\sqrt{2\pi}ac,b = 0 and\frac{1}{2 \pi c}.[3] So in particular the Gaussian functions with b = 0 andc = \frac{1}{\sqrt{2 \pi}} are kept fixed by the Fourier transform (they areeigenfunctions of the Fourier transform with eigenvalue 1). A physical realization is that of thediffraction pattern: for example, a photographic slide whose transmissivity has a Gaussian variation is also a Gaussian function.

The fact that the Gaussian function is an eigenfunction of the continuous Fourier transform allows us to derive the following interesting identity from thePoisson summation formula:

\sum_{k\in\mathbb{Z}}\exp\left(-\pi\cdot\left(\frac{k}{c}\right)^2\right) = c\cdot\sum_{k\in\mathbb{Z}}\exp(-\pi\cdot(kc)^2).


Integral of a Gaussian function

The integral of an arbitrary Gaussian function is

\int_{-\infty}^{\infty} a\,e^{-\left( x-b \right)^2/c^2}\,dx=a \, \left\vert c \right\vert \, \sqrt{\pi}.

An alternative form is

\int_{-\infty}^{\infty}k\,e^{-f x^2 + g x + h}\,dx=\int_{-\infty}^{\infty}k\,e^{-f \left( x-g/\left( 2f \right)\right)^2 +g^2/\left( 4f \right) + h}\,dx=k\,\sqrt{\frac{\pi}{f}}\,\exp\left(\frac{g^2}{4f} + h\right),

where f must be strictly positive for the integral to converge.

Proof

The integral

\int_{-\infty}^{\infty} ae^{-(x-b)^2/c^2}\,dx

for some real constants a, b, c > 0 can be calculated by putting it into the form of aGaussian integral. First, the constanta can simply be factored out of the integral. Next, the variable of integration is changed fromx toy = x + b.

a\int_{-\infty}^\infty e^{-y^2/c^2}\,dy,

and then to z=y/|c|

a |c| \int_{-\infty}^\infty e^{-z^2}\,dz.

Then, using the Gaussian integral identity

\int_{-\infty}^\infty e^{-z^2}\,dz = \sqrt{\pi},

we have

\int_{-\infty}^{\infty} ae^{-(x-b)^2/c^2}\,dx=a |c| \sqrt{\pi}.

Two-dimensional Gaussian function

Gaussian curve with a 2-dimensional domain

In two dimensions, the power to which e is raised in the Gaussian function is any negative-definite quadratic form. Consequently, the level sets of the Gaussian will always be ellipses.

A particular example of a two-dimensional Gaussian function is

f(x,y) = A \exp\left(- \left(\frac{(x-x_o)^2}{2\sigma_x^2} + \frac{(y-y_o)^2}{2\sigma_y^2} \right)\right).

Here the coefficient A is the amplitude, xo,yo is the center and σx, σy are thex andy spreads of the blob. The figure on the right was created usingA = 1,xo = 0, yo = 0, σx = σy = 1.

The volume under the Gaussian function is given by

V = \int_{-\infty}^\infty \int_{-\infty}^\infty f(x,y)\,dx dy=2 \pi A \sigma_x \sigma_y.

In general, a two-dimensional elliptical Gaussian function is expressed as

f(x,y) = A \exp\left(- \left(a(x - x_o)^2 + 2b(x-x_o)(y-y_o) + c(y-y_o)^2 \right)\right)

where the matrix

\left[\begin{matrix} a & b \\ b & c \end{matrix}\right]

is positive-definite.

Using this formulation, the figure on the right can be created using A = 1, (xo,yo) = (0, 0),a =c = 1/2,b = 0.

Meaning of parameters for the general equation

For the general form of the equation the coefficient A is the height of the peak and (xoyo) is the center of the blob.

If we set

a = \frac{\cos^2\theta}{2\sigma_x^2} + \frac{\sin^2\theta}{2\sigma_y^2}


b = -\frac{\sin2\theta}{4\sigma_x^2} + \frac{\sin2\theta}{4\sigma_y^2}


c = \frac{\sin^2\theta}{2\sigma_x^2} + \frac{\cos^2\theta}{2\sigma_y^2}

then we rotate the blob by a clockwise angle \theta (for counterclockwise rotation invert the signs in the b coefficient). This can be seen in the following examples:

\theta = 0
\theta = \pi/6
\theta = \pi/3

Using the following Octave code one can easily see the effect of changing the parameters

A = 1;
x0 = 0; y0 = 0;
 
sigma_x = 1;
sigma_y = 2;
 
[X, Y] = meshgrid(-5:.1:5, -5:.1:5);
 
for theta = 0:pi/100:pi
    a = cos(theta)^2/2/sigma_x^2 + sin(theta)^2/2/sigma_y^2;
    b = -sin(2*theta)/4/sigma_x^2 + sin(2*theta)/4/sigma_y^2 ;
    c = sin(theta)^2/2/sigma_x^2 + cos(theta)^2/2/sigma_y^2;
 
    Z = A*exp( - (a*(X-x0).^2 + 2*b*(X-x0).*(Y-y0) + c*(Y-y0).^2)) ;
end
 
surf(X,Y,Z);shading interp;view(-36,36)

Such functions are often used in image processing and in computational models of visual system function—see the articles on scale space and affine shape adaptation.

Also see multivariate normal distribution.

Multi-dimensional Gaussian function

In an n-dimensional space a Gaussian function can be defined as

f(x) = \exp(-x^TAx) \;,

where x=\{x_1,\dots,x_n\} is a column ofn coordinates,A is apositive-definiten\times n matrix, and{}^T denotestransposition.

The integral of this Gaussian function over the whole n-dimensional space is given as

\int_{\mathbb{R}^n} \exp(-x^TAx) \, dx = \sqrt{\frac{\pi^n}{\det{A}}} \;.

It can be easily calculated by diagonalizing the matrix A and changing the integration variables to the eigenvectors of A.

More generally a shifted Gaussian function is defined as

f(x) = \exp(-x^TAx+s^Tx) \;,

where s=\{s_1,\dots,s_n\} is the shift vector and the matrixA can be assumed to be symmetric,A^T=A, and positive-definite. The following integrals with this function can be calculated with the same technique,

\int_{\mathbb{R}^n} e^{-x^T A x+v^Tx} \, dx = \sqrt{\frac{\pi^n}{\det{A}}} \exp(\frac{1}{4}v^T A^{-1}v)\equiv \mathcal{M}\;.
\int_{\mathbb{R}^n} e^{- x^T A x + v^T x}  \left( a^T x \right) \, dx = (a^T u) \cdot\mathcal{M}\;,\; {\rm where}\;u = \frac{1}{2} A^{- 1} v \;.
\int_{\mathbb{R}^n} e^{- x^T A x + v^T x}  \left( x^T D x \right) \, dx = \left( u^T D u +\frac{1}{2} {\rm tr} (D A^{- 1}) \right) \cdot \mathcal{M}\;.
\begin{align}& \int_{\mathbb{R}^n} e^{- x^T A' x + s'^T x} \left( -\frac{\partial}{\partial x} \Lambda \frac{\partial}{\partial x} \right) e^{-x^T A x + s^T x} \, dx = \\& = \left( 2 {\rm tr} (A' \Lambda A B^{- 1}) + 4 u^T A' \Lambda A u - 2 u^T(A' \Lambda s + A \Lambda s') + s'^T \Lambda s \right) \cdot \mathcal{M}\;,\\ & {\rm where} \;u = \frac{1}{2} B^{- 1} v, v = s + s', B = A + A' \;.\end{align}

Gaussian profile estimation

A number of fields such as stellar photometry, Gaussian beam characterization, and emission/absorption line spectroscopy work with sampled Gaussian functions and need to accurately estimate the height, position, and width parameters of the function. These area,b, andc for a 1D Gaussian function,A,(x_0,y_0), and(\sigma_x,\sigma_y) for a 2D Gaussian function. The most common method for estimating the profile parameters is to take the logarithm of the data and fit a parabola to the resulting data set.[4] While this provides a simple least squares fitting procedure, the resulting algorithm is biased by excessively weighting small data values, and this can produce large errors in the profile estimate. One can partially compensate for this throughweighted least squares estimation, in which the small data values are given small weights, but this too can be biased by allowing the tail of the Gaussian to dominate the fit. In order to remove the bias, one can instead use an iterative procedure in which the weights are updated at each iteration (see Iteratively reweighted least squares).[4]

Once one has an algorithm for estimating the Gaussian function parameters, it is also important to know how accurate those estimates are. While an estimation algorithm can provide numerical estimates for the variance of each parameter (i.e. the variance of the estimated height, position, and width of the function), one can use Cramér–Rao bound theory to obtain an analytical expression for the lower bound on the parameter variances, given some assumptions about the data.[5][6]

  1. The noise in the measured profile is either i.i.d. Gaussian, or the noise is Poisson-distributed.
  2. The spacing between each sampling (i.e. the distance between pixels measuring the data) is uniform.
  3. The peak is "well-sampled", so that less than 10% of the area or volume under the peak (area if a 1D Gaussian, volume if a 2D Gaussian) lies outside the measurement region.
  4. The width of the peak is much larger than the distance between sample locations (i.e. the detector pixels must be at least 5 times smaller than the Gaussian FWHM).

When these assumptions are satisfied, the following covariance matrix K applies for the 1D profile parameters a,b, andc under i.i.d. Gaussian noise and under Poisson noise:[5]

 \mathbf{K}_{\text{Gauss}} = \frac{\sigma^2}{\sqrt{\pi} \delta_x Q^2} \begin{pmatrix} \frac{3}{2c} &0 &\frac{-1}{a} \\ 0 &\frac{2c}{a^2} &0 \\ \frac{-1}{a} &0 &\frac{2c}{a^2} \end{pmatrix} \ , \qquad \mathbf{K}_{\text{Poiss}} = \frac{1}{\sqrt{2 \pi}} \begin{pmatrix} \frac{3a}{2c} &0 &-\frac{1}{2} \\ 0 &\frac{c}{a} &0 \\ -\frac{1}{2} &0 &\frac{c}{2a} \end{pmatrix} \ ,

where \delta_x is the width of the pixels used to sample the function,Q is the quantum efficiency of the detector, and\sigma indicates the standard deviation of the measurement noise. Thus, the individual variances for the parameters are, in the Gaussian noise case,

\begin{align} \text{var} (a) &= \frac{3 \sigma^2}{2 \sqrt{\pi} \, \delta_x Q^2 c} \\ \text{var} (b) &= \frac{2 \sigma^2 c}{\delta_x \sqrt{\pi} \, Q^2 a^2} \\ \text{var} (c) &= \frac{2 \sigma^2 c}{\delta_x \sqrt{\pi} \, Q^2 a^2} \end{align}

and in the Poisson noise case,

\begin{align} \text{var} (a) &= \frac{3a}{2 \sqrt{2 \pi} \, c} \\ \text{var} (b) &= \frac{c}{\sqrt{2 \pi} \, a} \\ \text{var} (c) &= \frac{c}{2 \sqrt{2 \pi} \, a}. \end{align} \

For the 2D profile parameters giving the amplitude A, position(x_0,y_0), and width(\sigma_x,\sigma_y) of the profile, the following covariance matrices apply:[6]

 \mathbf{K}_{\text{Gauss}} = \frac{\sigma^2}{\pi \delta_x \delta_y Q^2} \begin{pmatrix} \frac{2}{\sigma_x \sigma_y} &0 &0 &\frac{-1}{A \sigma_y} &\frac{-1}{A \sigma_x} \\ 0      &\frac{2 \sigma_x}{A^2 \sigma_y} &0 &0 &0 \\ 0 &0 &\frac{2 \sigma_y}{A^2 \sigma_x} &0 &0 \\ \frac{-1}{A \sigma_y} &0 &0 &\frac{2 \sigma_x}{A^2 \sigma_y} &0 \\      \frac{-1}{A \sigma_x} &0 &0 &0 &\frac{2 \sigma_y}{A^2 \sigma_x} \end{pmatrix} \ ,
 \qquad \mathbf{K}_{\text{Poiss}} = \frac{1}{2 \pi} \begin{pmatrix} \frac{3A}{\sigma_x \sigma_y} &0 &0 &\frac{-1}{\sigma_y} &\frac{-1}{\sigma_x} \\ 0      &\frac{\sigma_x}{A \sigma_y} &0 &0 &0 \\ 0 &0 &\frac{\sigma_y}{A \sigma_x} &0 &0 \\ \frac{-1}{\sigma_y} &0 &0 &\frac{2 \sigma_x}{3A \sigma_y} &\frac{1}{3A} \\      \frac{-1}{\sigma_x} &0 &0 &\frac{1}{3A} &\frac{2 \sigma_y}{3A \sigma_x} \end{pmatrix} \ .

where the individual parameter variances are given by the diagonal elements of the covariance matrix.

Discrete Gaussian

The discrete Gaussian kernel (black, dashed), compared with the sampled Gaussian kernel (red, solid) for scales t=.5,1,2,4.

One may ask for a discrete analog to the Gaussian; this is necessary in discrete applications, particularlydigital signal processing. A simple answer is to sample the continuous Gaussian, yielding thesampled Gaussian kernel. However, this discrete function does not have the discrete analogs of the properties of the continuous function, and can lead to undesired effects, as described in the articlescale space implementation.

An alternative approach is to use discrete Gaussian kernel:[7]

T(n, t) = e^{-t} I_n(t)\,

where I_n(t) denotes themodified Bessel functions of integer order.

This is the discrete analog of the continuous Gaussian in that it is the solution to the discretediffusion equation (discrete space, continuous time), just as the continuous Gaussian is the solution to the continuous diffusion equation.[8]


12.一个随机数算法

<span style="font-size:14px;">float scale = 0.5;
float magic = 3571.0;float2 random = ( 1.0 / 4320.0 ) * position +	float2( 0.25, 0.0 );random = frac( dot( random * random, magic ) );
random = frac( dot( random * random, magic ) );
return -scale + 2.0 * scale * random;</span>


13.一个棋盘算法

in unity

<span style="font-size:14px;">				float scale = 0.25;
				float2 positionMod = float2(uint2(i.uv_MainTex*10) & 1);
				return (-scale + 2.0 * scale * positionMod.x) *
					(-1.0 + 2.0 * positionMod.y);
</span>
				float scale = 0.25;
				float2 positionMod = float2(uint2(i.uv_MainTex*10) & 1);
				return (-scale + 2.0 * scale * positionMod.x) *
					(-1.0 + 2.0 * positionMod.y) +
					0.5 * scale * (-1.0 + 2.0 * _frameCountMod);//<span style="font-size:14px;"></span><pre name="code" class="cpp">//_frameCountMod参数实现对棋盘的控制
 another 

<span style="font-size:14px;">float scale = 0.25;
float2 positionMod = float2( uint2( sv_position ) & 1 );
return ( -scale + 2.0 * scale * positionMod.x ) *
	( -1.0 + 2.0 * positionMod.y );
</span>



14.kNN(K-Nearest Neighbor)最邻近规则分类

http://blog.csdn.net/xlm289348/article/details/8876353

KNN最邻近规则,主要应用领域是对未知事物的识别,即判断未知事物属于哪一类,判断思想是,基于欧几里得定理,判断未知事物的特征和哪一类已知事物的的特征最接近;

K最近邻(k-Nearest Neighbor,KNN)分类算法,是一个理论上比较成熟的方法,也是最简单的机器学习算法之一。该方法的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别。KNN算法中,所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。 KNN方法虽然从原理上也依赖于极限定理,但在类别决策时,只与极少量的相邻样本有关。由于KNN方法主要靠周围有限的邻近的样本,而不是靠判别类域的方法来确定所属类别的,因此对于类域的交叉或重叠较多的待分样本集来说,KNN方法较其他方法更为适合。
  KNN算法不仅可以用于分类,还可以用于回归。通过找出一个样本的k个最近邻居,将这些邻居的属性的平均值赋给该样本,就可以得到该样本的属性。更有用的方法是将不同距离的邻居对该样本产生的影响给予不同的权值(weight),如权值与距离成正比(组合函数)。
  该算法在分类时有个主要的不足是,当样本不平衡时,如一个类的样本容量很大,而其他类样本容量很小时,有可能导致当输入一个新样本时,该样本的K个邻居中大容量类的样本占多数。 该算法只计算“最近的”邻居样本,某一类的样本数量很大,那么或者这类样本并不接近目标样本,或者这类样本很靠近目标样本。无论怎样,数量并不能影响运行结果。可以采用权值的方法(和该样本距离小的邻居权值大)来改进。该方法的另一个不足之处是计算量较大,因为对每一个待分类的文本都要计算它到全体已知样本的距离,才能求得它的K个最近邻点。目前常用的解决方法是事先对已知样本点进行剪辑,事先去除对分类作用不大的样本。该算法比较适用于样本容量比较大的类域的自动分类,而那些样本容量较小的类域采用这种算法比较容易产生误分。

K-NN可以说是一种最直接的用来分类未知数据的方法。基本通过下面这张图跟文字说明就可以明白K-NN是干什么的

简单来说,K-NN可以看成:有那么一堆你已经知道分类的数据,然后当一个新数据进入的时候,就开始跟训练数据里的每个点求距离,然后挑离这个训练数据最近的K个点看看这几个点属于什么类型,然后用少数服从多数的原则,给新数据归类。

 

算法步骤:

step.1---初始化距离为最大值

step.2---计算未知样本和每个训练样本的距离dist

step.3---得到目前K个最临近样本中的最大距离maxdist

step.4---如果dist小于maxdist,则将该训练样本作为K-最近邻样本

step.5---重复步骤2、3、4,直到未知样本和所有训练样本的距离都算完

step.6---统计K-最近邻样本中每个类标号出现的次数

step.7---选择出现频率最大的类标号作为未知样本的类标号


15.科赫曲线Koch Curve

科赫曲线是一种像雪花的几何曲线,所以又称为雪花曲线,它是de Rham曲线的特例。

它最早《关于一条连续而无切线,可由初等几何构作的曲线》(1904年,法语原题: Sur une courbe continue sans tangente, obtenue par une construction géométrique élémentaire)。
科赫曲线是de Rham曲线的特例。
1.给定线段AB,科赫曲线可以由以下步骤生成:
2.将线段分成三等份(AC,CD,DB)
3.以CD为底,向外(内外随意)画一个等边三角形DMC
4.将线段CD移去
分别对AC,CM,MD,DB重复1~3。
科赫雪花是以 等边三角形三边生成的科赫曲线组成的。科赫雪花的面积是[2√3(S)2]/5 ,其中S是原来三角形的边长。每条科赫曲线的长度是无限大,它是连续而无处可微的曲线。



15.误差函数

在数学中,误差函数(也称之为 高斯误差函数,error function or Gauss error function)是一个非基本函数(即不是 初等函数),其在 概率论、统计学以及偏 微分方程,半导体物理中都有广泛地应用。

定义

自变量x的误差函数定义为:
且有erf(∞)=1和erf(- x)=-erf( x)
余补误差函数erfc(x)定义为:

导数与积分

误差函数的导数为:
等等
误差函数的重积分定义为:
可得

级数展开式

误差函数的级数展开式为:





 



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值