数据挖掘(凑标题字数)

Chapter Two

Data dispersion characteristics
Center

Mean: x ˉ = 1 n ∑ i = 1 n x i \bar{x} = \frac{1}{n} \sum_{i = 1}^n x_i xˉ=n1i=1nxi, μ = ∑ x N \mu = \frac{\sum x}{N} μ=Nx
Weighted Mean: x ˉ = ∑ i = 1 n w i x i ∑ i = 1 n w i \bar{x} = \frac{\sum_{i = 1}^n w_i x_i}{\sum_{i = 1}^n w_i} xˉ=i=1nwii=1nwixi

Median(for grouped data): m e d i a n = L 1 + ( n / 2 − ( ∑ f r e q ) l f r e q m e d i a n ) w i d t h median = L_1 + (\frac{n / 2 - (\sum freq)l}{freq_{median}}) width median=L1+(freqmediann/2(freq)l)width

Mode: m e a n − m o d e = 3 × ( m e a n − m e d i a n ) mean - mode = 3 \times (mean - median) meanmode=3×(meanmedian)
mean > median, positively skewed
mean < median, negatively skewed

Quartiles: Q 1 Q_1 Q1(25th percentile), Q 3 Q_3 Q3(75th percentile)
Inter-quartile range: I Q R = Q 3 − Q 1 IQR = Q_3 - Q_1 IQR=Q3Q1
Five number summary: min, Q 1 Q_1 Q1, median, Q 3 Q_3 Q3, max
Boxplot: ends of the box are the quartiles; median is marked; add whiskers, and plot outliers individually
Outlier: usually, a value higher/lower than 1.5 x IQR

Variance:
unbiased estimation: s 2 = 1 n − 1 ∑ i = 1 n ( x i − x ˉ ) 2 = 1 n − 1 [ ∑ i = 1 n x i 2 − 1 n ( ∑ i = 1 n x i ) 2 ] s^2 = \frac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 = \frac{1}{n - 1}[\sum_{i = 1}^n x_i^2 - \frac{1}{n}(\sum_{i = 1}^n x_i)^2] s2=n11i=1n(xixˉ)2=n11[i=1nxi2n1(i=1nxi)2]
biased estimation: σ 2 = 1 n ∑ i = 1 n ( x i − μ ) 2 = 1 n ∑ i = 1 n x i 2 − μ 2 \sigma^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \mu)^2 = \frac{1}{n} \sum_{i = 1}^n x_i^2 - \mu^2 σ2=n1i=1n(xiμ)2=n1i=1nxi2μ2

Pixel-Oriented Visualization Techniques
Similarity and Dissimilarity
10sum
1qrq + r
0sts + t
sumq + sr + tp

Distance measure for symmetric binary variables: d ( i , j ) = r + s q + r + s + t d(i, j) = \frac{r + s}{q + r + s + t} d(i,j)=q+r+s+tr+s
Distance measure for asymmetric binary variables: d ( i , j ) = r + s q + + r + s d(i, j) = \frac{r + s}{q + + r + s} d(i,j)=q++r+sr+s

Here, the asymmetric means the loss cost is different, for some data sets, like the FP is an absolute majority.

Jaccard coefficient (similarity measure for asymmetric binary variables): s i m J a c c a r d ( i , j ) = q q + r + s sim_{Jaccard}(i, j) = \frac{q}{q + r + s} simJaccard(i,j)=q+r+sq

Minowski distance(L-h norm): d ( i , j ) = ∣ x i 1 − x j 1 ∣ h + ∣ x i 2 − x j 2 ∣ h + ⋯ + ∣ x i p − x j p ∣ h h d(i, j) = \sqrt[h]{|x_{i1} - x_{j1}|^h + |x_{i2} - x_{j2}|^h + \cdots + |x_{ip} - x_{jp}|^h} d(i,j)=hxi1xj1h+xi2xj2h++xipxjph

Properties:

  1. d ( i , j ) > 0 d(i, j) > 0 d(i,j)>0 if i ≠ j i \neq j i=j and d ( i , i ) = 0 d(i, i) = 0 d(i,i)=0 (Positive definiteness)
  2. d ( i , j ) = d ( j , i ) d(i, j) = d(j, i) d(i,j)=d(j,i) (Symmetry)
  3. d ( i , j ) ⩽ d ( i , k ) + d ( k , j ) d(i, j) \leqslant d(i, k) + d(k, j) d(i,j)d(i,k)+d(k,j) (Triangle Inequality)

A distance that satifies these properties is a metric.

h = 1 : h = 1: h=1: Mabhattan distance d ( i , j ) = ∣ x i 1 − x j 1 ∣ + ∣ x i 2 − x j 2 ∣ + ⋯ + ∣ x i p − x j p ∣ d(i, j) = |x_{i1} - x_{j1}| + |x_{i2} - x_{j2}| + \cdots + |x_{ip} - x_{jp}| d(i,j)=xi1xj1+xi2xj2++xipxjp
h = 2 : h = 2: h=2: Euclidean distance d ( i , j ) = ∣ x i 1 − x j 1 ∣ 2 + ∣ x i 2 − x j 2 ∣ 2 + ⋯ + ∣ x i p − x j p ∣ 2 d(i, j) = \sqrt{|x_{i1} - x_{j1}|^2 + |x_{i2} - x_{j2}|^2 + \cdots + |x_{ip} - x_{jp}|^2} d(i,j)=xi1xj12+xi2xj22++xipxjp2
h → ∞ : h \rightarrow \infty: h: supernum distance d ( i , j ) = l i m h → ∞ ( ∑ f = 1 p ∣ x i f − x j f ∣ ) 1 h = m a x f p ∣ x i f − x j f ∣ d(i, j) = lim_{h \rightarrow \infty} (\sum_{f = 1}^p |x_{if} - x_{jf}|)^{\frac{1}{h}} = max_f^p |x_{if} - x_{jf}| d(i,j)=limh(f=1pxifxjf)h1=maxfpxifxjf

Ordinal Variables: z i f = r i f − 1 M f − 1 z_{if} = \frac{r_{if} - 1}{M_f - 1} zif=Mf1rif1

d ( i , j ) = ∑ f = 1 p δ i j ( f ) d i j ( f ) ∑ f = 1 p δ i j ( f ) d(i, j) = \frac{\sum_{f = 1}^p \delta_{ij}^{(f)} d_{ij}^{(f)}}{\sum_{f = 1}^p \delta_{ij}^{(f)}} d(i,j)=f=1pδij(f)f=1pδij(f)dij(f)

c o s ( d 1 , d 2 ) = d 1 ⋅ d 2 ∣ ∣ d 1 ∣ ∣ ⋅ ∣ ∣ d 2 ∣ ∣ cos(d_1, d_2) = \frac{d_1 \cdot d_2}{||d_1|| \cdot ||d_2||} cos(d1,d2)=d1d2d1d2 to evaluate the similarity of sentences.

Chapter Three

Data Processing

Data cleaning, Data integration, Data reduction, Data transformation and data discretization.

χ 2 \chi^2 χ2(chi-square) test

χ 2 = ∑ ( O b s e r v e r d − E x p e c t e d ) 2 E x c e p t e d \chi^2 = \sum \frac{(Observerd - Expected)^2}{Excepted} χ2=Excepted(ObserverdExpected)2
The larger the Χ2 value, the more likely the variables are related.

Correlation coefficient(Pearson’s product moment coefficient)

r A , B = ∑ i = 1 n ( a − A ˉ ) ( b − B ˉ ) ( n − 1 ) σ A σ B = ∑ i = 1 n ( a i b i ) − n A ˉ B ˉ ( n − 1 ) σ A σ B r_{A, B} = \frac{\sum_{i = 1}^n (a - \bar{A})(b - \bar{B})}{(n - 1) \sigma_A \sigma_B} = \frac{\sum_{i = 1}^n (a_i b_i) - n \bar{A} \bar{B}}{(n - 1) \sigma_A \sigma_B} rA,B=(n1)σAσBi=1n(aAˉ)(bBˉ)=(n1)σAσBi=1n(aibi)nAˉBˉ

r A , B > 0 r_{A, B} > 0 rA,B>0 means A and B are positively correlated.

Let a k ′ = ( a k − m e a n ( A ) ) / s t d ( A ) , b k ′ = ( b k − m e a n ( B ) ) / s t d ( B ) {a_k}' = (a_k - mean(A)) / std(A), {b_k}' = (b_k - mean(B)) / std(B) ak=(akmean(A))/std(A),bk=(bkmean(B))/std(B),
then c o r r e l a t i o n ( A , B ) = A ′ ⋅ B ′ correlation(A, B) = {A}' \cdot {B}' correlation(A,B)=AB

Covariance

C o v ( A , B ) = E ( ( A − A ˉ ) ( B − B ˉ ) ) = ∑ i = 1 n ( a i − A ˉ ) ( b − B ˉ ) n Cov(A, B) = E((A - \bar{A})(B - \bar{B})) = \frac{\sum_{i = 1}^n (a_i - \bar{A})(b - \bar{B})}{n} Cov(A,B)=E((AAˉ)(BBˉ))=ni=1n(aiAˉ)(bBˉ)
r A , B = C o v ( A , B ) σ A σ B r_{A, B} = \frac{Cov(A, B)}{\sigma_A \sigma_B} rA,B=σAσBCov(A,B)
C o v ( A , B ) = E ( ( A − A ˉ ) ( B − B ˉ ) ) = E ( A ⋅ B ) − A ˉ B ˉ Cov(A, B) = E((A - \bar{A})(B - \bar{B})) = E(A \cdot B) - \bar{A} \bar{B} Cov(A,B)=E((AAˉ)(BBˉ))=E(AB)AˉBˉ

Data reduction

Unsupervised:

  1. Latent Semantic Indexing (LSI): truncated SVD
  2. Principal Component Analysis (PCA)
  3. Independent Component Analysis (ICA)
  4. Canonical Correlation Analysis (CCA)

Supervised:

  1. Linear Discriminant Analysis (LDA)

Semi-supervised:

  1. Semi-supervised Discriminant Analysis (SDA)

Linear:

  1. Latent Semantic Indexing (LSI): truncated SVD
  2. Principal Component Analysis (PCA)
  3. Linear Discriminant Analysis (LDA)
  4. Canonical Correlation Analysis (CCA)

Nonlinear:

  1. Nonlinear feature reduction using kernels
  2. Manifold learning

Dimensionality reduction (Feature reduction):

  1. Feature extraction
  2. Feature selection

Selection: choose a best subset of size d from the available p features.
Extraction: given p features (set X), extract d new features (set Z) by linear or non-linear combination of all the p features.

PCA

Given { x 1 , . . . , x n } ∈ R p \{x_1, ..., x_n\} \in \mathbb{R}^p {x1,...,xn}Rp, target: get the a a a to maxmize the v a r ( z ) var(z) var(z), here z = a x z = ax z=ax

v a r ( z ) = E ( ( z − z ˉ ) 2 ) = 1 n ∑ i = 1 n ( a x i − a x ˉ ) 2 = 1 n ∑ i = 1 n a T ( x i − x ˉ ) ( x i − x ˉ ) T a = a T S a S = 1 n ∑ i = 1 n ( x i − x ˉ ) ( x i − x ˉ ) T \begin{aligned} var(z) &= E((z - \bar{z})^2)\\ &= \frac{1}{n} \sum_{i = 1}^n (ax_i - a\bar{x})^2\\ &= \frac{1}{n} \sum_{i = 1}^n a^T(x_i - \bar{x})(x_i - \bar{x})^Ta\\ &= a^TSa\\ S &= \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})(x_i - \bar{x})^T \end{aligned} var(z)S=E((zzˉ)2)=n1i=1n(axiaxˉ)2=n1i=1naT(xixˉ)(xixˉ)Ta=aTSa=n1i=1n(xixˉ)(xixˉ)T

which means m a x a a T S a , s . t . a T a = 1 max_a a^TSa, s.t. a^Ta = 1 maxaaTSa,s.t.aTa=1.
We use Lagrange method to solve the problem.

L = a T S a − λ ( a T a − 1 ) ∂ L ∂ a = 2 S a − 2 λ a = 0 \begin{aligned} L = a^TSa - \lambda(a^Ta - 1)\\ \frac{\partial L}{\partial a} = 2Sa - 2\lambda a = 0\\ \end{aligned} L=aTSaλ(aTa1)aL=2Sa2λa=0

So, λ \lambda λ and a a a is the pair of eigenvalue and eigenvector of S. Then v a r ( z ) = a T λ a = λ var(z) = a^T \lambda a = \lambda var(z)=aTλa=λ. So the lambda is chosen from large to small.

Next, m a x a 2 a 2 T S a 2 , s . t . a 2 T a 2 = 1 , c o v ( z ( 2 ) , z ( 1 ) ) = 0 max_{a_2} a_2^T S a_2, s.t. a_2^T a_2 = 1, cov(z^{(2)}, z^{(1)}) = 0 maxa2a2TSa2,s.t.a2Ta2=1,cov(z(2),z(1))=0 if we want another PCA.
c o v ( z ( 2 ) , z ( 1 ) ) = a 2 T S a 1 = λ a 2 T a 1 cov(z^{(2)}, z^{(1)}) = a_2^T S a_1 = \lambda a_2^T a_1 cov(z(2),z(1))=a2TSa1=λa2Ta1, so(I don’t know) S a 2 = λ a 2 S a_2 = \lambda a_2 Sa2=λa2, and the λ \lambda λ is the second largest eigenvalue.
Dimension reduction: χ ∈ R p × n → A T χ ∈ R d × n \chi \in \mathbb{R}^{p×n} \rightarrow A^T \chi∈\mathbb{R}^{d×n} χRp×nATχRd×n
Original data(Reconstruction): A T χ ∈ R d × n → X ˉ = A ( A T X ) ∈ R p × n A^T \chi \in \mathbb{R}^{d×n} \rightarrow \bar{X} =A(A^TX) \in \mathbb{R}^{p×n} ATχRd×nXˉ=A(ATX)Rp×n

Main theoretical result:
The matrix A consisting of the first d eigenvectors of the covariance matrix S solves the following optimization problem
m i n A ∈ R p × d ∣ ∣ χ − A A T X ∣ ∣ F 2 , s . t . A T A = I d min_{A \in \mathbb{R}^{p \times d}} ||\chi - AA^TX||_F^2, s.t. A^TA = I_d minARp×dχAATXF2,s.t.ATA=Id

LDA(Linear Discriminant Analysis)

Find a transformation a, such that the a^TX_1 and a^TX_2 are maximally separated & each class is minimally dispersed (maximum separation).

m a x   ( a ( x 1 ˉ − x 2 ˉ ) ) 2 , m i n   v a r ( z 1 ) , m i n   v a r ( z 2 ) max\ (a(\bar{x_1} - \bar{x_2}))^2, min\ var(z_1), min\ var(z_2) max (a(x1ˉx2ˉ))2,min var(z1),min var(z2)
target: m a x   J = ( a ( x 1 ˉ − x 2 ˉ ) ) 2 v a r ( z 1 ) + v a r ( z 2 ) max\ J = \frac{(a(\bar{x_1} - \bar{x_2}))^2}{var(z_1) + var(z_2)} max J=var(z1)+var(z2)(a(x1ˉx2ˉ))2

Suppose there exists two class w 1 , w 2 w_1, w_2 w1,w2
z = a T x z = a^Tx z=aTx
μ i ~ = 1 n i ∑ z ∈ w i z \tilde{\mu_i} = \frac{1}{n_i} \sum_{z \in w_i} z μi~=ni1zwiz
μ i = 1 n i ∑ x ∈ w i x , μ i ~ = a T μ i \mu_i = \frac{1}{n_i} \sum_{x \in w_i} x, \tilde{\mu_i} = a^T \mu_i μi=ni1xwix,μi~=aTμi
∣ μ 1 ~ − μ 2 ~ ∣ = ∣ a T ( μ 1 − μ 2 ) ∣ |\tilde{\mu_1} - \tilde{\mu_2}| = |a^T(\mu_1 - \mu_2)| μ1~μ2~=aT(μ1μ2)
s i ~ 2 = ∑ z ∈ w i ( z − μ i ~ ) 2 \tilde{s_i}^2 = \sum_{z \in w_i} (z - \tilde{\mu_i})^2 si~2=zwi(zμi~)2
J ( a ) = ( μ 1 ~ − μ 2 ~ ) 2 s 1 ~ 2 + s 2 ~ 2 J(a) = \frac{(\tilde{\mu_1} - \tilde{\mu_2})^2}{\tilde{s_1}^2 + \tilde{s_2}^2} J(a)=s1~2+s2~2(μ1~μ2~)2

s i ~ 2 = ∑ y ∈ w i ( y − μ i ~ ) 2 = ∑ x ∈ w i ( a T x − a T μ i ) 2 = ∑ x ∈ w i ( a T x − a T μ i ) ( a T x − a T μ i ) T = ∑ x ∈ w i a T ( x − μ i ) ( x − μ i ) T a = a T S i a \tilde{s_i}^2 = \sum_{y \in w_i} (y - \tilde{\mu_i})^2 = \sum_{x \in w_i} (a^Tx - a^T\mu_i)^2 = \sum_{x \in w_i} (a^Tx - a^T\mu_i)(a^Tx - a^T\mu_i)^T = \sum_{x \in w_i} a^T(x - \mu_i)(x - \mu_i)^Ta = a^TS_ia si~2=ywi(yμi~)2=xwi(aTxaTμi)2=xwi(aTxaTμi)(aTxaTμi)T=xwiaT(xμi)(xμi)Ta=aTSia

within-in class scatter matrix: S W = S 1 + S 2 , s 1 ~ 2 + s 2 ~ 2 = a T S W a S_W = S_1 + S_2, \tilde{s_1}^2 + \tilde{s_2}^2 = a^TS_Wa SW=S1+S2,s1~2+s2~2=aTSWa

( μ 1 ~ − μ 2 ~ ) 2 = ( a T μ 1 − a T μ 2 ) 2 = a T ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T a = a T S B a (\tilde{\mu_1} - \tilde{\mu_2})^2 = (a^T\mu_1 - a^T\mu_2)^2 = a^T(\mu_1 - \mu_2) (\mu_1 - \mu_2)^Ta = a^TS_Ba (μ1~μ2~)2=(aTμ1aTμ2)2=aT(μ1μ2)(μ1μ2)Ta=aTSBa

between-class scatter matrix: S B = ( μ 1 − μ 2 ) ( μ 1 − μ 2 ) T S_B = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T SB=(μ1μ2)(μ1μ2)T

J ( a ) = a T S B a a T S W a J(a) = \frac{a^TS_Ba}{a^TS_Wa} J(a)=aTSWaaTSBa
S B a = λ S W a S_Ba = \lambda S_Wa SBa=λSWa
S W − 1 S B a = λ a S_W^{-1}S_Ba = \lambda a SW1SBa=λa

Chapter Four

FP mining

itemset: A set of one or more items
k-itemset X = { x 1 , … , x k } X = \{x_1, …, x_k\} X={x1,,xk}
(absolute) support, or, support count of X: Frequency or occurrence of an itemset X X X;
(relative) support, s, is the fraction of transactions that contains X X X (i.e., the probability that a transaction contains X X X).
An itemset X X X is frequent if X X X’s support is no less than a minsup threshold.

Find all the rules X → Y X \rightarrow Y XY with minimum support and confidence.
support, s, probability that a transaction contains X ∪ Y X \cup Y XY;
confidence, c, conditional probability that a transaction having X X X also contains Y Y Y.

closed-patterns and max-patterns
An itemset X X X is closed if X is frequent and there exists no super-pattern Y ⊃ X Y \supset X YX, with the same support as X X X;
An itemset X X X is a max-pattern if X X X is frequent and there exists no frequent super-pattern Y ⊃ X Y \supset X YX.

So a max-pattern is a closed-pattern.

Apriori

An important property: ** Any subset of a frequent itemset must be frequent**.
Apriori pruning principle: If there is any itemset which is infrequent, its superset should not be generated/tested!

Method:

  1. Initially, scan DB once to get frequent 1-itemset;
  2. Generate length (k+1) candidate itemsets from length k frequent itemsets;
  3. Test the candidates against DB;
  4. Terminate when no frequent or candidate set can be generated;

Example:

Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=null; k++) do begin
    Ck+1 = candidates generated from Lk;
    for each transaction t in database do
  increment the count of all candidates in Ck+1 that are contained in t
    Lk+1  = candidates in Ck+1 with min_support
    end
return union(Lk);

Major computational challenges:

  1. Multiple scans of transaction database
  2. Huge number of candidates
  3. Tedious workload of support counting for candidates

Improving Apriori: general ideas:

  1. Reduce passes of transaction database scans
  2. Shrink number of candidates
  3. Facilitate support counting of candidates
FP-growth

Here, we link an website, it[step 3, pages 28] says that “Recursively mine conditional FP‐trees and grow frequent patterns obtained so far. If the conditional FP‐tree contains a single path, simply enumerate all the patterns”.

Mining sequential patterns

sequential patterns:

GSP

Chapter Five

Decision Tree
Bayes Classification Methods
Support Vector Machines

Decision Tree

It is derivated in the aspect of propability. We can calculate the propability of every output with the given input. If we assume every condition is independent, then P ( X ∣ C ) = ∏ P ( X i ∣ C ) P(X|C) = \prod P(X_i|C) P(XC)=P(XiC), then l o g P ( X ∣ C ) = ∑ l o g P ( X i ∣ C ) logP(X|C) = \sum logP(X_i|C) logP(XC)=logP(XiC), so we let the cost Function be l o g log log. To understand better, we can use the concept of thermodynamics, which is called entropy.

H ( Y ) = − ∑ i = 1 m p i l o g ( p i ) H(Y) = - \sum_{i = 1}^m p_i log(p_i) H(Y)=i=1mpilog(pi) where p i = P ( Y = y i ) p_i = P(Y = y_i) pi=P(Y=yi)
H ( Y ∣ X ) = − ∑ x p ( x ) H ( Y ∣ X = x ) H(Y|X) = - \sum_x p(x)H(Y|X = x) H(YX)=xp(x)H(YX=x)

I n f o ( D ) = − ∑ i = 1 m p i l o g 2 ( p i ) Info(D) = -\sum_{i = 1}^m p_i log_2(p_i) Info(D)=i=1mpilog2(pi)
I n f o A ( D ) = − ∑ j = 1 v ∣ D j ∣ ∣ D ∣ × I n f o ( D j ) Info_A(D) = -\sum_{j = 1}^v \frac{|D_j|}{|D|} \times Info(D_j) InfoA(D)=j=1vDDj×Info(Dj)
G a i n ( A ) = I n f o ( D ) − I n f o A ( D ) Gain(A) = Info(D) - Info_A(D) Gain(A)=Info(D)InfoA(D)

Bayes Classification Methods

First, we know that P ( B ) = ∑ i = 1 M P ( B ∣ A i ) P ( A i ) P(B) = \sum_{i = 1}^M P(B|A_i)P(A_i) P(B)=i=1MP(BAi)P(Ai), and P ( H ∣ X ) = P ( X ∣ H ) P ( H ) P ( X ) P(H|X) = \frac{P(X|H)P(H)}{P(X)} P(HX)=P(X)P(XH)P(H)
Assume all condition is independent, then P ( X ∣ C i ) = ∏ k = 1 n P ( x k ∣ C i ) P(X|C_i) = \prod_{k = 1}^n P(x_k|C_i) P(XCi)=k=1nP(xkCi)

Naïve Bayesian prediction requires each conditional prob. be non-zero. Otherwise, the predicted prob. will be zero.
Use Laplacian correction:

  1. Adding 1 to each case
  2. The “corrected” prob. estimates are close to their “uncorrected” counterparts
Support Vector Machines

SVM原理-1
SVM原理-2

Model Evaluation and Selection

Confusion Matrix:

Actual class/ Predicted class C 1 C_1 C1 ¬ C 1 \neg C_1 ¬C1
C 1 C_1 C1True Positive(TP)False Negative(FN)
¬ C 1 \neg C_1 ¬C1False Positive(FP)True Negative(TN)

Accuracy: T P + T N A L L \frac{TP + TN}{ALL} ALLTP+TN
Error rate: F P + F N A L L \frac{FP + FN}{ALL} ALLFP+FN
Sensitivity: T P P \frac{TP}{P} PTP
Specificity: T N N \frac{TN}{N} NTN

Precision: T P T P + F P \frac{TP}{TP + FP} TP+FPTP
Recall: T P T P + F N \frac{TP}{TP + FN} TP+FNTP
F measure: 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l \frac{2 \times Precision \times Recall}{Precision + Recall} Precision+Recall2×Precision×Recall
F-beta measure: ( 1 + β 2 ) × P r e c i s i o n × R e c a l l β × P r e c i s i o n + R e c a l l \frac{(1 + \beta^2) \times Precision \times Recall}{\beta \times Precision + Recall} β×Precision+Recall(1+β2)×Precision×Recall

Holdout method
Cross-validation
Bootstrap
Estimating Confidence Intervals

t-test

ROC curves

Chapter Six

K-means
K-medoids

choose the closet point of the K-means center.

Q

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值