作业 1.6
作业 2.6
- A = { 3 , 5 } \mathbf{A}=\{3,5\} A={3,5}, 2 A = { ∅ , { 3 } , { 5 } , { 3 , 5 } } 2^{\mathbf{A}}=\{\emptyset,\{3\},\{5\},\{3,5\}\} 2A={∅,{3},{5},{3,5}}
- 2 ∅ = { ∅ } 2^{\emptyset}=\{\emptyset\} 2∅={∅}
- 谓词:
A
=
{
x
∈
N
∣
5
≤
x
≤
9
}
\mathbf{A}=\{x\in\mathbf{N}|5\leq x \leq 9\}
A={x∈N∣5≤x≤9}
简记: A = [ 5..9 ] \mathbf{A}=[5..9] A=[5..9]
作业3.6
- [ 1 1 2 2 3 3 ] ⋅ [ 1 2 3 1 4 5 6 1 ] = [ 5 7 9 2 10 14 18 4 15 21 27 6 ] \left[\begin{array}{ll} 1 & 1 \\ 2 & 2 \\ 3 & 3 \end{array}\right] \cdot \left[\begin{array}{lll} 1 & 2 & 3 & 1 \\ 4 & 5 & 6 & 1 \end{array}\right]=\left[\begin{array}{ccc} 5 & 7 & 9 & 2 \\ 10 & 14 & 18 & 4 \\ 15 & 21 & 27& 6 \end{array}\right] ⎣⎡123123⎦⎤⋅[14253611]=⎣⎡510157142191827246⎦⎤
作业4.6
-
R
=
{
(
a
,
b
)
∈
A
×
A
∣
a
m
o
d
2
=
b
m
o
d
2
}
\mathbf{R}=\{(a,b)\in\mathbf{A}\times\mathbf{A}\vert a \mod 2 = b \mod 2\}
R={(a,b)∈A×A∣amod2=bmod2}.
R = { ( 1 , 1 ) , ( 1 , 5 ) , ( 1 , 9 ) , ( 5 , 1 ) , ( 9 , 1 ) , ( 5 , 5 ) , ( 5 , 9 ) , ( 9 , 5 ) , ( 9 , 9 ) , ( 2 , 2 ) , ( 2 , 8 ) , ( 8 , 2 ) , ( 8 , 8 ) } \mathbf{R}=\{(1,1),(1,5),(1,9),(5,1),(9,1),(5,5),(5,9),(9,5),(9,9),(2,2),(2,8),(8,2),(8,8)\} R={(1,1),(1,5),(1,9),(5,1),(9,1),(5,5),(5,9),(9,5),(9,9),(2,2),(2,8),(8,2),(8,8)} -
R
1
=
{
(
1
,
1
)
,
(
2
,
2
)
,
(
1
,
2
)
}
\mathbf{R}_1=\{(1,1),(2,2),(1,2)\}
R1={(1,1),(2,2),(1,2)},
R
2
=
{
(
5
,
5
)
,
(
1
,
2
)
,
(
2
,
1
)
}
\mathbf{R}_2=\{(5,5),(1,2),(2,1)\}
R2={(5,5),(1,2),(2,1)}
R 1 ∘ R 2 = { ( 1 , 2 ) , ( 2 , 1 ) , ( 1 , 1 ) } \mathbf{R}_1\circ\mathbf{R}_2=\{(1,2),(2,1),(1,1)\} R1∘R2={(1,2),(2,1),(1,1)}
R + = ⋃ i = 1 ∣ A ∣ R i = { ( 1 , 1 ) , ( 2 , 1 ) , ( 2 , 2 ) , ( 1 , 2 ) , ( 5 , 5 ) } \mathbf{R}^{+}=\bigcup_{i=1}^{|\mathbf{A}|} \mathbf{R}^{i}=\{(1,1),(2,1),(2,2),(1,2),(5,5)\} R+=⋃i=1∣A∣Ri={(1,1),(2,1),(2,2),(1,2),(5,5)}
R 1 ∗ = R 1 + ∪ A 0 = { ( 1 , 1 ) , ( 2 , 1 ) , ( 2 , 2 ) , ( 1 , 2 ) , ( 5 , 5 ) , ( 5 , 5 ) } \mathbf{R}_{1}^{*}=\mathbf{R}_{1}^{+} \cup \mathbf{A}^{0}=\{(1,1),(2,1),(2,2),(1,2),(5,5),(5,5)\} R1∗=R1+∪A0={(1,1),(2,1),(2,2),(1,2),(5,5),(5,5)}
作业5.5
- 函数是一种映射关系
f ( x ) : R → R f(x):\mathbb{R}\to\mathbb{R} f(x):R→R是一元函数,定义域和值域满足二元关系
f ( x ) : R m → R f(x):\mathbb{R}^m\to\mathbb{R} f(x):Rm→R是多元函数,定义域和值域满足m+1元关系
作业6.5
- A = [ 1 2 1 2 3 2 1 2 1 ] \mathbf{A}=\left[\begin{array}{ll} 1 & 2 & 1\\ 2 & 3 & 2\\ 1 & 2 & 1 \end{array}\right] A=⎣⎡121232121⎦⎤
l
0
l_0
l0范数
∥
A
∥
0
=
{
(
i
,
j
)
∣
a
i
j
≠
0
}
=
9
\Vert\mathbf{A}\Vert_0=\{(i,j)\vert a_{ij} \neq 0\}=9
∥A∥0={(i,j)∣aij=0}=9
l
1
l_1
l1范数
∥
A
∥
1
=
∑
i
,
j
∣
a
i
j
∣
=
15
\Vert\mathbf{A}\Vert_1=\sum_{i,j}\vert a_{ij}\vert=15
∥A∥1=∑i,j∣aij∣=15
l
2
l_2
l2范数
∥
A
∥
2
=
∑
i
,
j
a
i
j
2
=
29
\Vert\mathbf{A}\Vert_2=\sqrt{\sum_{i, j} a_{i j}^{2}}=\sqrt{29}
∥A∥2=∑i,jaij2=29
l
∞
l_\infty
l∞范数
∥
A
∥
∞
=
max
i
,
j
∣
a
i
j
∣
=
3
\Vert\mathbf{A}\Vert_\infty=\max_{i,j}\vert a_{ij}\vert=3
∥A∥∞=maxi,j∣aij∣=3
作业7.3
-
min
∑
(
i
,
j
)
∈
Ω
(
f
(
x
i
,
t
j
)
−
r
i
j
)
2
\min \sum_{(i, j) \in \Omega}\left(f\left(\mathbf{x}_{i}, \mathbf{t}_{j}\right)-r_{i j}\right)^{2}
min∑(i,j)∈Ω(f(xi,tj)−rij)2
( i , j ) ∈ Ω (i, j) \in \Omega (i,j)∈Ω 数据集所属
f ( x i , t j ) f\left(\mathbf{x}_{i}, \mathbf{t}_{j}\right) f(xi,tj)模型对第I个用户的第j个商品的评分
r i j r_{i j} rij 第i个用户对第j个商品的真实评分
整体就是让模型的预测评分接近真实评分
作业8.3
- ∑ i % 2 = 0 x i \sum_{i \% 2=0} x_{i} ∑i%2=0xi
- X = [ 1 2 1 2 3 2 1 2 1 ] \mathbf{X}=\left[\begin{array}{ll} 1 & 2 & 1\\ 2 & 3 & 2\\ 1 & 2 & 1 \end{array}\right] X=⎣⎡121232121⎦⎤
计算
X
\mathbf{X}
X下三角分量累加:
∑
x
≤
j
x
i
j
=
∑
j
=
1
n
∑
i
=
1
j
x
x
j
=
1
+
2
+
3
+
1
+
2
+
1
=
10
\sum_{x \leq j} x_{i j}=\sum_{j=1}^{n} \sum_{i=1}^{j} x_{x j}=1+2+3+1+2+1=10
∑x≤jxij=∑j=1n∑i=1jxxj=1+2+3+1+2+1=10
计算
X
\mathbf{X}
X整数累乘:
∏
i
=
1
n
x
i
j
=
1
⋅
2
⋅
1
⋅
2
⋅
3
⋅
2
⋅
1
⋅
2
⋅
=
48
\prod_{i = 1}^{n}x_{ij}=1\cdot2\cdot1\cdot2\cdot3\cdot2\cdot1\cdot2\cdot=48
∏i=1nxij=1⋅2⋅1⋅2⋅3⋅2⋅1⋅2⋅=48
存在
D
=
{
(
x
,
y
)
∣
1
≤
x
2
+
y
2
≤
4
}
D=\left\{(x, y) \mid 1 \leq x^{2}+y^{2} \leq 4\right\}
D={(x,y)∣1≤x2+y2≤4}计算
∬
D
sin
(
π
x
2
+
y
2
)
x
2
+
y
2
d
x
d
y
\iint_{D} \frac{\sin \left(\pi \sqrt{x^{2}+y^{2}}\right)}{\sqrt{x^{2}+y^{2}}} d x d y
∬Dx2+y2sin(πx2+y2)dxdy
∬
D
sin
(
π
x
2
+
y
2
)
x
2
+
y
2
d
x
d
y
=
∫
0
2
π
d
θ
∫
1
2
sin
π
r
r
r
d
r
=
−
4
\iint_{D} \frac{\sin \left(\pi \sqrt{x^{2}+y^{2}}\right)}{\sqrt{x^{2}+y^{2}}} d x d y=\int_{0}^{2 \pi} d \theta \int_{1}^{2} \frac{\sin \pi r}{r} r d r=-4
∬Dx2+y2sin(πx2+y2)dxdy=∫02πdθ∫12rsinπrrdr=−4
-
∫
0
1
x
2
d
x
=
1
3
x
3
∣
0
1
=
1
3
\int_{0}^{1} x^{2} d x=\frac{1}{3}x^3|_{0}^{1}=\frac{1}{3}
∫01x2dx=31x3∣01=31
推导9.2
向量导数:
d
X
T
d
X
=
I
\frac{\mathrm{d} \mathbf{X}^{\mathrm{T}}}{\mathrm{d} \mathbf{X}}=\boldsymbol{I}
dXdXT=I、
d
X
d
X
T
=
I
\frac{\mathrm{d} \mathbf{X}}{\mathrm{~d} \mathbf{X}^{\mathrm{T}}}=\boldsymbol{I}
dXTdX=I、
d
X
T
A
d
X
T
=
A
\frac{\mathrm{d} \mathbf{X}^{\mathrm{T}} A}{\mathrm{~d} \mathbf{X}^{\mathrm{T}}}=A
dXTdXTA=A、
d
A
X
d
X
=
A
T
\frac{\mathrm{d} A \mathbf{X}}{\mathrm{~d} \mathbf{X}}=A^{\mathrm{T}}
dXdAX=AT、
d
A
X
d
X
T
=
A
\frac{\mathrm{d} A \mathbf{X}}{\mathrm{~d} \mathbf{X}^{\mathrm{T}}}=A
dXTdAX=A、
d
X
A
d
X
=
A
T
\frac{\mathrm{d} \mathbf{X} A}{\mathrm{~d} \mathbf{X}}=A^{\mathrm{T}}
dXdXA=AT
∥
X
w
−
Y
∥
2
2
=
(
X
w
−
Y
)
T
(
X
w
−
Y
)
=
(
w
T
X
T
−
Y
T
)
(
X
w
−
Y
)
=
w
T
X
T
X
w
−
w
T
X
T
Y
−
Y
T
X
w
+
Y
T
Y
\begin{aligned}\|\mathbf{X} \mathbf{w}-\mathbf{Y}\|_{2}^{2} &=(\mathbf{X} \mathbf{w}-\mathbf{Y})^{\mathrm{T}}(\mathbf{X} \mathbf{w}-\mathbf{Y}) \\ &=\left(\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}}-\mathbf{Y}^{\mathrm{T}}\right)(\mathbf{X} \mathbf{w}-\mathbf{Y}) \\ &=\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}-\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{Y}-\mathbf{Y}^{\mathrm{T}} \mathbf{X} \mathbf{w}+\mathbf{Y}^{\mathrm{T}} \mathbf{Y} \end{aligned}
∥Xw−Y∥22=(Xw−Y)T(Xw−Y)=(wTXT−YT)(Xw−Y)=wTXTXw−wTXTY−YTXw+YTY
令
f
(
w
)
=
w
T
X
T
X
w
−
w
T
X
T
Y
−
Y
T
X
w
+
Y
T
Y
f(\mathbf{w})=\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}-\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{Y}-\mathbf{Y}^{\mathrm{T}} \mathbf{X} \mathbf{w}+\mathbf{Y}^{\mathrm{T}} \mathbf{Y}
f(w)=wTXTXw−wTXTY−YTXw+YTY
d
f
d
w
=
X
T
X
w
+
(
w
T
X
T
X
)
T
−
X
T
Y
−
X
T
Y
\frac{\mathrm{d} f}{\mathrm{~d} w}=\mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}+\left(\mathbf{w}^{\mathrm{T}} \mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{\mathrm{T}}-\mathbf{X}^{\mathrm{T}} \mathbf{Y}-\mathbf{X}^{\mathrm{T}} \mathbf{Y}
dwdf=XTXw+(wTXTX)T−XTY−XTY
d
f
d
w
=
X
T
X
w
+
X
T
X
w
−
X
T
Y
−
X
T
Y
\frac{\mathrm{d} f}{\mathrm{~d} w}=\mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}+\mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}-\mathbf{X}^{\mathrm{T}} \mathbf{Y}-\mathbf{X}^{\mathrm{T}} \mathbf{Y}
dwdf=XTXw+XTXw−XTY−XTY
d
f
d
w
=
2
X
T
X
w
−
2
X
T
Y
\frac{\mathrm{d} f}{\mathrm{~d} w}=2 \mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}-\mathbf{2} \mathbf{X}^{\mathrm{T}} \mathbf{Y}
dwdf=2XTXw−2XTY
原式导数
d
∥
X
w
−
Y
∥
2
2
d
w
=
X
T
X
w
−
X
T
Y
\frac{\mathrm{d} \|\mathbf{X} \mathbf{w}-\mathbf{Y}\|_{2}^{2} }{\mathrm{~d} w}=\mathbf{X}^{\mathrm{T}} \mathbf{X} \mathbf{w}-\mathbf{X}^{\mathrm{T}} \mathbf{Y}
dwd∥Xw−Y∥22=XTXw−XTY
令
d
∥
X
w
−
Y
∥
2
2
d
w
=
0
\frac{\mathrm{d} \|\mathbf{X} \mathbf{w}-\mathbf{Y}\|_{2}^{2} }{\mathrm{~d} w}=0
dwd∥Xw−Y∥22=0 有
w
=
(
X
T
X
)
−
1
X
T
Y
\mathbf{w}=\left(\mathbf{X}^{\mathrm{T}} \mathbf{X}\right)^{-\mathbf{1}} \mathbf{X}^{\mathrm{T}} \mathbf{Y}
w=(XTX)−1XTY
作业9.3
作业10.6
- 值域就是概率: P ( y = 1 ∣ x ; w ) = 1 1 + e − x w P(y=1 \mid \mathbf{x} ; \mathbf{w})=\frac{1}{1+e^{-\mathbf{x} \mathbf{w}}} P(y=1∣x;w)=1+e−xw1
- 整体优化: arg max w L ( w ) = ∏ i = 1 n P ( y i ∣ x i ; w ) \underset{\mathbf{w}}{\arg \max } L(\mathbf{w})=\prod_{i=1}^{n} P\left(y_{i} \mid \mathbf{x}_{i} ; \mathbf{w}\right) wargmaxL(w)=∏i=1nP(yi∣xi;w)
- 累乘用log不改单调,优化计算:
log L ( w ) = ∑ i = 1 n log P ( y i ∣ x i ; w ) = ∑ i = 1 n y i log P ( y i = 1 ∣ x i ; w ) + ( 1 − y i ) log ( 1 − P ( y i = 1 ∣ x i ; w ) ) = ∑ i = 1 n y i log P ( y i = 1 ∣ x i ; w ) 1 − P ( y i = 1 ∣ x i ; w ) + log ( 1 − P ( y i = 1 ∣ x i ; w ) ) = ∑ i = 1 n y i x i w − log ( 1 + e x i w ) \begin{aligned} \log L(\mathbf{w}) &=\sum_{i=1}^{n} \log P\left(y_{i} \mid \mathbf{x}_{i} ; \mathbf{w}\right) \\ &=\sum_{i=1}^{n} y_{i} \log P\left(y_{i}=1 \mid \mathbf{x}_{i} ; \mathbf{w}\right)+\left(1-y_{i}\right) \log \left(1-P\left(y_{i}=1 \mid \mathbf{x}_{i} ; \mathbf{w}\right)\right) \\ &=\sum_{i=1}^{n} y_{i} \log \frac{P\left(y_{i}=1 \mid \mathbf{x}_{i} ; \mathbf{w}\right)}{1-P\left(y_{i}=1 \mid \mathbf{x}_{i} ; \mathbf{w}\right)}+\log \left(1-P\left(y_{i}=1 \mid \mathbf{x}_{i} ; \mathbf{w}\right)\right) \\ &=\sum_{i=1}^{n} y_{i} \mathbf{x}_{i} \mathbf{w}-\log \left(1+e^{\mathbf{x}_{i} \mathbf{w}}\right) \end{aligned} logL(w)=i=1∑nlogP(yi∣xi;w)=i=1∑nyilogP(yi=1∣xi;w)+(1−yi)log(1−P(yi=1∣xi;w))=i=1∑nyilog1−P(yi=1∣xi;w)P(yi=1∣xi;w)+log(1−P(yi=1∣xi;w))=i=1∑nyixiw−log(1+exiw) - 梯度下降,数值迭代: w t + 1 = w t − α ∂ log L ( w ) ∂ w \mathbf{w}^{t+1}=\mathbf{w}^{t}-\alpha \frac{\partial \log L(\mathbf{w})}{\partial \mathbf{w}} wt+1=wt−α∂w∂logL(w)