Haiyun_Jin-CSDN博客

原创 Overview Of MNAR Matrix Completion Under Nuclear Norm assumption

Why Use Propensity ScoreSetupSet up three matrices:signol matrix S∈Rm,nS \in \mathbb{R}^{m,n}S∈Rm,n.noise matrix W∈Rm,nW \in \mathbb{R}^{m,n}W∈Rm,n.probability matrix P∈[0,1]m,nP \in [0, 1]^{m,n...

2020-03-26 22:55:38 302

原创 Norms in Matrix and Vector

Norms in VectorP-norm∣∣V∣∣P=(∑i=1n∣Vi∣p)1p||V||_P = (\sum_{i=1}^{n} |V_i|^p)^{\frac{1}{p}}∣∣V∣∣P=(i=1∑n∣Vi∣p)p1Frobenius Norm∣∣V∣∣F=(∑i=1n∣Vi∣2)12||V||_F = (\sum_{i=1}^{n} |V_i|^2)^{\frac{1}{2...

2020-03-25 05:44:46 167

原创 SVD vs PCA vs 1bitMC

SVD vs PCA vs 1bitMCEigenDecompositionPCASVDEigenDecompositionFor any real symmetric square d×dd \times dd×d matrix A , we can find its eigenvalues λ1≥λ2≥...≥λd\lambda_1 \ge \lambda_2 \ge ... \g...

2020-03-25 04:51:38 223

原创知识盲点为什么Poisson/logistic regression会被derive成linear regression。

2019-05-07 13:08:16 196

转载 EM and Variational Inference Derivation

https://chrischoy.github.io/research/Expectation-Maximization-and-Variational-Inference/

2019-05-04 23:49:10 166

原创 Gibbs is special case of Metropolis

Although they appear quite different, Gibbs sampling is a special case of the Metropolis-Hasting algorithmSpecifically, Gibbs sampling involves a proposal from the full conditional distribution, whic...

2019-04-29 21:23:24 114

原创 Metropolis Method Condition Derivation

Metropolis-Hasting involves designing a Markov process by constructing Transition Probabilities, which has a unique stationary distribution π(x)\pi(x)π(x) if it fulfills two conditions:Existitence o...

2019-04-29 12:42:41 190

原创 Why does Markov Matrix contain eigenvalue=1 and eigenvalues less than or equa to1?

The intuition is that either the arbitrary vector v⃗\vec{v}v or the matrix PPP preserve the probability property that the sume of entries v⃗\vec{v}v is 1 or the sum of each row in PPP is 1.So that su...

2019-04-29 00:16:11 259

原创 Information Theory: Self-Information, Entropy, Relative entropy,Cross entropy, Conditinal Entropy

Self-Information: I(x)=log⁡1P(x)I(x) = \log \frac{1}{P(x)}I(x)=logP(x)1Entropy :H(X)=E[I(X)]=E(log⁡1P(X))=∑x∈XP(x)log⁡1P(x)H(X)=E[I(X)] =E(\log \frac{1}{P(X)})=\sum_{x \in X} P(x)\log \frac{1}{P(x)}...

2019-04-13 02:17:47 210

原创 KL divergence over-estimate/under-estimate

reference: Very intuitive example shown in this bloghttps://wiseodd.github.io/techblog/2016/12/21/forward-reverse-kl/

2019-04-13 01:17:38 302

原创 Ridge Linear Regression Estimation Invertible

Reference: https://math.stackexchange.com/questions/2447060/prove-that-the-regularization-term-in-rls-makes-the-matrix-invertibleθ^=(XTX+λI)−1Xy\hat{\theta} = (X^TX + \lambda I)^{-1}Xyθ^=(XTX+λI)−1Xy...

2019-04-10 22:58:21 365

原创 Bias-Variance+Noise Decomposition in Linear Regression

Model:y=F(x)+vF(x) 在这里可以看做的oracle model，不随training data的改变而改变。\begin{aligned}&y = F(\mathbf{x}) + v\\&\text{$F(\mathbf{x})$ 在这里可以看做的oracle model，不随training data的改变而改...

2019-03-24 04:06:22 167

原创困惑我好久的Multivariate Linear Regression 求导问题

Suppose we have (x⃗(i),y(i)\vec{x}^{(i)}, y^{(i)}x(i),y(i)) with sample size NNN, where x⃗(i)∈RD\vec{x}^{(i)} \in \mathbb{R}^Dx(i)∈RD.y^=∑j=1Dβjxj\hat{y} =\sum_{j=1}^D \beta_jx_j y^=j=1∑DβjxjL(...

2019-03-22 10:48:02 203

原创 Review: Model User Exposure in Recommender

简介对于解决implicit data recommendation的问题，最主要要难点是，user-item matrix只有0-1的信息，并且我们inference user preference的时候，我们必须要用到0的信息。但是我们不知道数值为0的item到底是真的用户不喜欢/不点击，还是这个item压根就没有被没有被推荐给用户。很多老的方法没有考虑这点，单纯的假设所有0的信息代表物品被...

2019-03-03 06:54:41 404

原创 Square Loss Function in Frequentist and Bayesian View

Suppose we have X1,....,X2∼N(θ,σ02)X_1,....,X_2 \sim N(\theta, \sigma_0^2)X1,....,X2∼N(θ,σ02)Loss Function: Square LossL(δ(x⃗)−θ)=(δ(x⃗)−θ)2L(\delta(\vec{x}) - \theta) = (\delta(\vec{x})-\theta)^...

2019-02-21 00:28:40 219

原创 MLE assumption 问题

MLE的成立不建立在任何假设上。有时MLE最后的形式不是一个closed form，不能确切的求出这个estimator。所以这个时候，我们可以asymptotically来approximate 这个值。当n 趋于无穷的时候，MLE会follow一个normal distribution: n(θ^−θ)∼N(0,I−1)\sqrt n(\hat\theta-\theta)\sim N(0, ...

2019-02-17 07:44:53 315 1

原创模型的解释性三部曲-总结

关于三篇解释性文章的总结细节总结细节第一篇：训练方法：先计算出每个 word的beta值-important score.给定长度为k的phrase，可以算出这个phrase在class1和class2下的average effect需要用到beta的值计算。因为即使每个相同的word在不同的sentence下的beta score也不一样，所以要take average。如果对class...

2019-02-17 07:12:24 1457

weixin_32334291的博客