![](https://img-blog.csdnimg.cn/20201014180756757.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
CMU 11-785
文章平均质量分 93
zealscott
https://tech.zealscott.com
展开
-
CMU 11-785 L23 Variational Autoencoders
EM for PCAWith complete informationIf we knew zzz for each xxx, estimating AAA and DDD would be simplex=Az+Ex=A z+E x=Az+EP(x∣z)=N(Az,D)P(x \mid z)=N(A z, D)P(x∣z)=N(Az,D)Given complete information (x1,z1),(x2,z2)\left(x_{1}, z_{1}\right),\lef原创 2021-03-08 17:18:45 · 316 阅读 · 0 评论 -
CMU 11-785 L22 Revisiting EM algorithm and generative models
Key pointsEM: An iterative technique to estimate probability models for data with missing components or informationBy iteratively “completing” the data and reestimating parametersPCA: Is actually a generative model for Gaussian dataData lie close原创 2021-02-25 21:15:01 · 378 阅读 · 0 评论 -
CMU 11-785 L21 Boltzmann machines2
The Hopfield net as a distributionThe Helmholtz Free Energy of a SystemAt any time, the probability of finding the system in state sss at temperature TTT is PT(s)P_T(s)PT(s)At each state it has a potential energy EsE_sEsThe internal energy of t原创 2021-01-21 21:18:24 · 339 阅读 · 0 评论 -
CMU 11-785 L20 Boltzmann machines 1
Training hopfield netsGeometric approachW=YYT−NpI\mathbf{W}=\mathbf{Y} \mathbf{Y}^{T}-N_{p} \mathbf{I}W=YYT−NpIE(y)=yTWy\mathbf{E}(\mathbf{y})=\mathbf{y}^{T} \mathbf{W y}E(y)=yTWySine : yT(YYT−NpI)y=yTYYTy−NNp\mathbf{y}^{T}\left(\mathbf{Y} \mat原创 2020-12-16 17:46:42 · 166 阅读 · 0 评论 -
CMU 11-785 L19 Hopfield network
Hopfield NetSo far, neural networks for computation are all feedforward structuresLoopy networkEach neuron is a perceptron with +1/-1 outputEvery neuron receives input from every other neuronEvery neuron outputs signals to every other neuron原创 2020-11-07 17:51:01 · 210 阅读 · 0 评论 -
CMU 11-785 L18 Representation
Logistic regressionThis the perceptron with a sigmoid activationIt actually computes the probability that the input belongs to class 1Decision boundaries may be obtained by comparing the probability to a thresholdThese boundaries will be lines (hype原创 2020-11-07 17:48:11 · 119 阅读 · 0 评论 -
CMU 11-785 L17 Seq2seq and attention model
Generating LanguageSynthesisInput: symbols as one-hot vectorsDimensionality of the vector is the size of the 「vocabulary」Projected down to lower-dimensional “embeddings”The hidden units are (one or more layers of) LSTM unitsOutput at each time:原创 2020-08-06 16:39:21 · 235 阅读 · 0 评论 -
CMU 11-785 L16 Connectionist Temporal Classification
Sequence to sequenceSequence goes in, sequence comes outNo notion of “time synchrony” between input and outputMay even nots maintain order of symbols (from one language to another)With order synchronyThe input and output sequences happen in the原创 2020-08-06 16:36:46 · 223 阅读 · 0 评论 -
CMU 11-785 L15 Divergence of RNN
Variants on recurrent netsArchitecturesHow to train recurrent networks of different architecturesSynchronyThe target output is time-synchronous with the inputThe target output is order-synchronous, but not time synchronousOne to oneNo rec原创 2020-05-30 21:34:56 · 393 阅读 · 0 评论 -
CMU 11-785 L14 Stability analysis and LSTMs
StabilityWill this necessarily be「Bounded Input Bounded Output」?Guaranteed if output and hidden activations are boundedBut will it saturate?Analyzing RecursionSufficient to analyze the behavior of the hidden layer since it carries the relevant原创 2020-05-25 19:50:18 · 444 阅读 · 0 评论 -
CMU 11-785 L13 Recurrent Networks
Modelling SeriesIn many situations one must consider a series of inputs to produce an outputOutputs too may be a seriesFinite response modelCan use convolutional neural net applied to series data (slide)Also called a Time-Delay neural network原创 2020-05-20 23:35:11 · 286 阅读 · 0 评论 -
CMU 11-785 L12 Back propagation through a CNN
ConvolutionEach position in zzz consists of convolution result in previous mapWay for shrinking the mapsStride greater than 1Downsampling (not necessary)Typically performed with strides > 1PoolingMaxpoolingNote: keep tracking of loc原创 2020-05-19 19:37:00 · 253 阅读 · 0 评论 -
CMU 11-785 L10 CNN architecture
ArchitectureA convolutional neural network comprises “convolutional” and “downsampling ” layersConvolutional layers comprise neurons that scan their input for patternsDownsampling layers perform max operations on groups of outputs from the convolutio原创 2020-05-19 19:28:29 · 177 阅读 · 0 评论 -
CMU 11-785 L09 Cascade-Correlation and Deep Learning
Cascade-Correlation AlgorithmStart with direct I/O connections only. No hidden units.Train output-layer weights using BP or Quickprop.If error is now acceptable, quit.Else, Create one new hidden unit offline.Create a pool of candidate units. Each ge原创 2020-05-19 19:23:48 · 249 阅读 · 0 评论 -
CMU 11-785 L08 Motivation of CNN
MovivationFind a word in a signal of find a item in pictureThe need for shift invarianceThe location of a pattern is not importantSo we can scan with a same MLP for the patternJust one giant...原创 2020-05-07 22:24:15 · 167 阅读 · 0 评论 -
CMU 11-785 L06 Optimization
ProblemsDecaying learning rates provide googd compromise between escaping poor local minima and convergenceMany of the convergence issues arise because we force the same learning rate on all parame...原创 2020-05-03 15:01:26 · 252 阅读 · 0 评论 -
CMU 11-785 L07 Optimizers and regularizers
OptimizersMomentum and Nestorov’s method improve convergence by normalizing the mean (first moment) of the derivativesConsidering the second momentsRMS Prop / Adagrad / AdaDelta / ADAM1Simple ...原创 2020-05-03 15:01:45 · 185 阅读 · 0 评论 -
CMU 11-785 L05 Convergence
BackpropagationThe divergence function minimized is only a proxy for classification error(like Softmax)Minimizing divergence may not minimize classification errorDoes not separate the points even...原创 2020-04-23 23:24:45 · 167 阅读 · 0 评论 -
CMU 11-785 L03.5 A brief note on derivatives
What is derivatives?A derivative of a function at any point tells us how much a minute increment to the argument of the function will increment the value of the functionTo be clear, what we want ...原创 2020-04-21 20:02:06 · 186 阅读 · 0 评论 -
CMU 11-785 L04 Backpropagation
Problem setupInput-output pairs: not to mentionRepresenting the output: one-hot vectoryi=exp(zi)∑jexp(zj)y_{i}=\frac{\exp \left(z_{i}\right)}{\sum_{j} \exp \left(z_{j}\right)}yi=∑jexp(z...原创 2020-04-21 19:56:29 · 175 阅读 · 0 评论 -
CMU 11-785 L03 Learning the network
PreliminaryThe bias can also be viewed as the weight of another input component that is always set to 1z=∑iwixiz=\sum_{i} w_{i} x_{i}z=∑iwixiWhat we learn: The …parameters… of the network...原创 2020-03-16 16:08:36 · 197 阅读 · 0 评论 -
CMU 11-785 L02 What can a network represent
PreliminaryPerceptronThreshold unit“Fires” if the weighted sum of inputs exceeds a thresholdSoft perceptronUsing sigmoid function instead of a threshold at the outputActivation: The functio...原创 2020-03-04 11:01:50 · 259 阅读 · 0 评论