机器学习
hebastast
这个作者很懒,什么都没留下…
展开
-
Santander unhappy customer
import pandas as pdimport numpy as npimport warnings #drop warnings generated by warnings.filterwarnings('ignore')import seaborn as sns%matplotlib inlineimport matplotlib.pyplot as pltsns.set(s原创 2016-08-29 19:46:42 · 1176 阅读 · 0 评论 -
neural network -recognize handwritten digits
"""network.py~~~~~~~~~~A module to implement the stochastic gradient descent learningalgorithm for a feedforward neural network. Gradients are calculatedusing backpropagation. Note that I have f原创 2016-09-25 22:27:52 · 471 阅读 · 0 评论 -
Facial_keypoints_deeplearning_cnn
import osimport numpy as npimport pandas as pdfrom sklearn.utils import shuffleFTRAIN='./input/training.csv'FTEST='./input/test.csv'df_train=pd.read_csv(FTRAIN)df_train['Image']=df_train['Image'].原创 2016-10-12 14:04:21 · 974 阅读 · 0 评论 -
theano_scan_demo_compute_Jacobian_matrix
import numpy as npimport theano.tensor as Timport theanofloatX='float32'V=T.vector('V')A=T.matrix('A')y=T.tanh(T.dot(V,A))results,updates=theano.scan(lambda i:T.grad(y[i],V),sequences=[T.arange(y.原创 2016-10-06 20:15:26 · 328 阅读 · 0 评论 -
Principal Component Analysis
pca的目标是通过基变换 使得元素的方差从大到小排列 并且元素之间的协方差为0(线性无关)。而元素的协方差 矩阵的对角线上是元素的方差 ,对角线外的元素是协方差。 我们的目标就是 找到单位正交基,使得通过基变换后 ,元素的方差从大到小排列 而不同元素的协方差为0 。元素的协方差矩阵 和经过基变换后的协方差矩阵关系如下 原数据为X X的协方差矩阵为C Y=PX P为变换矩阵 Y为变换后的元原创 2017-02-08 20:41:42 · 410 阅读 · 0 评论 -
Local Response Normalization (LRN)
This concept was raised in AlexNet, click here to learn more. Local response normalization algorithm was inspired by the real neurons, as the author said, “bears some resemblance to the local contrast转载 2017-03-27 14:20:26 · 1236 阅读 · 0 评论 -
word2vec
part1 The Model The skip-gram neural network model is actually surprisingly simple in its most basic form; I think it’s the all the little tweaks and enhancements that start to clutter the explanatio转载 2017-03-27 14:22:31 · 772 阅读 · 0 评论 -
特征离散化,特征交叉,连续特征离散化
一.互联网广告特征工程博文《互联网广告综述之点击率系统》论述了互联网广告的点击率系统,可以看到,其中的logistic regression模型是比较简单而且实用的,其训练方法虽然有多种,但目标是一致的,训练结果对效果的影响是比较大,但是训练方法本身,对效果的影响却不是决定性的,因为训练的是每个特征的权重,权重细微的差别不会引起ctr的巨大变化。在训练方法确定后,对ctr预估起到决定性作用的是选原创 2017-04-28 09:55:15 · 888 阅读 · 0 评论 -
DigitRecongnizer_CNN_DeepLearning
import numpy as npimport pandas as pd%matplotlib inlineimport matplotlib.pyplot as pltimport matplotlib.cm as cmfrom lasagne.layers import Conv2DLayerfrom lasagne.layers import MaxPool2DLayerfr原创 2016-10-10 22:16:05 · 600 阅读 · 0 评论 -
backpropagation
1.关于梯度简单的理解f(x,y)=xy 可以很容易得到f(x,y)关于x 和y的偏导数 函数关于每个变量的偏导数告诉了你整个函数对于单个变量的敏感程度。2.链式法则f(x,y,z)=(x+y)z 可以把上面的公式分解成为 q=x+y 和 f=qz 对y求偏导也同样如此 # set some inputs x = -2; y = 5; z = -原创 2016-09-08 10:33:36 · 457 阅读 · 0 评论 -
digit_recongnition
# Standard scientific Python imports%matplotlib inlineimport matplotlib.pyplot as plt# Import datasets, classifiers and performance metricsfrom sklearn import datasets, svm, metrics# The digits data原创 2016-09-01 16:23:15 · 786 阅读 · 0 评论 -
titanic prediction
# Imports# pandasimport pandas as pdfrom pandas import Series,DataFrame# numpy, matplotlib, seabornimport numpy as npimport matplotlib.pyplot as pltimport seaborn as snssns.set_style('whitegrid')原创 2016-08-25 21:44:59 · 1532 阅读 · 0 评论 -
softmax_linear_classifier
import numpy as np%matplotlib inlineimport matplotlib.pyplot as pltN = 100 # number of points per classD = 2 # dimensionalityK = 3 # number of classesX = np.zeros((N*K,D)) # data matrix (each row原创 2016-09-18 22:43:34 · 1693 阅读 · 0 评论 -
neural_network
import numpy as np%matplotlib inlineimport matplotlib.pyplot as pltN = 100 # number of points per classD = 2 # dimensionalityK = 3 # number of classesX = np.zeros((N*K,D)) # data matrix (each row原创 2016-09-18 22:44:18 · 522 阅读 · 0 评论 -
iris_visualization
import pandas as pdimport warnings #ignore the warnings that generated by seaborn warnings.filterwarnings('ignore')import seaborn as sns%matplotlib inline import matplotlib.pyplot as pltsns.set(s原创 2016-08-26 19:36:37 · 1038 阅读 · 0 评论 -
DigitRecognizer
from sklearn.ensemble import RandomForestClassifierimport numpy as npimport pandas as pddataset=pd.read_csv('input/train.csv')test=pd.read_csv('input/test.csv')dataset.describe()原创 2016-09-06 19:01:53 · 628 阅读 · 0 评论 -
deeplearning_cnn_theano
#### Libraries# Standard libraryimport gzipimport pickle# Third-party librariesimport numpy as npimport theanoimport theano.tensor as Tfrom theano.tensor.nnet import convfrom theano.tensor.nne原创 2016-10-08 20:07:25 · 851 阅读 · 1 评论 -
logistic_regression
import numpyimport theanoimport theano.tensor as Trng = numpy.randomN = 400 # training sample sizefeats = 784 # number of input varia原创 2016-09-07 11:33:04 · 1107 阅读 · 0 评论 -
hadoop下实现kmeans算法——一个mapreduce的实现方法
写mapreduce程序实现kmeans算法,我们的思路可能是这样的1. 用一个全局变量存放上一次迭代后的质心2. map里,计算每个质心与样本之间的距离,得到与样本距离最短的质心,以这个质心作为key,样本作为value,输出3. reduce里,输入的key是质心,value是其他的样本,这时重新计算聚类中心,将聚类中心put到一个全部变量t中。4. 在main里比较前一次的质心和本次的质心是否转载 2017-04-24 10:36:29 · 925 阅读 · 0 评论