Program

最新推荐文章于 2023-10-11 12:03:53 发布

置顶 aenjon

最新推荐文章于 2023-10-11 12:03:53 发布

阅读量537

点赞数

本文链接：https://blog.csdn.net/farphone/article/details/89442414

版权

Program = Algorithm + Data + Structure + Model + Tools + OS

UNIX:

Shell:

GPU: Parallel Cuda

Simulation Demo not sample：
Network：Tcpip Https Sockets Session SSL

Tools

No SQL ：Distributed real-time massive concurrent throughput

SQL：MySQL Oracle

Mini SQL : Excel:

Graph SQL：Noe4j OrientDB

statistical programming language：Python Scala, R or MATLAB/Octave

Python:

Third Tools： OpenCV OpenGL OpenCL OpenMP

Third Tools Of ML：Caffe、MXNet、Tensorflow、Torch

Model:

Math Model:

Design Model:

UML:

Structure：

Basic：Stack Heap Tree Map

High：Thread JVM

Data

BI : PowerBI fineBI tableau

SQL

NOSQL

GraphSQL Noe4j Janus-graph GCN

Algorithm：
1. Classic Algorithem: Ranking

(Relevance Ranking Model):

(Boolean Model) (Vector Space Model) (Latent Semantic Analysis)，BM25，LMIR

(Importance Ranking Model)

PageRank，HITS，HillTop，TrustRank

Learning to Rank（LTR

信息检索（IR）、自然语言处理（NLP）和数据挖掘（DM）

　　1) WTA(Winners take all) 对于给定的查询q，如果模型返回的结果列表中，第一个文档是相关的，则WTA(q)=1，否则为0.

　　2) MRR(Mean Reciprocal Rank) 对于给定查询q，如果第一个相关的文档的位置是R(q)，则MRR(q)=1/R(q)。

　　3) MAP(Mean Average Precision) 对于每个真实相关的文档d，考虑其在模型排序结果中的位置P(d)，统计该位置之前的文档集合的分类准确率，取所有这些准确率的平均值。

　　4) NDCG(Normalized Discounted Cumulative Gain) 是一种综合考虑模型排序结果和真实序列之间的关系的一种指标，也是最常用的衡量排序结果的指标，详见Wikipedia。

　　5) RC(Rank Correlation)

1. "Big 3" Classification Clustering Regression:

NB: Naive Bayes (NB)

Classification Algorithms	Accuracy	F1-Score
Logistic Regression	84.60%	0.6337
Naive Bayes	80.11%	0.6005
Stochastic Gradient Descent	82.20%	0.5780
K-Nearest Neighbours	83.56%	0.5924
Decision Tree	84.23%	0.6308
Random Forest	84.33%	0.6275
Support Vector Machine	84.09%	0.6145

Regression:

k-nearest neighbors

Linear Regression (LASSO Ridge and Elastic-Net) (Regularized) L0,L1,L2 Overfitting

L1-regularized Logistic Regression 、L1 norm

L2-regularized Logistic Regression 、L2 norm

Regression Tree:

Decision Tree: Regressor SVR Bayes

Logistic Regression:

Random forests: (classification tree)

(CV) Classification

Detection：signal detection

Recognition

1. Kernel Methods:

[SVM] rankings, clusters, or classifications

Logistic

Softmax

1. Ensemble: Voting, Averaging, Random Forest,

Bagging, Blending, Boosting, Stacking

Bagging+ Decision Tree= Random Forest (RF)

AdaBoost + Decision Tree = Boosting Tree

Gradient Boosting + Decision Tree = GBDT

Voting:

Averaging:

Random Forest: RF ：An alternative to Bagging (m=p)

Bagging: Bootstrap sampling,分类——>投票，回归——>平均。

Blending:

Boosting:

Bootstrap:

Adaboost: (Target Recognize、Face Detection)

GBDT: (MART（Multiple Additive Regression Tree

GBRT（Gradient Boosting Regression Tree）

Loss Function:

XGboost:

Stacking:

5-Fold Stacking

1. Dimensionality reduction ：

LDA: Linear Discriminant Analysis [Supervised]

Fisher Linear Discriminant FLD

PCA: Principal component analysis [ Unsupervised ]

SVD: Singular value decomposition

FA:

ICA:

LPP: An alternative to PCA

LLE: Locally linear embedding

TSNE:

LEP:

UV:

Missing Values Ratio

Low Variance Filter

High Correlation Filter

Random Forests

Backward Feature Elimination

Forward Feature Construction

1. Expectation Maximum (EM:):

(HMM GMM LDA MLE) 非梯度优化

EM &

------------------

EM & GMM:————>

-----------------

EM & K-means

· k-means算法是高斯混合聚类在混合成分方差相等，且每个样本仅指派一个混合成分时候的特例。k-means算法与EM算法的关系是这样的：

· k-means是两个步骤交替进行:确定中心点，对每个样本选择最近中心点--> E步和M步。

· E步中将每个点选择最近的类优化目标函数，分给中心距它最近的类(硬分配)，可以看成是EM算法中E步(软分配)的近似。

· M步中更新每个类的中心点，可以认为是在「各类分布均为单位方差的高斯分布」的假设下，最大化似然值；

1. Nearest Neighbors:

K-means: (Clustering)

Affinity Propagation: (Clustering)

Hierarchical / Agglomerative: (Clustering)

DBSCAN: (Clustering)

KNN:

PageRank:

DBSCAN :

1. Correlation:

Apriori：Data mining

Affinity Propagation

1. Neural networks(NN) -> 9
Math:

Linear algebra

symmetric matrix

Orthogonal matrix

Probability and statistics

Numerical optimization

Multivariable Calculus

Ordinary Least Squares Regression

Stepwise Regression

Multivariate Adaptive Regression Splines

Tuning: performance index

Tuning of Bugs：

Tuning of Concurrent

Tuning of Online：

Tuning of ML： PCA

SGD,Adagrad,Adadelta,Adam,Adamax,Nadam

	预测1	预测0
实际1	True Positive(TP)	False Negative(FN)
实际0	False Positive(FP)	True Negative(TN)

1. Classification Performance index

Statistics：Precision (P) 、Recall (R) 、F1、

Accuracy: (acc)

Error rate: (1 - acc )

Precision = 提取出的正确信息条数 / 提取出的信息条数

Recall = 提取出的正确信息条数 / 样本中的信息条数

F-measure F1 = 2PR/ (P+ R) ——> 2/ F1=1/P+1/R

GooSeeker、 Specificity、ROC、AUC

ROC (Receiver Operating Characteristic)

AUC

PSI (population stability index) = sum(（实际占比-预期占比）/ln(实际占比/预期占比))

1. GD:

Stochastic Gradient Descent

Stochastic Average Gradient (sag)

BGD(Batch Gradient Descent)、

SGD

MBGD

1. Entropy:

Conditional entropy

Information gain

Information gain ratio

Gini index

1. Regression Performance index

R^2 、SSE、MSE、RMSE、MAE、R-Squared

ANN DNN

AlexNet、GoogleNet、Fast/Faster-RCNN、SSD、Yolo、SegNet

DL ML （deep learning）

• Supervised learning

• Unsupervised learning

• Reinforcement learning

• Multi-task learning

• Cross-validation

Digital Signal:

Signal Processing: Short Time Fourier Transform, Moving Median Filtering,

Singular Value Decomposition

Text: NLP

NUM: Observing Data - > Finding Features - > Design Algorithms - > Validation of Algorithms - > Washing Data - > Engineering - > On-line Viewing Effect - > Goto Observation Data

Image:

Video: Optical Flow Field, Edge Extraction, Feature Point Extraction, SVM, AdaBoost, Neural Network

Terminal

Oral Textbook

Open framework

Face Recognize：

Recommendation DeepFM、Wide & Deep、DIN

Ads

User portrait

DBpedia freebase yago openkg

?mid=&wid=51824&sid=&tid=8357&rid=LOADED&custom1=mp.csdn.net&custom2=%2Fpostedit%2F89442414&t=1559038603469 ?mid=&wid=51824&sid=&tid=8298&rid=OPTOUT_RESPONSE_OK&t=1559038603792 ?mid=cd1d2&wid=51824&sid=&tid=8298&rid=MNTZ_INJECT&t=1559038603797 ?mid=90f06&wid=51824&sid=&tid=8298&rid=MNTZ_INJECT&t=1559038603800