Machine Learning From Scratch
从零开始机器学习
关于
本项目是用Python实现的一些基本的机器学习模型和算法。
该项目的目的不是尽可能优化和计算高效的算法,而是以透明和可访问的方式呈现它们的内部工作。
目录
安装
$ git clone https://github.com/eriklindernoren/ML-From-Scratch
$ cd ML-From-Scratch
$ python setup.py install
例子
多项式回归
$ python mlfromscratch/examples/polynomial_regression.py
图:正则多项式回归模型拟合的训练进度
数据来自2016年瑞典林雪平测得的温度数据。
CNN分类
$ python mlfromscratch/examples/convolutional_neural_network.py
+---------+
| ConvNet |
+---------+
Input Shape: (1, 8, 8)
+----------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+----------------------+------------+--------------+
| Conv2D | 160 | (16, 8, 8) |
| Activation (ReLU) | 0 | (16, 8, 8) |
| Dropout | 0 | (16, 8, 8) |
| BatchNormalization | 2048 | (16, 8, 8) |
| Conv2D | 4640 | (32, 8, 8) |
| Activation (ReLU) | 0 | (32, 8, 8) |
| Dropout | 0 | (32, 8, 8) |
| BatchNormalization | 4096 | (32, 8, 8) |
| Flatten | 0 | (2048,) |
| Dense | 524544 | (256,) |
| Activation (ReLU) | 0 | (256,) |
| Dropout | 0 | (256,) |
| BatchNormalization | 512 | (256,) |
| Dense | 2570 | (10,) |
| Activation (Softmax) | 0 | (10,) |
+----------------------+------------+--------------+
Total Parameters: 538570
Training: 100% [------------------------------------------------------------------------] Time: 0:01:55
Accuracy: 0.987465181058
图:使用 CNN 对数字数据集进行分类。
基于密度的聚类
$ python mlfromscratch/examples/dbscan.py
图:使用DBSCAN的卫星数据集聚类。
生成手写数字
$ python mlfromscratch/unsupervised_learning/generative_adversarial_network.py
+-----------+
| Generator |
+-----------+
Input Shape: (100,)
+------------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense | 25856 | (256,) |
| Activation (LeakyReLU) | 0 | (256,) |
| BatchNormalization | 512 | (256,) |
| Dense | 131584 | (512,) |
| Activation (LeakyReLU) | 0 | (512,) |
| BatchNormalization | 1024 | (512,) |
| Dense | 525312 | (1024,) |
| Activation (LeakyReLU) | 0 | (1024,) |
| BatchNormalization | 2048 | (1024,) |
| Dense | 803600 | (784,) |
| Activation (TanH) | 0 | (784,) |
+------------------------+------------+--------------+
Total Parameters: 1489936
+---------------+
| Discriminator |
+---------------+
Input Shape: (784,)
+------------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+------------------------+------------+--------------+
| Dense | 401920 | (512,) |
| Activation (LeakyReLU) | 0 | (512,) |
| Dropout | 0 | (512,) |
| Dense | 131328 | (256,) |
| Activation (LeakyReLU) | 0 | (256,) |
| Dropout | 0 | (256,) |
| Dense | 514 | (2,) |
| Activation (Softmax) | 0 | (2,) |
+------------------------+------------+--------------+
Total Parameters: 533762
图:生成手写数字的生成性对抗性网络的训练进度。
深度强化学习
$ python mlfromscratch/examples/deep_q_network.py
+----------------+
| Deep Q-Network |
+----------------+
Input Shape: (4,)
+-------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+-------------------+------------+--------------+
| Dense | 320 | (64,) |
| Activation (ReLU) | 0 | (64,) |
| Dense | 130 | (2,) |
+-------------------+------------+--------------+
Total Parameters: 450
图:OpenAI健身房中对CartPole-v1环境的深度Q网络解决方案。
使用RBM进行图形重建
$ python mlfromscratch/examples/restricted_boltzmann_machine.py
图:显示了在训练过程中网络如何在重建MNIST数据集中的数字2时变得更好。
进化神经网络
$ python mlfromscratch/examples/neuroevolution.py
+---------------+
| Model Summary |
+---------------+
Input Shape: (64,)
+----------------------+------------+--------------+
| Layer Type | Parameters | Output Shape |
+----------------------+------------+--------------+
| Dense | 1040 | (16,) |
| Activation (ReLU) | 0 | (16,) |
| Dense | 170 | (10,) |
| Activation (Softmax) | 0 | (10,) |
+----------------------+------------+--------------+
Total Parameters: 1210
Population Size: 100
Generations: 3000
Mutation Rate: 0.01
[0 Best Individual - Fitness: 3.08301, Accuracy: 10.5%]
[1 Best Individual - Fitness: 3.08746, Accuracy: 12.0%]
...
[2999 Best Individual - Fitness: 94.08513, Accuracy: 98.5%]
Test set accuracy: 96.7%
图:通过进化神经网络对数字数据集进行分类。
遗传算法
$ python mlfromscratch/examples/genetic_algorithm.py
+--------+
| GA |
+--------+
Description: Implementation of a Genetic Algorithm which aims to produce
the user specified target string. This implementation calculates each
candidate's fitness based on the alphabetical distance between the candidate
and the target. A candidate is selected as a parent with probabilities proportional
to the candidate's fitness. Reproduction is implemented as a single-point
crossover between pairs of parents. Mutation is done by randomly assigning
new characters with uniform probability.
Parameters
----------
Target String: 'Genetic Algorithm'
Population Size: 100
Mutation Rate: 0.05
[0 Closest Candidate: 'CJqlJguPlqzvpoJmb', Fitness: 0.00]
[1 Closest Candidate: 'MCxZxdr nlfiwwGEk', Fitness: 0.01]
[2 Closest Candidate: 'MCxZxdm nlfiwwGcx', Fitness: 0.01]
[3 Closest Candidate: 'SmdsAklMHn kBIwKn', Fitness: 0.01]
[4 Closest Candidate: ' lotneaJOasWfu Z', Fitness: 0.01]
...
[292 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[293 Closest Candidate: 'GeneticaAlgorithm', Fitness: 1.00]
[294 Answer: 'Genetic Algorithm']
关联分析
$ python mlfromscratch/examples/apriori.py
+-------------+
| Apriori |
+-------------+
Minimum Support: 0.25
Minimum Confidence: 0.8
Transactions:
[1, 2, 3, 4]
[1, 2, 4]
[1, 2]
[2, 3, 4]
[2, 3]
[3, 4]
[2, 4]
Frequent Itemsets:
[1, 2, 3, 4, [1, 2], [1, 4], [2, 3], [2, 4], [3, 4], [1, 2, 4], [2, 3, 4]]
Rules:
1 -> 2 (support: 0.43, confidence: 1.0)
4 -> 2 (support: 0.57, confidence: 0.8)
[1, 4] -> 2 (support: 0.29, confidence: 1.0)
实现
监督学习
- Adaboost算法
- Bayesian Regression(贝叶斯回归算法)
- Decision Tree(决策树)
- Elastic Net(弹性网络)
- Gradient Boosting(梯度提示)
- K Nearest Neighbors(K近邻算法,KNN)
- Lasso Regression(套索回归)
- Linear Discriminant Analysis(线性判别分析)
- Linear Regression(线性回归)
- Logistic Regression(逻辑回归)
- Multi-class Linear Discriminant Analysis(多类线性回归)
- Multilayer Perceptron(多层感知器)
- Naive Bayes(朴素贝叶斯算法)
- Neuroevolution(神经进化)
- Particle Swarm Optimization of Neural Network(神经网络的粒子群算法)
- Perceptron(感知器)
- Polynomial Regression(多项式回归)
- Random Forest(随机森林)
- Ridge Regression(岭回归算法)
- Support Vector Machine(支持向量机)
- XGBoost
无监督学习
- Apriori
- Autoencoder
- DBSCAN
- FP-Growth
- Gaussian Mixture Model
- Generative Adversarial Network
- Genetic Algorithm
- K-Means
- Partitioning Around Medoids
- Principal Component Analysis
- Restricted Boltzmann Machine
强化学习
深度学习
- 神经网络
- 层
- 激活层
- 平均池化层
- 批次规一化层
- 恒定填充层
- 卷积层
- 辍学层
- 扁平层
- 全连接(密集)层
- 全连接的 RNN 层
- 最大池化层
- 重塑图层
- 上采样层
- 零填充层
- 模型类型
- Convolutional Neural Network(卷积神经网络,CNN)
- Multilayer Perceptron(多层感知器)
- Recurrent Neural Network(递归神经网络,RNN)
联系
如果您希望在此处看到一些实现,或者您只是想社交,可以随时给我发送电子邮件或在LinkedIn上与我联系。
看到这里了,那么Machine Learning From Scratch,和Scratch(儿童编程)到底有啥关系呢?
答案是没有关系... ...
from scratch是从零开始的意思... ...