weixin_1822045735-CSDN博客

原创 Graph Neural Network(GNN)图神经网络

Graph Neural Network(GNN) 图神经网络

2023-03-01 20:46:03 704

原创卡尔曼滤波(Kalman Filter)及无迹卡尔曼滤波(Unscented KF)

卡尔曼滤波(Kalman Filter)及无迹卡尔曼滤波(Unscented KF)

2021-12-14 22:54:28 1663

原创 DBSCAN 具有噪声的基于密度的聚类方法

DBSCAN 将聚类视为　区别于低密度区域的　高密度区域。由于这种相当普遍的观点，DBSCAN可以找到任何形状的聚类，而不是像K－Ｍeans算法一样，它最小化聚类内的平方和，这对于凸的形状(convex shapes)很适用。再者，DBSCAN还可以自动判断出聚类的数量。DBSCAN核心成分是核心样本的概念。核心样本是指在eps的距离内　有至少minPts个其他数据点的　点。minPts: 对于聚类，形成一个密度区域的最少的数据点eps: 用于指定任何数据点的邻域距离的度量。如果两个数据点间的距离小

2021-10-31 16:50:01 797

原创随机采样一致性算法(Random Sample Consensus)

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. Therefore, it also can

2021-09-05 17:15:55 1767

原创基于网格的聚类 Grid-based Clustering

The grid-based technique is used for a multi-dimensional dataset. In this technique, we create a grid structure, and the comparison is performed on grids(also know as cells). The grid-based technique is fast and has low computational complexity.–wiki基于网格的

2021-08-03 23:08:56 2620 4

原创基于密度的聚类（Density-based clustering）-- 核密度估计（kernel density estimation）

In density-based clustering, clusters are defined as areas of higher density than the remainder of the data set. Objects in sparse areas - that are required to separate clusters-are usually considered to be noise and border points. --wiki在基于密度的聚类中，聚类定义为密度

2021-06-29 09:54:03 3494

原创基于分布的聚类(Distribution-based clustering) -- 高斯混合模型(GMM)

The clustering model most closely related to statistics is based on distribution models. Clusters can then easily be defined as objects belonging most likely to the same distribution. A convenient property of this approach is that this closely resembles th

2021-05-31 13:50:32 3008

原创基于质心的聚类(Centroid-based clustering)-- k均值（k-means）

基于质心的聚类中，该聚类可以使用聚类的中心向量来表示，这个中心向量不一定是该聚类下数据集的成员。当聚类的数量固定为k时，k-means聚类给出了优化问题的正式定义：找到聚类中心并将对象分配给最近的聚类中心，以使与聚类的平方距离最小化。该优化问题它本身是一个NP-hard（non-deterministic polynomial-time hardness）问题。 Lloyd’s algorithm 是一种近似方

2021-05-06 18:38:36 9594 1

原创聚类算法(Clustering Algorithms)之层次聚类(Hierarchical Clustering)

在之前的系列中，大部分都是关于监督学习（除了PCA那一节），接下来的几篇主要分享一下关于非监督学习中的聚类算法（clustering algorithms）。一、先了解一下聚类分析（clustering analysis）Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are mor

2021-04-11 12:22:35 5224

原创 Recurrent Neural Network 循环神经网络 & LSTM & GRU

2021-02-20 22:12:59 124

原创 Python 与MongoDB和Redis集群的读写

将数据读写到MongoDB MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Server Side Public L

2021-01-19 14:04:27 171

原创集成学习之Stacking

Stacking (sometimes called stacked generalization) involves training a learning algorithm to combine the predictions of several other learning algorithms. First, all of the other algorithms are trained using the available data, then a combiner algorithm i.

2020-12-29 22:00:28 793

原创顺序集成之Boosting--Gradient Boosting梯度提升

和上一期讲的AdaBoost一样，Gradient Boosting 也是通过按顺序添加predictors 到集成中来工作。但是，并不像AdaBoost那样，在每次iteration的时候调整样本的权重，Gradient Boosting这个方法是使用新的predictor去拟合旧的predictor产生的的残差。也就是说在残差的基础上进行拟合，拟合完成后剩下的残差又可以用新的predictor来拟合，步骤如下：第一步：使用DecisionTreeRegressor 来拟合训练集; 第二步：

2020-12-08 15:07:57 293

原创顺序集成Sequential ensemble (Boosting-Adaboost)自适应提升

顺序集成 Sequential ensemble是一种按顺序产生基础学习器的技术。例如：adaptive boosting（自适应提升）。这种按顺序产生基础学习器的方式提高了基础学习器之间的独立性。该技术结合了几个弱基础学习器，通过分配更多的权重到之前的错误的学习器，从这些错误中学习，以在将来做出更好的预测，通过这些训练形成一个强基础学习器，大大地提高了模型的可预测性，从而提高模型的表现能力。 Boosting有很多种形式，包括Gradient boosting, Adaptive Boosting

2020-11-16 13:53:51 373

原创平行集成 Parallel ensemble & bagging--Random Forest 随机森林

一、平行集成 Parallel ensemble learningBagging : bootstrap aggregating 的缩写（有放回采样）在从样本中抽样给classifier 或者predictor来训练集成模型里面的单个模型的时候，这里的采样方式是有放回的。如下图中，假设总样本为9个不同颜色的球，其中红色4个，绿色3个，蓝色两个。现在要从这个样本中抽取两个子集(subset)，分别命名为bag1和bag2，在一中抽取的为4个红球，两个蓝球，此时如果不将球放回的话，那么bag2中只能从剩下的

2020-10-22 13:07:52 237

原创集成学习 Ensemble learnings(平行 Parallel, 顺序 Sequential, 堆叠 Stacking)

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone–wiki在统计和机器学习中，集成方法使用多种学习算法来拥有比任何单个学习算法更好的预测性能。集成模型基本

2020-10-03 23:01:01 626

原创决策树分类（ID3；C4.5；CART）

一、基本流程决策树是一类常见的机器学习方法。以二分类为例，我们希望从给定训练数据集学得一个模型用以对新示例进行分类，这个把样本分类的任务，可看作对“当前样本属于这类吗？”这个问题的“决策”或“判定”过程。顾名思义，决策树是基于树结构来进行决策的，这恰是人类在面临决策问题时一种很自然的处理机制。例如，我们要对“这是好瓜吗？”这样的问题进行决策时，通常会进行一系列的判断或“子决策”：我们先看“它是什么颜色？”,如果是“青绿色”，则我们再看“它的根蒂是什么形态？”，如果是“蜷缩”，我们再判断“它敲起来是什么

2020-09-12 09:47:24 670

原创 PCA主成分分析提取主成分，过滤噪音

前一篇提到的人脸识别中，我们在使用SVM支持向量机做人脸分类之前使用到PCA提取人脸数据中的主要成分，降低计算的维度，那么具体PCA是如何提取的呢？下文了解一下。PCA is a method to project data in a higher dimensional space into lower dimensional space by maximizing the variance of each dimension --wikiPCA is mostly used as a tool i

2020-08-26 13:15:07 2841

原创支持向量机SVM Ⅱ 人脸识别（Face Recognition)

人脸识别的过程，其实就是一个人脸照片和姓名匹配的过程，也就是将一些混乱的图片给分类，贴上对应的姓名。模型越好，分类的效果最好，越熟悉的人，越容易在人群中一眼认出来，大概最高的熟悉程度是化成灰都认识吧。一、使用LFW数据库作为训练的数据集from sklearn.datasets import fetch_lfw_peoplefaces=fetch_lfw_people(min_faces_per_person=60)print(faces.target_names)print(faces.imag

2020-08-12 10:11:51 1196

原创支持向量机SVM(Support Vector Machine) Ⅰ 二分类高维投影

一、支持向量机（SVM: support vector machine）support vector machine (SVM): a support vector machine is supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training d

2020-07-27 19:19:56 1684

原创时间序列分析预测未来Ⅱ SARIMA 实操手册

上期讲了理论部分，这期结合代码看看如何做预测，并与预测结果进行比较。一、导入数据包，后台回复 wind 即可下载import matplotlib.pyplot as pltimport numpy as npfrom dateutil.relativedelta import relativedeltaimport datetimeimport timeimport pandas as pdimport statsmodels.api as smfrom statsmodels.tsa.

2020-07-08 22:16:55 3904 7

原创时间序列分析预测未来 Ⅰ

时间序列是指将某种现象某一个统计指标在不同时间上的各个数值，按时间先后顺序排列而形成的序列。A time series is a series of data point indexed(or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. – wiki时间序列分析：Time series an

2020-06-29 13:01:29 1115

原创云端部署代码及安装MySQL

这节谈谈如何把之前写的代码部署到云服务上。先简单介绍一下云计算：云计算（cloud computing）是分布式计算的一种，指的是通过网络“云”将巨大的数据计算处理程序分解成无数个小程序，然后，通过多部服务器组成的系统进行处理和分析这些小程序得到结果并返回给用户–百度百科。云部署的类型（通俗易懂）：公有云（Public Cloud）：暴露在公众范围，可以被任何人使用，通常用来给公众建立云服务，比如Amazon Web Services(A

2020-06-14 21:00:17 1471

原创 Few-shot Learning(小样本学习) 之Siamese Neural Network(孪生神经网络)

在往期的神经网络中，我们训练样本的时候需要成千上万的样本数据，在对这些数据进行收集和打标签的时候，往往需要付出比较多的代价。比如我们需要采集某个型号的设备开启时一段时间内的信号，那么我们需要对该种型号的设备，开启成千上万次，才能采集到那么多电信号用来训练，这无疑对我们的设备造成损害。因此，使用更少的样本学习到更多的特征，成为机器学习所追求的目标之一。常说的one-shot learning和few-shot learning，都是指的是通过一个及少量的样本习得模型，然后具有分类的能力。“one shot

2020-05-13 12:13:53 3747

原创一维卷积神经网络应用于电信号分类

维卷积神经网络，可以用来做一维的数据分析，以家用电器的识别分类作为背景。使用excel画出的简单的图形如下，横坐标为用电器开启后的秒数，纵坐标为某一秒的有功功率，由上至下分别为空调（Air Conditioner），冰箱（Refrigerator），烤炉（Stove）：! 从上面三个图可以看出不同的用电器在工作时会以自己特有的方式工作。从而形成不同的特征峰及平台。接下来使用到的数据一共...

2020-04-24 21:43:10 4985

weixin_44746091的博客