https://pypi.org/project/cuml/
sudo apt install libopenblas-base libomp-dev
# cuda 9.2
pip install cuml-cuda92# cuda 10.0
pip install cuml-cuda100import cudf from cuml import DBSCAN # Create and populate a GPU DataFrame gdf_float = cudf.DataFrame() gdf_float['0'] = [1.0, 2.0, 5.0] gdf_float['1'] = [4.0, 2.0, 1.0] gdf_float['2'] = [4.0, 2.0, 1.0] # Setup and fit clusters dbscan_float = DBSCAN(eps=1.0, min_samples=1) dbscan_float.fit(gdf_float) print(dbscan_float.labels_)
https://pypi.org/project/ImageAlgoKD/
# if want to use opencl backend
pip install pyopencl
# if want to use CUDA backend
pip install pycudapip install ImageAlgoKD
from ImageAlgoKD import * #Declare an instance of ImageAlgoKD with your algorithm parameters. Then give it the input data points. ia = ImageAlgoKD(MAXDISTANCE=20, KERNEL_R=1.0) ia.setInputsPoints(Points(np.genfromtxt("../data/basic.csv",delimiter=','))) #Then run the clustering over input data points. ia.run("numpy") # ia.run("opencl") or ia.run("cuda") if want run in parallel #In the end, the clustering result can be access by ia.points.clusterID
https://github.com/a0165897/dbscan-cuda
https://github.com/Labmem009/DBSCAN_CUDA
Design and optimization of DBSCAN Algorithm based on CUDA
https://github.com/Maghoumi/cudbscan
http://m.kokojia.com/article/39577.html
GPU 上带 Rapids 的 DBSCAN
现在,让我们用 Rapids 进行加速!
首先,我们将把数据转换为 pandas.DataFrame 并使用它创建一个 cudf.DataFrame。pandas.DataFrame 无缝转换成 cudf.DataFrame,数据格式无任何更改。
- import pandas as pd
- import cudf
- X_df = pd.DataFrame({'fea%d'%i: X[:, i] for i in range(X.shape[1])})
- X_gpu = cudf.DataFrame.from_pandas(X_df)
然后我们将从 cuML 导入并初始化一个特殊版本的 DBSCAN,它是 GPU 加速的版本。DBSCAN 的 cuML 版本的函数格式与 Scikit-Learn 的函数格式完全相同:相同的参数、相同的样式、相同的函数。
- from cuml import DBSCAN as cumlDBSCAN
- db_gpu = cumlDBSCAN(eps=0.6, min_samples=2)
最后,我们可以在测量运行时间的同时运行 GPU DBSCAN 的预测函数。
- %%time
- y_db_gpu = db_gpu.fit_predict(X_gpu)
GPU 版本的运行时间为 4.22 秒,几乎加速了 2 倍。由于我们使用的是相同的算法,因此结果图也与 CPU 版本完全相同。
https://github.com/SubjectNoi/DBSCAN_CUDA
https://github.com/recke-a/immunodbscan 一个基于CUDA的程序,使用DBSCAN可以快速克隆B和T细胞免疫库
https://github.com/ghoulsblade/CudaDBClustering/tree/master/src
https://github.com/h2oai/h2o4gpu/issues/239
1、CUDA-DClust: http://www.dbs.ifi.lmu.de/Publikationen/Boehm/CIKM_09.pdf. It uses a collision matrix approach to name clusters. also has an index data structure on GPU that helps reduce computational complexity of eps-neighborhood detection.
2、G-DBSCAN: https://pdfs.semanticscholar.org/31df/abb8d1085ac468b60a83d32af2a558407c95.pdf. Simpler implementation. Generates a proximity graph out of the eps-neighborhood info and then performs BFS traversal to name the clusters. Thus exposing more parallelism than the previous approach.
https://cameleonx.com/blog/portfolio-items/g-dbscan-of-superpixels/
https://cameleonx.com/blog/portfolio-items
https://github.com/ca1773130n/SLIC-DBSCAN-CUDA
https://github.com/ca1773130n/SLIC-DBSCAN-CUDA/tree/9840066a02cc6403bd234f390946bb3c79e71166
#define CAM_WIDTH 480 #define CAM_HEIGHT 360 #define NUM_SPIXELS 1200
https://pypi.org/project/cuml/
http://github.strcpy.cn/rapidsai/cuml/issues/1238
https://towardsdatascience.com/heres-how-you-can-accelerate-your-data-science-on-gpu-4ecf99db3430
https://www.chainnews.com/articles/904054826642.htm
https://rapids.ai/community.html
https://devblogs.nvidia.com/gpu-accelerated-analytics-rapids/
https://ibmsoe.github.io/snap-ml-doc/v1.5.0/
https://ibmsoe.github.io/snap-ml-doc/dbscandoc.html
(K-DBSCAN) (V-DBSCAN)
https://github.com/plasavall/kdbscan_vdbscan
https://github.com/pmarcol/dbscan_tf
https://github.com/jklen/PytExamples
https://stackoverflow.com/questions/49934606/how-to-implement-dbscan-clustering-in-tensorflow
https://github.com/karthikv2k/gpu_dbscan
https://github.com/kwende/DBScanGPU
https://github.com/magicfisk/gpu-dbscan-toy
https://github.com/jrciii/cuda-gpu-dbscan