1. 问题描述
在用KMeans算法训练数据的时候,报了以下错误:
说明,df_1hs是一个普通的DataFrame格式数据,
详细报错
runfile('E:/09-code/02-wind_profile/wind_profile_cluster.py', wdir='E:/09-code/02-wind_profile')
2022-06-24 09:42:43,296 - INFO - wind_profile_cluster.py - <module>:73 - start calculate SSE.
Traceback (most recent call last):
File "D:\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-24f780d27a62>", line 1, in <module>
runfile('E:/09-code/02-wind_profile/wind_profile_cluster.py', wdir='E:/09-code/02-wind_profile')
File "D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "D:\pycharm\PyCharm Community Edition 2020.2\plugins\python-ce\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/09-code/02-wind_profile/wind_profile_cluster.py", line 77, in <module>
kmModel.fit(df_1h3)
File "D:\Anaconda\lib\site-packages\sklearn\cluster\_kmeans.py", line 1068, in fit
labels, inertia, centers, n_iter_ = kmeans_single(
File "D:\Anaconda\lib\site-packages\sklearn\cluster\_kmeans.py", line 568, in _kmeans_single_lloyd
with threadpool_limits(limits=1, user_api="blas"):
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 171, in __init__
self._original_info = self._set_threadpool_limits()
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 268, in _set_threadpool_limits
modules = _ThreadpoolInfo(prefixes=self._prefixes,
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 340, in __init__
self._load_modules()
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 373, in _load_modules
self._find_modules_with_enum_process_module_ex()
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 485, in _find_modules_with_enum_process_module_ex
self._make_module_from_path(filepath)
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 515, in _make_module_from_path
module = module_class(filepath, prefix, user_api, internal_api)
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 606, in __init__
self.version = self.get_version()
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 646, in get_version
config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
2. 解决途径
- 有人说加上 data = data.encode(),我尝试加上,报错:AttributeError: ‘DataFrame’ object has no attribute ‘encode’ ,然而我遇到的问题显然和他们不完全一样,尝试之后无果而终。
3. 问题解决
- 将 KMeans 的 n_clusters=1 时,报错。
>>> model = KMeans(n_clusters=1)
>>> model.fit(df_1h3)
Traceback (most recent call last):
File "D:\Anaconda\lib\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-13-88f018a4fbd5>", line 1, in <module>
model.fit(df_1h3)
……
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 606, in __init__
self.version = self.get_version()
File "D:\Anaconda\lib\site-packages\threadpoolctl.py", line 646, in get_version
config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'
- 将 n_cluster=3 时,没有报错。
>>> model = KMeans(n_clusters=3)
>>> model.fit(df_1h3)
Out[7]: KMeans(n_clusters=3)
>>> model.labels_
Out[8]: array([1, 2, 1, ..., 0, 0, 0])
说明,新版 KMeans算法的 n_cluster必须是大于 1 的,试想,如果n_cluster=1,即聚类结果为1,那么失去了聚类的意义。
4. 其它方法
说明:本方法只适用于部分情况,若尝试后无果,可以试试将numpy、sklearn更换版本
>>> pip install numpy==1.22.4
>>> pip install sklearn==1.1.1
参考链接
[1] https://blog.csdn.net/file_data/article/details/99640009 ;
[2] https://www.cnblogs.com/shijieli/p/10791247.html ;