Weka中常见问题解答列表

最新推荐文章于 2022-04-22 20:59:04 发布

linglingbaby

最新推荐文章于 2022-04-22 20:59:04 发布

阅读量1.3k

点赞数

分类专栏：数据挖掘文章标签： distance variables tree algorithm random 工作

数据挖掘专栏收录该内容

24 篇文章 0 订阅

订阅专栏

1. 关于聚类中距离计算的问题

Hi...

if some of my variables are catogoricals...some are numeric............
to do cluster analysis, I should use Gower's distance ........am i right? Is there other options???

If i use Weka Explorer, Can i choose the Gower's distance??? How?? I couldn't find Gower's distance on the menu or may be i don't look in the right place..

Thank you so much for all your help

Best.
S.T.

Reply:

if some of my variables are catogoricals...some are numeric............
> to do cluster analysis, I should use Gower's distance ........am i right?

Why not use euclidean distance? Works as well.

> Is there other options???

Currently available distance functions in the developer version:
Chebyshev, Edit, Euclidean, Manhattan, Minkowski

> If i use Weka Explorer, Can i choose the Gower's distance???

Gower's distance is not part of Weka, but feel free to implement it
and contribute it.

Cheers, Peter

2. 关于在大数据量下聚类将出现局部最优化问题的情况

Folks
While running the K-Means (SimpleKMeans implementation) for large subset, i
found that the initial condition of "random" centroid chosen from the space
is very vital. Sometimes (20% of times) things get stuck in local optima.
What are the ways to make K-Means come out of this local optima? Some of my
thoughts were

- Run GA based K-Means and allow reproductions and crossovers to get
variety and search global optima
- Use some meta heuristics

If someone has used it and are aware of WEKA tools to do this, let me know
Cheers
Uday

Ø Can someone reply to this, please?

People will reply if they have an answer to a question. You can't
expect more from a mailing list run by volunteers and users.

> Because my data has combination of
> Categorical and Continuous, the centroids 20% of times are stuck in local
> optima, is there some way around this? Has anyone seen this or solved it?

There is no "meta-framework" available to deal with such problems.
Apart from varying the seed value for randomization there is nothing
you can do.

You can always implement your own meta-clusterer that determines via
some magic statistic whether the base-clusterer is stuck in a local
minimum and do something about. But it might involve some more work
than just that...

Cheers, Peter

3.关于 Prefix Based Tree的问题

> I'm searching for a Prefix Based Tree. I'm using an implementation of a
> CPT (Compact Patricia Tree) of my university, but it's very greedy for
> memory.
>
> I could not found a prefix based tree in the weka-wiki, so just to be
> shure: Am I blind or isn't one implemented in the weka-package?

Not sure what you want to use the tree for, but maybe the
weka.core.Trie class (for strings) would be helpful?

Cheers, Peter

4.关于文本分类和多标签分类的问题

> Thanks for the information. But, I have a simple question. Is the Weka
> ensemble or boosting algorithm support the following things automatically:
> (1) Multi-class with multiple labels

Just to clarify: Weka allows you only to have a single class
attribute. See also FAQ "Does WEKA support multi-label
classification?".

If you develop an ensemble classifier, then it depends a lot on the
base classifier(s) what data can be processed (= their capabilities).
The MultipleClassifiersCombiner superclass, for instance, returns as
capabilities only the capabilities that *all* of the base classifiers
share (see "getCapabilities()" method).

屈伟是做多标签的，有问题可以向他请教。至于在weka里面如何处理，可以参考FAQ中的解答。FAQ的地址在我的收藏链接里面有。

> (2) feature vectors to represent documents

See FAQ "How do I perform text classification?".

Link to the FAQs available from the Weka homepage.

关于在weka中进行文本分类工作的解释。以前我一直使用WVTool配合weka来做文本分类，现在才发现weka已经支持了文本分类的东西。有兴趣的可以仔细看看。因为weka对中文支持的不好，而且它的分词工具也很简单，所以还是有很多工作需要自己做的。

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

文中所涉及的信件内容均属于发件人所有，在此仅为转载，版权为发件人所有。

linglingbaby

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Weka中常见问题解答列表

1.关于聚类中距离计算的问题Q:Hi...if some of my variables are catogoricals...some are numeric............to do cluster analysis, I should use Gower's distance ........am i right? Is there other options
复制链接

扫一扫

专栏目录