1. 关于聚类中距离计算的问题
Q:
Hi...
if some of my variables are catogoricals...some are numeric............
to do cluster analysis, I should use Gower's distance ........am i right? Is there other options???
If i use Weka Explorer, Can i choose the Gower's distance??? How?? I couldn't find Gower's distance on the menu or may be i don't look in the right place..
Thank you so much for all your help
Best.
S.T.
Reply:
if some of my variables are catogoricals...some are numeric............
> to do cluster analysis, I should use Gower's distance ........am i right?
Why not use euclidean distance? Works as well.
> Is there other options???
Currently available distance functions in the developer version:
Chebyshev, Edit, Euclidean, Manhattan, Minkowski
> If i use Weka Explorer, Can i choose the Gower's distance???
Gower's distance is not part of Weka, but feel free to implement it
and contribute it.
Cheers, Peter
2. 关于在大数据量下聚类将出现局部最优化问题的情况
Q:
Folks
While running the K-Means (SimpleKMeans implementation) for large subset, i
found that the initial condition of "random" centroid chosen from the space
is very vital. Sometimes (20% of times) things get stuck in local optima.
What are the ways to make K-Means come out of this local optima? Some of my
thoughts were
- Run GA based K-Means and allow reproductions and crossovers to get
variety and search global optima
- Use some meta heuristics
If someone has used it and are aware of WEKA tools to do this, let me know
Cheers
Uday
R:
Ø Can someone reply to this, please?
People will reply if they have an answer to a question. You can't
expect more from a mailing list run by volunteers and users.
> Because my da
> Categorical and Continuous, the centroids 20% of times are stuck in local
> optima, is there some way around this? Has anyone seen this or solved it?
There is no "meta-framework" available to deal with such problems.
Apart from varying the seed value for randomization there is nothing
you can do.
You can always implement your own meta-clusterer that determines via
some magic statistic whether the base-clusterer is stuck in a local
minimum and do something about. But it might involve some more work
than just that...
Cheers, Peter
3.关于 Prefix Based Tree的问题
> I'm searching for a Prefix Based Tree. I'm using an implementation of a
> CPT (Compact Patricia Tree) of my university, but it's very greedy for
> memory.
>
> I could not found a prefix based tree in the weka-wiki, so just to be
> shure: Am I blind or isn't on
Not sure what you want to use the tree for, but maybe the
weka.core.Trie class (for strings) would be helpful?
Cheers, Peter
4.关于文本分类和多标签分类的问题
> Thanks for the information. But, I have a simple question. Is the Weka
> ensemble or boosting algorithm support the following things automatically:
> (1) Multi-class with multiple labels
Just to clarify: Weka allows you on
attribute. See also FAQ "Does WEKA support multi-label
classification?".
If you develop an ensemble classifier, then it depends a lot on the
base classifier(s) what da
The MultipleClassifiersCombiner superclass, for instance, returns as
capabilities on
share (see "getCapabilities()" method).
屈伟是做多标签的,有问题可以向他请教。至于在weka里面如何处理,可以参考FAQ中的解答。FAQ的地址在我的收藏链接里面有。
> (2) feature vectors to represent documents
See FAQ "How do I perform text classification?".
Link to the FAQs available from the Weka homepage.
关于在weka中进行文本分类工作的解释。以前我一直使用WVTool配合weka来做文本分类,现在才发现weka已经支持了文本分类的东西。有兴趣的可以仔细看看。因为weka对中文支持的不好,而且它的分词工具也很简单,所以还是有很多工作需要自己做的。
Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174
文中所涉及的信件内容均属于发件人所有,在此仅为转载,版权为发件人所有。