Weka中常见问题解答列表

 

1. 关于聚类中距离计算的问题

Q:

Hi...

if some of my variables are catogoricals...some are numeric............
to do cluster analysis, I should use Gower's distance ........am i right? Is there other options???

If i use Weka Explorer, Can i choose the Gower's distance??? How?? I couldn't find Gower's distance on the menu or may be i don't look in the right place..

Thank you so much for all your help

Best.
S.T.

Reply:

if some of my variables are catogoricals...some are numeric............
> to do cluster analysis, I should use Gower's distance ........am i right?

Why not use euclidean distance? Works as well.

> Is there other options???

Currently available distance functions in the developer version:
Chebyshev, Edit, Euclidean, Manhattan, Minkowski


> If i use Weka Explorer, Can i choose the Gower's distance???

Gower's distance is not part of Weka, but feel free to implement it
and contribute it.

Cheers, Peter

2. 关于在大数据量下聚类将出现局部最优化问题的情况

Q:

Folks
While running the K-Means (SimpleKMeans implementation) for large subset, i
found that the initial condition of "random" centroid chosen from the space
is very vital. Sometimes (20% of times) things get stuck in local optima.
What are the ways to make K-Means come out of this local optima? Some of my
thoughts were

- Run GA based K-Means and allow reproductions and crossovers to get
variety and search global optima
- Use some meta heuristics

If someone has used it and are aware of WEKA tools to do this, let me know
Cheers
Uday

R:

Ø Can someone reply to this, please?

People will reply if they have an answer to a question. You can't
expect more from a mailing list run by volunteers and users.

> Because my data has combination of
> Categorical and Continuous, the centroids 20% of times are stuck in local
> optima, is there some way around this? Has anyone seen this or solved it?

There is no "meta-framework" available to deal with such problems.
Apart from varying the seed value for randomization there is nothing
you can do.

You can always implement your own meta-clusterer that determines via
some magic statistic whether the base-clusterer is stuck in a local
minimum and do something about. But it might involve some more work
than just that...

Cheers, Peter

3.关于 Prefix Based Tree的问题

> I'm searching for a Prefix Based Tree. I'm using an implementation of a
> CPT (Compact Patricia Tree) of my university, but it's very greedy for
> memory.
>
> I could not found a prefix based tree in the weka-wiki, so just to be
> shure: Am I blind or isn't one implemented in the weka-package?

Not sure what you want to use the tree for, but maybe the
weka.core.Trie class (for strings) would be helpful?

Cheers, Peter

4.关于文本分类和多标签分类的问题

> Thanks for the information. But, I have a simple question. Is the Weka
> ensemble or boosting algorithm support the following things automatically:
> (1) Multi-class with multiple labels

Just to clarify: Weka allows you only to have a single class
attribute. See also FAQ "
Does WEKA support multi-label
classification?
".

If you develop an ensemble classifier, then it depends a lot on the
base classifier(s) what data can be processed (= their capabilities).
The MultipleClassifiersCombiner superclass, for instance, returns as
capabilities only the capabilities that *all* of the base classifiers
share (see "getCapabilities()" method).

屈伟是做多标签的,有问题可以向他请教。至于在weka里面如何处理,可以参考FAQ中的解答。FAQ的地址在我的收藏链接里面有。


> (2) feature vectors to represent documents

See FAQ "How do I perform text classification?".

Link to the FAQs available from the Weka homepage.

关于在weka中进行文本分类工作的解释。以前我一直使用WVTool配合weka来做文本分类,现在才发现weka已经支持了文本分类的东西。有兴趣的可以仔细看看。因为weka对中文支持的不好,而且它的分词工具也很简单,所以还是有很多工作需要自己做的。

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

文中所涉及的信件内容均属于发件人所有,在此仅为转载,版权为发件人所有。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值