分类摘要及对应英文1

 

一种设计层次支持向量机多类分类器的新方法


摘 要: 层次结构的设计是层次支持向量机多类分类方法应用中的关键问题,类间可分性是设计层次结构的重
要依据,提出了一种基于线性支持向量机度量类间相似程度的方法,并给出了一种基于类间可分性设计层次支
持向量机多类分类器的新方法。实验表明,新方法有效地提高了层次支持向量机多类分类器的分类精度和速
度。
关键词: 支持向量机; 多类分类; 层次结构; 类间可分性


New Method of Design Hierarchical Support VectorMachineMulti2class Classifier


Abstract: Designing the hierarchical structure is a key issue for the hierarchical support vectormachine multi2class classifica2
tion. Inter2class separability is an important basis for designing the hierarchical structure. A new method based on linear sup2
port vectormachines is p roposed to measure inter2class separability. Furthermore, a method is p resented which designs a hier2
archical support vectormachine multi2class classifier based on the inter2class separability. Experimental results indicate that
the new method speeds up the hierarchical support vectormachine multi2class classifiers and yields higher p recision.
Key words: SupportVectorMachines; Multi2class Classification; Hierarchical Structure; Inter2class Separability

 

Feature subset selection in large dimensionality domains




Abstract


Searching for an optimal feature subset from a high dimensional feature space is known to be an NP-complete problem. We present a hybrid algorithm, SAGA, for this task. SAGA combines the ability to avoid being trapped in a local minimum of simulated annealing with the very high rate of convergence of the crossover operator of genetic algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks. We compare the performance over time of SAGA and well-known algorithms on synthetic and real datasets. The results show that SAGA outperforms existing algorithms.


 


 


 


Feature selection next term with a measure of deviations from Poisson in previous term text next term categorization


 




Fujiyoshida-City, Yamanashi 403-0005, Japan



Available online 28 August 2008.

 





Abstract


To improve the performance of automatic previous term text next term classification, it is desirable to reduce a high dimensionality of the previous term feature next term space. In this paper, we propose a new measure for selecting previous term features, next term which estimates term importance based on how largely the probability distribution of each term deviates from the standard Poisson distribution. In information retrieval literatures, the deviation from Poisson has been used as a measure for weighting keywords and this motivates us to adopt the deviation from Poisson as a measure for previous term feature selection next term in previous term text next term classification tasks. The proposed measure is constructed so as to have the same computational complexity with other standard measures used for previous term feature selection. next term To test the effectiveness of our method, we conducted evaluation experiments on Reuters-21578 corpus with support vector machine and k -NN classifiers. In the experiments, we performed binary classifications to determine whether each of the test documents belongs to a certain target category or not. For the target category, each of the top 10 categories of Reuters-21578 was used because of enough numbers of training and test documents. Four measures were used for previous term feature selection; next term information gain (IG), χ 2 -statistic, Gini index and the proposed measure in this work. Both the proposed measure and Gini index proved to be better than IG and χ 2 -statistic in terms of macro-averaged and micro-averaged values of F 1 , especially at higher vocabulary reduction levels.



Keywords: previous term Text next term categorization; previous term Feature selection next term ; Poisson distribution; Support vector machine; k -NN classifier



Article Outline



1. Introduction
2. Poisson distribution in information retrieval
3. Application of deviation from Poisson to feature selection
4. Experimental setup

4.1. Data collection
4.2. Feature selection
4.3. Document representation
4.4. Classifiers
4.5. Performance measure
5. Results and discussion

5.1. View the Math**** source performance
5.2. View the Math**** source performance
5.3. Scalability
6. Conclusion
Acknowledgements
References

 


Feature selection next term for previous term text next term classification with Naïve Bayes  2008




Abstract


As an important preprocessing technology in previous term text next term classification, previous term feature selection next term can improve the scalability, efficiency and accuracy of a previous term text next term classifier. In general, a good previous term feature selection next term method should consider domain and algorithm characteristics. As the Naïve Bayesian classifier is very simple and efficient and highly sensitive to previous term feature selection, next term so the research of previous term feature selection next term specially for it is significant. This paper presents two previous term feature next term evaluation metrics for the Naïve Bayesian classifier applied on multi-class previous term text next term datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of previous term text next term classification with Naïve Bayesian classifiers were carried out on two multi-class previous term texts next term collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other previous term feature selection next term approaches.



Keywords: previous term Text next term classification; previous term Feature selection; Text next term preprocessing; Naïve Bayes



Article Outline



1. Introduction
2. Feature evaluation metrics for Naïve Bayes classifiers

2.1. The MOR metric
2.2. The CDM metric
3. Naïve Bayesian classifiers used on text data
4. Experiments

4.1. Data collections and performance setting
4.2. Experimental results and ****yses
5. Conclusion
Acknowledgements
References

 


Text feature selection using ant colony optimization


 




Abstract


Feature selection and feature extraction are the most important steps in classification systems. Feature selection is commonly used to reduce dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further. One of the problems in which feature selection is essential is text categorization. A major problem of text categorization is the high dimensionality of the feature space; therefore, feature selection is the most important step in text categorization. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present a novel feature selection algorithm that is based on ant colony optimization. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of genetic algorithm, information gain and CHI on the task of feature selection in Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.



Keywords: Feature selection; Ant colony optimization; Genetic algorithm; Text categorization



Article Outline




1. Introduction
2. Feature selection approaches
3. Ant colony optimization (ACO)

3.1. Ant colony optimization for feature selection

3.1.1. Graph representation
3.1.2. Heuristic desirability
3.1.3. Pheromone update rule
3.1.4. Solution construction
4. Proposed feature selection algorithm
5. Genetic algorithm (GA)

5.1. Genetic algorithm for feature selection
6. Statistical approaches

6.1. Information gain (IG)
6.2. χ 2 Statistic (CHI)
7. Experimental results

7.1. Dataset
7.2. Feature extraction
7.3. Performance measure
7.4. Results
8. Conclusion
Acknowledgements
References

 


 


A novel ACO–GA hybrid algorithm for feature selection in protein function prediction



Abstract


Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the ****allest subset of features. The results of experiments indicate the superiority of proposed algorithm.


 



Article Outline




1. Introduction
2. Protein function prediction
3. Feature selection approaches
4. Ant colony optimization

4.1. Ant colony optimization for feature selection

4.1.1. Graph representation
4.1.2. Heuristic desirability
4.1.3. Pheromone update rule
5. Genetic algorithm (GA)

5.1. Genetic algorithm for feature selection
6. Proposed ACO–GA algorithm
7. Experimental results

7.1. Datasets
7.2. Experimental methodology
7.3. Results
7.4. Discussion
8. Conclusion and future research
References

 


Optimal feature selection next term for support vector machines




Abstract


Selecting relevant previous term features next term for support vector machine (SVM) classifiers is important for a variety of reasons such as generalization performance, computational efficiency, and previous term feature next term interpretability. Traditional SVM approaches to previous term feature selection next term typically extract previous term features next term and learn SVM parameters independently. Independently performing these two steps might result in a loss of information related to the classification process. This paper proposes a convex energy-based framework to jointly perform previous term feature selection next term and SVM parameter learning for linear and non-linear kernels. Experiments on various databases show significant reduction of previous term features next term used while maintaining classification performance.



Keywords: Support vector machine; previous term Feature selection; Feature next term extraction



Article Outline



1. Introduction
2. Previous work

2.1. Support vector machines
2.2. Feature construction in SVM
3. SVMs and parameterized kernels
4. Learning feature weights
5. Feature weighting in feature space
6. Connection to L 1 -SVMs and sparsity
7. Experiments

7.1. Handwritten digit recognition
7.2. Pose classification
7.3. Eye detection
7.4. Experiments on other datasets
7.5. Software packages and training time
8. Conclusion
Acknowledgements
Appendix A. Proof of Theorem 1
Appendix B. Theorem 2
References
Vitae




Document Classification Algorithm Based on NPE and PSO
Ziqiang Wang; Xia Sun



With many potential applications in document management and Web searching, document classification has recently gained more attention. To efficiently resolve this problem, an efficient document classification algorithm based on neighborhood preserving embedding (NPE) and particle swarm optimization (PSO) is proposed in this paper. The document features are first extracted by the NPE algorithm, then the PSO classifier is used to classify the documents into semantically different classes. Experimental results show that the proposed algorithm achieves much better performance than other related classification algorithms.



基于粒子群优化算法的网页分类技术



粒子群优化算法由于其高效、容易理解、易于实现,在很多领域得到了应用.网页分类是网络信息检索研究的关键技术之一,在对网页的表示时,将Web页面分解为不同的部分,之后迭代使用SVM算法构造分类器.由于PSO算法是一种基于迭代的优化工具,对训练过程中迭代产生的网页分类器进行优化组合,产生最终分类器,同时也增强了分类器的自适应性.实验结果表明,通过对迭代产生的分类器进行优化组合,以及对网页结构的划分,寻找并利用网页集中蕴藏的规律综合计算特征权值,大大提高了网页分类的正确率和F-measure值,所以这种方法是有效的、稳健的和实用的.


A distributed PSO-SVM hybrid system with feature selection and parameter optimization


This study proposed a novel PSO-SVM model that hybridized the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a ****all and appropriate feature subset. This optimization mechani**** combined the discrete PSO with the continuous-valued PSO to simultaneously optimize the input feature subset selection and the SVM kernel parameter setting. The hybrid PSO-SVM data mining system was implemented via a distributed architecture using the web service technology to reduce the computational time. In a heterogeneous computing environment, the PSO optimization was performed on the application server and the SVM model was trained on the client (agent) computer. The experimental results showed the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy.


 


 


Dimensionality Reduction using GA-PSO 2006

The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in acceptable classification accuracy. In this paper, we propose a combination of genetic algorithms (GAs) and particle swarm optimization (PSO) for feature selection. The K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) serves as an evaluator for the GAs and the PSO. The proposed method is applied to five classification problems taken from the literature. Experimental results show that our method simplifies features effectively and obtains a higher classification accuracy compared to other feature selection methods.

 

A novel hybrid ACO-GA algorithm for text feature selection 2009

Abstract:

In our previous work we have proposed an ant colony optimization (ACO) algorithm for feature selection. In this paper, we hybridize the algorithm with a genetic algorithm (GA) to obtain excellent features of two algorithms by synthesizing them. Proposed algorithm is applied to a challenging feature selection problem. This is a data mining problem involving the categorization of text documents. We report the extensive comparison between our proposed algorithm and three existing algorithms - ACO-based, information gain (IG) and CHI algorithms proposed in the literature. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. Experimentations are carried out on Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.

 

Text feature selection using ant colony optimization



Feature selection and feature extraction are the most important steps in classification systems. Feature selection is commonly used to reduce dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further. One of the problems in which feature selection is essential is text categorization. A major problem of text categorization is the high dimensionality of the feature space; therefore, feature selection is the most important step in text categorization. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present a novel feature selection algorithm that is based on ant colony optimization. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of genetic algorithm, information gain and CHI on the task of feature selection in Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.


Feature Selection for the Stored-grain Insects Based on PSO and SVM



ABSTRACT



The feature subset selection is a key preprocessing part in the detection of the stored-grain insects based on the image recognition technology. According to the global optimization ability of the particle swarm optimization (PSO) and the superior classification performance of the support vector machines (SVM), this study proposed a method based on PSO and SVM to improve the classification accuracy with the appropriate feature subset. The single objective fitness function was designed to evaluate the feature subset by introducing the v-fold cross-validation training model accuracy and the number of the selected features. Nine species of the stored-grain insects spoiled seriously in grain-depot, like Tenebroides mauritanicus(L.) and Rhizopertha dominica Fabricius. The feature subset selection for the stored-grain insects was implemented by the method based on PSO and SVM. The optimal feature subset consisted of seven features was selected from the 17 morphological features, such as area and perimeter. Compared with the genetic algorithm (GA), the method in this study can decrease the size of the feature subset and improve the classification accuracy. Making use of the feature subset selected by PSO and SVM, the ninety image samples of the stored-grain insects were classified by the SVM classifier that two parameters had been optimized, and the classification accuracy was over 95.5%. The experiment showed that it was practical and feasible.



 Improved simplified PSO KNN classification algorithm



An efficient algorithm SPSOKNN is proposed to reduce the computational complexity of KNN text classification algorithm, it is based on particle swarm optimization which searches randomly within training document set. During the procedure for searching k nearest neighbors of tested sample, those document vectors that are impossible to be the k closest vectors are kicked out quickly. And removing PSO evolutionary process of particle velocity impact, thus we can more rapidly find the k closest vectors of test samples.By verifying the validity of algorithm, finding the same k nearest neighbors, classification accuracy of SPSOKNN algorithm is higher than KNN algorithm.



A PSO-Based Web Document Classification Algorithm



Abstract
Due to the exponential growth of documents in the Internet and the emergent need to organize them, the automatic document classification has received an ever-increased attention in the recent years. The particle swarm optimization (PSO) algorithm, new to the document classification community, is a robust stochastic evolutionary algorithm based on the movement and intelligence of swarms. In this paper, a PSO-based algorithm for document classification is presented. Comparison between our method and other conventional document classification algorithms is conducted on Reuter and TREC corpora. The experimental results indicate that our proposed algorithm yields much better performance than other conventional algorithms.



SVM based adaptive learning method for text classification from positive and unlabeled documents



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值