分类摘要及对应英文1

最新推荐文章于 2024-03-31 21:30:11 发布

enhz

最新推荐文章于 2024-03-31 21:30:11 发布

阅读量942

点赞数

分类专栏：文本分类文章标签： classification algorithm optimization performance features dataset

本文链接：https://blog.csdn.net/enhz/article/details/5440514

版权

文本分类专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一种设计层次支持向量机多类分类器的新方法

摘　要: 层次结构的设计是层次支持向量机多类分类方法应用中的关键问题,类间可分性是设计层次结构的重
要依据,提出了一种基于线性支持向量机度量类间相似程度的方法,并给出了一种基于类间可分性设计层次支
持向量机多类分类器的新方法。实验表明,新方法有效地提高了层次支持向量机多类分类器的分类精度和速
度。
关键词: 支持向量机; 多类分类; 层次结构; 类间可分性

New Method of Design Hierarchical Support VectorMachineMulti2class Classifier

Abstract: Designing the hierarchical structure is a key issue for the hierarchical support vectormachine multi2class classifica2
tion. Inter2class separability is an important basis for designing the hierarchical structure. A new method based on linear sup2
port vectormachines is p roposed to measure inter2class separability. Furthermore, a method is p resented which designs a hier2
archical support vectormachine multi2class classifier based on the inter2class separability. Experimental results indicate that
the new method speeds up the hierarchical support vectormachine multi2class classifiers and yields higher p recision.
Key words: SupportVectorMachines; Multi2class Classification; Hierarchical Structure; Inter2class Separability

Feature subset selection in large dimensionality domains

Abstract

Searching for an optimal feature subset from a high dimensional feature space is known to be an NP-complete problem. We present a hybrid algorithm, SAGA, for this task. SAGA combines the ability to avoid being trapped in a local minimum of simulated annealing with the very high rate of convergence of the crossover operator of genetic algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks. We compare the performance over time of SAGA and well-known algorithms on synthetic and real datasets. The results show that SAGA outperforms existing algorithms.

Feature selection with a measure of deviations from Poisson in text categorization

Fujiyoshida-City, Yamanashi 403-0005, Japan

Available online 28 August 2008.

Abstract

To improve the performance of automatic text classification, it is desirable to reduce a high dimensionality of the feature space. In this paper, we propose a new measure for selecting features, which estimates term importance based on how largely the probability distribution of each term deviates from the standard Poisson distribution. In information retrieval literatures, the deviation from Poisson has been used as a measure for weighting keywords and this motivates us to adopt the deviation from Poisson as a measure for feature selection in text classification tasks. The proposed measure is constructed so as to have the same computational complexity with other standard measures used for feature selection. To test the effectiveness of our method, we conducted evaluation experiments on Reuters-21578 corpus with support vector machine and k -NN classifiers. In the experiments, we performed binary classifications to determine whether each of the test documents belongs to a certain target category or not. For the target category, each of the top 10 categories of Reuters-21578 was used because of enough numbers of training and test documents. Four measures were used for feature selection; information gain (IG), χ 2 -statistic, Gini index and the proposed measure in this work. Both the proposed measure and Gini index proved to be better than IG and χ 2 -statistic in terms of macro-averaged and micro-averaged values of F 1 , especially at higher vocabulary reduction levels.

Keywords: Text categorization; Feature selection ; Poisson distribution; Support vector machine; k -NN classifier

Article Outline

1. Introduction
2. Poisson distribution in information retrieval
3. Application of deviation from Poisson to feature selection
4. Experimental setup

4.1. Data collection
4.2. Feature selection
4.3. Document representation
4.4. Classifiers
4.5. Performance measure
5. Results and discussion

5.1.

performance
5.2.

performance
5.3. Scalability
6. Conclusion
Acknowledgements
References

Feature selection for text classification with Naïve Bayes 2008

Abstract

As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naïve Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naïve Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naïve Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.

Keywords: Text classification; Feature selection; Text preprocessing; Naïve Bayes

Article Outline

1. Introduction
2. Feature evaluation metrics for Naïve Bayes classifiers

2.1. The MOR metric
2.2. The CDM metric
3. Naïve Bayesian classifiers used on text data
4. Experiments

4.1. Data collections and performance setting
4.2. Experimental results and ****yses
5. Conclusion
Acknowledgements
References

Text feature selection using ant colony optimization

Abstract

Feature selection and feature extraction are the most important steps in classification systems. Feature selection is commonly used to reduce dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further. One of the problems in which feature selection is essential is text categorization. A major problem of text categorization is the high dimensionality of the feature space; therefore, feature selection is the most important step in text categorization. At present there are many methods to deal with text feature selection. To improve the performance of text categorization, we present a novel feature selection algorithm that is based on ant colony optimization. Ant colony optimization algorithm is inspired by observation on real ants in their search for the shortest paths to food sources. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of genetic algorithm, information gain and CHI on the task of feature selection in Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.

Keywords: Feature selection; Ant colony optimization; Genetic algorithm; Text categorization

Article Outline

1. Introduction
2. Feature selection approaches
3. Ant colony optimization (ACO)

3.1. Ant colony optimization for feature selection

3.1.1. Graph representation
3.1.2. Heuristic desirability
3.1.3. Pheromone update rule
3.1.4. Solution construction
4. Proposed feature selection algorithm
5. Genetic algorithm (GA)

5.1. Genetic algorithm for feature selection
6. Statistical approaches

6.1. Information gain (IG)
6.2. χ 2 Statistic (CHI)
7. Experimental results

7.1. Dataset
7.2. Feature extraction
7.3. Performance measure
7.4. Results
8. Conclusion
Acknowledgements
References

A novel ACO–GA hybrid algorithm for feature selection in protein function prediction

Abstract

Protein function prediction is an important problem in functional genomics. Typically, protein sequences are represented by feature vectors. A major problem of protein datasets that increase the complexity of classification models is their large number of features. Feature selection (FS) techniques are used to deal with this high dimensional space of features. In this paper, we propose a novel feature selection algorithm that combines genetic algorithms (GA) and ant colony optimization (ACO) for faster and better search capability. The hybrid algorithm makes use of advantages of both ACO and GA methods. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. The performance of proposed algorithm is compared to the performance of two prominent population-based algorithms, ACO and genetic algorithms. Experimentation is carried out using two challenging biological datasets, involving the hierarchical functional classification of GPCRs and enzymes. The criteria used for comparison are maximizing predictive accuracy, and finding the ****allest subset of features. The results of experiments indicate the superiority of proposed algorithm.

Article Outline

1. Introduction
2. Protein function prediction
3. Feature selection approaches
4. Ant colony optimization

4.1. Ant colony optimization for feature selection

4.1.1. Graph representation
4.1.2. Heuristic desirability
4.1.3. Pheromone update rule
5. Genetic algorithm (GA)

5.1. Genetic algorithm for feature selection
6. Proposed ACO–GA algorithm
7. Experimental results

7.1. Datasets
7.2. Experimental methodology
7.3. Results
7.4. Discussion
8. Conclusion and future research
References

Optimal feature selection for support vector machines

Abstract

Selecting relevant features for support vector machine (SVM) classifiers is important for a variety of reasons such as generalization performance, computational efficiency, and feature interpretability. Traditional SVM approaches to feature selection typically extract features and learn SVM parameters independently. Independently performing these two steps might result in a loss of information related to the classification process. This paper proposes a convex energy-based framework to jointly perform feature selection and SVM parameter learning for linear and non-linear kernels. Experiments on various databases show significant reduction of features used while maintaining classification performance.

Keywords: Support vector machine; Feature selection; Feature extraction

Article Outline

1. Introduction
2. Previous work

2.1. Support vector machines
2.2. Feature construction in SVM
3. SVMs and parameterized kernels
4. Learning feature weights
5. Feature weighting in feature space
6. Connection to L 1 -SVMs and sparsity
7. Experiments

7.1. Handwritten digit recognition
7.2. Pose classification
7.3. Eye detection
7.4. Experiments on other datasets
7.5. Software packages and training time
8. Conclusion
Acknowledgements
Appendix A. Proof of Theorem 1
Appendix B. Theorem 2
References
Vitae

Document Classification Algorithm Based on NPE and PSO
Ziqiang Wang; Xia Sun

With many potential applications in document management and Web searching, document classification has recently gained more attention. To efficiently resolve this problem, an efficient document classification algorithm based on neighborhood preserving embedding (NPE) and particle swarm optimization (PSO) is proposed in this paper. The document features are first extracted by the NPE algorithm, then the PSO classifier is used to classify the documents into semantically different classes. Experimental results show that the proposed algorithm achieves much better performance than other related classification algorithms.

基于粒子群优化算法的网页分类技术

粒子群优化算法由于其高效、容易理解、易于实现,在很多领域得到了应用.网页分类是网络信息检索研究的关键技术之一,在对网页的表示时,将Web页面分解为不同的部分,之后迭代使用SVM算法构造分类器.由于PSO算法是一种基于迭代的优化工具,对训练过程中迭代产生的网页分类器进行优化组合,产生最终分类器,同时也增强了分类器的自适应性.实验结果表明,通过对迭代产生的分类器进行优化组合,以及对网页结构的划分,寻找并利用网页集中蕴藏的规律综合计算特征权值,大大提高了网页分类的正确率和F-measure值,所以这种方法是有效的、稳健的和实用的.

A distributed PSO-SVM hybrid system with feature selection and parameter optimization

This study proposed a novel PSO-SVM model that hybridized the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a ****all and appropriate feature subset. This optimization mechani**** combined the discrete PSO with the continuous-valued PSO to simultaneously optimize the input feature subset selection and the SVM kernel parameter setting. The hybrid PSO-SVM data mining system was implemented via a distributed architecture using the web service technology to reduce the computational time. In a heterogeneous computing environment, the PSO optimization was performed on the application server and the SVM model was trained on the client (agent) computer. The experimental results showed the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy.

Dimensionality Reduction using GA-PSO 2006

The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in acceptable classification accuracy. In this paper, we propose a combination of genetic algorithms (GAs) and particle swarm optimization (PSO) for feature selection. The K-nearest neighbor (K-NN) method with leave-one-out cross-validation (LOOCV) serves as an evaluator for the GAs and the PSO. The proposed method is applied to five classification problems taken from the literature. Experimental results show that our method simplifies features effectively and obtains a higher classification accuracy compared to other feature selection methods.

A novel hybrid ACO-GA algorithm for text feature selection 2009

Abstract:

In our previous work we have proposed an ant colony optimization (ACO) algorithm for feature selection. In this paper, we hybridize the algorithm with a genetic algorithm (GA) to obtain excellent features of two algorithms by synthesizing them. Proposed algorithm is applied to a challenging feature selection problem. This is a data mining problem involving the categorization of text documents. We report the extensive comparison between our proposed algorithm and three existing algorithms - ACO-based, information gain (IG) and CHI algorithms proposed in the literature. Proposed algorithm is easily implemented and because of use of a simple classifier in that, its computational complexity is very low. Experimentations are carried out on Reuters-21578 dataset. Simulation results on Reuters-21578 dataset show the superiority of the proposed algorithm.

Text feature selection using ant colony optimization

Article Outline

Feature Selection for the Stored-grain Insects Based on PSO and SVM

ABSTRACT

The feature subset selection is a key preprocessing part in the detection of the stored-grain insects based on the image recognition technology. According to the global optimization ability of the particle swarm optimization (PSO) and the superior classification performance of the support vector machines (SVM), this study proposed a method based on PSO and SVM to improve the classification accuracy with the appropriate feature subset. The single objective fitness function was designed to evaluate the feature subset by introducing the v-fold cross-validation training model accuracy and the number of the selected features. Nine species of the stored-grain insects spoiled seriously in grain-depot, like Tenebroides mauritanicus(L.) and Rhizopertha dominica Fabricius. The feature subset selection for the stored-grain insects was implemented by the method based on PSO and SVM. The optimal feature subset consisted of seven features was selected from the 17 morphological features, such as area and perimeter. Compared with the genetic algorithm (GA), the method in this study can decrease the size of the feature subset and improve the classification accuracy. Making use of the feature subset selected by PSO and SVM, the ninety image samples of the stored-grain insects were classified by the SVM classifier that two parameters had been optimized, and the classification accuracy was over 95.5%. The experiment showed that it was practical and feasible.

Improved simplified PSO KNN classification algorithm

An efficient algorithm SPSOKNN is proposed to reduce the computational complexity of KNN text classification algorithm, it is based on particle swarm optimization which searches randomly within training document set. During the procedure for searching k nearest neighbors of tested sample, those document vectors that are impossible to be the k closest vectors are kicked out quickly. And removing PSO evolutionary process of particle velocity impact, thus we can more rapidly find the k closest vectors of test samples.By verifying the validity of algorithm, finding the same k nearest neighbors, classification accuracy of SPSOKNN algorithm is higher than KNN algorithm.

A PSO-Based Web Document Classification Algorithm

Abstract
Due to the exponential growth of documents in the Internet and the emergent need to organize them, the automatic document classification has received an ever-increased attention in the recent years. The particle swarm optimization (PSO) algorithm, new to the document classification community, is a robust stochastic evolutionary algorithm based on the movement and intelligence of swarms. In this paper, a PSO-based algorithm for document classification is presented. Comparison between our method and other conventional document classification algorithms is conducted on Reuter and TREC corpora. The experimental results indicate that our proposed algorithm yields much better performance than other conventional algorithms.

SVM based adaptive learning method for text classification from positive and unlabeled documents

Abstract Automatic text classification is one of the most important tools in Information Retrieval. This paper presents a novel text classifier using positive and unlabeled examples. The primary challenge of this problem as compared with the classical text classification problem is that no labeled negative documents are available in the training example set. Firstly, we identify many more reliable negative documents by an improved 1-DNF algorithm with a very low error rate. Secondly, we build a set of classifiers by iteratively applying the SVM algorithm on a training data set, which is augmented during iteration. Thirdly, different from previous PU-oriented text classification works, we adopt the weighted vote of all classifiers generated in the iteration steps to construct the final classifier instead of choosing one of the classifiers as the final classifier. Finally, we discuss an approach to evaluate the weighted vote of all classifiers generated in the iteration steps to construct the final classifier based on PSO (Particle Swarm Optimization), which can discover the best combination of the weights. In addition, we built a focused crawler based on link-contexts guided by different classifiers to evaluate our method. Several comprehensive experiments have been conducted using the Reuters data set and thousands of web pages. Experimental results show that our method increases the performance (F1-measure) compared with PEBL, and a focused web crawler guided by our PSO-based classifier outperforms other several classifiers both in harvest rate and target recall