Abstract:
It is very important to ascertain rationally the number and positions of split points for discretization of continuous variables. To improve the efficiency of unsupervised discretization, an entropy-based algorithm was proposed for discretization of continuous variables. It made use of the characteristics of the information content(entropy) of a continuous variable, and partitioned the continuous variable by itself for minimizing both the loss of entropy and the number of partitions, in order to find the best balance between the information loss and a low number of partitions, so then obtained an optimal discretization result. The experiments show this approach effective.