TermExtractorRelated的使用
前几天师兄把TermExtractorRelated这个短语抽取工具交给我,让我借用其来完成我的短语抽取工作,下面是具体的使用方式
这个是工具包下载链接https://download.csdn.net/download/Fitz1318/12911242
- 打开
C:\Users\qingbaobao\Desktop\TermExtractorRelated\TermExtractorRelated\termextractor_monitor\termextracttools
这个应用程序,他会初始化短语抽取模型,然后打开本地socket6060端口,如下图所示
-
编写python程序,来在自己的文本中抽取关键词
import socket import re # Host = '192.168.6.213' Host = '127.0.0.1' Port = 6060 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((Host, Port)) def term_extract(qs: str, max_size: int=10) -> list: rqs = re.sub(r'[<>]', ' ', qs) rqs = re.sub(r'[^\x00-\x7F]', ' ', rqs) rqs = rqs.strip() retry = 0 resp = [] s.sendall(te_buff_gen(rqs, max_size)) result = s.recv(8192).decode('utf-8') for line in result.split('\n'): line = line.split('\t') resp.append((line[0], float(line[1]))) return resp def te_buff_gen(qs: str, max_size: int) -> bytes: return bytes('<cmd>\nnum=%d\ntype=PhraseOnly</cmd>%s <end>\n\n' % (max_size, qs), 'utf-8') text1 = """Proof theory of many-valued logic—linear optimization—logic design: connections and interactions In this paper proof theory of many-valued logic is connected with areas outside of logic, namely, linear optimization and computer aided logic design. By stating these nos explicitly, I want to encourage interaction between the mentioned disciplines. Once familiar with the others’ terminology, I believe that the respective communities can greatly benefit from each other. """ text2 = """An approach to the script discrimination in the Slavic documents The paper deals with the problem of the script discrimination in old Slavic printed documents. Therefore, an algorithm for script classification and identification is proposed. It creates coded text from initial document. Then, the coded text is subjected to statistical analysis. As a result, the texture feature extraction is carried out. Obtained texture features are used as criteria for script classification and identification. The proposed method is tested on the samples of old Slavic printed documents written in Glagolitic, Cyrillic and Latin script. """ text3 = """A differential inclusion approach for modeling and analysis of dynamical systems under uncertainty In this paper we deal with the application of differential inclusions to modeling nonlinear dynamical systems under uncertainty in parameters. In this case, differential inclusions seem to be better suited to modeling practical situations under uncertainty and imprecision than formulations by means of fuzzy differential equations. We develop a practical algorithm to approximate the reachable sets of a class of nonlinear differential inclusion, which eludes the computational problems of a previous set-valued version of the Heun’s method. Our algorithm is based on a complete discretization (time and state space) of the differential inclusion and it suits hardware features, handling the memory used by the method in a controlled fashion during all iterations. As a case of study, we formulate a differential inclusion to model an epidemic outbreak of dengue fever under Cuban conditions. The model takes into account interaction of human and mosquito populations as well as vertical transmission in the mosquito population. It is studied from the theoretical point of view to apply the Practical Algorithm. Also, we estimate the temporal evolution of the different human and mosquito populations given by the model in the Dengue 3 epidemic in Havana 2001, through the computation of the reachable sets using the Practical Algorithm. """ text4 = """A study of large-scale data clustering based on fuzzy clustering Large-scale data are any data that cannot be loaded into the main memory of the ordinary. This is not the objective definition of large-scale data, but it is easy to understand what the large-scale data is. We first introduce some present algorithms to clustering large-scale data, some data stream clustering algorithms based on FCM algorithms are also introduced. In this paper, we propose a new structure to cluster large-scale data and two new data stream clustering algorithms based on the structure are propose in Sects. 3 and 4. In our method, we load the objects in the dataset one by one. We set a threshold of the membership, if the membership of one object and a cluster center is bigger than the threshold, the object is assigned to the cluster and the location of nearest cluster center will be updated, else the object is put into the temporary matrix; we call it pool. When the pool is full, we cluster the data in the pool and update the location of cluster centers. The two algorithms are based on the data stream structure. The difference of the two algorithms is the how the objects in the data are weighed. We test our algorithms on handwritten digits images dataset and several large-scale UCI datasets and make a comparison with some presented algorithms. The experiments proved that our algorithm is more suitable to cluster large-scale datasets. """ text5 = """Hybridization of magnetic charge system search and particle swarm optimization for efficient data clustering using neighborhood search strategy Clustering is a popular data analysis technique, which is applied for partitioning of datasets. The aim of clustering is to arrange the data items into clusters based on the values of their attributes. Magnetic charge system search (MCSS) algorithm is a new meta-heuristic optimization algorithm inspired by the electromagnetic theory. It has been proved better than other meta-heuristics. This paper presents a new hybrid meta-heuristic algorithm by combining both MCSS and particle swarm optimization (PSO) algorithms, which is called MCSS–PSO, for partitional clustering problem. Moreover, a neighborhood search strategy is also incorporated in this algorithm to generate more promising solutions. The performance of the proposed MCSS–PSO algorithm is tested on several benchmark datasets and its performance is compared with already existing clustering algorithms such as K-means, PSO, genetic algorithm, ant colony optimization, charge system search, chaotic charge system search algorithm, and some PSO variants. From the experimental results, it can be seen that performance of the proposed algorithm is better than the other algorithms being compared and it can be effectively used for partitional clustering problem. """ print(term_extract(text1,100)) """ WordPhrase PhraseOnly WordOnly """
运行结果如下