04-23.eri-test 使用ColdFusion / Java进行语言检测

最新推荐文章于 2024-09-11 09:07:21 发布

cunbang3337

最新推荐文章于 2024-09-11 09:07:21 发布

阅读量200

点赞数

文章标签： java python

In the past, I\'ve used character ranges in an attempt to identify the language of text. While this seemed to work for Russian, Chinese, Japanese, Turkish, Greek, Hebrew, Korean & Arabic, it was pretty useless when it came to Latin languages like French, German & Spanish.

\n\n

Today, a question was asked on StackOverflow:

\n\n

\n
How can I identify the sentence having other languages like Spanish?
\n

\n\n

Someone recommended polyglot and PYCLD2 python libraries and this started me on my quest for a Java solution. I found Apache OpenNLP, but it seemed overkill as I was only needed language detection. Lingua looked promising, but the library was 30mb and integration didn\'t seem very easy. On the Lingua page, the Optimaize Language Detector Java library was referenced. There was also a tag cloud at the top of the page and "language-detection" was listed. I followed it and filtered the language to "java" and received 23 public repositories. The kju2 language-detector library is a fork of Optimaize, seemed more ColdFusion-friendly in terms of integration & usage and the pre-compiled JAR file is only 1.2mb (versus 131mb for Lingua).

\n\n

\n \n \n Installation\n

\n\n

Copy the JAR file to your JAVA path.

\n\n

\n \n \n Usage\n

\n\n

Instantiate the languageDetector.cfc component.
\n

\n\n

var languageDetector = new languageDetector();\n

\n\n

\n \n \n languageDetector.detect(text)\n

\n\n

Returns a text string with the language detected.
\n

languageDetector("Quel est votre nom?")          // CATALAN (French?)\nlanguageDetector("Wie hei\xc3\x9fen Sie?")              // GERMAN\nlanguageDetector("\xc2\xbfCu\xc3\xa1l es tu nombre?")          // SPANISH\nlanguageDetector("\xce\xa0\xcf\x89\xcf\x82 \xcf\x83\xce\xb5 \xce\xbb\xce\xad\xce\xbd\xce\xb5?")                 // GREEK    \nlanguageDetector("\xd8\xa2\xd9\xbe \xda\xa9\xd8\xa7 \xd9\x86\xd8\xa7\xd9\x85 \xda\xa9\xdb\x8c\xd8\xa7 \xdb\x81\xdb\x92\xd8\x9f ")          // URDU\nlanguageDetector("\xd0\x9a\xd0\xb0\xd0\xba \xd0\x92\xd0\xb0\xd1\x81 \xd0\xb7\xd0\xbe\xd0\xb2\xd1\x83\xd1\x82?")               // BELARUSIAN (Russian)\nlanguageDetector("\xe0\xb8\x84\xe0\xb8\xb8\xe0\xb8\x93\xe0\xb8\x8a\xe0\xb8\xb7\xe0\xb9\x88\xe0\xb8\xad\xe0\xb8\xad\xe0\xb8\xb0\xe0\xb9\x84\xe0\xb8\xa3?")                    // THAI\n

\n\n

\n \n \n Source\n

\n\n

Download it from Github.

\n\n\n

\n \n \n JamoCA\n / \n cf-language-detector\n \n

\n ColdFusion wrapper for kju2 forked "Language Detection Library for Java"\n

\n \n

\ncf-language-detect

ColdFusion wrapper for kju2-forked "Language Detection Library for Java".

\nInstallation

Install the JAR file to your existing JAVA path and restart the ColdFusion server.

\n
Download and build JAR file manually from https://github.com/kju2/language-detector\n\n
Download pre-compiled JAR from MvnRepository. https://mvnrepository.com/artifact/io.github.kju2.languagedetector/language-detector/1.0.5\n\n
Use included JAR file (v1.0.5)\n

\nUsage

Instantiate the component:

    var languageDetector = new languageDetector();

\nlanguageDetector.detect(text)

Returns a text string with the language detected.

languageDetector("Quel est votre nom?")          // CATALAN (French?)\nlanguageDetector("Wie hei\xc3\x9fen Sie?")              // GERMAN\nlanguageDetector("\xc2\xbfCu\xc3\xa1l es tu nombre?")          // SPANISH\nlanguageDetector("\xce\xa0\xcf\x89\xcf\x82 \xcf\x83\xce\xb5 \xce\xbb\xce\xad\xce\xbd\xce\xb5?")                 // GREEK    \nlanguageDetector("\xd8\xa2\xd9\xbe \xda\xa9\xd8\xa7 \xd9\x86\xd8\xa7\xd9\x85 \xda\xa9\xdb\x8c\xd8\xa7 \xdb\x81\xdb\x92\xd8\x9f ")          // URDU\nlanguageDetector("\xd0\x9a\xd0\xb0\xd0\xba \xd0\x92\xd0\xb0\xd1\x81 \xd0\xb7\xd0\xbe\xd0\xb2\xd1\x83\xd1\x82?")               // BELARUSIAN (Russian)\nlanguageDetector("\xe0\xb8\x84\xe0\xb8\xb8\xe0\xb8\x93\xe0\xb8\x8a\xe0\xb8\xb7\xe0\xb9\x88\xe0\xb8\xad\xe0\xb8\xad\xe0\xb8\xb0\xe0\xb9\x84\xe0\xb8\xa3?")                    // THAI