ios 多语言 默认语言
Some time ago I presented a talk at CocoaHeads SP on how to use NLP in an iOS app. A lot has changed since then, so I thought it would be nice to post something about it.
前段时间,我在CocoaHeads SP上发表了有关如何在iOS应用中使用NLP的演讲 。 从那时起,发生了很多变化,所以我认为发布一些相关信息会很好。
自然语言处理 (Natural Language Processing)
The idea of processing human language with computer programs has been around for a while. The tools, methods and approaches change rapidly and there is a myriad of algorithms and techniques some of which people have been using for decades, and others were created just a few years ago.
用计算机程序处理人类语言的想法已经存在了一段时间。 工具,方法和方法Swift变化,并且有无数算法和技术,其中一些已经被人们使用了数十年,而另一些则是几年前创建的。
Some of the common tasks in NLP are tokenization, lemmatization, part-of-speech tagging, word embeddings and text classification. There are, obviously, many other tasks in NLP but it would be impossible to talk about all of them in one post so I decided to talk about those ones.
NLP中的一些常见任务是标记化 , 词形 化 , 词性标记 , 词嵌入和文本分类 。 显然,NLP中还有许多其他任务,但是不可能一次发表谈论所有这些任务,因此我决定谈论这些任务。
For each of the tasks I will give a brief explanation, maybe with some use cases for an app, and then I will show how to implement it in iOS.
对于每项任务,我都会做一个简短的解释,也许会针对一个应用程序给出一些用例,然后我将展示如何在iOS中实现它。
代币化 (Tokenization)
A text is usually represented in a program as a string. Tokenization handles the question (which might look trivial from a simplistic perspective) of how to split that string into units (paragraphs, sentences, words, etc.).
文本通常在程序中表示为字符串。 令牌化处理了有关如何将字符串分成单位(段落,句子,单词等)的问题(从简单的角度看,这似乎是微不足道的)。
At first you might be tempted to split the string by breaking it at every period for sentences and every blank space for words, for instance. It could work for a few very short texts, but if you’re handling a larger text chances are that approach would not suffice. Whenever you’re splitting your text, you’d wanna have “Mr. Hegarty” together in the same sentence when tokenizing a sentence like: Mr. Hegarty lives in New York.
起初,您可能会想通过在句子的每个句点和单词的每个空格将其断开来分割字符串。 它可能适用于一些非常短的文本,但是如果您要处理较大的文本,则该方法可能不够用。 每当您拆分文本时