ios 多语言 默认语言_iOS中的自然语言处理

本文探讨了iOS设备如何支持多语言,并重点介绍了系统默认语言的设定。同时,提到了iOS中的自然语言处理技术,为开发者提供了在应用程序中实现智能文本处理的洞察。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

ios 多语言 默认语言

Some time ago I presented a talk at CocoaHeads SP on how to use NLP in an iOS app. A lot has changed since then, so I thought it would be nice to post something about it.

前段时间,我在CocoaHeads SP上发表了有关如何在iOS应用中使用NLP的演讲 。 从那时起,发生了很多变化,所以我认为发布一些相关信息会很好。

自然语言处理 (Natural Language Processing)

The idea of processing human language with computer programs has been around for a while. The tools, methods and approaches change rapidly and there is a myriad of algorithms and techniques some of which people have been using for decades, and others were created just a few years ago.

用计算机程序处理人类语言的想法已经存在了一段时间。 工具,方法和方法Swift变化,并且有无数算法和技术,其中一些已经被人们使用了数十年,而另一些则是几年前创建的。

Some of the common tasks in NLP are tokenization, lemmatization, part-of-speech tagging, word embeddings and text classification. There are, obviously, many other tasks in NLP but it would be impossible to talk about all of them in one post so I decided to talk about those ones.

NLP中的一些常见任务是标记化词形 词性标记词嵌入文本分类 。 显然,NLP中还有许多其他任务,但是不可能一次发表谈论所有这些任务,因此我决定谈论这些任务。

For each of the tasks I will give a brief explanation, maybe with some use cases for an app, and then I will show how to implement it in iOS.

对于每项任务,我都会做一个简短的解释,也许会针对一个应用程序给出一些用例,然后我将展示如何在iOS中实现它。

代币化 (Tokenization)

A text is usually represented in a program as a string. Tokenization handles the question (which might look trivial from a simplistic perspective) of how to split that string into units (paragraphs, sentences, words, etc.).

文本通常在程序中表示为字符串。 令牌化处理了有关如何将字符串分成单位(段落,句子,单词等)的问题(从简单的角度看,这似乎是微不足道的)。

At first you might be tempted to split the string by breaking it at every period for sentences and every blank space for words, for instance. It could work for a few very short texts, but if you’re handling a larger text chances are that approach would not suffice. Whenever you’re splitting your text, you’d wanna have “Mr. Hegarty” together in the same sentence when tokenizing a sentence like: Mr. Hegarty lives in New York.

起初,您可能会想通过在句子的每个句点和单词的每个空格将其断开来分割字符串。 它可能适用于一些非常短的文本,但是如果您要处理较大的文本,则该方法可能不够用。 每当您拆分文本时,您都想拥有“先生。 标记以下句子时,在同一句子中同时显示“ Hegarty”: Hegarty先生居住在纽约

Tokenization is the technique used to decompose a text into units (“tokens”) that can be used later in the processing.

令牌化是一种用于将文本分解为单位(“令牌”)的技术,可在以后的处理中使用。

In order to tokenize a text in iOS, you'll need to instantiate an NLTokenizer and call the enumerateTokens method.

为了在iOS中标记文本,您需要实例化NLTokenizer并调用enumerateTokens方法。

Example function retrieving tokens from a passed string
从传递的字符串中检索令牌的示例函数

合法化 (Lemmatization)

As a former Linguistics Student, I’m tempted to spend longer than I should discussing what lemmatization really is, but I’ll be brief for now — at the risk of disappointing my linguist friends— for the sake of simplicity.

作为前语言学的学生,我很想花的时间比我要讨论什么词形还原确实是,但我会现在是短暂的-在令人失望的我语言学家FRIENDS-为简单起见的风险。

The idea behind lemmatizing a word is turning both lover and loving into the same lemma: love, or turning is and were into be. This can be useful for a number of use cases. Say you want your user to be able to search through a database of descriptions of photos and they want to find pictures with rain, for instance. They might type “raining” at the text field, but I bet you they would like to get results like “… it rained all day…” or “… the rain didn’t stop for a minute…” although none of those sentences present the exact word “raining”.

背后lemmatizing一个字的想法是既转爱人到同一个引理: 爱情 ,或者转弯并且BE。 这对于许多用例可能很有用。 假设您希望用户能够搜索照片描述数据库,例如,他们想下雨的照片。 他们可能会在文本字段中键入“ raining”,但我敢打赌,他们希望得到诸如“… 整天下雨 ……”或“… 下雨没有一分钟 ……”的结果,尽管这些句子都没有出现。确切的词“ raining”。

In order to get the word’s lemma in iOS, you'll need to use the NLTagger class. You should initialize an NLTagger object with .lemma as one of its scheemes, set its string property to be the text you wanna lemmatize and call enumerateTags .

为了在iOS中获取单词的引理,您需要使用NLTagger类。 您应该使用.lemma作为其架构之一来初始化NLTagger对象,将其string属性设置为您要进行NLTagger的文本,然后调用enumerateTags

Example function that retrieves the lemmas for a passed string
检索传递的字符串的引理的示例函数

词性标记 (Part-of-speech tagging)

A part-of-speech is the syntactic class to which a word belongs. A word can be a verb, a noun, a preposition and so on.

词性是单词所属的句法类。 单词可以是动词名词介词等。

Determining the POS of a word in a text is far from being a trivial task. In languages like English, where almost every word can get “verbalized”, things could get really complicated.

确定文本中单词的POS并不是一件容易的事。 在像英语这样的语言中,几乎每个单词都可以被“语言化”,事情可能会变得非常复杂。

In order to the POS tag for the tokens in a text we can use a very similar approach to that of lemmatization. The only difference is that instead of using the .lemma scheme, we should use .lexicalClass .

为了对文本中的令牌使用POS标签,我们可以使用与词义化非常相似的方法。 唯一的区别是,应该使用.lexicalClass而不是使用.lemma方案。

Example function that retrieves the lexical classes (or POS tags) for a given string
检索给定字符串的词法类(或POS标签)的示例函数

词嵌入 (Word embeddings)

Representing words has always been a challenge. With the advancement of GPUs about a decade ago, and the consequent revival of neural networks, it was necessary to represent words numerically, and it would be even nicer if the word representation could somehow encode the similarities/differences between words.

代表单词一直是一个挑战。 随着大约10年前GPU的发展以及神经网络的复兴,有必要用数字表示单词,并且如果单词表示可以某种方式编码单词之间的相似性/差异,那就更好了。

Word embeddings provide just that. They are a representation of words as n-dimensional vectors (where n usually goes from 50 to 300). That representation is suitable as an input to Machine Learning models, like neural nets. Also, with that representation we end up getting some interesting consequences. For example, we can calculate the Euclidean distance between words (everybody remembers Pythagoras' Theorem, right?), and that distance is related to the semantic similarity between words. So you can imagine the vector for the word "dog" being closer to the vector for the word "cat" then to that of the word "space".

单词嵌入正是提供了这一点。 它们将单词表示为n维向量(其中n通常从50到300)。 该表示适合作为机器学习模型(例如神经网络)的输入。 同样,通过这种表示,我们最终会得到一些有趣的结果。 例如,我们可以计算单词之间的欧几里得距离(每个人都记得毕达哥拉斯定理,对吗?),该距离与单词之间的语义相似性有关。 因此,您可以想象单词“ dog”的向量比单词“ cat”的向量更接近单词“ space”的向量。

A vector representation can be useful for an iOS app in a number of ways. I would like to mention two use cases for word embeddings in an app: firstly, as an input for a Core ML model; and, secondly, in order to make some features somewhat "smarter".

向量表示可以多种方式对iOS应用程序有用。 我想提到两个在应用程序中进行单词嵌入的用例:首先,作为Core ML模型的输入; 其次,为了使某些功能更“智能”。

In order to be used as an input for a Core ML model, you'd have to encode your text as a matrix where each line is a vector for a word in the text. The steps would be basically (1) tokenize your text string — as I mentioned above — ; (2) get the vector for each token; (3) pass on as an input to your Core ML model the list of vectors.

为了用作Core ML模型的输入,您必须将文本编码为矩阵,其中每一行都是文本中单词的向量。 这些步骤基本上是(1)标记化您的文本字符串(如上所述)。 (2)获得每个令牌的向量; (3)将向量列表作为输入传递给您的Core ML模型。

When Apple announced Core ML at the WWDC 2017, I remember one of my unanswered questions in the lab was how to easily use word embeddings in order to preprocess the text for a model. Back then, if you wanted to use word-embeddings, you’d have to do it “manually”, loading the vectors from disk at run time.

当苹果在WWDC 2017上发布Core ML时,我记得我在实验室中尚未回答的问题之一是如何轻松使用单词嵌入来预处理模型文本。 那时,如果要使用单词嵌入,则必须“手动”执行,并在运行时从磁盘加载向量。

A lot has changed since then, and now getting a vector representation for a word is easier than adding a gradient background to a button!

此后发生了很多变化,现在为单词获取矢量表示比向按钮添加渐变背景更容易!

All you need to do is instanciate an NLEmbedding using the wordEmbedding(for:) factory method, then you call vector(for:) passing the word you need as a string.

您需要做的就是使用wordEmbedding(for:)工厂方法实例化NLEmbedding ,然后调用vector(for:)将所需的单词作为字符串传递。

Getting the word vector
获取单词向量

Another interesting use case for word embeddings is allowing your app to be "smarter". Using the same example for an app where the user can search for a picture based on the descriptions. Say your user wants to find a picture with a house on it, then he or she goes ahead and types "house" in the search bar. It's quite possible that they would like to get as a result a picture whose description mentions a "mansion" or a "building". How could you implement that?

单词嵌入的另一个有趣用例是允许您的应用“更智能”。 对于用户可以根据描述搜索图片的应用,使用相同的示例。 假设您的用户想查找上面有房屋的图片,然后他或她继续在搜索栏中键入“房屋”。 结果他们很可能希望获得一张描述中提到“豪宅”或“建筑物”的图片。 您如何实现呢?

One of the cool features of word embeddings is getting the "close neighbors" of a given word. So, if your user searches for a word w you could implement your search bringing the results where such word appears, but also bringing back to the user the results where the closest neighbors of that word appear.

单词嵌入的一个很酷的功能之一就是获得给定单词的“近邻”。 因此,如果您的用户搜索单词w ,则可以实施搜索,将结果带到出现该单词的位置,也可以将结果带回给用户,该结果是该单词最接近的邻居出现的位置。

Getting the neighbors for a given word in iOS is a piece of cake:

在iOS中让给定单词的邻居很容易:

Gettings the 5 nearest neighbors of a given word
获取给定单词的5个最近邻居

文字分类 (Text classification)

Lastly, I'd like to mention text classification. The idea in text classification is basically, given a text, determine whether it belongs to a class (for example, news article, or sports text, or even positive/negative sentiment).

最后,我想提一下文本分类。 在给定文本的情况下,文本分类的思想基本上是确定其是否属于一类(例如, 新闻文章体育文本 ,甚至是正面/负面情绪 )。

One of the ways to achieve text classification is using Core ML, as I mentioned above. Depending on the model you're gonna use you may need to use word embeddings as a preprocessing step. But what I would like to mention here is one of the most used types of text classification which is sentiment analysis.

如上所述,实现文本分类的方法之一是使用Core ML。 根据您要使用的模型,您可能需要使用词嵌入作为预处理步骤。 但是我想在这里提及的是情感分析中最常用的文本分类类型之一。

The NaturalLanguage framework in iOS has a simple high level API to determine whether a text is positive or negative.

iOS中的NaturalLanguage框架具有一个简单的高级API,可以确定文本是肯定的还是否定的。

In order to classify a text, you'll use the NLTagger introduced above.

为了对文本进行分类,您将使用NLTagger介绍的NLTagger

Example function that retrieves the sentiment for a given text
检索给定文本情感的示例函数

那不是全部! (That's not all!)

There's still a lot of things you can do in the intersection between iOS and NLP. Obviously I had no pretension to be exhaustive in this post, but you can look up things like, language detection, named entity recognition, document analysis and many other techniques that are easy to use and can have a very positive impact in your apps and, mainly, in the lives of the users.

在iOS和NLP之间的交集中,您仍然可以做很多事情。 显然,我在这篇文章中并没有力求详尽,但是您可以查找语言检测,命名实体识别,文档分析以及许多其他易于使用的技术,这些技术可以对您的应用产生非常积极的影响,主要是在用户的生活中。

I hope you liked it.

我希望你喜欢它。

If you have any comments, questions, suggestions, etc. leave a comment! I'll be glad to answer!

如果您有任何意见,问题,建议等,请发表评论! 我很乐意回答!

翻译自: https://medium.com/cocoaacademymag/natural-language-processing-in-ios-2455a3f541a5

ios 多语言 默认语言

<think>好的,我现在需要在Vue2项目中找到展示Markdown文本的最佳实践。首先,我得了解Vue2常用的Markdown渲染方法。记得Vue本身不直接支持Markdown,所以可能需要第三方库。 用户提到VuePress,但VuePress主要是静态网站生成器,可能更适合文档站点,不过它的Markdown处理方式可能有参考价值。引用[1]提到VuePress内置了Markdown支持,结合Markdown-it-Vue,可能有用。不过VuePress是Vue驱动的,或许在普通Vue2项目里可以用类似的库,比如markdown-it。 然后想到,常用的库有markdown-it,可以将Markdown转换为HTML。在Vue组件里,用v-html指令渲染生成的HTML。但需要注意XSS安全问题,确保内容可信。另外,可能需要代码高亮,比如highlight.js或prism.js。 另外,引用[3]提到用VS Code插件和PicGo图床,但可能和项目展示Markdown关系不大。不过里面提到的工具链可能对整体流程有帮助,比如处理图片。 还有,用户可能需要支持自定义组件或扩展语法,这时候可能需要更高级的库,比如vue-markdown,它基于markdown-it,并支持Vue组件替换。 步骤大概是:安装markdown-it或vue-markdown,在Vue组件中引入,处理Markdown文本,渲染为HTML,并添加样式和代码高亮。同时,要注意安全性和性能。 可能的问题:如何安全地渲染HTML,如何处理代码块的高亮,如何支持自定义的Markdown扩展语法,比如流程图或LaTeX公式。需要逐一解决这些问题。 所以最佳实践可能是:使用markdown-it配合highlight.js进行代码高亮,在Vue组件中通过方法转换Markdown,用v-html渲染,同时使用DOMPurify防止XSS攻击。或者使用vue-markdown组件,更集成化的解决方案。</think>### Vue2 项目中展示 Markdown 文本的最佳实践 #### 1. **选择 Markdown 解析库** - **推荐库**: `markdown-it` 或 `vue-markdown` - **`markdown-it`**: 轻量级且高度可配置,支持插件扩展(如代码高亮、表格、自定义语法等)[^1]。 - **`vue-markdown`**: 基于 `markdown-it` 的 Vue 组件,可直接在模板中使用,支持插槽和自定义渲染规则[^2]。 ```bash # 安装 markdown-it npm install markdown-it # 或安装 vue-markdown npm install vue-markdown ``` #### 2. **基础实现步骤** - **使用 `markdown-it` 示例**: ```javascript // 在 Vue 组件中 import MarkdownIt from 'markdown-it'; const md = new MarkdownIt(); export default { data() { return { content: '# Hello Markdown!' }; }, computed: { htmlContent() { return md.render(this.content); } } }; ``` ```html <template> <div v-html="htmlContent"></div> </template> ``` - **使用 `vue-markdown` 组件**: ```html <template> <vue-markdown :source="content" /> </template> <script> import VueMarkdown from 'vue-markdown'; export default { components: { VueMarkdown }, data() { return { content: '# Hello Markdown!' }; } }; </script> ``` #### 3. **增强功能** - **代码高亮**: 集成 `highlight.js` 或 `prism.js`: ```javascript import hljs from 'highlight.js'; import 'highlight.js/styles/github.css'; // 选择样式 const md = new MarkdownIt({ highlight: (code, lang) => { if (lang && hljs.getLanguage(lang)) { return hljs.highlight(code, { language: lang }).value; } return hljs.highlightAuto(code).value; } }); ``` - **防止 XSS 攻击**: 使用 `DOMPurify` 对渲染后的 HTML 进行过滤: ```bash npm install dompurify ``` ```javascript import DOMPurify from 'dompurify'; // 在 computed 中 htmlContent() { return DOMPurify.sanitize(md.render(this.content)); } ``` - **支持自定义组件**: 通过 `vue-markdown` 的 `render` 插槽或 `markdown-it` 插件实现组件替换: ```html <vue-markdown :source="content"> <template #heading="{ level, text }"> <h :level="level" class="custom-heading">{{ text }}</h> </template> </vue-markdown> ``` #### 4. **样式优化** - 添加 Markdown 主题 CSS(如 `github-markdown-css`): ```bash npm install github-markdown-css ``` ```javascript import 'github-markdown-css'; ``` ```html <div class="markdown-body" v-html="htmlContent"></div> ``` #### 5. **性能优化** - 若内容静态,可预编译为 HTML。 - 使用 `v-once` 避免重复渲染: ```html <div v-html="htmlContent" v-once></div> ``` --- ### 相关问题 1. **如何实现 Markdown 中的图片懒加载?** 可通过 `markdown-it` 插件修改图片标签,添加 `loading="lazy"` 属性。 2. **如何在 Markdown 中嵌入 Vue 组件?** 使用 `vue-markdown` 的插槽或自定义渲染规则,将特定语法映射到 Vue 组件。 3. **如何处理 Markdown 中的数学公式?** 集成 `markdown-it-katex` 插件,并引入 KaTeX 的 CSS 和字体文件[^3]。 --- : VuePress 的 Markdown 解析基于 `markdown-it`,支持插件扩展语法。 : `vue-markdown` 提供了更直接的 Vue 组件集成方式。 : 数学公式渲染需依赖 LaTeX 解析库如 KaTeX 或 MathJax。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值