Generative AI为Qt开发者重写编码辅助规则

本文探讨了2023年GenerativeAI如何提升编码辅助,如GitHubCopilot,通过自然语言提示生成代码,减少了开发者在样板代码上的时间。文章还关注了法规对生成AI的影响,以及未来企业可能倾向于更透明、特定行业定制的LLM以保证代码质量和合规性。
摘要由CSDN通过智能技术生成

Generative AI Rewrites the Rules on Coding Assistance for Qt Developers

Generative AI为Qt开发者重写编码辅助规则

October 18, 2023 by Peter Schneider | Comments

​2023年10月18日Peter Schneider |评论

In 2023, the way software is developed has changed. Forever. There is no turning back.

2023年,软件的开发方式发生了变化。永远没有回头路。

Traditionally, software developers asked their team colleagues or searched on Internet resources, such as Stack Overflow, when they needed help to create new code or learn a new programming language.

传统上,软件开发人员在需要帮助创建新代码或学习新编程语言时,会询问团队同事或搜索互联网资源,如Stack Overflow。

Already in the past, Integrated Development Environments (IDE) could provide auto-complete functionality for code and code snippets such as elements andproperties. However, coding assistants powered by Generative AI have taken auto-completing of code to a whole new level creating entire functions, test cases, and code documentation. Furthermore, coding assistants such as GitHub Copilot create new code on request through natural language prompts entered as comments in the code.

在过去,集成开发环境(IDE)可以为代码和代码片段(如元素和属性)提供自动完成的功能。然而,由Generative AI提供支持的编码助手已经将代码的自动完成提升到了一个全新的水平,创建了整个功能、测试用例和代码文档。此外,GitHub Copilot等编码助手通过在代码中输入注释的自然语言提示,根据请求创建新代码。

v3

Illustration: Code generation in Qt Creator through GitHub Copilot integration

示例:通过GitHub Copilot集成在Qt Creator中生成代码

These game-changing capabilities reduce the time developers spend on boilerplate and repetitive code, but also how they get help. Peer-to-peer support services such as Stack Overflow can feel the change.

这些改变游戏规则的功能减少了开发人员在样板和重复代码上花费的时间,也减少了他们获得帮助的方式。像Stack Overflow这样的对等支持服务可以感受到这种变化。

With the release of GitHub Copilot and ChatGPT, developers are less often looking for content on Stack Overflow in 2023. In August 2023, Stack Overflow wrote: “…in April of this year, we saw an above average traffic decrease (~14%), which we can likely attribute to developers trying GPT-4 after it was released in March… This year, overall, we're seeing an average of ~5% less traffic compared to 2022… The surge in generative AI, like the rise of any other disruptive technology, should cause us to reflect, challenge, and question how we measure success.

随着GitHub Copilot和ChatGPT的发布,开发者在2023年不再经常在Stack Overflow上寻找内容。2023年8月,Stack Overflow写道:“……今年4月,我们看到流量下降幅度高于平均水平(约14%),我们可能会将其归因于GPT-4在3月份发布后的开发者尝试……今年,总体而言,我们看到的流量比2022年平均减少了约5%……生成性人工智能的激增,就像任何其他颠覆性技术的兴起一样,应该会让我们反思、挑战和质疑我们如何衡量成功。”

Nevertheless, coding assistants will evolve significantly in the coming years. Let me highlight three things that will influence coding assistants in the future.

尽管如此,编码助手在未来几年将有显著的发展。让我强调三件事,这三件事将在未来影响编码助手。

Regional Regulations for Generative AI are forming

生成人工智能的区域法规正在形成

Which suggestions coding assistants are allowed to make will change due to regulatory activities. While there doesn't seem to be a global regulative framework for Generative AI on the horizon, we are likely to end up with regional legal frameworks for data privacy, such as the European General Data Protection Regulation(GDPR), the US Privacy Act, or China's Personal Information Protection Law (PIPL).

允许编码助理提出哪些建议将因监管活动而发生变化。虽然目前似乎还没有一个针对Generative AI的全球监管框架,但我们很可能最终会有针对数据隐私的区域性法律框架,如《欧洲通用数据保护条例》(GDPR)、《美国隐私法》或《中国个人信息保护法》(PIPL)。

Regulatory Status AI

Source: State of AI 2023 report by Othmane Sebbouh, Corina Gurau and Alex Chalmers

来源:Othmane Sebbouh、Corina Gurau和Alex Chalmers的《2023年人工智能状况》报告

Once the regulations on Generative AI are decided in major economic areas such as Europe and the US, then coding assistants are likely to adopt those through more transparency on which training data has been used and with tighter guardrails on which code is being suggested.

一旦欧洲和美国等主要经济领域决定了对Generative AI的规定,那么编码助理很可能会通过更透明地使用训练数据,并对建议的代码设置更严格的护栏来采用这些规定。

Industry-specific Large Language Models are emerging

特定于行业的大型语言模型正在出现

I believe that we will see industry-specific LLMs emerge during 2024 that power code generation for professional developers in the enterprise segment. Why? Because industry-specific LLMs can focus on doing specific things very well instead of being good at everything.

我相信,我们将在2024年看到特定行业的LLM出现,为企业领域的专业开发人员提供代码生成能力。为什么?因为特定行业的LLM可以专注于把特定的事情做得很好,而不是什么都擅长。

The most known coding assistant, the GitHub Copilot, is powered by the OpenAI Codex LLM. The OpenAI Codex is a descendant of OpenAI's GPT-3 model, optimized for code generation use cases. The GPT3 LLM was released in 2020, meaning that the code suggestions are based on training data that is three years old or more. While wine might improve over time, three years in software development are huge. Because my job is to make software development for cross-platform applications easier, I am quite worried about the relevance of training data used for the OpenAI Codex LLM. When the OpenAI Codex LLM was trained, the Qt 6.2 LTS release - not to mention the Qt 6.5 LTS release - didn’t exist. This is obvious whenever developers prompt to generate code for modules in the Qt 6 release series. I guess – without having any insider knowledge - that OpenAI will publish a newer version of the OpenAI Codex during 2024, but will it know the Qt 6.8 LTS release? I doubt it.

最著名的编码助手GitHub Copilot由OpenAI Codex LLM提供支持。OpenAI Codex是OpenAI的GPT-3模型的后代,针对代码生成用例进行了优化。GPT3 LLM于2020年发布,这意味着代码建议基于三年或三年以上的训练数据。虽然葡萄酒可能会随着时间的推移而改进,但三年的软件开发是巨大的。因为我的工作是让跨平台应用程序的软件开发更容易,所以我非常担心用于OpenAI Codex LLM的训练数据的相关性。当OpenAI Codex LLM被训练时,Qt 6.2 LTS版本——更不用说Qt 6.5 LTS版本了——并不存在。每当开发人员提示为Qt6发布系列中的模块生成代码时,这一点就显而易见了。我想,在没有任何内幕消息的情况下,OpenAI将在2024年发布一个新版本的OpenAI Codex,但它会知道Qt 6.8 LTS的发布吗?我对此表示怀疑。

Large Language Models are costly to train due to the massive data set. Therefore, the training occurs less frequently than innovations in UI and application frameworks. There are ways to complement the knowledge of LLMs, such as Retrieval Augmented Generation tapping into fresh information resources. Still, these don't change the fundamental issue that the deep learning of LLMs is only substituted with fresh knowledge from dedicated resources. The relevance of training data is one reason I believe smaller and industry-specific LLMs will emerge. Industry-specific LLMs require less training data than general-purpose LLMs and, therefore, are easier and cheaper to keep up to date.

由于庞大的数据集,大型语言模型的训练成本很高。因此,与UI和应用程序框架的创新相比,培训的频率更低。有一些方法可以补充LLM的知识,例如利用新的信息资源进行检索增强生成。尽管如此,这些并没有改变一个根本问题,即LLM的深度学习只能被来自专门资源的新鲜知识所取代。培训数据的相关性是我相信会出现更小的行业特定LLM的原因之一。与通用LLM相比,特定于行业的LLM需要更少的培训数据,因此更容易、更便宜地保持最新。

Transparency will take priority in selecting LLMs for code generation in enterprises

透明度将优先选择LLM用于企业代码生成

Enterprises can only use code when they know where it is from and what licenses are attached to it. Even when coding assistants make a point that code suggestions are created from scratch, it is still possible to contain the same code as the training data. GitHub writes in its FAQs: “GitHub Copilot generates new code in a probabilistic way, and the probability that they produce the same code as a snippet that occurred in training is low.” Is a low probability enough for commercial enterprises? A single copyright infringement might mean software vendors must stop distributing their products. It gets even worse if we consider a digital hardware product without over-the-air software update capabilities. Such a product would need to be recalled in a worst-case scenario. Hence, businesses demand to know where code suggestions come from and whether they infringe on other people's IPRs. Transparency on the origin of training data is one way to solve this issue. While OpenAI does not clearly state what code it has used to train the model, it becomes evident from simply trying it out that it must have scanned Qt framework repositories on GitHub extensively. But whether the code suggestions include “copies” of code with permissive or non-permissive licenses remains a mystery to the developer.

​企业只有在知道代码来自何处以及附加了哪些许可证的情况下才能使用代码。即使编码助理指出代码建议是从头开始创建的,仍然有可能包含与培训数据相同的代码。GitHub在其常见问题解答中写道:“GitHub Copilot以概率的方式生成新代码,并且它们生成与训练中出现的代码片段相同的代码的概率很低。”对于商业企业来说,低概率足够吗?单一的版权侵权可能意味着软件供应商必须停止分销他们的产品。如果我们考虑一个没有空中软件更新功能的数字硬件产品,情况会变得更糟。这种产品在最坏的情况下需要召回。因此,企业需要知道代码建议来自哪里,以及它们是否侵犯了他人的知识产权。训练数据来源的透明度是解决这一问题的一种方法。虽然OpenAI没有明确说明它使用了什么代码来训练模型,但从简单的尝试中可以明显看出,它一定对GitHub上的Qt框架存储库进行了广泛的扫描。但是,代码建议是否包括具有许可或非许可许可的代码的“副本”,对开发人员来说仍然是个谜。

shutterstock_150108239

Coding assistants might offer ways to remove suggestions that equal public code before suggesting them to the developer or use only an LLM based on permissive training data—however, a lot of knowledge of what good code is lost. A coding assistant that has seen little professionally developed code for embedded devices or no code from the Qt 6 series framework repositories is great for developers new to C++, but senior Qt application developers might expect more. Fortunately, GitHub is working on a feature that provides references for suggestions that “resemble public code” (in GitHub repositories) to create more transparency.

编码助手可能会提供一些方法,在向开发人员建议等同于公共代码的建议之前,删除这些建议,或者只使用基于许可训练数据的LLM——然而,丢失了很多关于什么是好代码的知识。对于刚接触C++的开发人员来说,一个几乎看不到专业开发的嵌入式设备代码或没有Qt 6系列框架存储库中的代码的编码助理非常棒,但资深Qt应用程序开发人员可能会期待更多。幸运的是,GitHub正在开发一项功能,该功能为“类似于公共代码”的建议(在GitHub存储库中)提供参考,以创建更多的透明度。

Code generation LLMs, such as the open-source StarCoder LLM, are more transparent. One can check whether repositories from a particular GitHub user have been used to train the model. For example, the StarCoder LLM has used only three Qt framework repositories with MIT licenses but excluded all other libraries.

代码生成LLM,如开源的StarCoder LLM,更加透明。可以检查特定GitHub用户的存储库是否已用于训练模型。例如,StarCoder LLM只使用了三个具有MIT许可证的Qt框架存储库,但排除了所有其他库。

Due to regulatory pressure or company risk management, knowing that the originator of the training code has given permission to be used to train a Large Language Model and under which conditions will play a bigger role in the future. Therefore, enterprises might consciously choose smaller but transparent language models as a foundation for code generation.

由于监管压力或公司风险管理,知道培训代码的创建者已经允许用于培训大型语言模型,并且在这种情况下,未来将发挥更大的作用。因此,企业可能会有意识地选择更小但透明的语言模型作为代码生成的基础。

The Qt Creator IDE has a ready plug-in to the GitHub Copilot supporting a variety of coding assistant use cases. If you want to know more about using GitHub Copilot with the Qt Creator, check for more information here.

​QtCreator IDE为GitHub Copilot提供了一个现成的插件,支持各种编码助理用例。如果你想了解更多关于在Qt Creator中使用GitHub Copilot的信息,请点击此处查看更多信息。

PSC: No generative AI was used writing this blog post...

PSC:写这篇博客文章没有使用生成人工智能。。。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值