Large sequence models for software development activities

here is a good discussion about LLM-based system engineering approach for software engineering automation:

大模型驱动软件开发的一些误区和思考-CSDN博客

It mentioned training LLM by and for not only the results (functional codes) but also the dev. processes. This concept is a bit vague for me, but fortunately the speaker specifically pointed to Google's DIDACT and kindly provided the link:

https://research.google/blog/large-sequence-models-for-software-development-activities/

it's quite impressive and this state-intent-act approach (cheap history) can address the long-term memory required for developing large projects.

Software isn’t created in one dramatic step. It improves bit by bit, one little step at a time — editing, running unit tests, fixing build errors, addressing code reviews, editing some more, appeasing linters, and fixing more errors — until finally it becomes good enough to merge into a code repository. Software engineering isn’t an isolated process, but a dialogue among human developers, code reviewers, bug reporters, software architects and tools, such as compilers, unit tests, linters and static analyzers.

Today we describe DIDACT (​​Dynamic Integrated Developer ACTivity), which is a methodology for training large machine learning (ML) models for software development. The novelty of DIDACT is that it uses the process of software development as the source of training data for the model, rather than just the polished end state of that process, the finished code. By exposing the model to the contexts that developers see as they work, paired with the actions they take in response, the model learns about the dynamics of software development and is more aligned with how developers spend their time. We leverage instrumentation of Google's software development to scale up the quantity and diversity of developer-activity data beyond previous works. Results are extremely promising along two dimensions: usefulness to professional software developers, and as a potential basis for imbuing ML models with general software development skills.

DIDACT is a multi-task model trained on development activities that include editing, debugging, repair, and code review.

We built and deployed internally three DIDACT tools, Comment Resolution (which we recently announced), Build Repair, and Tip Prediction, each integrated at different stages of the development workflow. All three of these tools received enthusiastic feedback from thousands of internal developers. We see this as the ultimate test of usefulness: do professional developers, who are often experts on the code base and who have carefully honed workflows, leverage the tools to improve their productivity?

Perhaps most excitingly, we demonstrate how DIDACT is a first step towards a general-purpose developer-assistance agent. We show that the trained model can be used in a variety of surprising ways, via prompting with prefixes of developer activities, and by chaining together multiple predictions to roll out longer activity trajectories. We believe DIDACT paves a promising path towards developing agents that can generally assist across the software development process.

A treasure trove of data about the software engineering process

Google’s software engineering toolchains store every operation related to code as a log of interactions among tools and developers, and have done so for decades. In principle, one could use this record to replay in detail the key episodes in the “software engineering video” of how Google’s codebase came to be, step-by-step — one code edit, compilation, comment, variable rename, etc., at a time.

Google code lives in a monorepo, a single repository of code for all tools and systems. A software developer typically experiments with code changes in a local copy-on-write workspace managed by a system called Clients in the Cloud (CitC). When the developer is ready to package a set of code changes together for a specific purpose (e.g., fixing a bug), they create a changelist (CL) in Critique, Google’s code-review system. As with other types of code-review systems, the developer engages in a dialog with a peer reviewer about functionality and style. The developer edits their CL to address reviewer comments as the dialog progresses. Eventually, the reviewer declares “LGTM!” (“looks good to me”), and the CL is merged into the code repository.

Of course, in addition to a dialog with the code reviewer, the developer also maintains a “dialog” of sorts with a plethora of other software engineering tools, such as the compiler, the testing framework, linters, static analyzers, fuzzers, etc.

gif: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhdDAeePURbX0syZUQcLifMLS9uMw-ufpwg8jUXQoQMyHa0UNstR_vA7mqan_fjqzlXLuSlwVZItU-dqZne3Bk4Adsb2DLTi__wX0vmfk_JnEjRC_tmE72e7woRNCsZO6znn3OCQsZLZvZrlU55byBsm3oi5ubHBvDizqvz3N83je01r7XtZ6cuvEZZA/w640-h597/image4.gif

An illustration of the intricate web of activities involved in developing software: small actions by the developer, interactions with a code reviewer, and invocations of tools such as compilers.

A multi-task model for software engineering

DIDACT utilizes interactions among engineers and tools to power ML models that assist Google developers, by suggesting or enhancing actions developers take — in context — while pursuing their software-engineering tasks. To do that, we have defined a number of tasks about individual developer activities: repairing a broken build, predicting a code-review comment, addressing a code-review comment, renaming a variable, editing a file, etc. We use a common formalism for each activity: it takes some State (a code file), some Intent (annotations specific to the activity, such as code-review comments or compiler errors), and produces an Action (the operation taken to address the task). This Action is like a mini programming language, and can be extended for newly added activities. It covers things like editing, adding comments, renaming variables, marking up code with errors, etc. We call this language DevScript.

gif: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsd4dFYSNVxcg20eqgJqdr3u9S2hHsBlqOelnix5XZd5zIhrZIemDLaF3CcaMk09Y_9kyDkhAuAq2-STjKDOP1Tb9ShsE91FpNgnLRqOxIzCKrTj2_WaAJUlRxikaLmQK2iv7YfpnQtQeaUeIXNCRE9efYju6KxGBEu4zwDA0vQ6qla6dNvpResnch2w/s16000/DIDACT.gif
The DIDACT model is prompted with a task, code snippets, and annotations related to that task, and produces development actions, e.g., edits or comments.

This state-intent-action formalism enables us to capture many different tasks in a general way. What’s more, DevScript is a concise way to express complex actions, without the need to output the whole state (the original code) as it would be after the action takes place; this makes the model more efficient and more interpretable. For example, a rename might touch a file in dozens of places, but a model can predict a single rename action.

An ML peer programmer

DIDACT does a good job on individual assistive tasks. For example, below we show DIDACT doing code clean-up after functionality is mostly done. It looks at the code along with some final comments by the code reviewer (marked with “human” in the animation), and predicts edits to address those comments (rendered as a diff).

gif: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNMPCHlZwRpo4EKM-HZt7qVII342P6kFgSK18H2sYqftvqmrEC-SZp9FsmgvpZtafgCid3WF905_ooJrA8vEFNs7M4B9Ox5pObqzvCltaAUXouIkiVpLWKRs-VEY0Dhpg11RbH7iJfhdinkNegpK0rOhTPR7rdzleQJzpIErVIZBkPi3WjisexB0Uklw/w640-h543/DIDACT2.gif
Given an initial snippet of code and the comments that a code reviewer attached to that snippet, the Pre-Submit Cleanup task of DIDACT produces edits (insertions and deletions of text) that address those comments.

The multimodal nature of DIDACT also gives rise to some surprising capabilities, reminiscent of behaviors emerging with scale. One such capability is history augmentation, which can be enabled via prompting. Knowing what the developer did recently enables the model to make a better guess about what the developer should do next.

gif: https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjgwd-Dvh_esu_26f2rO6DKwMpGyDF0wKHzasGuD3RhXNJiJR5cM613KLSuabNp5hvKV9dF2vzHd7XkUOKD4d7l1cOyV2ovYwxWLECmudUKDk5bmVNOg5eImsqUJbkUkPWk7yOiCogafub7eHb9B3cytjGInIICcMoWFrTlsgqfN3B9IvHFPbWbwfIkZQ/s16000/CitCMonster.gif
An illustration of history-augmented code completion in action.

A powerful such task exemplifying this capability->A powerful capability such task exemplifies is history-augmented code completion. In the figure below, the developer adds a new function parameter (1), and moves the cursor into the documentation (2). Conditioned on the history of developer edits and the cursor position, the model completes the line (3) by correctly predicting the docstring entry for the new parameter.

gif:https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_ao8R6Rv9t0sz_O27BPC0Wy3dpbllP101ovPL1yklPN4SqJu8B94EZyhKumBZLsbmuiLdNVlA92wmOgnBQ-nglk2FcNRc4B2J2hFsR3x0Zy2XWcDGylowaxI2hDJH2cAyIzAEcZqCeg8PsLWCNkKzhr-jnMPr4_aGjjZguCogRalV2582jqshece5bA/s16000/DIDACT4.gif
An illustration of edit prediction, over multiple chained iterations.

In an even more powerful history-augmented task, edit prediction, the model can choose where to edit next in a fashion that is historically consistentIf the developer deletes a function parameter (1), the model can use history to correctly predict an update to the docstring (2) that removes the deleted parameter (without the human developer manually placing the cursor there) and to update a statement in the function (3) in a syntactically (and — arguably — semantically) correct way. With history, the model can unambiguously decide how to continue the “editing video” correctly. Without history, the model wouldn’t know whether the missing function parameter is intentional (because the developer is in the process of a longer edit to remove it) or accidental (in which case the model should re-add it to fix the problem).

The model can go even further. For example, we started with a blank file and asked the model to successively predict what edits would come next until it had written a full code file. The astonishing part is that the model developed code in a step-by-step way that would seem natural to a developer: It started by first creating a fully working skeleton with imports, flags, and a basic main function. It then incrementally added new functionality, like reading from a file and writing results, and added functionality to filter out some lines based on a user-provided regular expression, which required changes across the file, like adding new flags.

Conclusion

DIDACT turns Google's software development process into training demonstrations for ML developer assistants, and uses those demonstrations to train models that construct code in a step-by-step fashion, interactively with tools and code reviewers. These innovations are already powering tools enjoyed by Google developers every day. The DIDACT approach complements the great strides taken by large language models at Google and elsewhere, towards technologies that ease toil, improve productivity, and enhance the quality of work of software engineers.

Acknowledgements

This work is the result of a multi-year collaboration among Google Research, Google Core Systems and Experiences, and DeepMind. We would like to acknowledge our colleagues Jacob Austin, Pascal Lamblin, Pierre-Antoine Manzagol, and Daniel Zheng, who join us as the key drivers of this project. This work could not have happened without the significant and sustained contributions of our partners at Alphabet (Peter Choy, Henryk Michalewski, Subhodeep Moitra, Malgorzata Salawa, Vaibhav Tulsyan, and Manushree Vijayvergiya), as well as the many people who collected data, identified tasks, built products, strategized, evangelized, and helped us execute on the many facets of this agenda (Ankur Agarwal, Paige Bailey, Marc Brockschmidt, Rodrigo Damazio Bovendorp, Satish Chandra, Savinee Dancs, Denis Davydenko, Matt Frazier, Alexander Frömmgen, Nimesh Ghelani, Chris Gorgolewski, Chenjie Gu, Vincent Hellendoorn, Franjo Ivančić, Marko Ivanković, Emily Johnston, Luka Kalinovcic, Lera Kharatyan, Jessica Ko, Markus Kusano, Kathy Nix, Christian Perez, Sara Qu, Marc Rasi, Marcus Revaj, Ballie Sandhu, Michael Sloan, Tom Small, Gabriela Surita, Maxim Tabachnyk, Stephanie Tang, David Tattersall, Sara Toth, Kevin Villela, Sara Wiltberger, and Donald Duo Zhao) and our extremely supportive leadership (Martín Abadi, Joelle Barral, Jeff Dean, Madhura Dudhgaonkar, Douglas Eck, Zoubin Ghahramani, Hugo Larochelle, Chandu Thekkath, and Niranjan Tulpule). Thank you!

### 回答1: “invalid byte sequence for encoding” 的意思是编码中存在无效的字节序列。这通常是由于使用了不兼容的编码方式或者数据中包含了不支持的字符造成的。解决方法是使用正确的编码方式或者对数据进行清洗和转换。 ### 回答2: "invalid byte sequence for encoding" 是一个编码错误。这个错误通常出现在字符编码处理时,表示输入字符无法正确地被编码为所指定的编码格式。 在计算机中,字符通常被以数字的形式表示和处理。多种编码方案被用于给字符分配唯一的数字标识,这样计算机可以识别和处理这些字符。然而,有时候输入的字符包含不能被当前编码方案表示的字节序列,这就导致了"invalid byte sequence for encoding"错误。 这个错误通常出现在处理非ASCII字符或者多字节字符时。比如,如果使用UTF-8编码方式处理一段包含非UTF-8字符的文本数据,就有可能出现这个错误。同样地,如果将一个以UTF-8编码的文本以ASCII编码方式进行处理,也可能引发这个错误。 要解决这个问题,通常需要确保输入的字符与所选择的编码方式相匹配。可以尝试更换编码方式或者使用更高级的编码方案来处理特殊字符。另外,也需要确保在编码和解码过程中使用正确的编码方法,以便正确地转换字符。 总之,"invalid byte sequence for encoding"错误意味着输入的字节序列无法被正确编码,通常需要检查数据的编码方式或者处理方法,以确保正确处理字符。 ### 回答3: "Invalid byte sequence for encoding"是一个数据库错误,通常出现在从一个字符编码转换为另一个字符编码时。这个错误意味着在源字符编码中存在一个或多个无效字节序列。 造成这个错误的原因可能有多种,例如: 1. 输入的数据包含了不兼容的字符或特殊字符,无法在目标编码中正确表示。 2. 数据库连接字符集配置不正确,导致无法正确转换字符编码。 3. 数据库的字符编码与应用程序或数据源的字符编码不匹配。 解决这个问题可以尝试以下方法: 1. 检查输入数据是否包含特殊字符或不兼容的字符,可以使用字符转义或过滤来处理。 2. 检查数据库连接字符集配置是否正确,确保数据库连接使用正确的字符编码。 3. 检查数据库的字符编码设置,与应用程序或数据源的字符编码设置是否匹配。 在处理字符编码时,建议使用标准的UTF-8字符编码,因为它可以支持世界上几乎所有的字符。确保所有的输入数据和数据库连接都使用相同的字符编码,以避免"invalid byte sequence for encoding"错误的发生。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值