【双语新闻】 AI 安全新闻：计算规模的下一代，按越狱敏感性和机器道德对模型进行排名-CSDN博客

本文链接：https://blog.csdn.net/qq_29883477/article/details/142188483

计算规模的下一代

The Next Generation of Compute Scale

AI开发正处在计算规模大幅扩展的边缘。从芯片制造到电力基础设施的最新发展，都指向一个未来，即人工智能模型可能会超过今天最大的系统。在这篇文章中，我们将审视关键的发展以及它们对人工智能计算未来的影响。
AI development is on the cusp of a dramatic expansion in compute scale. Recent developments across multiple fronts—from chip manufacturing to power infrastructure—point to a future where AI models may dwarf today's largest systems. In this story, we examine key developments and their implications for the future of AI compute.

xAI和特斯拉正在建造大规模人工智能集群。埃隆·马斯克的xAI已经将其孟菲斯超级集群——“巨人”——上线。根据马斯克，该集群拥有100k个英伟达H100，使其成为世界上最大的超级计算机。此外，xAI计划在接下来的几个月内增加50k个H200。作为对比，Meta的Llama 3是在1万6千个H100上进行训练的。
xAI and Tesla are building massive AI clusters. Elon Musk’s xAI has brought its Memphis supercluster—“Colossus”—online. According to Musk, the cluster has 100k Nvidia H100s, making it the largest supercomputer in the world. Moreover, xAI plans to add 50k H200s in the next few months. For comparison, Meta’s Llama 3 was trained on 16k H100s.

与此同时，特斯拉的“特斯拉吉加工厂”正在扩建，用以容纳一台人工智能超级集群。特斯拉的吉加工厂超级计算机预计初始用电130兆瓦，潜在增长至500兆瓦。1兆瓦大约足以为美国的1000个家庭供电，因此这种能耗水平开始与大城市相匹敌。详细信息
Meanwhile, Tesla’s “Gigafactory Texas” is expanding to house an AI supercluster. Tesla's Gigafactory supercomputer is expected to initially draw 130MW, with potential growth to 500MW. One Megawatt is roughly enough to power 1,000 homes in the US, so this level of power consumption begins to match that of a large city.

OpenAI计划全球推动人工智能基础设施。 OpenAI的首席执行官Sam Altman据称担心xAI将比OpenAI更容易获得计算能力。OpenAI使用微软的计算资源，但最近的报道表明OpenAI计划建立自己的基础设施。
OpenAI plans a global AI infrastructure push. CEO of OpenAI Sam Altman is reportedly concerned that xAI will have more access to computing power than OpenAI. OpenAI uses Microsoft’s compute resources, but recent reports have indicated that OpenAI plans its own infrastructure buildout.

根据 Bloomberg，山姆·阿尔特曼正在领导一项大规模的人工智能基础设施建设，首先在美国的几个州展开项目。该计划旨在组建一个全球投资者联盟，为快速发展人工智能所需的基础设施提供资金支持。
According to Bloomberg, Sam Altman is spearheading a massive buildout of AI infrastructure, beginning with projects in several U.S. states. This initiative aims to form a global investor coalition to fund the physical infrastructure necessary for rapid AI development.

这些项目的范围很广，涵盖了数据中心的建设、能源容量的扩大以及半导体制造能力的增长。潜在投资者包括来自加拿大、韩国、日本和阿拉伯联合酋长国的实体。

The scope of these projects is broad, encompassing the construction of data centers, expansion of energy capacity, and growth of semiconductor manufacturing capabilities. Potential investors include entities from Canada, Korea, Japan, and the United Arab Emirates.

这种基础设施推动是与OpenAI的一轮新融资同时进行的，该融资包括苹果和英伟达，可能会推动该公司的估值超过1000亿美元。
This infrastructure push is happening alongside OpenAI's approach towards a new funding round that includes Apple and Nvidia, and could push the company's valuation beyond $100 billion.

这些在OpenAI和xAI的发展并不令人意外，而是代表了更广泛的趋势，即不断增长的计算规模。例如，北达科他州据称被两家不同的公司接洽，希望在该州开发1250亿美元的计算集群。
These developments at OpenAI and xAI are not surprising—rather, they are representative of the broader trend towards ever larger compute scale. For example, North Dakota was reportedly approached by two separate companies about developing $125 billion clusters in the state.

台积电开始在亚利桑那州的工厂进行试制芯片生产，据报道，其产量与台湾工厂相当。这一成功使美国在实现国内半导体生产目标的道路上取得了进展，同时也使台积电有望获得美国政府作为CHIPS和科学法案一部分的66亿美元补助和高达50亿美元贷款。英特尔正在考虑将其晶圆厂业务拆分出来。
TSMC starts production in Arizona, and Intel considers splitting out its foundry business. TSMC began trial chip production in its Arizona facility, and its yields are reportedly on par with facilities in Taiwan. The success put the US on track to meet its targets for domestic semiconductor production, and TSMC on track to receive 5 billion in loans from the US as a part of the CHIPS and Science Act.

图片 1

TSMC的亚利桑那工厂在建设期间。 图片来源.
TSMCs Arizona facility during its construction. Photo source.

图片对英特尔来说更为复杂。根据CHIPS和Science法案，英特尔的晶圆厂业务应该会获得大约85亿美元的资金，但它已经花费数十亿美元进行资格认证，在第二季度，它报告了28亿美元的亏损。
The picture is more complicated for Intel. Intel’s foundry business is supposed to receive approximately $8.5 billion under the CHIPS and Science Act, but it's already spending billions to qualify—in the second quarter, it reported a loss of $2.8 billion.

芯片制造商据称在从CHIPS法案获得资金方面遇到了困难，现在面临战略十字路口。美国芯片厂成为了国家战略重点，投资者可能会寻求英特尔来对冲对TSMC的地缘政治不确定性，考虑到中国声称台湾。然而，英特尔的厂投资正在拖累其本来利润丰厚的微处理器业务。
The chipmaker has reportedly had difficulty receiving funds from the CHIPS act, and now faces a strategic crossroads. A US-based chip foundry is a national strategic priority, and investors might look to Intel to hedge against the geopolitical uncertainty of reliance on TSMC in light of China’s claims to Taiwan. However, Intel's foundry investments are dragging down its otherwise profitable microprocessors business.

英特尔正在考虑分拆其晶圆代工业务作为回应。这一举措可能会使公司重返盈利，同时也可能为台积电设立一个可能的国内竞争对手。splitting out its foundry business
In response, Intel is reportedly considering splitting out its foundry business. The move might return the company to profitability, while at the same time setting up a possible domestic competitor to TSMC.

对模型的易破解性进行排名

Ranking Models by Susceptibility to Jailbreaking

在9月7日，“AI安全与安全”公司Gray Swan启动了一场比赛，旨在越狱LLMs。该比赛涵盖了Anthropic、OpenAI、Google、Meta、Microsoft、Alibaba、Mistral、Cohere和Gray Swan AI等公司的模型。
On September 7th, the “AI safety and security” company Gray Swan kicked off a competition to jailbreak LLMs. The competition includes models from Anthropic, OpenAI, Google, Meta, Microsoft, Alibaba, Mistral, Cohere, and Gray Swan AI.

截至撰写本故事时，比赛仍在进行。它预计在所有型号都被至少一个人越狱（成功提示给定有害输出）后结束。每个型号都已经被越狱，只有Grey Swan的型号还抵抗了一万次手动越狱尝试。比赛的模型排行榜列出了其余型号的比较情况。
As of the writing of this story, the competition is ongoing. It is set to end when all models have been jailbroken (successfully prompted to give a specified harmful output) by at least one person. Every model has been jailbroken except for Grey Swan’s, which have so far resisted over ten thousand manual jailbreaking attempts. The competition’s model leaderboard lists how the rest of the models compare.

这是一个很好的证据，表明使LLMs对恶意使用具有鲁棒性的问题比以前想象的更易处理。特别是，格雷天鹅采用的安全技术，包括“断路器”和其他表征工程技术。
This is good evidence that the problem of making LLMs robust to malicious use is more tractable than previously thought. In particular, the safety techniques employed by Gray Swan, including “circuit breaking” and other representation engineering techniques.

然而，我们对于从这次比赛中能够推断出的内容也存在重要的限制。首先，竞争者一次只能使用一个提示来越狱一个模型。扩展的、多提示的对话很可能会越狱一些可以抵抗单提示攻击的模型。其次，竞争者无法访问模型的权重。开放权重模型容易受到更强形式的对抗攻击，比如微调。
However, there are also important limitations to what we can infer from this competition. First, competitors are allowed only one prompt at a time to jailbreak a model. Extended, multi-prompt conversations will likely jailbreak some models that can resist single-prompt attacks. Second, the competitors do not have access to the model’s weights. Open-weight models are subject to much stronger forms of adversarial attacks, such as fine-tuning.

机器道德

Machine Ethics

合法人工智能。 一个指导人工智能行为的建议是确保人工智能代理遵守现行法律。法律具有几个优势：它可以说是合法形成的（至少在民主国家），经过时间考验，并且在范围上很全面。
Lawful AI. One proposal for guiding AI behavior is to ensure an AI agent adheres to existing law. Law has several advantages: it is arguably legitimately formed (at least in democracies), time-tested, and comprehensive in scope.

然而，法律也有几个缺点：它通常不考虑人工智能。例如，许多刑法要求精神状态和意图，这并不一定适用于人工智能。例如，生物武器公约的实施法规讨论了“故意”帮助恐怖分子；如果一个人工智能给恐怖分子提供生物武器指令，并不一定是故意这样做，AI的开发者也不是，所以没有人会受到惩罚。法律在许多重要问题上也故意保持沉默，因此提供了有限的保护。
However, law also has several disadvantages: it is often written without AIs in mind. For example, much of criminal law requires mental states and intent, which do not necessarily apply to AIs. For example, the implementation act of the bioweapons convention discusses “knowingly” aiding terrorists; if an AI gives bioweapon instructions to a terrorist, it is not necessarily doing so knowingly, and neither do people the AI developers, so nothing gets penalized. Law is also intentionally silent on many important issues, so provides a limited set of guardrails.

公平人工智能。 有益的人工智能理想情况下也应该优先考虑公平性。不公平的偏见可以通过许多方式进入人工智能系统的行为，例如，通过有缺陷的训练数据。人工智能中的偏见是危险的，因为它可能产生反馈循环：基于有缺陷数据训练的人工智能系统可能做出有偏见的决策，然后将其馈送到未来模型中。
Fair AI. Beneficial AIs should also ideally prioritize fairness. Unfair bias can enter the behavior of AI systems in many ways—for example, though through flawed training data. Bias in AIs is hazardous because it can generate feedback loops: AI systems trained on flawed data could make biased decisions that are then fed into future models.

改善人工智能系统的公平性涉及结合技术方法，如对抗性测试，以及社会技术解决方案，如参与式设计，其中所有利益相关者都参与系统的开发。
Improving the fairness of AI systems involves combining technical approaches like adversarial testing and sociotechnical solutions like participatory design, in which all stakeholders are involved in a system’s development.

经济利益的人工智能。 另一个提议是人工智能的行为应该受市场力量引导，因为资本主义激励那些能够促进经济增长的人工智能（思考 e/acc）。然而，尽管经济增长是一个值得追求的目标，但它也存在市场失灵等限制。
Economically beneficial AI. Another proposal is that AI behavior should be guided by market forces, since capitalism incentivizes AIs that increase economic growth (think e/acc). However, while economic growth is a worthy goal, it has limitations like market failures.

道德不确定性。 AI应该能够在道德不确定性或存在冲突道德考量的情况下做出决策。对于道德不确定性有几种潜在的解决方案。
Moral uncertainty. AIs should be able to make decisions under moral uncertainty, or situations in which there are conflicting moral considerations. There are several potential solutions to moral uncertainty.

首先，人工智能可以在牺牲其他一切的代价下使用一个“偏爱的理论”，但是，尽管简单，这可能导致专注和过度自信。人工智能可以最大化选项的可取性和其相应理论的可能性的乘积，但是，虽然这种方法更加平衡，按确信度排列各种理论在本质上是主观的。最后，人工智能可以使用一个“道德议会”，在这个议会中，来自不同理论的假设代表进行辩论并达成 compromise。
First, an AI could use a “favored theory” at the expense of all others, but while simple, this could lead to single-mindedness and overconfidence. An AI could maximize the product of an option’s desirability and how likely its corresponding theory is true, but while this approach is more balanced, ranking theories by credence is inherently subjective. Finally, an AI could use a “moral parliament” in which hypothetical delegates from different theories debate and come to a compromise.

政府
Government

加利福尼亚州立法机构通过了 SB 1047。该法案即将送到纽森州长的办公桌上。

The California Legislature passed SB 1047. The bill is headed to Governor Newsom’s desk.

美国工业和安全局宣布对人工智能开发者实施强制报告要求。

The Bureau of Industry and Security announced mandatory reporting requirements for AI developers.
美国、欧盟和英国签署了关于人工智能使用的首个具有法律约束力的国际条约。
The US, EU, and UK signed the first legally binding international treaty on the use of AI.
北京安全与治理研究所成立。
The Beijing Institute of Safety and Governance launched.
OpenAI 和 Anthropic agree 同意向美国人工智能安全测试研究所提供新模型的早期访问。
OpenAI and Anthropic agree to provide the US AISI early access to new models.

技术
Technology

OpenAI据报道向国家安全官员展示了“Strawberry”，并利用这一突破来帮助培训其下一代旗舰系统“Orion”。据报道，“Strawberry”将在接下来的两周内发布。
OpenAI has reportedly demonstrated “Strawberry” to national security officials, and is using the breakthrough to help train its next flagship system, “Orion.” Strawberry is reported to be released within the next two weeks.
Ilya Sutskever的三个月大的人工智能公司Safe Superintelligence（SSI）已经筹集了10亿美元现金，据报道估值50亿美元。已筹集
Ilya Sutskever’s three-month-old AI company, Safe Superintelligence (SSI), has raised 5 billion.
Sakana AI 在 A 轮融资中筹集了 1 亿美元。
Sakana AI raised $100 million in a Series A funding round.
亚马逊首席执行官Andy Jassy声称，该公司的人工智能软件助手已经节省了4500名开发者的工作年限。
Amazon CEO Andy Jassy claims that the company’s AI software assistant has saved 4,500 developer-years of work.

彭博社报道了人工智能对菲律宾外包产业的影响。

Bloomberg reported on the effects that AI is having on the Philippines' outsourcing industry.
AI 开发者 Magic 训练模型以在多达1亿标记的情境下进行推理。
AI developer Magic trained models to reason on up to 100 million tokens.

CAIS发布了一款预测机器人的演示，以提高人工智能预测技术的认知度并增加其采用率。

To raise awareness of advances in AI forecasting technology and increase its rate of adoption, CAIS released a demo of a forecasting bot.

意见
Opinion

一些专家还认为SB 1047可能会加强欧盟人工智能监管。

Some experts also argue that SB 1047 could enhance EU AI regulation.
一大批学者在支持SB 1047的一封信上签了名。前沿人工智能实验室的100多名员工也这样做了。其他团体，比如SAG AFTRA（劳工联盟），也支持SB 1047。
A long list of academics signed a letter in support of SB 1047. So did over 100 employees of frontier AI labs. Other groups like SAG AFTRA (labor union) also endorsed SB 1047.

原文参考：https://www.lesswrong.com/posts/L4ZG6Tce75sDNxWiq/ai-safety-newsletter-41-the-next-generation-of-compute-scale