feynman1999_AI Feynman 2.0：从数据中学习回归方程

最新推荐文章于 2024-08-21 09:14:18 发布

weixin_26632369

最新推荐文章于 2024-08-21 09:14:18 发布

阅读量1.4k

点赞数

文章标签： python java 人工智能机器学习大数据

原文链接：https://towardsdatascience.com/ai-feynman-2-0-learning-regression-equations-from-data-3232151bd929

版权

本文介绍了AI Feynman 2.0，这是一个能够从数据中学习并生成回归方程的工具。通过这个系统，可以利用机器学习技术解析复杂的数据模式，提取出数学表达式，从而更好地理解和预测数据行为。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

feynman1999

麻省理工学院MAX TEGMARK实验室的一个新AI图书馆 (A NEW AI LIBRARY FROM MAX TEGMARK’S LAB AT MIT)

Table of Contents

1. Introduction2. Code3. Their Example4. Our Own Easy Example5. Symbolic Regression on Noisy Data

1. 简介 2. 代码 3. 他们的示例 4. 我们自己的简单示例 5. 嘈杂数据的符号回归

1.一个新的符号回归库 (1. A New Symbolic Regression Library)

I recently saw a post on LinkedIn from MIT professor Max Tegmark about a new ML library his lab released. I decided to try it out. The paper is AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity, submitted June 18th, 2020. The first author is Silviu-Marian Udrescu, who was generous enough to hop on a call with me and explain the backstory of this new machine learning library. The library, called AI Feynman 2.0, helps to fit regression formulas to data. More specifically, it helps to fit formulas to data at different levels of complexity (defined in bits). The user can select the operators that the solver will use from sets of operators, and the solver does its thing. Operators are things like exponentiation, cos, arctan, and so on.

最近，我在麻省理工学院教授马克斯·泰格马克(Max Tegmark)的 LinkedIn上看到了他的实验室发布的新ML库的帖子。我决定尝试一下。论文为AI Feynman 2.0：利用图形模块化的帕累托最优符号回归，于2020年6月18日提交。第一作者是Silviu-Marian Udrescu ，他慷慨地希望与我联系，并解释这台新机器的背景知识学习图书馆。该库名为AI Feynman 2.0，有助于将回归公式拟合到数据中。更具体地说，它有助于使公式适合不同复杂程度(以位定义)的数据。用户可以从一组运算符中选择求解器将使用的运算符，然后求解器将执行其操作。运算符是幂运算，cos，反正切运算等。

Symbolic regression is a way of stringing together the user-specified mathematical functions to build an equation for the output “y” that best fits the provided dataset. That provided dataset takes the form of sample points (or observations) for each input variable x0, x1, and so forth, along with the corresponding “y”. Since we don’t want to overfit on the data, we need to limit the allowed complexity of the equation or at least have the ability to solve under a complexity constraint. Unlike a neural network, learning one formula with just a few short expressions in it gives you a highly interpretable model, and can lead to insights that you might not get from a neural network model with millions of weights and biases.

符号回归是一种将用户指定的数学函数串在一起以为输出“ y”建立最适合提供的数据集的方程式的方法。提供的数据集采用每个输入变量x0，x1等的采样点(或观察值)的形式，以及相应的“ y”。由于我们不想对数据进行过度拟合，因此需要限制方程的允许复杂度，或者至少具有在复杂度约束下求解的能力。与神经网络不同，学习一个仅包含几个短表达式的公式可以为您提供一个高度可解释的模型，并且可以带来您可能无法从具有数百万个权重和偏差的神经网络模型中获得的见解。

Why is this interesting? Well, science tends to generate lots of observations (data) that scientists want to generalize into underlying rules. These rules are equations that “fit” the observations. Unlike a “usual” machine learning model, equations of the form y=f(x) are very clear, and they can omit some of the variables in the data that are not needed. In the practicing machine learning engineer’s toolbox, regression trees would be the closest concept I can think of that implements this idea of learning an interpretable model that connects observations to a prediction. Having a new way to try and fit a regression model to