人工智能的大统一理论? | 智源大会-「人工智能的数理基础」专题论坛

当下人工智能面临着可计算性、可解释性、泛化性、稳定性等诸多重大理论挑战,如何打破基于计算机实验和认知神经科学的人工智能传统建模范式,建立起以数学与统计理论为第一原理的新一代人工智能方法论,依然是待开垦的处女地,也是数学家们不可多得的机遇。

专题论坛:AI系

议程 


嘉宾介绍


论坛主席-张平文

张平文,教授,中国科学院院士,发展中国家科学院院士,北京大学副校长 ,北京大学科学与工程计算中心主任,大数据分析与应用技术国家工程实验室主任。研究方向:复杂流体的多尺度建模与计算,移动网格方向与应用,多尺度分析与计算。

报告嘉宾-苏炜杰

苏炜杰,现任宾夕法尼亚大学沃顿商学院助理教授。2011年本科毕业于北京大学数学科学学院,2016年在斯坦福大学获得统计学博士学位。苏炜杰曾获得2016年Theodore Anderson数理统计博士论文奖,2019年美国自然科学基金颁发的CAREER Award,和2020年斯隆研究奖。研究兴趣包括机器学习特别是深度学习的理论基础、隐私数据分析、高维统计,以及优化理论。

报告主题:Local Elasticity: A Phenomenological Approach Toward Understanding Deep Learning

报告摘要:Motivated by the iterative nature of training neural networks, we ask: If the weights of a neural network are updated using the induced gradient on an image of a tiger, how does this update impact the prediction of the neural network at another image (say, an image of another tiger, a cat, or a plane)? To address this question, I will introduce a phenomenon termed local elasticity. Roughly speaking, our experiments show that modern deep neural networks are locally elastic in the sense that the change in prediction is likely to be most significant at another tiger and least significant at a plane, at late stages of the training process. I will illustrate some implications of local elasticity by relating it to the neural tangent kernel and improving on the generalization bound for uniform stability. Moreover, I will introduce a phenomenological model for simulating neural networks, which suggests that local elasticity may result from feature sharing between semantically related images and the hierarchical representations of high-level features. Finally, I will offer a local-elasticity-focused agenda for future research toward a theoretical foundation for deep learning.


报告嘉宾-梅松

梅松博士于2020年起就任加州大学伯克利分校统计系助理教授;此前于2014年获得北京大学数学系学士学位,2020年获得斯坦福大学计算数学与工程系博士学位。梅松博士的研究将统计物理理论应用于理论机器学习和高维概率统计,特别是提出了双层神经网络优化的平均场理论,并发展了随机特征方法与神经切向核模型的泛化理论。

报告主题:The efficiency of kernel methods on structured datasets

报告摘要:Inspired by the proposal of tangent kernels of neural networks (NNs), a recent research line aims to design kernels with a better generalization performance on standard datasets. Indeed, a few recent works showed that certain kernel machines perform as well as NNs on certain datasets, despite their separations in specific cases implied by theoretical results. Furthermore, it was shown that the induced kernels of convolutional neural networks perform much better than any former handcrafted kernels. These empirical results pose a theoretical challenge to understanding the performance gaps in kernel machines and NNs in different scenarios. In this talk, we show that data structures play an essential role in inducing these performance gaps. We consider a few natural data structures, and study their effects on the performance of these learning methods. Based on a fine-grained high dimensional asymptotics framework of analyzing random features models and kernel machines, we show the following: 1) If the feature vectors are nearly isotropic, kernel methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation; 2) If the feature vectors display the same low-dimensional structure as the target function (the spiked covariates model), this curse of dimensionality becomes milder, and the performance gap between kernel methods and NNs become smaller; 3) On datasets that display some invariance structure (e.g., image dataset), there is a quantitative performance gain of using invariant kernels (e.g., convolutional kernels) over inner product kernels. Beyond explaining the performance gaps, these theoretical results can further provide some intuitions towards designing kernel methods with better performance.

报告嘉宾-Katya Scheinberg

Katya Scheinberg,康奈尔大学教授。在加入康奈尔大学之前,她是里海大学工业和系统工程系Harvey E. Wagner讲座教授。她本科就读于莫斯科大学,并在哥伦比亚大学获得博士学位。2010年加入Lehigh之前,她曾在IBM T.J. Watson研究中心担任研究人员超过10年。Katya的主要研究领域是针对连续优化中的各种问题开发实用算法(及其理论分析),如凸优化、无导数优化、机器学习、二次规划等。2015年,她与安迪·康恩(Andy Conn)和路易斯·维森特(Luis Vicente)共同获得了拉格朗日奖(Lagrange Prize)。2019年,她被Informs Optimization Society授予Farkas奖。

报告主题:Stochastic adaptive optimization and martingales

报告摘要:We will present a general framework which models standard methods such as line search and trust region methods in a stochastic setting via analysis of stochastic processes and their stopping time. We will show how this framework models some variants of stochastic line search and how analyzing the stopping time gives us a high probability bound (and bound in expectation) on the complexity of the line search.  This framework provides strong convergence analysis under weaker conditions than alternative approaches in the literature.


报告嘉宾-马超 

马超,2016年本科毕业于北京大学数学科学学院,后赴普林斯顿大学应用数学系攻读博士,并于2020年取得博士学位,导师为鄂维南教授。此后于斯坦福大学任Szego助理教授至今。马超的主要研究方向为神经网络模型的数学基础,尤其是其优化与泛化表现。此外,他也致力于深度学习方法在科学计算问题中的应用。

报告主题:The Sobolev regularization effect from the linear stability of sgd(随机梯度下降法的线性稳定性导致的Sobolev正则化效应)

报告摘要:我们探索了神经网络模型的输入层中输入数据与参数矩阵的乘法关系,并由此关系建立了模型函数关于参数的梯度和关于数据的梯度间的联系。利用此联系,我们证明最小值的平缓程度对模型函数(关于数据)的Sobolev半范数具有正则化作用,此结果在理论上解释了为何平缓的最优值通常具有较好的泛化表现。随后,我们将最优值的平缓程度推广到最优值处不同训练数据对应目标函数的梯度的高阶矩,并利用最优值处的线性稳定理论证明随机梯度下降法对这些矩具有控制作用,并进一步由前述乘法关系推导出随机梯度下降法对模型函数的Sobolev半范数的正则化效应。最终,基于以上Sobolev正则化效应以及关于数据分布的必要假设我们为随机梯度下降法推导出一组泛化误差估计。


2021智源大会群英荟萃,已确认出席嘉宾包括图灵奖得主Yoshua Bengio、David Patterson,以及人工智能各领域多位世界级的专家。本届大会采用线上+线下模式,线上报名已开启,线下参会和官网即将公开。

人工智能领域不容错过的内行分享,等你一同见证!

扫码加入「AI数理基础」论坛交流群,参与相关话题讨论


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值