C4-Numerical Computation

最新推荐文章于 2021-07-02 14:21:43 发布
issory
最新推荐文章于 2021-07-02 14:21:43 发布
阅读量208
点赞数
分类专栏： Deep Learning Note of Book
本文链接：https://blog.csdn.net/u011310345/article/details/83215965
版权
Deep Learning 同时被 2 个专栏收录
7 篇文章 0 订阅
订阅专栏
Note of Book
7 篇文章 0 订阅
订阅专栏
                    
                    Overflow and Underflow 
underflow 
  occurs when numbers near zero are rounded to zero
 
Overflow 
  occurs when numbers with large magnitude are approximated as  
        
            ∞ 
           
           \infty 
          
       ∞ or  
        
            − 
           
            ∞ 
           
           -\infty 
          
       −∞
 
Softmax function 
  be stabilized against underﬂow and overﬂow
used to predict theprobabilities associated with a multinoulli distribution
 
            softmax 
           
            ( 
           
             x 
            
             ⃗ 
            
             ) 
            
             i 
            
            = 
           
              exp 
             
              ⁡ 
             
              ( 
             
               x 
              
               i 
              
              ) 
             
               ∑ 
              
                j 
               
                = 
               
                1 
               
               n 
              
              exp 
             
              ⁡ 
             
              ( 
             
               x 
              
               j 
              
              ) 
             
           \text{softmax}(\vec{x})_i=\frac{\exp(x_i)}{\sum_{j=1}^n\exp(x_j)} 
          
       softmax(x 
                
               )i​=∑j=1n​exp(xj​)exp(xi​)​
 
            softmax 
           
            ( 
           
             z 
            
             ⃗ 
            
            ) 
           
           \text{softmax}(\vec{z}) 
          
       softmax(z 
                
               ), where  
        
             z 
            
             ⃗ 
            
            = 
           
             x 
            
             ⃗ 
            
            − 
           
             max 
            
             ⁡ 
            
             i 
            
             x 
            
             i 
            
           \vec{z}=\vec{x}-\max_ix_i 
          
       z 
                
               =x 
                
               −maxi​xi​==> solve the difficulties that the results are undefined.
 
Poor conditioning 
how rapidly a function changes with respect to small changes in its inputs.
For  
      
          f 
         
          ( 
         
           x 
          
           ⃗ 
          
          ) 
         
          = 
         
           A 
          
            − 
           
            1 
           
           x 
          
           ⃗ 
          
         f(\vec{x})=\mathbf{A}^{-1}\vec{x} 
        
     f(x 
              
             )=A−1x 
              
             , and  
      
          A 
         
          ∈ 
         
           R 
          
            n 
           
            × 
           
            n 
           
         \mathbf{A}\in\mathbb{R}^{n\times n} 
        
     A∈Rn×n, we get condition number  
      
           max 
          
           ⁡ 
          
            i 
           
            , 
           
            j 
           
          ∣ 
         
            λ 
           
            i 
           
            λ 
           
            j 
           
          ∣ 
         
         \max_{i,j}|\frac{\lambda_i}{\lambda_j}| 
        
     maxi,j​∣λj​λi​​∣.
namely, the ratio of the magnitude of the largest and smallest eigenvalue
the number is larger, matrix inversion is more sensitive to error in the input.
 
Gradient-Based Optimization 
objective function or criterion: The function we want to minimize or maximize (For minimized problem, we can call it cost function,loss function, or error function)
 
            x 
           
            ⃗ 
           
           ∗ 
          
          = 
         
          arg 
         
          ⁡ 
         
          min 
         
          ⁡ 
         
          f 
         
          ( 
         
           x 
          
           ⃗ 
          
          ) 
         
         \vec{x}^*=\arg\min f(\vec{x}) 
        
     x 
               
              ∗=argminf(x 
              
             )
gradient descent:  
      
          f 
         
          ( 
         
          x 
         
          + 
         
          ϵ 
         
          ) 
         
          ≈ 
         
          f 
         
          ( 
         
          x 
         
          ) 
         
          + 
         
          ϵ 
         
           f 
          
           ′ 
          
          ( 
         
          x 
         
          ) 
         
         f(x+\epsilon)\approx f(x)+\epsilon f&#x27;(x) 
        
     f(x+ϵ)≈f(x)+ϵf′(x)
 
critical points or stationary points:  
      
           f 
          
           ′ 
          
          ( 
         
          x 
         
          ) 
         
          = 
         
          0 
         
         f&#x27;(x)=0 
        
     f′(x)=0
saddle points 
  local minimum: a point where  
        
            f 
           
            ( 
           
            x 
           
            ) 
           
           f(x) 
          
       f(x) is lower than at all neighboring points
local maximum: a point where  
        
            f 
           
            ( 
           
            x 
           
            ) 
           
           f(x) 
          
       f(x) is higher than at all neighboring points
 
global minimum: A point that obtains the absolute lowest value of  
        
            f 
           
            ( 
           
            x 
           
            ) 
           
           f(x) 
          
       f(x)
 
partial derivatives  
      
           ∂ 
          
            ∂ 
           
             x 
            
             i 
            
          f 
         
          ( 
         
           x 
          
           ⃗ 
          
          ) 
         
         \frac{\partial}{\partial x_i}f(\vec{x}) 
        
     ∂xi​∂​f(x 
              
             ), measures how  
      
          f 
         
         f 
        
     f changes as only the variable  
      
           x 
          
           i 
          
         x_i 
        
     xi​ increases at point  
      
           x 
          
           ⃗ 
          
         \vec{x} 
        
     x 
              
             .
the gradient of  
      
          f 
         
         f 
        
     f is denoted as  
      
           ∇ 
          
            x 
           
            ⃗ 
           
          f 
         
          ( 
         
           x 
          
           ⃗ 
          
          ) 
         
         \nabla_{\vec{x}}f(\vec{x}) 
        
     ∇x 
                      
                     ​f(x 
              
             ), which is a vector containing all of the partial derivatives with respect to  
      
           x 
          
           i 
          
         x_i 
        
     xi​
directional derivative in direction  
      
           u 
          
           ⃗ 
          
         \vec{u} 
        
     u 
              
              is the slope of the function in direction  
      
          u 
         
         u 
        
     u. 
  the derivative of  
        
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            + 
           
            α 
           
             u 
            
             ⃗ 
            
            ) 
           
           f(\vec{x}+\alpha\vec{u}) 
          
       f(x 
                
               +αu 
                
               ) with respect to  
        
            α 
           
           \alpha 
          
       α, evaluated at  
        
            α 
           
            = 
           
            0 
           
           \alpha=0 
          
       α=0
namely,  
        
             ∂ 
            
              ∂ 
             
              α 
             
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            + 
           
            α 
           
             u 
            
             ⃗ 
            
            ) 
           
           \frac{\partial}{\partial\alpha}f(\vec{x}+\alpha\vec{u}) 
          
       ∂α∂​f(x 
                
               +αu 
                
               ) evaluates to  
        
              u 
             
              ⃗ 
             
             T 
            
             ∇ 
            
              x 
             
              ⃗ 
             
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            ) 
           
           \vec{u}^T\nabla_{\vec{x}}f(\vec{x}) 
          
       u 
                 
                T∇x 
                        
                       ​f(x 
                
               ), when  
        
            α 
           
            = 
           
            0 
           
           \alpha=0 
          
       α=0
To minimize  
        
            f 
           
           f 
          
       f, we solve the equation: 
     
               min 
              
               ⁡ 
              
                 u 
                
                 ⃗ 
                
                , 
               
                  u 
                 
                  ⃗ 
                 
                 T 
                
                 u 
                
                 ⃗ 
                
                = 
               
                1 
               
                u 
               
                ⃗ 
               
               T 
              
               ∇ 
              
                x 
               
                ⃗ 
               
              f 
             
              ( 
             
               x 
              
               ⃗ 
              
              ) 
             
              = 
             
               min 
              
               ⁡ 
              
                 u 
                
                 ⃗ 
                
                , 
               
                  u 
                 
                  ⃗ 
                 
                 T 
                
                 u 
                
                 ⃗ 
                
                = 
               
                1 
               
              ∥ 
             
               u 
              
               ⃗ 
              
               ∥ 
              
               2 
              
              ∥ 
             
               ∇ 
              
                x 
               
                ⃗ 
               
              f 
             
              ( 
             
               x 
              
               ⃗ 
              
              ) 
             
               ∥ 
              
               2 
              
              cos 
             
              ⁡ 
             
              θ 
             
             \min\limits_{\vec{u},\vec{u}^T\vec{u}=1}\vec{u}^T\nabla_{\vec{x}}f(\vec{x})=\min\limits_{\vec{u},\vec{u}^T\vec{u}=1}\|\vec{u}\|_2\|\nabla_{\vec{x}}f(\vec{x})\|_2\cos{\theta} 
            
         u 
                         
                        ,u 
                          
                         Tu 
                         
                        =1min​u 
                   
                  T∇x 
                          
                         ​f(x 
                  
                 )=u 
                         
                        ,u 
                          
                         Tu 
                         
                        =1min​∥u 
                  
                 ∥2​∥∇x 
                          
                         ​f(x 
                  
                 )∥2​cosθ, where  
          
              θ 
             
             \theta 
            
         θ is the angle between  
          
               u 
              
               ⃗ 
              
             \vec{u} 
            
         u 
                  
                  and the gradient.
decrease  
          
              f 
             
             f 
            
         f by moving in the direction of the negative gradient
 
steepest descent or gradient descent 
     
                x 
               
                ⃗ 
               
               ′ 
              
              = 
             
               x 
              
               ⃗ 
              
              − 
             
              ϵ 
             
               ∇ 
              
                x 
               
                ⃗ 
               
              f 
             
              ( 
             
               x 
              
               ⃗ 
              
              ) 
             
             \vec{x}&#x27;=\vec{x}-\epsilon\nabla_{\vec{x}}f(\vec{x}) 
            
         x 
                   
                  ′=x 
                  
                 −ϵ∇x 
                          
                         ​f(x 
                  
                 ), where  
          
              ϵ 
             
             \epsilon 
            
         ϵ is learning rate.
choose  
          
              ϵ 
             
             \epsilon 
            
         ϵ 
      set  
            
                ϵ 
               
               \epsilon 
              
           ϵ to a small constant.
linear search: evaluate  
            
                f 
               
                ( 
               
                 x 
                
                 ⃗ 
                
                − 
               
                ϵ 
               
                ∇ 
               
                 x 
                
                 ⃗ 
                
                f 
               
                ( 
               
                 x 
                
                 ⃗ 
                
                ) 
               
                ) 
               
               f(\vec{x}-\epsilon\nabla\vec{x}f(\vec{x})) 
              
           f(x 
                    
                   −ϵ∇x 
                    
                   f(x 
                    
                   )) for several values of  
            
                ϵ 
               
               \epsilon 
              
           ϵ and choose the one that results in the smallest objective function value
 
Beyond the Gradient: Jacobian and Hessian Matrices 
Jacobian matrix 
  if we have a function  
        
            f 
           
            : 
           
             R 
            
             m 
            
            → 
           
             R 
            
             n 
            
           f:\mathbb{R}^m\rightarrow\mathbb{R}^n 
          
       f:Rm→Rn, then the Jacobian matrix  
        
            J 
           
            ∈ 
           
             R 
            
              m 
             
              × 
             
              n 
             
           J\in\mathbb{R}^{m\times n} 
          
       J∈Rm×n of  
        
            f 
           
           f 
          
       f is defined such that  
        
             J 
            
              i 
             
              , 
             
              j 
             
            = 
           
             ∂ 
            
              ∂ 
             
               x 
              
               j 
              
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
             ) 
            
             i 
            
           J_{i,j}=\frac{\partial}{\partial x_j}f(\vec{x})_i 
          
       Ji,j​=∂xj​∂​f(x 
                
               )i​
 
second derivative: a derivative of a derivative, regard as measuring curvature.
 
Hessian matrix==> 
      
          H 
         
          ( 
         
          f 
         
          ) 
         
          ( 
         
           x 
          
           ⃗ 
          
          ) 
         
         H(f)(\vec{x}) 
        
     H(f)(x 
              
             ) 
   
            H 
           
            ( 
           
            f 
           
            ) 
           
            ( 
           
             x 
            
             ⃗ 
            
             ) 
            
              i 
             
              , 
             
              j 
             
            = 
           
              ∂ 
             
              2 
             
              ∂ 
             
               x 
              
               i 
              
              ∂ 
             
               x 
              
               j 
              
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            ) 
           
           H(f)(\vec{x})_{i,j}=\frac{\partial^2}{\partial x_i\partial x_j}f(\vec{x}) 
          
       H(f)(x 
                
               )i,j​=∂xi​∂xj​∂2​f(x 
                
               )
the Hessian is the Jacobian of the gradient
 
             H 
            
              i 
             
              , 
             
              j 
             
            = 
           
             H 
            
              j 
             
              , 
             
              i 
             
           H_{i,j}=H_{j,i} 
          
       Hi,j​=Hj,i​
Because the Hessian matrix is real and symmetric,we can decompose it into a set of real eigenvalues and an orthogonal basis of eigenvectors.
second-order Taylor series approximation: 
     
              f 
             
              ( 
             
               x 
              
               ⃗ 
              
              ) 
             
              ≈ 
             
              f 
             
              ( 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
              ) 
             
              + 
             
              ( 
             
               x 
              
               ⃗ 
              
              − 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
               ) 
              
               T 
              
               g 
              
               ⃗ 
              
              + 
             
               1 
              
               2 
              
              ( 
             
               x 
              
               ⃗ 
              
              − 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
               ) 
              
               T 
              
              H 
             
              ( 
             
               x 
              
               ⃗ 
              
              − 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
              ) 
             
             f(\vec{x})\approx f(\vec{x}^{(0)})+(\vec{x}-\vec{x}^{(0)})^T\vec{g}+\frac{1}{2}(\vec{x}-\vec{x}^{(0)})^TH(\vec{x}-\vec{x}^{(0)}) 
            
         f(x 
                  
                 )≈f(x 
                   
                  (0))+(x 
                  
                 −x 
                   
                  (0))Tg 
                  
                 ​+21​(x 
                  
                 −x 
                   
                  (0))TH(x 
                  
                 −x 
                   
                  (0)), where  
          
               g 
              
               ⃗ 
              
             \vec{g} 
            
         g 
                  
                 ​ is the gradient and  
          
              H 
             
             H 
            
         H is the Hessian at  
          
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
             \vec{x}^{(0)} 
            
         x 
                   
                  (0).
Use  
          
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
              − 
             
              ϵ 
             
               g 
              
               ⃗ 
              
             \vec{x}^{(0)}-\epsilon\vec{g} 
            
         x 
                   
                  (0)−ϵg 
                  
                 ​, then  
          
              f 
             
              ( 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
              − 
             
              ϵ 
             
               g 
              
               ⃗ 
              
              ) 
             
              ≈ 
             
              f 
             
              ( 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
              ) 
             
              − 
             
              ϵ 
             
                g 
               
                ⃗ 
               
               T 
              
               g 
              
               ⃗ 
              
              + 
             
               1 
              
               2 
              
               ϵ 
              
               2 
              
                g 
               
                ⃗ 
               
               T 
              
              H 
             
               g 
              
               ⃗ 
              
             f(\vec{x}^{(0)}-\epsilon\vec{g})\approx f(\vec{x}^{(0)})-\epsilon\vec{g}^T\vec{g}+\frac{1}{2}\epsilon^2\vec{g}^TH\vec{g} 
            
         f(x 
                   
                  (0)−ϵg 
                  
                 ​)≈f(x 
                   
                  (0))−ϵg 
                   
                  ​Tg 
                  
                 ​+21​ϵ2g 
                   
                  ​THg 
                  
                 ​.
the original value of the function  
          
              f 
             
              ( 
             
                x 
               
                ⃗ 
               
                ( 
               
                0 
               
                ) 
               
              ) 
             
             f(\vec{x}^{(0)}) 
            
         f(x 
                   
                  (0))
the expected improvement due to the slope of the function  
          
              − 
             
              ϵ 
             
                g 
               
                ⃗ 
               
               T 
              
               g 
              
               ⃗ 
              
             -\epsilon\vec{g}^T\vec{g} 
            
         −ϵg 
                   
                  ​Tg 
                  
                 ​
the correction we must apply to account for the curvature of the function  
          
               1 
              
               2 
              
               ϵ 
              
               2 
              
                g 
               
                ⃗ 
               
               T 
              
              H 
             
               g 
              
               ⃗ 
              
             \frac{1}{2}\epsilon^2\vec{g}^TH\vec{g} 
            
         21​ϵ2g 
                   
                  ​THg 
                  
                 ​
when  
          
                g 
               
                ⃗ 
               
               T 
              
              H 
             
               g 
              
               ⃗ 
              
             \vec{g}^TH\vec{g} 
            
         g 
                   
                  ​THg 
                  
                 ​ is positive, we can get $\epsilon*=\frac{\vec{g}T\vec{g}}{\vec{g}^TH\vec{g}} $
critical point:  
          
               f 
              
               ′ 
              
              ( 
             
              x 
             
              ) 
             
              = 
             
              0 
             
             f&#x27;(x)=0 
            
         f′(x)=0 
      if  
            
                 f 
                
                  ′ 
                 
                  ′ 
                 
                ( 
               
                x 
               
                ) 
               
                &gt; 
               
                0 
               
               f&#x27;&#x27;(x)&gt;0 
              
           f′′(x)>0, then  
            
                 f 
                
                 ′ 
                
                ( 
               
                x 
               
                − 
               
                ϵ 
               
                ) 
               
                &lt; 
               
                0 
               
               f&#x27;(x-\epsilon)&lt;0 
              
           f′(x−ϵ)<0 and  
            
                 f 
                
                 ′ 
                
                ( 
               
                x 
               
                + 
               
                ϵ 
               
                ) 
               
                &gt; 
               
                0 
               
               f&#x27;(x+\epsilon)&gt;0 
              
           f′(x+ϵ)>0 for small enough  
            
                ϵ 
               
               \epsilon 
              
           ϵ.
 
local minimum:  
          
               f 
              
               ′ 
              
              ( 
             
              x 
             
              ) 
             
              = 
             
              0 
             
             f&#x27;(x)=0 
            
         f′(x)=0 and  
          
               f 
              
                ′ 
               
                ′ 
               
              ( 
             
              x 
             
              ) 
             
              &gt; 
             
              0 
             
             f&#x27;&#x27;(x)&gt;0 
            
         f′′(x)>0
local maximum:  
          
               f 
              
               ′ 
              
              ( 
             
              x 
             
              ) 
             
              = 
             
              0 
             
             f&#x27;(x)=0 
            
         f′(x)=0 and  
          
               f 
              
                ′ 
               
                ′ 
               
              ( 
             
              x 
             
              ) 
             
              &lt; 
             
              0 
             
             f&#x27;&#x27;(x)&lt;0 
            
         f′′(x)<0
 
Newton’s method 
  based on using a second-order Taylor series
 
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            ) 
           
            ≈ 
           
            f 
           
            ( 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
            ) 
           
            + 
           
            ( 
           
             x 
            
             ⃗ 
            
            − 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
             ) 
            
             T 
            
             ∇ 
            
             x 
            
            f 
           
            ( 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
            ) 
           
            + 
           
             1 
            
             2 
            
            ( 
           
             x 
            
             ⃗ 
            
            − 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
             ) 
            
             T 
            
            H 
           
            ( 
           
            f 
           
            ) 
           
            ( 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
            ) 
           
            ( 
           
             x 
            
             ⃗ 
            
            − 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
            ) 
           
           f(\vec{x})\approx f(\vec{x}^{(0)})+(\vec{x}-\vec{x}^{(0)})^T\nabla_xf(\vec{x}^{(0)})+\frac{1}{2}(\vec{x}-\vec{x}^{(0)})^TH(f)(\vec{x}^{(0)})(\vec{x}-\vec{x}^{(0)}) 
          
       f(x 
                
               )≈f(x 
                 
                (0))+(x 
                
               −x 
                 
                (0))T∇x​f(x 
                 
                (0))+21​(x 
                
               −x 
                 
                (0))TH(f)(x 
                 
                (0))(x 
                
               −x 
                 
                (0))
Solve for the critical point:  
        
              x 
             
              ⃗ 
             
             ∗ 
            
            = 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
            − 
           
            H 
           
            ( 
           
            f 
           
            ) 
           
            ( 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
             ) 
            
              − 
             
              1 
             
             ∇ 
            
             x 
            
            f 
           
            ( 
           
              x 
             
              ⃗ 
             
              ( 
             
              0 
             
              ) 
             
            ) 
           
           \vec{x}^*=\vec{x}^{(0)}-H(f)(\vec{x}^{(0)})^{-1}\nabla_xf(\vec{x}^{(0)}) 
          
       x 
                 
                ∗=x 
                 
                (0)−H(f)(x 
                 
                (0))−1∇x​f(x 
                 
                (0))
Newton’s method is only appropriate when the nearby critical point is a minimum
 
Lipschitz continuous (derivatives) 
  A Lipschitz continuous function is a function  
        
            f 
           
           f 
          
       f whose rate of change is bounded by a Lipschitz constant  
        
            L 
           
           \mathcal{L} 
          
       L:  
        
            ∀ 
           
             x 
            
             ⃗ 
            
            , 
           
            ∀ 
           
             y 
            
             ⃗ 
            
            , 
           
            ∣ 
           
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            ) 
           
            − 
           
            f 
           
            ( 
           
             y 
            
             ⃗ 
            
            ) 
           
            ∣ 
           
            ≤ 
           
            L 
           
            ∥ 
           
             x 
            
             ⃗ 
            
            − 
           
             y 
            
             ⃗ 
            
             ∥ 
            
             2 
            
           \forall\vec{x},\forall\vec{y},|f(\vec{x})-f(\vec{y})|\leq\mathcal{L}\|\vec{x}-\vec{y}\|_2 
          
       ∀x 
                
               ,∀y 
                
               ​,∣f(x 
                
               )−f(y 
                
               ​)∣≤L∥x 
                
               −y 
                
               ​∥2​
weak constraint
 
Convex optimization 
  strong constraint
all of their local minima are necessarily global minima
 
Constraint Optimization 
ﬁnd the maximal or minimal value of  
      
          f 
         
          ( 
         
           x 
          
           ⃗ 
          
          ) 
         
         f(\vec{x}) 
        
     f(x 
              
             ) for values of  
      
           x 
          
           ⃗ 
          
         \vec{x} 
        
     x 
              
              in some set  
      
          S 
         
         \mathbb{S} 
        
     S.
method: 
  modify gradient descent taking the constraint into account
design a diﬀerent, unconstrained optimization problem whose solution can be converted into a solution to the original,constrained optimization problem
Karush–Kuhn–Tucker(KKT) approach (Need more information) 
    generalized Lagrangian or generalized Lagrange function 
       
                S 
               
                = 
               
                { 
               
                 x 
                
                 ⃗ 
                
                ∣ 
               
                ∀ 
               
                i 
               
                , 
               
                 g 
                
                  ( 
                 
                  i 
                 
                  ) 
                 
                ( 
               
                 x 
                
                 ⃗ 
                
                ) 
               
                = 
               
                0 
               
                 and  
               
                ∀ 
               
                j 
               
                , 
               
                 h 
                
                  ( 
                 
                  j 
                 
                  ) 
                 
                ( 
               
                 x 
                
                 ⃗ 
                
                ) 
               
                ≤ 
               
                0 
               
                } 
               
               \mathbb{S}=\{\vec{x}|\forall i,g^{(i)}(\vec{x})=0\text{ and }\forall j,h^{(j)}(\vec{x})\leq0\} 
              
           S={x 
                    
                   ∣∀i,g(i)(x 
                    
                   )=0 and ∀j,h(j)(x 
                    
                   )≤0}
equality constraints  
            
                 g 
                
                  ( 
                 
                  i 
                 
                  ) 
                 
               g^{(i)} 
              
           g(i)
inequality constraints  
            
                 h 
                
                  ( 
                 
                  j 
                 
                  ) 
                 
               h^{(j)} 
              
           h(j)
KKT multipliers:  
            
                 λ 
                
                 i 
                
               \lambda_i 
              
           λi​ and  
            
                 α 
                
                 j 
                
               \alpha_j 
              
           αj​ for each constraint
 
                L 
               
                ( 
               
                 x 
                
                 ⃗ 
                
                , 
               
                 λ 
                
                 ⃗ 
                
                , 
               
                 α 
                
                 ⃗ 
                
                ) 
               
                = 
               
                f 
               
                ( 
               
                 x 
                
                 ⃗ 
                
                ) 
               
                + 
               
                 ∑ 
                
                 i 
                
                 λ 
                
                 i 
                
                 g 
                
                  ( 
                 
                  i 
                 
                  ) 
                 
                ( 
               
                 x 
                
                 ⃗ 
                
                ) 
               
                + 
               
                 ∑ 
                
                 j 
                
                 α 
                
                 j 
                
                 h 
                
                  ( 
                 
                  j 
                 
                  ) 
                 
                ( 
               
                 x 
                
                 ⃗ 
                
                ) 
               
               L(\vec{x},\vec{\lambda},\vec{\alpha})=f(\vec{x})+\sum\limits_i\lambda_ig^{(i)}(\vec{x})+\sum\limits_j\alpha_jh^{(j)}(\vec{x}) 
              
           L(x 
                    
                   ,λ 
                    
                   ,α 
                    
                   )=f(x 
                    
                   )+i∑​λi​g(i)(x 
                    
                   )+j∑​αj​h(j)(x 
                    
                   )
 
             min 
            
             ⁡ 
            
              x 
             
              ⃗ 
             
             max 
            
             ⁡ 
            
              λ 
             
              ⃗ 
             
             max 
            
             ⁡ 
            
               α 
              
               ⃗ 
              
              , 
             
               α 
              
               ⃗ 
              
              ≥ 
             
              0 
             
            L 
           
            ( 
           
             x 
            
             ⃗ 
            
            , 
           
             λ 
            
             ⃗ 
            
            , 
           
             α 
            
             ⃗ 
            
            ) 
           
            ⇔ 
           
             min 
            
             ⁡ 
            
             S 
            
            f 
           
            ( 
           
             x 
            
             ⃗ 
            
            ) 
           
           \min\limits_{\vec{x}}\max\limits_{\vec{\lambda}}\max\limits_{\vec{\alpha},\vec{\alpha}\geq0}L(\vec{x},\vec{\lambda},\vec{\alpha})\Leftrightarrow \min\limits_{\mathbb{S}}f(\vec{x}) 
          
       x 
                       
                      min​λ 
                       
                      max​α 
                       
                      ,α 
                       
                      ≥0max​L(x 
                
               ,λ 
                
               ,α 
                
               )⇔Smin​f(x 
                
               )
Karush-Kuhn-Tucker (KKT) conditions: 
    The gradient of the generalized Lagrangian is zero
All constraints on both  
          
               x 
              
               ⃗ 
              
             \vec{x} 
            
         x 
                  
                  and the KKT multipliers are satisﬁed
The inequality constraints exhibit “complementary slackness”: 
          
              α 
             
              ⊙ 
             
              h 
             
              ( 
             
               x 
              
               ⃗ 
              
              ) 
             
              = 
             
              0 
             
             \alpha\odot h(\vec{x}) =0 
            
         α⊙h(x 
                  
                 )=0
issory
关注
0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
C4-Numerical Computation

Overflow and Underflowunderflowoccurs when numbers near zero are rounded to zeroOverflowoccurs when numbers with large magnitude are approximated as ∞\infty∞ or −∞-\infty−∞Softmax function...
复制链接

扫一扫
专栏目录