kkt with examples and python code

最新推荐文章于 2025-04-11 14:00:00 发布

Mr.Daozhi

最新推荐文章于 2025-04-11 14:00:00 发布

阅读量1.9k

点赞数

Karush-Kuhn-Tucker (KKT) conditions

What do you need to know to understand this topic?

The importance of gradients into finding the minimum/maximum of functions
The method of Lagrange Multipliers

Sections

What are the Karush-Kuhn-Tucker (KKT) ?

The method of Lagrange Multipliers is used to find the solution for optimization problems constrained to one or more equalities. When our constraints also have inequalities, we need to extend the method to the KKT conditions.

The new problem can be formulated as:

x∗=argminxf(x)

subject to hi(x)=0,∀i=1,..,m

subject to gi(x)≤0,∀i=1,..,n

In words, find the solution that minimizes f(x), as long as all equalities hi(x)=0 and all inequalities gi(x)≤0 hold. It is easy to see that any equality or inequality constraint can be defined, so long as all terms are in the left side of the equation. The inequality conditions are added to the method of Lagrange Multipliers in a similar way to the equalities: Put the cost function as well as the constraints in a single minimization problem, but multiply each equality constraint by a factor λi and the inequality constraints by a factor μi (the KKT multipliers). In our example, we would have mequalities and n inequalities. Hence the expression for the optimization problem becomes:

x∗=argmin xL(x,λ,μ)=argmin xf(x)+∑mi=1λihi(x)+∑ni=1μigi(x),

where L(x,λ,μ) is the Lagrangian and depends also on λ and μ, which are vectors of the multipliers.

As usual, we find the roots of the gradient of the loss function with respect to x to find the extremum of the function. However, the constraints in the function will make x depend on λ and μ. Furthermore, we have number of variables equal to the elements in x (say k) plus the number of multipliers (m+n), and, as of now, we only have k equations coming from the gradient with respect to x. We have seen before that we can differentiate the function with respect to each lagrange multiplier λi to get m more equations. These equations are restricting the set of solutions to the ones that meet the equality constraints.

The new challenge is how to come up with n more equations coming from the inequality constraints. In order to do so, think of what the inequality constraints mean. If the extremum of the original function is in gi(x∗)<0, then this constraint will never play any role in changing the extremum compared with the problem without the constraint. Therefore, its coefficient μican be set to zero. If, on the other hand, the new solution is at the border of the constraint, then gi(x∗)=0. The next graphical representation helps to understand this concept.

Graphical explanation for the KKT conditions

Fig. 1 - Graphical explanation for the KKT conditions.

In both situations, the equation:

μigi(x)=0

is necessary for the solution to our new problem. Therefore, we get n equations from the inequality constraints. The constraint terms are always zero in the set of possible solutions, thereby not affecting the result of the loss function. The coefficients λi can have any value. However, the coefficients μi are limited to nonnegative values. To see why that is, and with the aid of Fig. 2, imagine x∗ is in the region gi(x)=0, so that μi can be different from zero.

Graphical explanation for sign of the KKT multipliers

Fig. 2 - Graphical explanation for the sign of the μ.

x∗=argmin xf(x)+μigi(x)

0=∇f(x)+μi∇gi(x)

μi=−∇f(x)∇gi(x)(1)

At such point x∗, the gradient of f(x) and of gi(x) both with respect to x have opposite directions. Therefore, according to (1), μi must be positive.

The KKT conditions

We are now ready to enumerate the KKT conditions:

∇xf(x)+∑mi=1∇xλihi(x)−∑ni=1μi∇xgi(x)=0 (maximization)

Stationarity
∇xf(x)+∑mi=1∇xλihi(x)+∑ni=1μi∇xgi(x)=0 (minimization)
Equality constraints
∇λf(x)+∑mi=1∇λλihi(x)+∑ni=1μi∇λgi(x)=0
Inequality constraints a.k.a. complementary slackness condition
μigi(x)=0,∀i=1,..,n

μi≥0,∀i=1,..,n

An example

Consider we are trying to maximize the transmission rate of a multi-carrier communication system with N channels. Each carrier/channel can carry a signal power pi≥0 under noise ni>0. The total power must be smaller or equal than P. The transmission rate of each carrier is proportional to:

log2(1+pini)

Given this information, and noting that maximizing ln(x) also maximizes log2(x), the problem is:

max∑Ni=1ln(1+pini)

subject to ∑Ni=1pi≤P

subject to pi≥0,∀i=1,..N

Changing pi≥0 to −pi≤0 and noting that this a maximization problem, the Lagrangian is then:

L(p,μ)=ln(1+pini)−μ0(∑Ni=1pi−P)−∑Ni=1μi(−pi)

L(p,μ)=ln(1+pini)+μ0(P−∑Ni=1pi)+∑Ni=1μipi

Taking the stationarity condition, we get:

∇piL(p,μ)=1pi+ni−μ0+μi=0

pi+ni=1μ0−μi

Since ni>0, then μ0>μi, which also means that μ0>0. From the complimentary slackness conditions:

μ0(P−∑Ni=1pi)=0

μipi=0

μ0,μi≥0

and since μ0>0, we know that

P−∑Ni=1pi=0

P=∑Ni=1pi

which means that pi cannot be zero (all of them since they all have the same role in the optimization problem), forcing μi=0,∀i=1,..,N. Then

pi+ni=1μ0−μi=1μ0

pi=1μ0−ni

The final equations to solve the problem are:

pi=1μ0−ni,∀i=1,..,N

∑Ni=1pi=P

which are easily solvable.

Sufficiency and regularization

The KKT conditions are necessary to find an optimum, but not necessarily sufficient. A set of problems where these conditions are also sufficient are the ones where the functions f(x) and gi(x) are continuously differentiable and convex, and the functions hi(x) are linear.

Furthermore, a certain value of x only satisfies these conditions if it is regular. There are several ways of determining the regularity of x, some of which are in the wikipedia page, while a more extensive explanation can be found in the book Nonlinear Programming by Bertsekas.

The table below summarizes the KKT conditions depending on these two types of conditions. The problem can either have sufficient conditions or not and x can either be regular or not.

Sufficient conditions?	x is regular?
Sufficient conditions?	No	Yes
No	Do not hold	Necessary
Yes	Do not hold	Sufficient

the implementing python code of optimization problem following

Exercise

import numpy as np
from scipy.optimize import minimize

def objective(x):
    return x[0]*x[3]*(x[0]+x[1]+x[2])+x[2]

def constraint1(x):
    return x[0]*x[1]*x[2]*x[3]-25.0

def constraint2(x):
    sum_eq = 40.0
    for i in range(4):
        sum_eq = sum_eq - x[i]**2
    return sum_eq

# initial guesses
n = 4
x0 = np.zeros(n)
x0[0] = 1.0
x0[1] = 5.0
x0[2] = 5.0
x0[3] = 1.0

# show initial objective
print('Initial SSE Objective: ' + str(objective(x0)))

# optimize
b = (1.0,5.0)
bnds = (b, b, b, b)
con1 = {'type': 'ineq', 'fun': constraint1} 
con2 = {'type': 'eq', 'fun': constraint2}
cons = ([con1,con2])
solution = minimize(objective,x0,method='SLSQP',\
                    bounds=bnds,constraints=cons)
x = solution.x

# show final objective
print('Final SSE Objective: ' + str(objective(x)))

# print solution
print('Solution')
print('x1 = ' + str(x[0]))
print('x2 = ' + str(x[1]))
print('x3 = ' + str(x[2]))
print('x4 = ' + str(x[3]))