Mathematics Basics - Multivariate Calculus (Taylor Series)

Taylor Series

We discussed in Jacobian chapter some challenges with applying Jacobian in reality. One of the challenges is when there is no clearly defined analytical function or that function is too complicated to be differentiated. To tackle this, we have to resort to numerical methods and come up with an approximation to the actual function. Taylor series is one such method. It is used to approximate a complex function by a series of simpler functions. In the univariate case, Taylor series usually takes a polynomial form like below.

g 0 ( x ) = a g 1 ( x ) = a + b x g 2 ( x ) = a + b x + c x 2 g 3 ( x ) = a + b x + c x 2 + d x 3 ⋮ \begin{aligned} g_0(x)&=a\\ g_1(x)&=a+bx\\ g_2(x)&=a+bx+cx^2\\ g_3(x)&=a+bx+cx^2+dx^3\\ \vdots \end{aligned} g0(x)g1(x)g2(x)g3(x)=a=a+bx=a+bx+cx2=a+bx+cx2+dx3

We call g 0 ( x ) g_0(x) g0(x) the zeroth order approximation, g 1 ( x ) g_1(x) g1(x) the first order approximation, g 2 ( x ) g_2(x) g2(x) the second order approximation, g 3 ( x ) g_3(x) g3(x) the third order approximation and so on. The order is dictated by the highest power x x x term in the polynomial expression. So Taylor series is just a series of polynomial expressions with increasing power of x x x terms. You will soon see that as the power increases, the resultant polynomial expression gets more and more accurate at approximating the original function.

Next, we will use an example to demonstrate how to find the coefficients a a a, b b b, c c c, d d d etc. used in the Taylor series function. We take a function f ( x ) = e x f(x)=e^x f(x)=ex and build a series of polynomial functions to approximate it.

Before we start, let’s first note an interesting property of the function f ( x ) = e x f(x)=e^x f(x)=ex. No matter how many times we differentiate f ( x ) f(x) f(x), we will always get the same result e x e^x ex. Therefore,

f ( x ) = f ( 1 ) ( x ) = f ( 2 ) ( x ) = ⋯ = f ( n ) ( x ) = e x ​ f(x)=f^{(1)}(x)=f^{(2)}(x)=\cdots=f^{(n)}(x)=e^x​ f(x)=f(1)(x)=f(2)(x)==f(n)(x)=ex

We use superscript ( n ) (n) (n) to denote differentiating the function n n n times with respect to x x x.

In Taylor Series we can only approximate function f f f at one particular point. Here we choose our point of interest at x = 0 x=0 x=0. Substitute this into the derivatives, we get

f ( 0 ) = f ( 1 ) ( 0 ) = f ( 2 ) ( 0 ) = ⋯ = f ( n ) ( 0 ) = e 0 = 1 f(0)=f^{(1)}(0)=f^{(2)}(0)=\cdots=f^{(n)}(0)=e^0=1 f(0)=f(1)(0)=f(2)(0)==f(n)(0)=e0=1

With all those preparation steps in place, we can start with the zeroth order approximation g 0 ( x ) = a g_0(x)=a g0(x)=a. Since zeroth order approximation is just a constant, the best approximation we can obtain at x = 0 x=0 x=0 is a = f ( 0 ) a=f(0) a=f(0). So

g 0 ( x ) = f ( 0 ) = 1 g_0(x)=f(0)=1 g0(x)=f(0)=1
Zeroth Order Approximation for f(x)=e^x

This does not look like a very good approximation. Let’s try the first order approximation.

For first order approximation, we are trying to find a line g 1 ( x ) = a + b x g_1(x)=a+bx g1(x)=a+bx that satisfies the following equations.

g 1 ( x ) = f ( x ) g 1 ( 1 ) ( x ) = f ( 1 ) ( x ) \begin{aligned} g_1(x)&=f(x)\\ g_1^{(1)}(x)&=f^{(1)}(x) \end{aligned} g1(x)g1(1)(x)=f(x)=f(1)(x)

We know the first order derivative of g 1 ( x ) g_1(x) g1(x) is

g 1 ( 1 ) ( x ) = d d x ( a + b x ) = b g_1^{(1)}(x)=\frac{d}{dx}(a+bx)=b g1(1)(x)=dxd(a+bx)=b

Substitute the expression for g 1 ( x ) g_1(x) g1(x) and g 1 ( 1 ) ( x ) g_1^{(1)}(x) g1(1)(x) into the simultaneous equation,

a + b x = f ( x ) b = f ( 1 ) ( x ) \begin{aligned} a+bx&=f(x)\\ b&=f^{(1)}(x)\\ \end{aligned} a+bxb=f(x)=f(1)(x)

At point x = 0,

a + b ( 0 ) = f ( 0 ) b = f ( 1 ) ( 0 ) \begin{aligned} a+b(0)&=f(0)\\ b&=f^{(1)}(0) \end{aligned} a+b(0)b=f(0)=f(1)(0)

Therefore, we obtain the solution for a a a and b b b in first order approximation function as,

g 1 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x = e 0 + e 0 x = 1 + x \begin{aligned} g_1(x)&=f(0)+f^{(1)}(0)x\\ &=e^0+e^0x\\ &=1+x \end{aligned} g1(x)=f(0)+f(1)(0)x=e0+e0x=1+x

We can plot this line together with original function f f f.
First Order Approximation for f(x)=e^x

Our approximation has improved a little, but this is still not good enough. Let’s move on to the second order approximation.

In second order approximation, we are solving a quadratic function g 2 ( x ) = a + b x + c x 2 g_2(x)=a+bx+cx^2 g2(x)=a+bx+cx2. This function g 2 ( x ) g_2(x) g2(x) must satisfy the following equations

g 2 ( x ) = f ( x ) g 2 ( 1 ) ( x ) = f ( 1 ) ( x ) g 2 ( 2 ) ( x ) = f ( 2 ) ( x ) \begin{aligned} g_2(x)&=f(x)\\ g_2^{(1)}(x)&=f^{(1)}(x)\\ g_2^{(2)}(x)&=f^{(2)}(x) \end{aligned} g2(x)g2(1)(x)g2(2)(x)=f(x)=f(1)(x)=f(2)(x)

We can calculate the first and second order derivatives of g 2 ( x ) g_2(x) g2(x) as,

g 2 ( 1 ) ( x ) = d d x ( a + b x + c x 2 ) = b + 2 c x g 2 ( 2 ) ( x ) = d d x ( b + 2 c x ) = 2 c \begin{aligned} g_2^{(1)}(x)&=\frac{d}{dx}(a+bx+cx^2)=b+2cx\\ g_2^{(2)}(x)&=\frac{d}{dx}(b+2cx)=2c\\ \end{aligned} g2(1)(x)g2(2)(x)=dxd(a+bx+cx2)=b+2cx=dxd(b+2cx)=2c

Substitute these back to our previous simultaneous equation,

a + b x + c x 2 = f ( x ) b + 2 c x = f ( 1 ) ( x ) 2 c = f ( 2 ) ( x ) \begin{aligned} a+bx+cx^2&=f(x)\\ b+2cx&=f^{(1)}(x)\\ 2c&=f^{(2)}(x) \end{aligned} a+bx+cx2b+2cx2c=f(x)=f(1)(x)=f(2)(x)

At point x = 0 x=0 x=0,

a + b ( 0 ) + c ( 0 ) 2 = f ( 0 ) b + 2 c ( 0 ) = f ( 1 ) ( 0 ) 2 c = f ( 2 ) ( 0 ) \begin{aligned} a+b(0)+c(0)^2&=f(0)\\ b+2c(0)&=f^{(1)}(0)\\ 2c&=f^{(2)}(0) \end{aligned} a+b(0)+c(0)2b+2c(0)2c=f(0)=f(1)(0)=f(2)(0)

The solution for our coefficients a a a, b b b and c c c is simply

a = f ( 0 ) b = f ( 1 ) ( 0 ) c = 1 2 f ( 2 ) ( 0 ) \begin{aligned} a&=f(0)\\ b&=f^{(1)}(0)\\ c&=\frac{1}{2}f^{(2)}(0) \end{aligned} abc=f(0)=f(1)(0)=21f(2)(0)

So our second order approximation can be evaluated as,

g 2 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 = e 0 + e 0 x + 1 2 e 0 x 2 = 1 + x + 1 2 x 2 \begin{aligned} g_2(x)&=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2\\ &=e^0+e^0x+\frac{1}{2}e^0x^2\\ &=1+x+\frac{1}{2}x^2 \end{aligned} g2(x)=f(0)+f(1)(0)x+21f(2)(0)x2=e0+e0x+21e0x2=1+x+21x2

We plot this second order approximation function together with the original function f f f.
Second Order Approximation for f(x)=e^x

This looks better now. Can we be more accurate? Let’s solve the third order approximation which is a cubic function g 3 ( x ) = a + b x + c x 2 + d x 3 g_3(x)=a+bx+cx^2+dx^3 g3(x)=a+bx+cx2+dx3. Again, it must satisfy the following equations.

g 3 ( x ) = f ( x ) g 3 ( 1 ) ( x ) = f ( 1 ) x g 3 ( 2 ) ( x ) = f ( 2 ) x g 3 ( 3 ) ( x ) = f ( 3 ) x \begin{aligned} g_3(x)&=f(x)\\ g_3^{(1)}(x)&=f^{(1)}x\\ g_3^{(2)}(x)&=f^{(2)}x\\ g_3^{(3)}(x)&=f^{(3)}x\\ \end{aligned} g3(x)g3(1)(x)g3(2)(x)g3(3)(x)=f(x)=f(1)x=f(2)x=f(3)x

The derivatives of g 3 ( x ) g_3(x) g3(x) are given by

g 3 ( 1 ) ( x ) = d d x ( a + b x + c x 2 + d x 3 ) = b + 2 c x + 3 d x 2 g 3 ( 2 ) ( x ) = d d x ( b + 2 c x + 3 d x 2 ) = 2 c + 6 d x g 3 ( 3 ) ( x ) = d d x ( 2 c + 6 d x ) = 6 d \begin{aligned} g_3^{(1)}(x)&=\frac{d}{dx}(a+bx+cx^2+dx^3)=b+2cx+3dx^2\\ g_3^{(2)}(x)&=\frac{d}{dx}(b+2cx+3dx^2)=2c+6dx\\ g_3^{(3)}(x)&=\frac{d}{dx}(2c+6dx)=6d \end{aligned} g3(1)(x)g3(2)(x)g3(3)(x)=dxd(a+bx+cx2+dx3)=b+2cx+3dx2=dxd(b+2cx+3dx2)=2c+6dx=dxd(2c+6dx)=6d

Substitute these back and evaluate at point x = 0 x=0 x=0.

a + b ( 0 ) + c ( 0 ) 2 + d ( 0 ) 3 = f ( 0 ) b + 2 c ( 0 ) + 3 d ( 0 ) 2 = f ( 1 ) ( 0 ) 2 c + 6 d ( 0 ) = f ( 2 ) ( 0 ) 6 d = f ( 3 ) ( 0 ) \begin{aligned} a+b(0)+c(0)^2+d(0)^3&=f(0)\\ b+2c(0)+3d(0)^2&=f^{(1)}(0)\\ 2c+6d(0)&=f^{(2)}(0)\\ 6d&=f^{(3)}(0) \end{aligned} a+b(0)+c(0)2+d(0)3b+2c(0)+3d(0)22c+6d(0)6d=f(0)=f(1)(0)=f(2)(0)=f(3)(0)

The coefficients for our approximation function g 3 ( x ) g_3(x) g3(x) are

a = f ( 0 ) b = f ( 1 ) ( 0 ) c = 1 2 f ( 2 ) ( 0 ) d = 1 6 f ( 3 ) ( 0 ) \begin{aligned} a&=f(0)\\ b&=f^{(1)}(0)\\ c&=\frac{1}{2}f^{(2)}(0)\\ d&=\frac{1}{6}f^{(3)}(0) \end{aligned} abcd=f(0)=f(1)(0)=21f(2)(0)=61f(3)(0)

Therefore, the third order approximation function is

g 3 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 + 1 6 f ( 3 ) ( 0 ) x 2 = e 0 + e 0 x + 1 2 e 0 x 2 + 1 6 e 0 x 3 = 1 + x + 1 2 x 2 + 1 6 x 3 \begin{aligned} g_3(x)&=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2+\frac{1}{6}f^{(3)}(0)x^2\\ &=e^0+e^0x+\frac{1}{2}e^0x^2+\frac{1}{6}e^0x^3\\ &=1+x+\frac{1}{2}x^2+\frac{1}{6}x^3 \end{aligned} g3(x)=f(0)+f(1)(0)x+21f(2)(0)x2+61f(3)(0)x2=e0+e0x+21e0x2+61e0x3=1+x+21x2+61x3

Let’s plot the third order approximation function in the chart.
Third Order Approximation for f(x)=e^x
It looks like we are getting better and better at approximating the function f ( x ) = e x f(x)=e^x f(x)=ex. Before we get carried over with higher order approximations, let’s pause for a second and look at the approximation functions we have obtained so far.

g 0 ( x ) = f ( 0 ) g 1 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x g 2 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 g 3 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 + 1 6 f ( 3 ) ( 0 ) x 3 \begin{aligned} g_0(x)&=f(0)\\ g_1(x)&=f(0)+f^{(1)}(0)x\\ g_2(x)&=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2\\ g_3(x)&=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2+\frac{1}{6}f^{(3)}(0)x^3\\ \end{aligned} g0(x)g1(x)g2(x)g3(x)=f(0)=f(0)+f(1)(0)x=f(0)+f(1)(0)x+21f(2)(0)x2=f(0)+f(1)(0)x+21f(2)(0)x2+61f(3)(0)x3

A nice pattern is revealed. Instead of explicitly writing out coefficients for higher order approximation functions, we can derive a a general expression for the n t h n^{th} nth order approximation for function f f f at point x = 0 x=0 x=0 as

g n ( x ) = ∑ n = 0 n f ( n ) ( 0 ) n ! x n g_n(x)=\sum_{n=0}^{n}\frac{f^{(n)}(0)}{n!}x^n gn(x)=n=0nn!f(n)(0)xn

Note that when n n n goes to infinity, the approximation function g ( x ) g(x) g(x) becomes exactly the same as the original function f ( x ) f(x) f(x).

f ( x ) = ∑ n = 0 ∞ f ( n ) ( 0 ) n ! x n f(x)=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^n f(x)=n=0n!f(n)(0)xn

With this general expression, we can easily tell the fourth order approximation is

g 4 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 + 1 6 f ( 3 ) ( 0 ) x 3 + 1 24 f ( 4 ) ( 0 ) x 4 = e 0 + e 0 x + 1 2 e 0 x 2 + 1 6 e 0 x 3 + 1 24 e 0 x 4 = 1 + x + 1 2 x 2 + 1 6 x 3 + 1 24 x 4 \begin{aligned} g_4(x)&=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2+\frac{1}{6}f^{(3)}(0)x^3+\frac{1}{24}f^{(4)}(0)x^4\\ &=e^0+e^0x+\frac{1}{2}e^0x^2+\frac{1}{6}e^0x^3+\frac{1}{24}e^0x^4\\ &=1+x+\frac{1}{2}x^2+\frac{1}{6}x^3+\frac{1}{24}x^4 \end{aligned} g4(x)=f(0)+f(1)(0)x+21f(2)(0)x2+61f(3)(0)x3+241f(4)(0)x4=e0+e0x+21e0x2+61e0x3+241e0x4=1+x+21x2+61x3+241x4

And the fifth order approximation is

g 5 ( x ) = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 + 1 6 f ( 3 ) ( 0 ) x 3 + 1 24 f ( 4 ) ( 0 ) x 4 + 1 120 f ( 5 ) ( 0 ) x 5 = e 0 + e 0 x + 1 2 e 0 x 2 + 1 6 e 0 x 3 + 1 24 e 0 x 4 + 1 120 e 0 x 5 = 1 + x + 1 2 x 2 + 1 6 x 3 + 1 24 x 4 + 1 120 x 5 \begin{aligned} g_5(x)&=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2+\frac{1}{6}f^{(3)}(0)x^3+\frac{1}{24}f^{(4)}(0)x^4+\frac{1}{120}f^{(5)}(0)x^5\\ &=e^0+e^0x+\frac{1}{2}e^0x^2+\frac{1}{6}e^0x^3+\frac{1}{24}e^0x^4+\frac{1}{120}e^0x^5\\ &=1+x+\frac{1}{2}x^2+\frac{1}{6}x^3+\frac{1}{24}x^4+\frac{1}{120}x^5 \end{aligned} g5(x)=f(0)+f(1)(0)x+21f(2)(0)x2+61f(3)(0)x3+241f(4)(0)x4+1201f(5)(0)x5=e0+e0x+21e0x2+61e0x3+241e0x4+1201e0x5=1+x+21x2+61x3+241x4+1201x5

This is a very powerful formula. It means as long as we can keep on differentiating a function at point x = 0 x=0 x=0, we can use these derivative terms to reconstruct the function everywhere else. Moreover, the polynomial function we derive from this formula can approximate the actual function at an arbitrary degree of accuracy. This formula for approximating a function at point x = 0 x=0 x=0 was first discovered by mathematician Colin Maclaurin. Therefore, it is also called Maclaurin series.

You might be wondering what is so special about the point x = 0 x=0 x=0 that would allow us to find an approximation function easily. It turns out there is nothing special. Maclaurin series is just one special case of the more general Taylor series which can be evaluated at any point x = p x=p x=p. The general form of Taylor series is given by

f ( x ) = ∑ n = 0 ∞ f ( n ) ( p ) n ! ( x − p ) n f(x)=\sum_{n=0}^\infty\frac{f^{(n)}(p)}{n!}(x-p)^n f(x)=n=0n!f(n)(p)(xp)n

According to the Taylor series formula, we can rewrite our zeroth to fifth order approximation function as

g 0 ( x ) = f ( p ) g 1 ( x ) = f ( p ) + f ( 1 ) ( p ) ( x − p ) g 2 ( x ) = f ( p ) + f ( 1 ) ( p ) ( x − p ) + 1 2 f ( 2 ) ( p ) ( x − p ) 2 g 3 ( x ) = f ( p ) + f ( 1 ) ( p ) ( x − p ) + 1 2 f ( 2 ) ( p ) ( x − p ) 2 + 1 6 f ( 3 ) ( p ) ( x − p ) 3 g 4 ( x ) = f ( p ) + f ( 1 ) ( p ) ( x − p ) + 1 2 f ( 2 ) ( p ) ( x − p ) 2 + 1 6 f ( 3 ) ( p ) ( x − p ) 3 + 1 24 f ( 4 ) ( p ) ( x − p ) 4 g 5 ( x ) = f ( p ) + f ( 1 ) ( p ) ( x − p ) + 1 2 f ( 2 ) ( p ) ( x − p ) 2 + 1 6 f ( 3 ) ( p ) ( x − p ) 3 + 1 24 f ( 4 ) ( p ) ( x − p ) 4 + 1 120 f ( 5 ) ( p ) ( x − p ) 5 \begin{aligned} g_0(x)&=f(p)\\ g_1(x)&=f(p)+f^{(1)}(p)(x-p)\\ g_2(x)&=f(p)+f^{(1)}(p)(x-p)+\frac{1}{2}f^{(2)}(p)(x-p)^2\\ g_3(x)&=f(p)+f^{(1)}(p)(x-p)+\frac{1}{2}f^{(2)}(p)(x-p)^2+\frac{1}{6}f^{(3)}(p)(x-p)^3\\ g_4(x)&=f(p)+f^{(1)}(p)(x-p)+\frac{1}{2}f^{(2)}(p)(x-p)^2+\frac{1}{6}f^{(3)}(p)(x-p)^3+\frac{1}{24}f^{(4)}(p)(x-p)^4\\ g_5(x)&=f(p)+f^{(1)}(p)(x-p)+\frac{1}{2}f^{(2)}(p)(x-p)^2+\frac{1}{6}f^{(3)}(p)(x-p)^3+\frac{1}{24}f^{(4)}(p)(x-p)^4+\frac{1}{120}f^{(5)}(p)(x-p)^5 \end{aligned} g0(x)g1(x)g2(x)g3(x)g4(x)g5(x)=f(p)=f(p)+f(1)(p)(xp)=f(p)+f(1)(p)(xp)+21f(2)(p)(xp)2=f(p)+f(1)(p)(xp)+21f(2)(p)(xp)2+61f(3)(p)(xp)3=f(p)+f(1)(p)(xp)+21f(2)(p)(xp)2+61f(3)(p)(xp)3+241f(4)(p)(xp)4=f(p)+f(1)(p)(xp)+21f(2)(p)(xp)2+61f(3)(p)(xp)3+241f(4)(p)(xp)4+1201f(5)(p)(xp)5

Compared to the Maclaurin series, Taylor series can be applied to any point x = p x=p x=p as long as function f f f is differentiable at this point. We change from differentiating at x = 0 x=0 x=0 to differentiating at x = p x=p x=p. Consequently, the x x x term is changed from x n x^n xn to ( x − p ) n (x-p)^n (xp)n. It is easy to see that Maclairin series is just Taylor series when p = 0 p=0 p=0.

Of course, we are not building the Taylor series to approximate a simple function like f ( x ) = e x f(x)=e^x f(x)=ex. Taylor series is most useful when we have a complex function that can’t be differentiated easily. For such complex functions, we can find its gradient at one particular point and derive a polynomial expression to re-express it. This is much simpler than carrying out the standard differentiation process for every point of the function.

In addition, Taylor series approximation is most accurate around the point it is approximating. The accuracy drops as we move away from this point. Luckily, in real word applications we seldom need to obtain the derivatives of a function at every point. Instead, we only need to focus on a specific range of input values and understand how they influence the final output. This is exactly what Taylor series can be used for.

More Taylor Series Examples

Let’s look at another commonly used function f ( x ) = cos ⁡ ( x ) f(x)=\cos(x) f(x)=cos(x) and find its approximations.
f(x)=cos(x)

We first observe that there is a cyclic pattern in the derivatives of f ( x ) f(x) f(x). The derivatives cycle between cosine and sine and between positive and negative. Moreover, the same terms appear after every 4th differentiation.

f ( x ) = cos ⁡ ( x ) f ( 1 ) ( x ) = − sin ⁡ ( x ) f ( 2 ) ( x ) = − cos ⁡ ( x ) f ( 3 ) ( x ) = sin ⁡ ( x ) f ( 4 ) ( x ) = cos ⁡ ( x ) ⋮ \begin{aligned} f(x)&=\cos(x)\\ f^{(1)}(x)&=-\sin(x)\\ f^{(2)}(x)&=-\cos(x)\\ f^{(3)}(x)&=\sin(x)\\ f^{(4)}(x)&=\cos(x)\\ \vdots \end{aligned} f(x)f(1)(x)f(2)(x)f(3)(x)f(4)(x)=cos(x)=sin(x)=cos(x)=sin(x)=cos(x)

We can substitute the value x = 0 x=0 x=0 and see what these derivatives evaluate to.

f ( 0 ) = cos ⁡ ( 0 ) = 1 f ( 1 ) ( 0 ) = − sin ⁡ ( 0 ) = 0 f ( 2 ) ( 0 ) = − cos ⁡ ( 0 ) = − 1 f ( 3 ) ( 0 ) = sin ⁡ ( 0 ) = 0 f ( 4 ) ( 0 ) = cos ⁡ ( 0 ) = 1 ⋮ \begin{aligned} f(0)&=\cos(0)=1\\ f^{(1)}(0)&=-\sin(0)=0\\ f^{(2)}(0)&=-\cos(0)=-1\\ f^{(3)}(0)&=\sin(0)=0\\ f^{(4)}(0)&=\cos(0)=1\\ \vdots \end{aligned} f(0)f(1)(0)f(2)(0)f(3)(0)f(4)(0)=cos(0)=1=sin(0)=0=cos(0)=1=sin(0)=0=cos(0)=1

In this case, the derivatives are 0 whenever we differentiate the function odd number of times ( f ( 1 ) ( 0 ) , f ( 0 ) ( x ) , ⋯   , f ( 2 n + 1 ) ( 0 ) f^{(1)}(0),f^{(0)}(x),\cdots,f^{(2n+1)}(0) f(1)(0),f(0)(x),,f(2n+1)(0)). One the other hand, the derivatives are flipping between 1 1 1 and − 1 -1 1 whenever we differentiate the function even number of times ( f ( 2 ) ( 0 ) , f ( 4 ) ( 0 ) , ⋯   , f ( 2 n ) ( 0 ) f^{(2)}(0),f^{(4)}(0),\cdots,f^{(2n)}(0) f(2)(0),f(4)(0),,f(2n)(0)). Based on this observation, we can derive the Maclaurin series for approximating function f ( x ) = cos ⁡ ( x ) f(x)=\cos(x) f(x)=cos(x) at point x = 0 x=0 x=0 as,

f ( x ) = ∑ n = 0 ∞ f ( n ) ( 0 ) n ! x n = f ( 0 ) + f ( 1 ) ( 0 ) x + 1 2 f ( 2 ) ( 0 ) x 2 + 1 6 f ( 3 ) ( 0 ) x 3 + 1 24 f ( 4 ) ( 0 ) x 4 + ⋯ = 1 + ( 0 ) x + 1 2 ( − 1 ) x 2 + 1 6 ( 0 ) x 3 + 1 24 ( 1 ) x 4 + ⋯ = ∑ n = 0 ∞ ( − 1 ) n ( 2 n ) ! x 2 n \begin{aligned} f(x)&=\sum_{n=0}^{\infty}\frac{f^{(n)}(0)}{n!}x^n\\ &=f(0)+f^{(1)}(0)x+\frac{1}{2}f^{(2)}(0)x^2+\frac{1}{6}f^{(3)}(0)x^3+\frac{1}{24}f^{(4)}(0)x^4+\cdots\\ &=1+(0)x+\frac{1}{2}(-1)x^2+\frac{1}{6}(0)x^3+\frac{1}{24}(1)x^4+\cdots\\ &=\sum_{n=0}^\infty\frac{(-1)^n}{(2n)!}x^{2n} \end{aligned} f(x)=n=0n!f(n)(0)xn=f(0)+f(1)(0)x+21f(2)(0)x2+61f(3)(0)x3+241f(4)(0)x4+=1+(0)x+21(1)x2+61(0)x3+241(1)x4+=n=0(2n)!(1)nx2n

Let’s plot out the fourth order approximation function. It shows a very closed match to the original function around the region x = 0 x=0 x=0.

g 4 ( x ) = 1 − x 2 2 + x 4 24 g_4(x)=1-\frac{x^2}{2}+\frac{x^4}{24} g4(x)=12x2+24x4
Fourth Order Approximation for f(x)=cos(x)

We call the function f ( x ) = cos ⁡ ( x ) f(x)=\cos(x) f(x)=cos(x) a well-behaved function because it has two properties.

  1. It is continuous everywhere. You can traverse along the curve of this function and not find any sudden break.
  2. It can be differentiated infinitely. The same cyclic pattern will repeat over and over as you differentiate the function in higher orders.

These two properties are very important prerequisites for us to successfully derive a Maclaurin or Taylor series in general. We can use another function f ( x ) = 1 x f(x)=\frac{1}{x} f(x)=x1 to demonstrate why these two properties matter so much.
f(x)=1/x

Based on the plot, we can see that f ( x ) = 1 x f(x)=\frac{1}{x} f(x)=x1 is not a continuous function. The curve breaks at x = 0 x=0 x=0. Therefore, even we can infinitely differentiate this function as follows, it is not a well-behaved function.

f ( x ) = 1 x f ( 1 ) ( x ) = − 1 x 2 f ( 2 ) ( x ) = 2 x 3 f ( 3 ) ( x ) = − 6 x 4 f ( 4 ) ( x ) = 24 x 5 ⋮ \begin{aligned} f(x)&=\frac{1}{x}\\ f^{(1)}(x)&=-\frac{1}{x^2}\\ f^{(2)}(x)&=\frac{2}{x^3}\\ f^{(3)}(x)&=-\frac{6}{x^4}\\ f^{(4)}(x)&=\frac{24}{x^5}\\ \vdots \end{aligned} f(x)f(1)(x)f(2)(x)f(3)(x)f(4)(x)=x1=x21=x32=x46=x524

It is quite clear that we cannot use Maclaurin series for approximation because all the derivatives of function f f f are undefined at x = 0 x=0 x=0. Nevertheless, we can use Taylor series to approximate this function at a different point. For example, at x = 1 x=1 x=1, we can evaluate the derivatives of f ( x ) f(x) f(x) as

f ( 1 ) = 1 1 = 1 f ( 1 ) ( 1 ) = − 1 1 2 = − 1 f ( 2 ) ( 1 ) = 2 1 3 = 2 f ( 3 ) ( 1 ) = − 6 1 4 = − 6 f ( 4 ) ( 1 ) = 24 1 5 = 24 ⋮ \begin{aligned} f(1)&=\frac{1}{1}=1\\ f^{(1)}(1)&=-\frac{1}{1^2}=-1\\ f^{(2)}(1)&=\frac{2}{1^3}=2\\ f^{(3)}(1)&=-\frac{6}{1^4}=-6\\ f^{(4)}(1)&=\frac{24}{1^5}=24\\ \vdots \end{aligned} f(1)f(1)(1)f(2)(1)f(3)(1)f(4)(1)=11=1=121=1=132=2=146=6=1524=24

Notice that the derivatives are evaluated to be a positive number whenever we differentiate function f f f even number of times ( f ( 2 ) ( 1 ) , f ( 4 ) ( 1 ) , ⋯   , f ( 2 n ) ( 1 ) f^{(2)}(1),f^{(4)}(1),\cdots,f^{(2n)}(1) f(2)(1),f(4)(1),,f(2n)(1)). Conversely, the derivatives are evaluated to be a negative number whenever we differentiate function f f f odd number of times ( f ( 1 ) ( 1 ) , f ( 3 ) ( 1 ) , ⋯   , f ( 2 n + 1 ) ( 1 ) f^{(1)}(1),f^{(3)}(1),\cdots,f^{(2n+1)}(1) f(1)(1),f(3)(1),,f(2n+1)(1)).

Substituting these values into our Taylor series formula, we can obtain the following approximation function for f = 1 x f=\frac{1}{x} f=x1.

f ( x ) = ∑ n = 0 ∞ f ( n ) ( 1 ) n ! ( x − 1 ) n = f ( 1 ) + f ( 1 ) ( 1 ) ( x − 1 ) + 1 2 f ( 2 ) ( 1 ) ( x − 1 ) 2 + 1 6 f ( 3 ) ( 1 ) ( x − 1 ) 3 + 1 24 f ( 4 ) ( 1 ) ( x − 1 ) 4 + ⋯ = 1 + ( − 1 ) ( x − 1 ) + 1 2 ( 2 ) ( x − 1 ) 2 + 1 6 ( − 6 ) ( x − 1 ) 3 + 1 24 ( 24 ) ( x − 1 ) 4 + ⋯ = 1 − ( x − 1 ) + ( x − 1 ) 2 − ( x − 1 ) 3 + ( x − 1 ) 4 + ⋯ = ∑ n = 0 ∞ ( − 1 ) n ( x − 1 ) n \begin{aligned} f(x)&=\sum_{n=0}^\infty\frac{f^{(n)}(1)}{n!}(x-1)^n\\ &=f(1)+f^{(1)}(1)(x-1)+\frac{1}{2}f^{(2)}(1)(x-1)^2+\frac{1}{6}f^{(3)}(1)(x-1)^3+\frac{1}{24}f^{(4)}(1)(x-1)^4+\cdots\\ &=1+(-1)(x-1)+\frac{1}{2}(2)(x-1)^2+\frac{1}{6}(-6)(x-1)^3+\frac{1}{24}(24)(x-1)^4+\cdots\\ &=1-(x-1)+(x-1)^2-(x-1)^3+(x-1)^4+\cdots\\ &=\sum_{n=0}^\infty(-1)^n(x-1)^n \end{aligned} f(x)=n=0n!f(n)(1)(x1)n=f(1)+f(1)(1)(x1)+21f(2)(1)(x1)2+61f(3)(1)(x1)3+241f(4)(1)(x1)4+=1+(1)(x1)+21(2)(x1)2+61(6)(x1)3+241(24)(x1)4+=1(x1)+(x1)2(x1)3+(x1)4+=n=0(1)n(x1)n

How good are we with the approximation this time? Let’s plot the fourth order approximation function below.

g 4 ( x ) = 1 − ( x − 1 ) + ( x − 1 ) 2 − ( x − 1 ) 3 + ( x − 1 ) 4 g_4(x)=1-(x-1)+(x-1)^2-(x-1)^3+(x-1)^4 g4(x)=1(x1)+(x1)2(x1)3+(x1)4
Fourth Order Approximation for f(x)=1/x

We have some degree of closeness of the two curves, especially around the region x = 1 x=1 x=1. There are also some interesting features we can observe here. Firstly, the region of the original function where x x x is less than 0 is not covered at all by this approximation function. Secondly, the approximation function ignores the asymptote at x = 0 x=0 x=0 and goes straight across the y-axis. Lastly, the approximation function did a bad job for points with larger values of x. This does not get improved much with higher order approximation. The “tail” is wildly flailing up and down with higher power terms (verify this by yourself).

Therefore, we can see that Taylor series does not approximate well if a function is not well-behaved, i.e. continuous everywhere and indefinitely differentiable. Luckily, most of the functions we see in real world can fulfill these two requirements. Even for functions that are discontinuous like f ( x ) = 1 x f(x)=\frac{1}{x} f(x)=x1, we can try to approximate it by deriving piece-wise Taylor series expressions. For example, we can have one approximation at x = 1 x=1 x=1 and the other one at x = − 1 x=-1 x=1 to cover both the positive and negative x x x values of f ( x ) = 1 x f(x)=\frac{1}{x} f(x)=x1. As for functions that can only be differentiated finite number of times, we are just approximating it with a certain level of accuracy determined by the highest order differentiation term. We will have to ensure this approximation accuracy is acceptable for the input range we are interested in.

Linearization

We have learned that Taylor series can be expanded into infinite number of terms (as long as the function can be differentiated). However, in practice we often just evaluate the first few terms, and ignore the other higher order terms. This is called the truncated form of Taylor series. We will see where this trucked form is used.

Recall that a Taylor series is given by

f ( x ) = ∑ n = 0 ∞ f ( n ) ( p ) n ! ( x − p ) n f(x)=\sum_{n=0}^\infty\frac{f^{(n)}(p)}{n!}(x-p)^n f(x)=n=0n!f(n)(p)(xp)n

This is evaluated at point x = p x=p x=p. At some distance Δ p \Delta p Δp away from this point, the same Taylor series will give us the value of the function as

f ( p + Δ p ) = ∑ n = 0 ∞ f ( n ) ( p ) n ! ( Δ p ) n f(p+\Delta p)=\sum_{n=0}^\infty\frac{f^{(n)}(p)}{n!}(\Delta p)^n f(p+Δp)=n=0n!f(n)(p)(Δp)n

By convention, we will rewrite all the p p p's as x x x's because they are in fact the same. So the function becomes

f ( x + Δ x ) = ∑ n = 0 ∞ f ( n ) ( x ) n ! ( Δ x ) n = f ( x ) + f ( 1 ) ( x ) ( Δ x ) + 1 2 f ( 2 ) ( x ) ( Δ x ) 2 + 1 6 f ( 3 ) ( x ) ( Δ x ) 3 + 1 24 f ( 4 ) ( x ) ( Δ x ) 4 + ⋯ \begin{aligned} f(x+\Delta x)&=\sum_{n=0}^\infty\frac{f^{(n)}(x)}{n!}(\Delta x)^n\\&=f(x)+f^{(1)}(x)(\Delta x)+\frac{1}{2}f^{(2)}(x)(\Delta x)^2+\frac{1}{6}f^{(3)}(x)(\Delta x)^3+\frac{1}{24}f^{(4)}(x)(\Delta x)^4+\cdots \end{aligned} f(x+Δx)=n=0n!f(n)(x)(Δx)n=f(x)+f(1)(x)(Δx)+21f(2)(x)(Δx)2+61f(3)(x)(Δx)3+241f(4)(x)(Δx)4+

Let’s rearrange this expression to find the first order derivative of function f f f.

f ( 1 ) ( x ) = f ( x + Δ x ) − f ( x ) Δ x − 1 2 f ( 2 ) ( x ) ( Δ x ) − 1 6 f ( 3 ) ( x ) ( Δ x ) 2 − 1 24 f ( 4 ) ( x ) ( Δ x ) 3 − ⋯ f^{(1)}(x)=\frac{f(x+\Delta x)-f(x)}{\Delta x}-\frac{1}{2}f^{(2)}(x)(\Delta x)-\frac{1}{6}f^{(3)}(x)(\Delta x)^2-\frac{1}{24}f^{(4)}(x)(\Delta x)^3-\cdots f(1)(x)=Δxf(x+Δx)f(x)21f(2)(x)(Δx)61f(3)(x)(Δx)2241f(4)(x)(Δx)3

Check the first term of this expression, f ( x + Δ x ) − f ( x ) Δ x \frac{f(x+\Delta x)-f(x)}{\Delta x} Δxf(x+Δx)f(x). This looks very similar to the rise-over-run equation we derived at the beginning to estimate gradient of a function at a point. Except now we have some additional terms to tell us how good we are at estimating the gradient. We can lump these additional terms up as an error term and denote it by O ( Δ x ) O(\Delta x) O(Δx).

O ( Δ x ) = − 1 2 f ( 2 ) ( x ) ( Δ x ) − 1 6 f ( 3 ) ( x ) ( Δ x ) 2 − 1 24 f ( 4 ) ( x ) ( Δ x ) 3 − ⋯ O(\Delta x)=-\frac{1}{2}f^{(2)}(x)(\Delta x)-\frac{1}{6}f^{(3)}(x)(\Delta x)^2-\frac{1}{24}f^{(4)}(x)(\Delta x)^3-\cdots O(Δx)=21f(2)(x)(Δx)61f(3)(x)(Δx)2241f(4)(x)(Δx)3

If we were careful with choosing Δ x \Delta x Δx when estimating the gradient, this Δ x \Delta x Δx would be evaluated to a very small number. Then ( Δ x ) 2 (\Delta x)^2 (Δx)2, ( Δ x ) 3 (\Delta x)^3 (Δx)3 and all the higher order terms would be even smaller. Therefore, the first term we omitted in the Taylor series expansion, 1 2 f ( 2 ) ( x ) ( Δ x ) \frac{1}{2}f^{(2)}(x)(\Delta x) 21f(2)(x)(Δx) is a good indicator of the size of error incurred with the estimation. This estimation is said to be first order accurate since its first omitted term is Δ x \Delta x Δx. Gradient of a function can then be expressed as

f ( 1 ) ( x ) = f ( x + Δ x ) − f ( x ) Δ x + O ( Δ x ) f^{(1)}(x)=\frac{f(x+\Delta x)-f(x)}{\Delta x}+O(\Delta x) f(1)(x)=Δxf(x+Δx)f(x)+O(Δx)

As a result, we know a linear approximation of function f f f at some distance Δ x \Delta x Δx away from the approximation point and we know how to measure the approximation error by Δ x \Delta x Δx.

f ( x + Δ x ) ≈ f ( x ) + f ( 1 ) ( x ) ( Δ x ) f(x+\Delta x)\approx f(x)+f^{(1)}(x)(\Delta x) f(x+Δx)f(x)+f(1)(x)(Δx)

This process of taking a function and ignoring all the higher order terms above Δ x \Delta x Δx is called linearization and is often used by computers to solve derivative of a function numerically rather than analytically.

One practical use of linearization is when we know the value of a function at one point and we want to estimate the value at another point. A simple example is estimating 4.01 \sqrt{4.01} 4.01 without the help of a calculator.

We can first define our square-root function f ( x ) f(x) f(x) as

f ( x ) = x f(x)=\sqrt{x} f(x)=x

The linearized approximation of f ( x ) f(x) f(x) would be

f ( x + Δ x ) ≈ f ( x ) + d d x x ( Δ x ) ≈ x + 1 2 x ( Δ x ) \begin{aligned} f(x+\Delta x)&\approx f(x)+\frac{d}{dx}\sqrt{x}(\Delta x)\\ &\approx \sqrt{x}+\frac{1}{2\sqrt{x}}(\Delta x) \end{aligned} f(x+Δx)f(x)+dxdx (Δx)x +2x 1(Δx)

We already know that 4 = 2 \sqrt{4}=2 4 =2. So we can approximate function f ( x ) f(x) f(x) at x = 4 x=4 x=4 first and calculate the value at distance Δ x = 0.01 \Delta x=0.01 Δx=0.01 away from this approximation point.

4.01 = f ( 4 + 0.01 ) ≈ 4 + 1 2 4 ( 0.01 ) ≈ 2 + 1 4 × 0.01 ≈ 2.0025 \begin{aligned} \sqrt{4.01}&=f(4+0.01)\\ &\approx \sqrt{4}+\frac{1}{2\sqrt{4}}(0.01)\\ &\approx 2+\frac{1}{4}\times0.01\\ &\approx 2.0025 \end{aligned} 4.01 =f(4+0.01)4 +24 1(0.01)2+41×0.012.0025

This result is very closed to the actual value of 4.01 = 2.002498 \sqrt{4.01}=2.002498 4.01 =2.002498. It shows our linearization method can generate a good enough approximation if we have no better way to solve the equation analytically.

In reality, there are some functions which are computationally too expensive to evaluate at every single point. Therefore, if we know its value at one point, we can make use of the linearization method to approximate its value at another point close by. It is much easier to perform such operations by a computer.

We have gone through quite a few heavy topics regarding Taylor series and its applications. Taylor series is an important tool for us to solve complex functions by close approximations. It enables computers to efficiently evaluate functions at different points. We will see it being used a lot in advanced machine learning algorithms.


(Inspired by Mathematics for Machine Learning lecture series from Imperial College London)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值