Content:
Conditions:
1, f ∈ C [ a , b ] . f\in C[a,b]. f∈C[a,b]. That is a very weak condition.
Conclusions:
1, ∃ { P n } ⇉ f \exist\{P_n\}\rightrightarrows f ∃{Pn}⇉f, where P n P_n Pn are polynomials with rational coefficients.
Examples:
1, This says stage is important.
1 x \frac 1x x1 cannot be approximated by polynomials on ( 0 , 1 ] (0,1] (0,1].
Then we give three proofs that go along different main ideas.
Broken-Line Revamp
By Lebesgue, who uses a very, very straightforward method to prove this theorem.
Essence:
1, Since f f f’s uniformly continuous, we know the broken-lines, constructed by connecting all end-points in partitions, will uniformly converge to f f f.
2, Broken-line is actually a morph of ∣ x ∣ |x| ∣x∣.
3, ∣ x ∣ |x| ∣x∣ can be uniformly approximated by polynomials.
Idea:
We just need to revamp a broken-line to a polynomial.
Tricks:
1, u 0 ( x ) : ≡ 0 , u n + 1 ( x ) : = u n ( x ) + x 2 − u n 2 ( x ) 2 . u n ( x ) ⇉ ∣ x ∣ , n → + ∞ . u_0(x):\equiv0,u_{n+1}(x):=u_n(x)+\frac{x^2-u_n^2(x)}2.u_n(x)\rightrightarrows|x|,n\to+\infty. u0(x):≡0,un+1(x):=un(x)+2x2−un2(x).un(x)⇉∣x∣,n→+∞. Thus Essence 3 is utilised.
2, Remember
ReLU
\text{ReLU}
ReLU? Now we define
ReLU
\text{ReLU}
ReLU at
x
0
x_0
x0:
ReLU
(
x
0
)
:
=
x
−
x
0
+
∣
x
−
x
0
∣
2
.
\text{ReLU}(x_0):=\frac{x-x_0+|x-x_0|}2.
ReLU(x0):=2x−x0+∣x−x0∣.
According to Trick 1 we know this can be uniformly approximated by polynomials.
3, We now focus on the bent at c c c on the broken line.
See ReLU ( c ) \text{ReLU}(c) ReLU(c), which has no effect on [ a , c ] [a,c] [a,c] but changes the slope on ( c , b ] (c,b] (c,b].
Thus we can figure out the exact expression of this broken-line, which only consists of polynomials and things like ∣ x − x 0 ∣ |x-x_0| ∣x−x0∣.
Kernel Perturbation
By Landau, who shows us how to use kernel to fine-tune a function.
Essence:
1, Use a kernel to perturb it into a polynomial without changing any point’s being paramount at its own place.
Tricks:
1, We do a scaling and subtract a linear function to let f ( 0 ) = f ( 1 ) = 0 & f ≡ 0 , x ∉ [ 0 , 1 ] f(0)=f(1)=0\ \&\ f\equiv0,x\not\in[0,1] f(0)=f(1)=0 & f≡0,x∈[0,1].
2,
Q n ( x ) : = c n ( 1 − t 2 ) n Q_n(x):=c_n(1-t^2)^n Qn(x):=cn(1−t2)n, which only makes sense at 0 0 0, when n n n is big enough.
And we want its integral = 1 =1 =1, thus c n : = ( ∫ − 1 1 ( 1 − x 2 ) n d x ) − 1 ⩽ n c_n:=(\int_{-1}^1(1-x^2)^n\text dx)^{-1}\leqslant\sqrt n cn:=(∫−11(1−x2)ndx)−1⩽n.
⟹ Q n ⩽ n ( 1 − δ 2 ) n \implies Q_n\leqslant\sqrt n(1-\delta^2)^n ⟹Qn⩽n(1−δ2)n.
3,
P n ( x ) = ∫ − 1 1 f ( x + t ) Q n ( t ) d t , x ∈ [ 0 , 1 ] . P_n(x)=\int_{-1}^1f(x+t)Q_n(t)\text dt,x\in[0,1]. Pn(x)=∫−11f(x+t)Qn(t)dt,x∈[0,1].
According to the property of Q n Q_n Qn, { P n } → f \{P_n\}\to f {Pn}→f.
P n ( x ) = ∫ − x 1 − x f ( x + t ) Q n ( t ) d t = ∫ 0 1 f ( m ) Q n ( m − x ) d m P_n(x)=\int_{-x}^{1-x}f(x+t)Q_n(t)\text dt=\int_0^1f(m)Q_n(m-x)\text dm Pn(x)=∫−x1−xf(x+t)Qn(t)dt=∫01f(m)Qn(m−x)dm.
Since m m m will disappear after the integrating, this is an x x x-polynomial.
4,
∣ P n − f ∣ ⩽ ∫ − 1 1 ∣ f ( x + t ) − f ( x ) ∣ Q n ( t ) d t |P_n-f|\leqslant\int_{-1}^1|f(x+t)-f(x)|Q_n(t)\text dt ∣Pn−f∣⩽∫−11∣f(x+t)−f(x)∣Qn(t)dt.
Then we use uniformly continuous.
Split the interval into ∫ − 1 − δ + ∫ δ 1 + ∫ − δ δ \int_{-1}^{-\delta}+\int_{\delta}^1+\int_{-\delta}^{\delta} ∫−1−δ+∫δ1+∫−δδ.
Then amplify to ε \varepsilon ε.
Distribution is also a Kernel!
By Bernstein, who let us know that binomial distribution can be non-trivial.
Essence:
1, Binomial distribution has maximal expectancy at mid.
Tricks:
1, Do a scaling; let f ∈ C [ 0 , 1 ] f\in C[0,1] f∈C[0,1].
2,
Construct kernel with binomial distribution: P n , i ( x ) : = C n i x i ( 1 − x ) n − i P_{n,i}(x):=C_n^ix^i(1-x)^{n-i} Pn,i(x):=Cnixi(1−x)n−i. Naturally, ∑ i = 0 n P n , i = 1 \sum_{i=0}^nP_{n,i}=1 ∑i=0nPn,i=1.
3,
B n ( x ) : = ∑ i = 0 n P n , i ( x ) f ( i n ) . B_n(x):=\sum_{i=0}^nP_{n,i}(x)f(\frac in). Bn(x):=∑i=0nPn,i(x)f(ni).
Actually this expression has a form like expectancy.
4,
∣ B n ( x ) − f ( x ) ∣ = ∑ P n ( x ) ( f ( k n ) − f ( x ) ) = ⋯ |B_n(x)-f(x)|=\sum P_n(x)(f(\frac kn)-f(x))=\cdots ∣Bn(x)−f(x)∣=∑Pn(x)(f(nk)−f(x))=⋯.
Proof is like that of Landau.
Properties:
1, It’s exactly same at endpoints.
2, This has an explicit expression.
3, It converges very slowly.
4, Not a good approximation and some times funny.
Consider x 2 x^2 x2 which is a polynomial per se but this method will not find it out.
Actually, consider such a set E n ( f ) : = { P ( x ) ∣ deg P ⩽ n , ∣ f − P ∣ < ε , ∀ x } E_n(f):=\{P(x)|\deg P\leqslant n,|f-P|<\varepsilon,\forall x\} En(f):={P(x)∣degP⩽n,∣f−P∣<ε,∀x}.
We want a minimal ε \varepsilon ε, which polynomial is the best?
De facto, Tschebyscheff’s method is the best. It’s not unwarranted that we have a noun Tschebyscheff’s best approximation in high school.