Mobius Inversion
This is an especially important method when dealing with functions that are difficult to sum, often it reduces the time complexity of the algorithm used from at least quadratic to linear(and even faster).
In this blog entry, I will be focusing on the theoretic proof of the result for Möbius Inversion(and a similar use for euler’s totient function), serving to deepen my own understanding about the topic. Of course, it will be heartening to see if anyone find this article helpful in any way.
Catalog
Prerequsite Knowledge
Lemmas
-
The Integer Division Lemma
∀ x , y , z ∈ Z + , ⌊ x y z ⌋ = ⌊ ⌊ x y ⌋ z ⌋ \forall x,y,z \in Z^+ , \lfloor \frac{x}{yz}\rfloor=\lfloor \frac{\lfloor\frac{x}{y}\rfloor}{z}\rfloor ∀x,y,z∈Z+,⌊yzx⌋=⌊z⌊yx⌋⌋
Proof: Let x=axy+by+c(b ≤ \leq ≤x, c ≤ \leq ≤y), then we have LHS=RHS=a. -
The Harmonic Lemma
∑ i = 1 n ⌊ n i ⌋ ≈ n l n ( n ) \sum^{n}_{i=1}\lfloor\frac {n}{i}\rfloor \approx nln(n) i=1∑n⌊in⌋≈nln(n)
Proof: when n is large(which is usually so for large testcases),
∑ i = 1 n ⌊ n i ⌋ ≈ ∫ 0 n n x d x = n l n ( n ) \sum^{n}_{i=1}\lfloor\frac {n}{i}\rfloor \approx \int^{n}_{0}\frac{n}{x}dx=nln(n) i=1∑n⌊in⌋≈∫0nxndx=nln(n)
since the “discrete” difference of 1 is so insignificant compared to n that the summation is almost continuous.clearly this is not a rigid result, yet it works well enough for approximation of time complexity
Multiplicative Functions
A multiplicative function is an arithmetic function f(n) of a positive integer n with the property that f(1) = 1 and whenever a and b are coprime, then f(a)f(b)=f(ab)
The two functions that we will be using a lot later, namely the Möbius function μ \mu μ as well as the Euler’s totient function φ \varphi φ, are both multiplicative, hence they can be precomputed in O(n) time using linear sieve. I will not be proving the multiplicative properties of the two functions here(which can be easily found on wikipedia . The definitions of the two functions, however, are essential to know for better understanding of subsequent proof.
-
μ
\mu
μ(n) :the parity (−1 for odd, +1 for even) of the number of prime factors of square-free numbers; 0 if n is not square-free
-
φ
\varphi
φ(n) :counting the positive integers coprime to (but not bigger than) n
Dirichlet Convolution
If f,g are two arithmetic functions mapping from the positive integers to the complex numbers, the Dirichlet convolution f ∗ g is a new arithmetic function h defined by:
h
(
n
)
=
∑
d
∣
n
f
(
d
)
g
(
n
d
)
=
∑
a
b
=
n
f
(
a
)
g
(
b
)
h(n)=\sum_{d\mid n}f(d)g(\frac{n}{d})=\sum_{ab=n}f(a)g(b)
h(n)=d∣n∑f(d)g(dn)=ab=n∑f(a)g(b)
which can be more simply expressed as
h
=
f
∗
g
h=f*g
h=f∗g
This operation is commutative, associative and distributive.
Before we continue, let me introduce a few more important funtions:
ε
(
x
)
=
{
1
x
=
1
0
o
t
h
e
r
w
i
s
e
\varepsilon(x)= \begin{cases} 1 & x=1\\ 0 & otherwise \end{cases}
ε(x)={10x=1otherwise
This is know as the unit function. A very important property about
ε
\varepsilon
εis that
f
∗
ε
=
f
f*\varepsilon=f
f∗ε=f
for any arithmetic funtion f (from the expression for Dirichlet Convolution we can see clearly that only f(n) is contributing to the final sum, the only case when
ε
\varepsilon
ε(
n
d
\frac{n}{d}
dn)=
ε
\varepsilon
ε(1)
≠
\ne
= 0)
Another function is the identity function id, as the name suggests, it always returns the same value that is used as its argument, that is id(x)=x for all x in the domain.
Möbius function μ \mu μ
Let us revise the definition for
μ
\mu
μ first
-
μ
\mu
μ(n) :the parity (−1 for odd, +1 for even) of the number of prime factors of square-free numbers; 0 if n is not square-free
Specificly, we have
μ
(
x
)
=
{
1
x
=
1
0
x contains a square of some prime factor
(
−
1
)
k
x contains k prime factors
\mu(x)= \begin{cases} 1 &\text x=1\\ 0 &\text {x contains a square of some prime factor}\\ (-1)^k &\text {x contains k prime factors} \end{cases}
μ(x)=⎩⎪⎨⎪⎧10(−1)kx=1x contains a square of some prime factorx contains k prime factors
One of the most interesting(and important) properties about
μ
\mu
μ is that we have
∑
d
∣
n
μ
(
d
)
=
{
1
n=1
0
o
t
h
e
r
w
i
s
e
\sum_{d\mid n}\mu(d)= \begin{cases} 1 &\text {n=1}\\ 0 &\text otherwise \end{cases}
d∣n∑μ(d)={10n=1otherwise
To prove this,
Let
n
=
∏
p
i
q
i
consider the contribution of each of the divisor of n to the sum
∑
d
∣
n
μ
(
d
)
we can see that only those with at most 1 copy of each of n’s prime factor contributes
(
−
1
)
number of prime factors
to the sum.
Suppose n has k different prime factors, for each possible contribution
(
−
1
)
c
,
there are
(
k
c
)
different divisor of n that contributes this amount to the final sum.
\begin{aligned} &\ \text {Let }n=\prod p_i^{q_i}\\ &\ \text{consider the contribution of each of the divisor of n to the sum} \sum_{d\mid n}\mu(d)\\&\ \text{we can see that only those with at most 1 copy of each of n's prime factor contributes}\\ &\ (-1)^\text{number of prime factors} \text{ to the sum.}\\ &\ \text{Suppose n has k different prime factors, for each possible contribution} (-1)^c,\\ &\ \text{there are } {k\choose c} \text{ different divisor of n that contributes this amount to the final sum.} \end{aligned}
Let n=∏piqi consider the contribution of each of the divisor of n to the sumd∣n∑μ(d) we can see that only those with at most 1 copy of each of n’s prime factor contributes (−1)number of prime factors to the sum. Suppose n has k different prime factors, for each possible contribution(−1)c, there are (ck) different divisor of n that contributes this amount to the final sum.
Therefore,
∑
d
∣
n
μ
(
d
)
=
∑
c
=
0
k
(
k
c
)
(
−
1
)
c
=
∑
c
=
0
k
(
k
c
)
(
−
1
)
c
1
k
−
c
=
(
−
1
+
1
)
k
=
{
1
k=0, of which the only case is n=1
0
o
t
h
e
r
w
i
s
e
\begin{aligned} \sum_{d\mid n}\mu(d)&\ =\sum_{c=0}^{k} {k\choose c}(-1)^c\\ &\ =\sum_{c=0}^{k} {k\choose c}(-1)^c1^{k-c}\\ &\ =(-1+1)^k\\ &\ = \begin{cases} 1 &\ \text{k=0, of which the only case is n=1}\\ 0 & otherwise \end{cases} \end{aligned}
d∣n∑μ(d) =c=0∑k(ck)(−1)c =c=0∑k(ck)(−1)c1k−c =(−1+1)k ={10 k=0, of which the only case is n=1otherwise
Recall the definition for
ε
\varepsilon
ε, thus we have
ϵ
=
μ
∗
1
\epsilon=\mu*1
ϵ=μ∗1
This is the basis for the Möbius inversion. Now we can convert the summation of certain tricky arithmetic functions into that of mobius function. I shall demonstrate with one example: suppose we want to calculate the number of pairs of coprime positive integers from 1 to n. In mathematical form, this is equivalent as computing
∑
i
=
1
n
∑
j
=
1
n
[
g
c
d
(
i
,
j
)
=
1
]
where gcd denotes the largest common divisor
\sum_{i=1}^n \sum_{j=1}^n[gcd(i,j)=1]\\ \text{where gcd denotes the largest common divisor}
i=1∑nj=1∑n[gcd(i,j)=1]where gcd denotes the largest common divisor
Notice that we have used the boolean value of a statement here, which is actually equivalent to using the unit function
ε
\varepsilon
ε.
Thus, we have
∑
i
=
1
n
∑
j
=
1
n
[
g
c
d
(
i
,
j
)
=
1
]
=
∑
i
=
1
n
∑
j
=
1
n
ε
(
g
c
d
(
i
,
j
)
)
=
∑
i
=
1
n
∑
j
=
1
n
∑
d
∣
g
c
d
(
i
,
j
)
μ
(
d
)
\begin{aligned} &\ \sum_{i=1}^n \sum_{j=1}^n[gcd(i,j)=1]\\ = &\ \sum_{i=1}^n \sum_{j=1}^n \varepsilon(gcd(i,j))\\ = &\ \sum_{i=1}^n \sum_{j=1}^n \sum_{d\mid gcd(i,j)}\mu(d)\\ \end{aligned}
== i=1∑nj=1∑n[gcd(i,j)=1] i=1∑nj=1∑nε(gcd(i,j)) i=1∑nj=1∑nd∣gcd(i,j)∑μ(d)
Now we consider each d first before corresponding i and j
for each d, corresponding i and j must be in the form of
i
=
x
∗
d
,
y
∗
d
thus we can enumerate x and y in place of i and j
\begin{aligned} &\ \text{Now we consider each d first before corresponding i and j}\\ &\ \text{for each d, corresponding i and j must be in the form of }i=x*d,y*d\\ &\ \text{thus we can enumerate x and y in place of i and j} \text{} \end{aligned}
Now we consider each d first before corresponding i and j for each d, corresponding i and j must be in the form of i=x∗d,y∗d thus we can enumerate x and y in place of i and j
Thus,
∑
i
=
1
n
∑
j
=
1
n
∑
d
∣
g
c
d
(
i
,
j
)
μ
(
d
)
=
∑
d
=
1
n
μ
(
d
)
∑
x
=
1
⌊
n
d
⌋
∑
y
=
1
⌊
n
d
⌋
1
=
∑
d
=
1
n
μ
(
d
)
⌊
n
d
⌋
2
\begin{aligned} &\ \sum_{i=1}^n \sum_{j=1}^n \sum_{d\mid gcd(i,j)}\mu(d)\\ =&\ \sum_{d=1}^n \mu(d)\sum_{x=1}^{\lfloor \frac{n}{d}\rfloor} \sum_{y=1}^{\lfloor \frac{n}{d}\rfloor}1\\ =&\ \sum_{d=1}^n \mu(d){\lfloor \frac{n}{d}\rfloor}^2 \end{aligned}
== i=1∑nj=1∑nd∣gcd(i,j)∑μ(d) d=1∑nμ(d)x=1∑⌊dn⌋y=1∑⌊dn⌋1 d=1∑nμ(d)⌊dn⌋2
We can precompute the prefix sum of
μ
\mu
μ in O(n) time using linear sieve since
μ
\mu
μ is a multiplicative function, then we can calculate d with the same
⌊
\lfloor
⌊
n
d
\frac{n}{d}
dn
⌋
\rfloor
⌋ altogether, improving time complexity of each query to O(
n
\sqrt{n}
n).
Euler’s totient Function φ \varphi φ
Recall the definition for
φ
\varphi
φ:
-
φ
\varphi
φ(n) :counting the positive integers coprime to (but not bigger than) n
Similar to
μ
\mu
μ, we have
φ
∗
1
=
i
d
\varphi*1=id
φ∗1=id
To prove this, for any positive integer n, consider the n fractions
i
n
\frac{i}{n}
ni with i ranging from 1 to n. For each such fraction, its most simplified form is
p
d
\frac{p}{d}
dp, where gcd(d,p)=1, and d is a divisor of n. For every divisor of n, there are
φ
\varphi
φ(d) such fractions. Hence, if we sum up the number of such fractions for every positive divisor of n, we get back the original n fractions.
Thus,
∑
d
∣
n
φ
(
d
)
=
n
,
hence
φ
∗
1
=
i
d
\sum _{d\mid n}\varphi(d)=n,\\ \text{hence }\varphi*1=id
d∣n∑φ(d)=n,hence φ∗1=id
This is more useful than Möbius inversion in certain scenarios, for example, when we want to compute the sum of greatest common divisor of all distinct pairs of integer from 1 to n.
Notice that if we do a dirichlet convolution on both sides of the expression, we have
φ
∗
1
∗
μ
=
i
d
∗
μ
φ
∗
ε
=
i
d
∗
μ
φ
=
i
d
∗
μ
\varphi *1*\mu=id*\mu \\ \varphi*\varepsilon=id*\mu\\ \varphi=id*\mu
φ∗1∗μ=id∗μφ∗ε=id∗μφ=id∗μ
This facilitates the conversion between the two.