which are referred to as"failure" and "success".
The Bernoulli has a single parameter which defines the probability of observinga success x = 1.
and we will sometimes use the equivalent notation
Now suppose we have a data set of observed values of x. We can construct the likelihood function, which is a function of ,on the assumption that the observations are drawn independently from , so that
In a frequentist setting, we can estimate a value for by maximizing the likelihood function, or equivalently by maximizing the logarithm of the likelihood. In the case of the Bernoulli distribution, the log likelihood function is given by
we obtain the maximum likelihood estimator