"Exposure" and offset[edit]
Poisson regression may also be appropriate for rate data, where the rate is a count of events divided by some measure of that unit's exposure (a particular unit of observation). For example, biologists may count the number of tree species in a forest: events would be tree observations, exposure would be unit area, and rate would be the number of species per unit area. Demographers may model death rates in geographic areas as the count of deaths divided by person−years. More generally, event rates can be calculated as events per unit time, which allows the observation window to vary for each unit. In these examples, exposure is respectively unit area, person−years and unit time. In Poisson regression this is handled as an offset, where the exposure variable enters on the right-hand side of the equation, but with a parameter estimate (for log(exposure)) constrained to 1.
- {\displaystyle \log(\operatorname {E} (Y\mid x))=\log({\text{exposure}})+\theta 'x}
which implies
- {\displaystyle \log(\operatorname {E} (Y\mid x))-\log({\text{exposure}})=\log \left({\frac {\operatorname {E} (Y\mid x)}{\text{exposure}}}\right)=\theta 'x}
Offset in the case of a GLM in R can be achieved using the offset() function:
glm(y ~ offset(log(exposure)) + x, family=poisson(link=log) )