While other answers already fulfilled the question (it's a 3 years old question after all), I'm just gonna add some info, and probably fixed a bit of misunderstanding.
Em, while originally meant as the term for a single 'M' character's width in typography, in digital medium it was shifted to a unit relative to the point size of the typeface (font-size or textSize), in other words it's uses the height of the text, not the width of a single 'M'.
In Android, that means when you specify the ems of a TextView, it uses the said TextView's textSize as the base, excluding the added padding for accents/diacritics. When you set a 16sp TextView's ems to 4, it means its width will be 64sp wide, thus explained @stefan 's comment about why a 10 ems wide EditText is able to fit 17 'M'.