问题描述
在对从网页爬取的字符串做处理时,发现掺杂了形如**ˈ**的乱码,示例如下:
"Mother Mary Teresa Bojaxhiu[6] (born Anjezë Gonxhe Bojaxhiu, Albanian: [aˈɲɛzə ˈɡɔndʒɛ bɔjaˈdʒiu]; 26 August 1910 – 5 September 1997), honoured in the Catholic Church as Saint Teresa of Calcutta,[7] was an Albanian-Indian[4] Roman Catholic nun and missionary."
清洗目标
"Mother Mary Teresa Bojaxhiu[6] (born Anjezë Gonxhe Bojaxhiu, Albanian: [aˈɲɛzə ˈɡɔndʒɛ bɔjaˈdʒiu]; 26 August 1910 – 5 September 1997), honoured in the Catholic Church as Saint Teresa of Calcutta,[7] was an Albanian-Indian[4] Roman Catholic nun and missionary."