之前写过的爬虫里面,因为种种原因出现了一些重复的数据需要删除掉。然后发现mysql并没有直接的去重功能,要自己写。
查过许多博客之后发现可以这么写。
删除ppeople 重复的数据,然后重复数据中保留id最小的那一条数据。
delete from people
where peopleId in (select peopleId from people group by peopleId having count(peopleId) > 1)
and rowid not in (select min(rowid) from people group by peopleId having count(peopleId )>1)
但是执行之后发现mysql不支持这么写。报错信息为:
You can't specify target table 'news' for update in FROM clause
查阅之后发现,应当把查询结果通过中间表再查询一遍才行。
修改为:
deletefrom news
wherenewsurl in (select NewsUrl from (select NewsUrl from news group by newsurlhaving count(newsurl) > 1) a)
andnewsid not in ( select newsid from (select min(newsid) as newsid from news group by newsurl havingcount(newsurl )>1) b)
运行成功