mysql常用单词集,确定最常用的单词集php mysql

I'm trying to figure out how to go about determining the most used words on a mysql dataset.

Not sure how to go about this or if there's a simpler approach. Read a couple posts where some suggests an algorithm.

Example:

From 24,500 records, find out the top 10 used words.

解决方案

Right, this runs like a dog and is limited to working with a single delimiter, but hopefully will give you an idea.

SELECT aWord, COUNT(*) AS WordOccuranceCount

FROM (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(concat(SomeColumn, ' '), ' ', aCnt), ' ', -1) AS aWord

FROM SomeTable

CROSS JOIN (

SELECT a.i+b.i*10+c.i*100 + 1 AS aCnt

FROM integers a, integers b, integers c) Sub1

WHERE (LENGTH(SomeColumn) + 1 - LENGTH(REPLACE(SomeColumn, ' ', ''))) >= aCnt) Sub2

WHERE Sub2.aWord != ''

GROUP BY aWord

ORDER BY WordOccuranceCount DESC

LIMIT 10

This relies on having a table called integers with a single column called i with 10 rows with the values 0 to 9. It copes with up to ~1000 words but can easily be altered to cope with more (but will slow down even more).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值