实现一个违规词匹配的功能。
像一些名称检测,弹幕检测,都会有这么一个匹配,输入的内容与违规词库进行匹配,只要匹配出来一个就不允许通过,那具体怎么实现呢。
首先是违规词库的录入,一般开发过程违规词库肯定是要放到数据库的,这里为了方便,就放到本地的资源文件中。
然后就是项目启动自动录入
@Configuration
public class SensitiveWordsConfig {
@Value("${spring.sensitive}")
String sensitive;
@Bean
public void sensitiveWordsUtils(){
Set<String> sensitiveMap = new HashSet<>();
String[] sensitiveArr = sensitive.split("/");
for(String s : sensitiveArr){
sensitiveMap.add(s);
}
SensitiveWordsUtils.initKeyWord(sensitiveMap);
System.out.println(SensitiveWordsUtils.sensitiveWordsMap);
}
}
最后就是录入方法initKeyWord,我们默认把英文全部转为小写
public class SensitiveWordsUtils {
public static HashMap sensitiveWordsMap = new HashMap();
public static int minMatchTYpe = 1;
public static Map initKeyWord(Set<String> keySets) {
try {
addSensitiveWordToHashMap(keySets);
} catch (Exception e) {
e.printStackTrace();
}
return sensitiveWordsMap;
}
public static void addSensitiveWordToHashMap(Set<String> keyWordSet) {
if (sensitiveWordsMap.size() == 0) {
String key = null;
Map resultMap = null;
Map<String, String> newWordMap = null;
Iterator<String> iterator = keyWordSet.iterator();
while (iterator.hasNext()) {
key = iterator.next();
resultMap = sensitiveWordsMap;
if (StringUtils.isNotBlank(key)) {
for (int i = 0; i < key.length(); i++) {
char keyChar = key.charAt(i);
if((keyChar+"").matches("^[a-zA-Z]*")){
keyChar = toLowerCase(keyChar);
}
Object wordMap = resultMap.get(keyChar);
if (wordMap != null) {
resultMap = (Map) wordMap;
} else {
newWordMap = new HashMap<String, String>();
newWordMap.put("isEnd", "0");
resultMap.put(keyChar, newWordMap);
resultMap = newWordMap;
}
if (i == key.length() - 1) {
resultMap.put("isEnd", "1");
}
}
}
}
}
}
}
这里顺便讲讲录入的结构吧,就是Map<String,Map>类型,这里输出一下就清楚很多了
{p={q={isEnd=0, o={s={isEnd=1}, isEnd=0}}, isEnd=0}, a={s={d={q={w={g={h={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=1}, j={i={q={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, u={d={q={isEnd=0, o={isEnd=1}}, isEnd=0}, isEnd=0}, h={u={isEnd=0, o={z={x={d={q={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}}, isEnd=0}, isEnd=0}, isEnd=0}, q={a={i={a={isEnd=0, m={g={isEnd=1}, isEnd=0}}, isEnd=0}, isEnd=0}, isEnd=0}, b={isEnd=0, m={c={isEnd=0, n={i={a={isEnd=1}, isEnd=0}, isEnd=0}}, isEnd=0}}, s={d={g={j={i={isEnd=0, o={e={isEnd=1}, isEnd=0}}, isEnd=0}, isEnd=0}, isEnd=0}, h={i={t={g={isEnd=1}, isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, c={j={v={i={p={z={q={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0, o={d={isEnd=0, n={e={q={isEnd=1}, isEnd=0}, isEnd=0}}, isEnd=0}}, g={isEnd=0, o={isEnd=0, o={d={isEnd=1}, isEnd=0}}}, h={e={l={l={isEnd=0, o={isEnd=1}}, isEnd=0}, isEnd=0}, u={i={h={s={a={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, x={x={i={isEnd=0, o={f={w={q={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}}, isEnd=0}, isEnd=0}, j={a={h={q={e={r={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, u={i={q={isEnd=1}, isEnd=0}, isEnd=0}, i={q={p={4={5={5={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, x={c={b={i={q={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, j={i={s={h={g={u={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, isEnd=0}, k={isEnd=0, o={w={isEnd=1}, isEnd=0}}, isEnd=0}, z={c={q={w={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}, n={isEnd=0, m={z={h={a={isEnd=1}, isEnd=0}, isEnd=0}, isEnd=0}}}
为了方便演示,我们把违规词库弄简单一点,只保留Hello,Good与Goods
{g={isEnd=0, o={isEnd=0, o={d={s={isEnd=1}, isEnd=1}, isEnd=0}}}, h={e={l={l={isEnd=0, o={isEnd=1}}, isEnd=0}, isEnd=0}, isEnd=0}}
我们可以看到map.get(“g”) = { isEnd = 0 , o}, isEnd = 0 就是违规词不是最后一个,还要继续匹配,isEnd = 1 就是已经匹配结束了。好,然后在这个基础上再map.get(“g”).get(“o”) = {isEnd=0,d}, 这样大概就了解基本的结构了。
那么大概的匹配规则我们也理得差不多了,既然要匹配,那么那些什么特殊字符空格什么的,就都要去掉啦,然后再匹配
public static Set<String> getAllSensitiveWord(@NonNull String txt, int matchType){
Set<String> sensitiveWordList = new HashSet<>();
txt = replaceAllSymbol(txt);
if (StringUtils.isNotBlank(txt)) {
for (int i = 0; i < txt.length(); i++) {
int length = CheckSensitiveWord(txt, i, matchType);
if (length > 0) {
sensitiveWordList.add(txt.substring(i, i + length));
// i = i + length - 1;
}
}
}
return sensitiveWordList;
}
public static Set<String> getSensitiveWord(@NonNull String txt, int matchType) {
Set<String> sensitiveWordList = new HashSet<>();
txt = replaceAllSymbol(txt);
if (StringUtils.isNotBlank(txt)) {
for (int i = 0; i < txt.length(); i++) {
int length = CheckSensitiveWord(txt, i, matchType);
if (length > 0) {
sensitiveWordList.add(txt.substring(i, i + length));
// i = i + length - 1;
break;
}
}
}
return sensitiveWordList;
}
public static int CheckSensitiveWord(String txt, int beginIndex, int matchType) {
boolean flag = false;
int matchFlag = 0;
char word = 0;
Map nowMap = sensitiveWordsMap;
for (int i = beginIndex; i < txt.length(); i++) {
word = txt.charAt(i);
if((word+"").matches("^[A-Z]*")){
word = toLowerCase(word);
}
nowMap = (Map) nowMap.get(word);
if (nowMap != null) {
matchFlag++;
if ("1".equals(nowMap.get("isEnd"))) {
flag = true;
if (minMatchTYpe == matchType) {
break;
}
}
} else {
break;
}
}
if (!flag) {
matchFlag = 0;
}
return matchFlag;
}
// 去除特殊字符
public static String replaceAllSymbol(String txt){
return txt.replaceAll("[\\W]","");
}