1.原来的分词结果
词库内容包含特殊字符:
测试String:志向(心理学)芳香-L-氨基酸脱羧酶类,atp柠檬酸(pro-S)裂合酶
结果:
[志向, 心理学, 芳香, l-, 氨基酸, 脱羧, 酶类, atp, 柠檬酸, pro-s, 裂合酶]
2.修改IKAnaylzer的org.wltea.analyzer.core.CharacterUtil
词库内容包含特殊字符:
words.
add(
"志向(心理学)");
words.
add(
"芳香-L-氨基酸脱羧酶类");
words.
add(
"atp柠檬酸(pro-S)裂合酶");
结果:
[志向, 心理学, 芳香, l-, 氨基酸, 脱羧, 酶类, atp, 柠檬酸, pro-s, 裂合酶]
2.修改IKAnaylzer的org.wltea.analyzer.core.CharacterUtil
/**
* IK 中文分词 版本 5.0
* IK Analyzer release 5.0
*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* 源代码由林良益(linliangyi2005
@gmail
.com)提供
* 版权声明 2012,乌龙茶工作室
* provided by Linliangyi and copyright 2012 by Oolong studio
*
* 字符集识别工具类
*/
package
org.
wltea.
analyzer.
core;
/**
*
* 字符集识别工具类
*/
class
CharacterUtil {
public
static
final
int
CHAR_USELESS
=
0;
public
static
final
int
CHAR_ARABIC
=
0X00000001;
public
static
final
int
CHAR_ENGLISH
=
0X00000002;
public
static
final
int
CHAR_CHINESE
=
0X00000004;
public
static
final
int
CHAR_OTHER_CJK
=
0X00000008;
// Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS : 4E00-9FBF:CJK 统一