从字符串中提取多个关键字对应的内容

菜鸟非鱼

已于 2024-03-18 16:08:14 修改

阅读量438

点赞数 9

文章标签： java

于 2024-03-18 16:07:47 首次发布

本文链接：https://blog.csdn.net/yimengxianren/article/details/136812529

版权

本文介绍了如何从包含关键字和对应值的字符串中，通过编程方式找到关键字与其内容，包括获取关键字在字符串中的下标列表、截取关键字对应的值，并将结果转换为Map以处理重复关键字的情况。

摘要由CSDN通过智能技术生成

一、场景

有一个内容包含关键字和对应值的字符串，例：A123C456B678E566,

已知关键字是 A，B，C，D。

要求从上述字符串中提取出关键字对应的内容，规律关键字 A 和 C 之间的内容即为 A 的对应值。其中关键字出现的顺序和次数不固定

二、解决方式

先根据关键字去匹配字符串内容，获得关键字所在坐标，两个坐标之间即为关键字和关键字内容，匹配出来的关键字为 key，剩余的为 value

三、代码示例

1. 获取所有的关键字在字符串中的下标

/**
 * 获取关键字在字符串中的下标
 *  遍历关键字在字符串中的下标，记录在list中
 *  如果有同一关键字在字符串中出现了多次，那么在list记录多次
 *
 * @param content 字符串
 * @param keyList 关键字List
 * @return
 */
public static List<MailDetailDto> getIndexByContent(String content, List<String> keyList) {
    List<MailDetailDto> indexDtoList = new ArrayList<>();
    for (String key : keyList) {
        String contentStr = content;
        //最新坐标
        int indexNew = 0;
        while (contentStr.indexOf(key) > 0) {
            int index = contentStr.indexOf(key);
            if (index > 0 && index < content.length()) {
                MailDetailDto mailDetailDto = new MailDetailDto(key, index + indexNew, null);
                indexDtoList.add(mailDetailDto);
            }
            contentStr = contentStr.substring(index + key.length());
            indexNew += index + key.length();
        }


    }
    return indexDtoList;
}

MailDetailDto 对象

@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
public class MailDetailDto {


    /**
     * 对应关键字.
     */
    private String key;


    /**
     * 在字符串中的下标.
     */
    private int index;


    /**
     * 对应值.
     */
    private String value;


}

2. 截取关键字对应的值

/**
 * 解析字符串内容
 *
 * @param mailDetailDtoList 解析对象（包含key和index）
 * @param content           目标字符串
 * @return
 */
public static List<MailDetailDto> buildMailDetailDto(List<MailDetailDto> mailDetailDtoList, String content) {
    if (StringUtils.isBlank(content)) {
        log.error("内容为空");
        return new ArrayList<>();
    }
    //排序
    List<MailDetailDto> collect = mailDetailDtoList.stream().sorted(Comparator.comparing(MailDetailDto::getIndex)).collect(Collectors.toList());
    for (int i = 0; i < collect.size(); i++) {
        MailDetailDto indexDto = collect.get(i);
        int indexBegin = indexDto.getIndex();
        //增加关键字长度
        indexBegin += indexDto.getKey().length();


        int indexEnd;
        if ((i + 1) == collect.size()) {
            indexEnd = content.length();
        } else {
            indexEnd = collect.get(i + 1).getIndex();
        }


        //防止一个关键字包含其他关键字的情况：例：关键字GPII STATUS 包含关键字 STATUS
        String value = "";
        if (indexBegin <= indexEnd) {
            value = content.substring(indexBegin, indexEnd);
        }
        // 去掉&nbsp;
        value = value.replace("&nbsp;", "");
        //去掉前后空格
        collect.get(i).setValue(StringUtils.isBlank(value) ? value : value.trim());
    }
    return collect;
}

3. 将提取到的值转为 map 方便取值，如有 key 重复，取第一个的值

// list 转map； 有相同的key时，如果前面的value不为空取前面的，前面为空取后边
Map<String, String> map = mailDetailDtoList.stream().collect(Collectors.toMap(MailDetailDto::getKey, MailDetailDto::getValue, (String value1, String value2) -> {
    if (StringUtils.isNotEmpty(value1)) {
        return value1;
    }else {
        return value2;
    }
}));