求出在字符串中连续出现最多的子串是什么和出现的次数(JAVA实现)

今天看到这样一个题:

请统计出以下这段文字中,出现频率最高的二元字符(两个字符)的组合。

(举例:在字符串“1252336528952”中,二元字符组合“52”出现3次,频率最高。)

oneofthecentralresultsofairesearchinthe1970swasthattoachievegoodperformanceaisystemsmusthav
elargeamountsofknowledgeknowledgeispowertheslogangoeshumansclearlyusevastamountsofknowledge
andifaiistoachieveitslongtermgoalsaisystemsmustalsousevastamountssincehandcodinglargeamount
sofknowledgeintoasystemisslowtediousanderrorpronemachinelearningtechniqueshavebeendeveloped
toautomaticallyacquireknowledgeoftenintheformofifthenrulesproductionsunfortunatelythishasof
tenledtoautilityproblemminton1988bthelearninghascausedanoverallslowdowninthesystemforexampl
einmanysystemslearnedrulesareusedtoreducethenumberofbasicstepsthesystemtakesinordertosolvep
roblemsbypruningthesystemssearchspaceforinstancebutinordertodetermineateachstepwhichrulesar
eapplicablethesystemmustmatchthemagainstitscurrentsituationusingcurrenttechniquesthematcher
slowsdownasmoreandmorerulesareacquiredsoeachsteptakeslongerandlongerthisectcanoutweighthere
ductioninthenumberofstepstakensothatthenetresultisaslowdownthishasbeenobservedinseveralrece
ntsystemsminton1988aetzioni1990tambeetal1990cohen1990ofcoursetheproblemofslowdownfromincrea
singmatchcostisnotrestrictedtosystemsinwhichthepurposeofrulesistoreducethenumberofproblemso
lvingstepsasystemacquiringnewrulesforanypurposecanslowdowniftherulessignicantlyincreasethem
atchcostandintuitivelyoneexpectsthatthemoreproductionsthereareinasystemthehigherthetotalmat
chcostwillbethethesisofthisresearchisthatwecansolvethisprobleminabroadclassofsystemsbyimpro
vingthematchalgorithmtheyuseinessenceouraimistoenablethescalingupofthenumberofrulesinproduc
tionsystemsweadvancethestateoftheartinproductionmatchalgorithmsdevelopinganimprovedmatchalg
orithmwhoseperformancescaleswellonasignicantlybroaderclassofsystemsthanexistingalgorithmsfu
rthermorewedemonstratethatbyusingthisimprovedmatchalgorithmwecanreduceoravoidtheutilityprob
leminalargeclassofmachinelearningsystems

我觉的这道题很有意思,以前写过c语言的实现,现在用java来解决这个问题,直接看代码:

package com.company;

import java.io.*;
import java.util.*;
import java.util.List;

/**
 * * @projectName test
 * * @title Test3
 * * @package com.company
 * * @description  查找字符串中出现最多的子类
 * * @author IT_CREAT
 * * @date  2020 2020/5/24/024 22:32  
 * * @version 1.0.0
 */
public class Test3 {

    /**
     * 待测试的字符串
     */
    public static String testTtr = "oneofthecentralresultsofairesearchinthe1970swasthattoachievegoodper" +
            "formanceaisystemsmusthavelargeamountsofknowledgeknowledgeispowertheslogangoeshumansclearly" +
            "usevastamountsofknowledgeandifaiistoachieveitslongtermgoalsaisystemsmustalsousevastamounts" +
            "sincehandcodinglargeamountsofknowledgeintoasystemisslowtediousanderrorpronemachinelearning" +
            "techniqueshavebeendevelopedtoautomaticallyacquireknowledgeoftenintheformofifthenrulesprodu" +
            "ctionsunfortunatelythishasoftenledtoautilityproblemminton1988bthelearninghascausedanoveral" +
            "lslowdowninthesystemforexampleinmanysystemslearnedrulesareusedtoreducethenumberofbasicstep" +
            "sthesystemtakesinordertosolveproblemsbypruningthesystemssearchspaceforinstancebutinorderto" +
            "determineateachstepwhichrulesareapplicablethesystemmustmatchthemagainstitscurrentsituation" +
            "usingcurrenttechniquesthematcherslowsdownasmoreandmorerulesareacquiredsoeachsteptakeslonge" +
            "randlongerthisectcanoutweighthereductioninthenumberofstepstakensothatthenetresultisaslowdo" +
            "wnthishasbeenobservedinseveralrecentsystemsminton1988aetzioni1990tambeetal1990cohen1990ofc" +
            "oursetheproblemofslowdownfromincreasingmatchcostisnotrestrictedtosystemsinwhichthepurposeo" +
            "frulesistoreducethenumberofproblemsolvingstepsasystemacquiringnewrulesforanypurposecanslow" +
            "downiftherulessignicantlyincreasethematchcostandintuitivelyoneexpectsthatthemoreproduction" +
            "sthereareinasystemthehigherthetotalmatchcostwillbethethesisofthisresearchisthatwecansolvet" +
            "hisprobleminabroadclassofsystemsbyimprovingthematchalgorithmtheyuseinessenceouraimistoenab" +
            "lethescalingupofthenumberofrulesinproductionsystemsweadvancethestateoftheartinproductionma" +
            "tchalgorithmsdevelopinganimprovedmatchalgorithmwhoseperformancescaleswellonasignicantlybro" +
            "aderclassofsystemsthanexistingalgorithmsfurthermorewedemonstratethatbyusingthisimprovedmat" +
            "chalgorithmwecanreduceoravoidtheutilityprobleminalargeclassofmachinelearningsystems";

    /**
     * 用作返回map的key
     */
    public enum ReturnKey {
        COUNT, SUBSTRINGS
    }

    /**
     * 找出文本文件中出现最多的字串的集合
     *
     * @param chainNumber 连续多少个字符算一个字串,也就是字串这个单词的长度
     * @param filePath    需要读取文件路径
     * @return 出现最多的字串的集合和次数
     */
    public static Map<ReturnKey, Object> searchMostSubstringsByFile(int chainNumber, String filePath) {
        List<String> mostSubstrings = new ArrayList<>();
        Map<ReturnKey, Object> returnMap = new LinkedHashMap<>(2);
        returnMap.put(ReturnKey.COUNT, 0);
        returnMap.put(ReturnKey.SUBSTRINGS, mostSubstrings);
        if (strIsEmpty(filePath)) {
            return returnMap;
        }
        File file = new File(filePath);
        if (file.exists()) {
            FileReader fileReader = null;
            try {
                fileReader = new FileReader(file);
                char[] readChar = new char[1024];
                StringBuilder waitParsingStr = new StringBuilder();
                int readLength = 0;
                while ((readLength = fileReader.read(readChar)) != -1) {
                    waitParsingStr.append(readChar, 0, readLength);
                }
                return searchMostSubstrings(chainNumber, waitParsingStr.toString());
            } catch (IOException e) {
                System.out.println(e.getMessage());
            } finally {
                try {
                    if (fileReader != null) {
                        fileReader.close();
                    }
                } catch (IOException e) {
                    System.out.println(e.getMessage());
                }
            }
        }
        return returnMap;
    }

    /**
     * 找出字符产中出现做多的字串集合
     *
     * @param chainNumber    连续多少个字符算一个字串,也就是字串这个单词的长度
     * @param waitParsingStr 需要被解析的字符串
     * @return 出现最多的字串的集合和次数
     */
    public static Map<ReturnKey, Object> searchMostSubstrings(int chainNumber, String waitParsingStr) {
        //需要返回的查找出来的最多的字串的集合
        List<String> mostSubstrings = new ArrayList<>();
        Map<ReturnKey, Object> returnMap = new LinkedHashMap<>(2);
        returnMap.put(ReturnKey.COUNT, 0);
        returnMap.put(ReturnKey.SUBSTRINGS, mostSubstrings);
        //等待解析的字符串的长度
        int waitParsingStrSize = waitParsingStr.length();
        System.out.println("待解析字符串大小 : " + waitParsingStrSize + " , 待解析字符串内容 : " + waitParsingStr);
        if (strIsEmpty(waitParsingStr) || chainNumber > waitParsingStrSize) {
            return returnMap;
        }
        //最多字串的数量
        int mostSubstringCount = 0;
        //解析出来的所有字串的集合
        Set<String> substrings = new HashSet<>();
        //从字符串开头每个字符开始循环解析
        for (int i = 0; i < waitParsingStrSize; i++) {
            //如果查找字串所在的最后的索引小于待解析的字符串则取出该子字符串
            if (i + (chainNumber - 1) < waitParsingStrSize) {
                String substr = waitParsingStr.substring(i, i + chainNumber);
                //如果字串集合中已经包含了本次获取到的字串则跳出进行下一次字串解析
                if (substrings.contains(substr)) {
                    continue;
                }
                substrings.add(substr);
                //获得字串在待解析字符串中出现的次数
                int substrCount = countStr(waitParsingStr, substr);
                //如果当前获得的字串的数量大于之前出现的最大字串的数量,则清除之前的字串,添加当前的字串
                if (substrCount > mostSubstringCount) {
                    mostSubstrings.clear();
                    mostSubstrings.add(substr);
                } else if (substrCount == mostSubstringCount) {//如果当前获得的字串的数量等于之前出现的最大字串的数量,则添加当前的字串
                    mostSubstrings.add(substr);
                }
                //比较获取当前字串最大的次数进行临时赋值
                mostSubstringCount = Math.max(substrCount, mostSubstringCount);
            }
        }
        returnMap.put(ReturnKey.COUNT, mostSubstringCount);
        return returnMap;
    }

    /**
     * @param str     原字符串
     * @param sToFind 需要查找的字符串
     * @return 返回在原字符串中sToFind出现的次数
     */
    private static int countStr(String str, String sToFind) {
        int num = 0;
        while (str.contains(sToFind)) {
            str = str.substring(str.indexOf(sToFind) + sToFind.length());
            num++;
        }
        return num;
    }

    /**
     * 判断字符串是否为空
     *
     * @param str 需要判断的字符串
     * @return boolean值,为空返回true,不为空返回true
     */
    private static boolean strIsEmpty(String str) {
        return str == null || str.isEmpty();
    }

    public static void main(String[] args) {
        Map<ReturnKey, Object> returnKeyObjectMap1 = searchMostSubstrings(2, testTtr);
        System.out.println("字符串中出现子串出现最多的次数是 : " + returnKeyObjectMap1.get(ReturnKey.COUNT));
        System.out.println("字符串中出现最多的子串集合是 : " + returnKeyObjectMap1.get(ReturnKey.SUBSTRINGS));
        Map<ReturnKey, Object> returnKeyObjectMap2 = searchMostSubstringsByFile(2, "C:\\Users\\Administrator\\Desktop\\test\\src\\com\\company\\test.txt");
        System.out.println("字符串中出现子串出现最多的次数是 : " + returnKeyObjectMap2.get(ReturnKey.COUNT));
        System.out.println("字符串中出现最多的子串集合是 : " + returnKeyObjectMap2.get(ReturnKey.SUBSTRINGS));
    }

}

效果是这样的:

待解析字符串大小 : 1860 , 待解析字符串内容 : oneofthecentralresultsofairesearchinthe1970swasthattoachievegoodperformanceaisystemsmusthavelargeamountsofknowledgeknowledgeispowertheslogangoeshumansclearlyusevastamountsofknowledgeandifaiistoachieveitslongtermgoalsaisystemsmustalsousevastamountssincehandcodinglargeamountsofknowledgeintoasystemisslowtediousanderrorpronemachinelearningtechniqueshavebeendevelopedtoautomaticallyacquireknowledgeoftenintheformofifthenrulesproductionsunfortunatelythishasoftenledtoautilityproblemminton1988bthelearninghascausedanoverallslowdowninthesystemforexampleinmanysystemslearnedrulesareusedtoreducethenumberofbasicstepsthesystemtakesinordertosolveproblemsbypruningthesystemssearchspaceforinstancebutinordertodetermineateachstepwhichrulesareapplicablethesystemmustmatchthemagainstitscurrentsituationusingcurrenttechniquesthematcherslowsdownasmoreandmorerulesareacquiredsoeachsteptakeslongerandlongerthisectcanoutweighthereductioninthenumberofstepstakensothatthenetresultisaslowdownthishasbeenobservedinseveralrecentsystemsminton1988aetzioni1990tambeetal1990cohen1990ofcoursetheproblemofslowdownfromincreasingmatchcostisnotrestrictedtosystemsinwhichthepurposeofrulesistoreducethenumberofproblemsolvingstepsasystemacquiringnewrulesforanypurposecanslowdowniftherulessignicantlyincreasethematchcostandintuitivelyoneexpectsthatthemoreproductionsthereareinasystemthehigherthetotalmatchcostwillbethethesisofthisresearchisthatwecansolvethisprobleminabroadclassofsystemsbyimprovingthematchalgorithmtheyuseinessenceouraimistoenablethescalingupofthenumberofrulesinproductionsystemsweadvancethestateoftheartinproductionmatchalgorithmsdevelopinganimprovedmatchalgorithmwhoseperformancescaleswellonasignicantlybroaderclassofsystemsthanexistingalgorithmsfurthermorewedemonstratethatbyusingthisimprovedmatchalgorithmwecanreduceoravoidtheutilityprobleminalargeclassofmachinelearningsystems
字符串中出现子串出现最多的次数是 : 53
字符串中出现最多的子串集合是 : [th]
待解析字符串大小 : 1860 , 待解析字符串内容 : oneofthecentralresultsofairesearchinthe1970swasthattoachievegoodperformanceaisystemsmusthavelargeamountsofknowledgeknowledgeispowertheslogangoeshumansclearlyusevastamountsofknowledgeandifaiistoachieveitslongtermgoalsaisystemsmustalsousevastamountssincehandcodinglargeamountsofknowledgeintoasystemisslowtediousanderrorpronemachinelearningtechniqueshavebeendevelopedtoautomaticallyacquireknowledgeoftenintheformofifthenrulesproductionsunfortunatelythishasoftenledtoautilityproblemminton1988bthelearninghascausedanoverallslowdowninthesystemforexampleinmanysystemslearnedrulesareusedtoreducethenumberofbasicstepsthesystemtakesinordertosolveproblemsbypruningthesystemssearchspaceforinstancebutinordertodetermineateachstepwhichrulesareapplicablethesystemmustmatchthemagainstitscurrentsituationusingcurrenttechniquesthematcherslowsdownasmoreandmorerulesareacquiredsoeachsteptakeslongerandlongerthisectcanoutweighthereductioninthenumberofstepstakensothatthenetresultisaslowdownthishasbeenobservedinseveralrecentsystemsminton1988aetzioni1990tambeetal1990cohen1990ofcoursetheproblemofslowdownfromincreasingmatchcostisnotrestrictedtosystemsinwhichthepurposeofrulesistoreducethenumberofproblemsolvingstepsasystemacquiringnewrulesforanypurposecanslowdowniftherulessignicantlyincreasethematchcostandintuitivelyoneexpectsthatthemoreproductionsthereareinasystemthehigherthetotalmatchcostwillbethethesisofthisresearchisthatwecansolvethisprobleminabroadclassofsystemsbyimprovingthematchalgorithmtheyuseinessenceouraimistoenablethescalingupofthenumberofrulesinproductionsystemsweadvancethestateoftheartinproductionmatchalgorithmsdevelopinganimprovedmatchalgorithmwhoseperformancescaleswellonasignicantlybroaderclassofsystemsthanexistingalgorithmsfurthermorewedemonstratethatbyusingthisimprovedmatchalgorithmwecanreduceoravoidtheutilityprobleminalargeclassofmachinelearningsystems
字符串中出现子串出现最多的次数是 : 53
字符串中出现最多的子串集合是 : [th]

本次编写的代码,可以通过指定组合字符的个数,不管是题中给出的2个还是更多或者更少,都可以查找出来,同时也可以查找文本文件中出现最多的组合字符。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: split()方法是java用于将一个字符串按照指定分隔符分割成多个字符串的方法。它返回一个字符串数组,其每个元素都包含被分隔符分隔开的一部分字符串。split()方法的使用非常简单,只需传入一个分隔符,即可将目标字符串分割成多个字符串。例如,可以使用以下代码将一个以逗号分隔的字符串分割成多个字符串: String str = "apple,banana,pear"; String[] arr = str.split(","); 这样,arr数组就包含了3个元素,分别为"apple"、"banana"和"pear"。如果目标字符串没有指定的分隔符,则返回一个只包含该字符串本身的数组。 ### 回答2: Java的split()方法是一个字符串方法,用于将一个字符串拆分成一个字符串数组。split()方法接收一个正则表达式作为参数,并根据该正则表达式将字符串拆分成多个字符串。 例如,我们有一个字符串"Hello,World",我们可以使用split()方法将其拆分成两个字符串"Hello"和"World"。我们可以采用以下方式使用split()方法: String str = "Hello,World"; String[] parts = str.split(","); // 使用逗号作为分隔符进行拆分 在这个例,我们首先创建了一个字符串"Hello,World",然后使用split()方法将其拆分成一个字符串数组。我们将逗号作为分隔符传递给split()方法,它会根据逗号将字符串拆分成两个字符串"Hello"和"World",然后将这些字符串存储在数组。 接下来,我们可以使用数组索引访问拆分后的字符串,例如: String part1 = parts[0]; // part1 = "Hello" String part2 = parts[1]; // part2 = "World" 此外,我们还可以使用限制参数来指定拆分的次数。例如: String str = "Hello,World,Goodbye"; String[] parts = str.split(",", 2); // 使用逗号作为分隔符进行拆分,最多拆分成两个字符串 在这个例,我们将限制参数设置为2,意味着我们只拆分字符串两次。结果将是一个包含两个元素的数组,第一个元素是"Hello",第二个元素是"World,Goodbye"。 总而言之,Java的split()方法是一个用于字符串拆分的有用方法。它接收一个正则表达式作为参数,并根据该正则表达式将字符串拆分成多个字符串,然后将这些字符串存储在一个字符串数组。我们还可以使用限制参数来指定最多拆分的次数。 ### 回答3: split()方法是Java的一个字符串方法,它用于将一个字符串分割成一个字符串数组。split()方法接受一个正则表达式作为参数,并根据该正则表达式将原始字符串分割成若干个子串。 使用split()方法的语法如下: String[] stringArray = str.split(regex); 其,str是要进行分割的原始字符串,regex是用于分割的正则表达式。 split()方法的返回值是一个字符串数组,该数组包含了根据正则表达式分割后的子串。 举一个例,假设有一个字符串str = "Hello World,How are you?",我们想要根据逗号和空格分割这个字符串,可以使用split()方法实现: String[] stringArray = str.split("[,\\s]+"); 在上述例,正则表达式"[,\\s]+"表示逗号和空格的一个或多个连续。这样,原始字符串会被分割成三个子串:"Hello", "World"和"How are you?"。 需要注意的是,split()方法返回的字符串数组的长度取决于分割后产生的子串数量。如果没有匹配到符合分割条件的子串,split()方法将返回原始字符串本身。 总结起来,split()方法可以根据正则表达式将一个字符串分割成一个字符串数组,为我们在处理字符串时提供了方便和灵活性。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值