Coursera Algorithm Ⅱ week5 Burrows Wheeler

Bokk

于 2022-09-26 07:39:53 发布

阅读量154

点赞数

分类专栏： Coursera Algorithm

本文链接：https://blog.csdn.net/Joshmo/article/details/127046397

版权

Coursera Algorithm 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

全部代码 https://github.com/Joshmomel/Princeton_Algorithms/tree/main/burrows

Context

这次assignment主要是实现Burrows Wheeler的数据压缩算法, 具体来说有三个步骤
在这里插入图片描述

Implementation

CircularSuffixArray

这个CircularSuffixArray是为Burrows Wheeler做准备的, 目标就是把index[i]计算出来
在这里插入图片描述
注意我们只需要保存一个string的pointer就行, 不需要把所有index对应的string都创建出来, 我一开始是创建一个CircularSuffix的class, 不过由于performance没过后来直接就把CircularSuffix抛弃了

private final Integer[] indexList;

public CircularSuffixArray(String s) {
    if (s == null) throw new IllegalArgumentException("arg can not be null");

    indexList = new Integer[s.length()];
    for (int i = 0; i < s.length(); i++) {
      indexList[i] = i;
    }
    Comparator<Integer> cmp = (a1, a2) -> {
      for (int i = 0; i < s.length(); i++) {
        int i1 = (a1 + i) % s.length();
        int i2 = (a2 + i) % s.length();
        if (s.charAt(i1) == s.charAt(i2))
          continue;
        else
          return Character.compare(s.charAt(i1), s.charAt(i2));
      }
      return 0;
    };

    Arrays.sort(indexList, cmp);
  }

Burrows–Wheeler transform

Transform

Transform就是把 t 以及最初始的string的位置3计算出来
在这里插入图片描述
也就是

3
ARD!RCAAAABB

我们是知道第一个char的index[i], 这样反推最后一个char就是circularSuffixArray.index(i) - 1, 具体是

s.charAt((circularSuffixArray.index(i) - 1 + stringLength) % stringLength)

Inverse-Transform

inverse transform是这次assignment比较难得部分, 不过F&Q说了ou should find the key-indexed counting algorithm from the string sorting lecture to be especially useful, 发现确实很像

计算每个char出现的次数
计算每个char在next[i]

全部代码 https://github.com/Joshmomel/Princeton_Algorithms/tree/main/burrows

Context

这次assignment主要是实现Burrows Wheeler的数据压缩算法, 具体来说有三个步骤
在这里插入图片描述

Implementation

CircularSuffixArray

private final Integer[] indexList;

public CircularSuffixArray(String s) {
    if (s == null) throw new IllegalArgumentException("arg can not be null");

    indexList = new Integer[s.length()];
    for (int i = 0; i < s.length(); i++) {
      indexList[i] = i;
    }
    Comparator<Integer> cmp = (a1, a2) -> {
      for (int i = 0; i < s.length(); i++) {
        int i1 = (a1 + i) % s.length();
        int i2 = (a2 + i) % s.length();
        if (s.charAt(i1) == s.charAt(i2))
          continue;
        else
          return Character.compare(s.charAt(i1), s.charAt(i2));
      }
      return 0;
    };

    Arrays.sort(indexList, cmp);
  }

Burrows–Wheeler transform

Transform

Transform就是把 t 以及最初始的string的位置3计算出来
在这里插入图片描述
也就是

3
ARD!RCAAAABB

我们是知道第一个char的index[i], 这样反推最后一个char就是circularSuffixArray.index(i) - 1, 具体是

s.charAt((circularSuffixArray.index(i) - 1 + stringLength) % stringLength)

Inverse-Transform

inverse transform是这次assignment比较难得部分, 不过F&Q说了ou should find the key-indexed counting algorithm from the string sorting lecture to be especially useful, 发现确实很像

计算每个char出现的次数
计算每个char在next[i]中出现的位置 couont[]
把next[i]计算出来中出现的位置 couont[]
把next[i]计算出来

  public static void inverseTransform() {
    int first = BinaryStdIn.readInt();
    String s = BinaryStdIn.readString();
    String sortedString = sortString(s);

    int N = s.length();
    int R = 256;
    int[] cs = new int[R + 1];
    for (int i = 0; i < N; i++) {
      cs[s.charAt(i) + 1]++;
    }

    for (int r = 0; r < R; r++) {
      cs[r + 1] += cs[r];
    }

    int[] next = new int[N];
    for (int i = 0; i < N; i++) {
      next[cs[s.charAt(i)]++] = i;
    }

    int p = first;
    for (int i = 0; i < N; i++) {
      BinaryStdOut.write(sortedString.charAt(p));
      p = next[p];
    }


    BinaryStdOut.close();
  }

Move-to-front

encoding

Initialize the sequence by making the ith character in the sequence equal to the ith extended ASCII character. Now, read each 8-bit character c from standard input, one at a time; output the 8-bit index in the sequence where c appears; and move c to the front

读取8-bit character c
写下c对应的index
把c放在第一位

  public static void encode() {
    LinkedList<Character> sequence = new LinkedList<>();
    for (int i = 0; i < 256; i++) {
      sequence.add((char) i);
    }

    while (!BinaryStdIn.isEmpty()) {
     //读取8-bit character c
      char in = BinaryStdIn.readChar(8);
      int i = sequence.indexOf((in));
      // 写下c对应的index
      BinaryStdOut.write(i, 8);
      //把c放在第一位
      Character c = sequence.remove(i);
      sequence.addFirst(c);
    }

    BinaryStdOut.close();
  }

decoding

Initialize an ordered sequence of 256 characters, where extended ASCII character i appears ith in the sequence. Now, read each 8-bit character i (but treat it as an integer between 0 and 255) from standard input one at a time; write the ith character in the sequence; and move that character to the front. Check that the decoder recovers any encoded message

读取8-bit character i , 获取对应0-255对应的数字
写下来i
把i放到第一位

  public static void decode() {
    LinkedList<Character> sequence = new LinkedList<>();
    for (int i = 0; i < 256; i++) {
      sequence.add((char) i);
    }

    while (!BinaryStdIn.isEmpty()) {
      char in = BinaryStdIn.readChar(8);
      Character inChar = sequence.get((int) in);
      BinaryStdOut.write(inChar, 8);
      Character c = sequence.remove(in);
      sequence.addFirst(c);
    }

    BinaryStdOut.close();
  }