6.集合和映射

一. 集合基础和基于二分搜索树的集合实现


以上接口明显的都可以用之前的二分搜索树来实现

代码实现

新建项目Set, 将上一篇博客完成的BST.java放入:

.
├── Set.iml
└── src
    ├── BST.java
    ├── BSTSet.java
    ├── Main.java
    └── Set.java

接口Set.java

public interface Set<E> {

    void add(E e);
    void remove(E e);
    boolean contains(E e);
    int getSize();
    boolean isEmpty();
}

BST.java即上一篇博文中完成的二分搜索树
BSTSet.java

public class BSTSet<E extends Comparable<E>> implements Set<E> {
    private BST<E> bst;

    public BSTSet(){
        bst = new BST<>();
    }

    @Override
    public int getSize(){
        return bst.size();
    }

    @Override
    public boolean isEmpty(){
        return bst.isEmpty();
    }

    @Override
    public void add(E e){
        bst.add(e);
    }

    @Override
    public boolean contains(E e){
        return bst.contains(e);
    }

    @Override
    public void remove(E e){
        bst.remove(e);
    }
}

增加读取文件并分词的类FileOperation.java

import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.Locale;
import java.io.File;
import java.io.BufferedInputStream;
import java.io.IOException;

// 文件相关操作
public class FileOperation {

    // 读取文件名称为filename中的内容,并将其中包含的所有词语放进words
    public static boolean readFile(String filename, ArrayList<String> words){

        if (filename == null || words == null){
            System.out.println("filename is null or words is null");
            return false;
        }

        // 文件读取
        Scanner scanner;

        try {
            File file = new File(filename);
            if(file.exists()){
                FileInputStream fis = new FileInputStream(file);
                scanner = new Scanner(new BufferedInputStream(fis), "UTF-8");
                scanner.useLocale(Locale.ENGLISH);
            }
            else
                return false;
        }
        catch(IOException ioe){
            System.out.println("Cannot open " + filename);
            return false;
        }

        // 简单分词
        // 这个分词方式相对简陋, 没有考虑很多文本处理中的特殊问题
        // 在这里只做demo展示用
        if (scanner.hasNextLine()) {

            String contents = scanner.useDelimiter("\\A").next();

            int start = firstCharacterIndex(contents, 0);
            for (int i = start + 1; i <= contents.length(); )
                if (i == contents.length() || !Character.isLetter(contents.charAt(i))) {
                    String word = contents.substring(start, i).toLowerCase();
                    words.add(word);
                    start = firstCharacterIndex(contents, i);
                    i = start + 1;
                } else
                    i++;
        }

        return true;
    }

    // 寻找字符串s中,从start的位置开始的第一个字母字符的位置
    private static int firstCharacterIndex(String s, int start){

        for( int i = start ; i < s.length() ; i ++ )
            if( Character.isLetter(s.charAt(i)) )
                return i;
        return s.length();
    }
}

找到傲慢与偏见电子版pride-and-prejudice.txt,放入目录结构中,进行测试
Main.java

import java.util.ArrayList;

public class Main {
    public static void main(String[] args){
        System.out.println("Pride and Prejudice");

        ArrayList<String> words1 = new ArrayList<>();
        FileOperation.readFile("pride-and-prejudice.txt", words1);
        System.out.println("Total words: " + words1.size());

        BSTSet<String> set1 = new BSTSet<>();
        for(String word: words1){
            set1.add(word);
        }
        System.out.println("Total different words: "+set1.getSize());
    }
}

运行结果:

Pride and Prejudice
Total words: 125901
Total different words: 6530

基于链表的集合实现

在项目中加入我们之前实现的链表文件LinkedList.java, 也可以在用的时候导入java.util.LinkedList
新建LinkedListSet.java

public class LinkedListSet<E> implements Set<E> {
    private LinkedList<E> list;

    public LinkedListSet(){
        list = new LinkedList<>();
    }

    @Override
    public int getSize(){
        return list.getSize();
    }

    @Override
    public boolean isEmpty(){
        return list.isEmpty();
    }

    @Override
    public boolean contains(E e){
        return list.contains(e);
    }

    @Override
    public void add(E e){
        if(!list.contains(e))
            list.addFirst(e);
    }

    @Override
    public void remove(E e){
        list.removeElement(e);
    }
}

在Main.java中测试

import java.util.ArrayList;

public class Main {
    public static void main(String[] args){
        System.out.println("Pride and Prejudice");

        ArrayList<String> words1 = new ArrayList<>();
        FileOperation.readFile("pride-and-prejudice.txt", words1);
        System.out.println("Total words: " + words1.size());

        LinkedListSet<String> set2 = new LinkedListSet<>();
        for(String word: words1){
            set2.add(word);
        }
        System.out.println("Total different words: "+set2.getSize());
    }
}

运行结果没问题, 但是比用二分搜索树实现的Set要慢许多:

Pride and Prejudice
Total words: 125901
Total different words: 6530


三. 集合类的复杂度分析

  • 对BSTSet和LinkedListSet的效率进行比较

Main.java

import java.util.ArrayList;

public class Main {

    private static double testSet(Set<String> set, String filename) {
        long startTime = System.nanoTime();

        System.out.println(filename);

        ArrayList<String> words1 = new ArrayList<>();
        if (FileOperation.readFile(filename, words1)) {
            System.out.println("Total words: " + words1.size());

            for (String word : words1) {
                set.add(word);
            }
            System.out.println("Total different words: " + set.getSize());
        }


        long endTime = System.nanoTime();

        return (endTime - startTime) / 1000000000.0;
    }


    public static void main(String[] args) {
        String filename = "pride-and-prejudice.txt";

        BSTSet<String> bstset = new BSTSet<>();
        double time1 = testSet(bstset, filename);
        System.out.println("BST Set: " + time1 + " s");

        LinkedListSet<String> linkedlistset = new LinkedListSet<>();
        double time2 = testSet(linkedlistset, filename);
        System.out.println("LinkedList Set: " + time2 + " s");
    }

}

运行结果, 明显BSTSet更快:

pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
BST Set: 0.34745494 s
pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
LinkedList Set: 2.789588019 s

  • 算法复杂度分析

1.LinkedListSet的时间复杂度:


2.BSTSet的时间复杂度:




综合



3.BSTSet的局限性

  • BST有最优的情况和最坏的情况
    • 最优的情况: 满二分搜索树
    • 最坏的情况: 只有做节点或只有右节点, 这样就等效于链表了。


4.结论

image

想要一直处于最优情况,我们可以使用平衡二叉树, 这在后面的博文中会讲解。


四. Leetcode中的集合问题

804号题目:

国际摩尔斯密码定义一种标准编码方式,将每个字母对应于一个由一系列点和短线组成的字符串, 比如: "a" 对应 ".-", "b" 对应 "-...", "c" 对应 "-.-.", 等等。

为了方便,所有26个英文字母对应摩尔斯密码表如下:

[".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."]

给定一个单词列表,每个单词可以写成每个字母对应摩尔斯密码的组合。例如,"cab" 可以写成 "-.-.-....-",(即 "-.-." + "-..." + ".-"字符串的结合)。我们将这样一个连接过程称作单词翻译。

返回我们可以获得所有词不同单词翻译的数量。

例如:
输入: words = ["gin", "zen", "gig", "msg"]
输出: 2
解释: 
各单词翻译如下:
"gin" -> "--...-."
"zen" -> "--...-."
"gig" -> "--...--."
"msg" -> "--...--."

共有 2 种不同翻译, "--...-."  "--...--.".
 

注意:

单词列表words 的长度不会超过 100
每个单词 words[i]的长度范围为 [1, 12]
每个单词 words[i]只包含小写字母。

解题代码:

import java.util.TreeSet;    // 类似我们自己实现的BSTSet 但底层是基于红黑树的,功能更强大



class Solution {
    public int uniqueMorseRepresentations(String[] words) {
        String[] codes = {".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."};

        TreeSet<String> set = new TreeSet<>();
        for(String word: words){

            StringBuilder res = new StringBuilder();
            for(int i=0; i< word.length(); i++){
                res.append(codes[word.charAt(i) - 'a']);
            }

            set.add(res.toString());
        }

        return set.size();
    }
}

五. 映射基础

映射Map 概念

  • 存储(键, 值) 数据对的数据结构(key, value)
  • 根据键(key),寻找值(value)
  • 非常容易使用链表或者二分搜索树实现
  • 对应在python语言中其实是dict。

image


接口设计

Map.java

public interface Map<K, V> {
    void add(K key, V value);
    V remove(K key);
    boolean contains(K key);
    V get(K key);
    void set(K key, V newValue);
    int getSize();
    boolean isEmpty();
}


六. 基于链表的映射实现

LinkedListMap.java

import java.util.ArrayList;

public class LinkedListMap<K, V> implements Map<K, V> {
   private class Node {
       public K key;
       public V value;
       public Node next;

       public Node(K key, V value, Node next) {
           this.key = key;
           this.value = value;
           this.next = next;
       }

       public Node(K key) {
           this(key, null, null);
       }

       public Node() {
           this(null, null, null);
       }

       @Override
       public String toString() {
           return key.toString() + " : " + value.toString();
       }
   }


   private Node dummyHead;
   private int size;

   public LinkedListMap() {
       dummyHead = new Node();
       size = 0;
   }

   @Override
   public int getSize() {
       return size;
   }

   @Override
   public boolean isEmpty() {
       return size == 0;
   }

   private Node getNode(K key) {
       Node cur = dummyHead.next;
       while (cur != null) {
           if (cur.key.equals(key))
               return cur;
           cur = cur.next;
       }
       return null;
   }

   @Override
   public boolean contains(K key) {
       return getNode(key) != null;
   }

   @Override
   public V get(K key) {
       Node node = getNode(key);
       return node == null ? null : node.value;
   }

   @Override
   public void add(K key, V value) {
       Node node = getNode(key);
       if (node == null) {
           dummyHead.next = new Node(key, value, dummyHead.next);
           size++;
       } else {
           node.value = value;
       }
   }

   @Override
   public void set(K key, V value) {
       Node node = getNode(key);
       if (node == null) {
           throw new IllegalArgumentException(key + "doesn't exist!");
       }

       node.value = value;
   }

   @Override
   public V remove(K key) {
       Node prev = dummyHead;
       while (prev.next != null) {
           if (prev.next.key.equals(key)) {
               break;
           }
           prev = prev.next;
       }

       if (prev.next != null) {
           Node delNode = prev.next;
           prev.next = delNode.next;
           delNode.next = null;
           size--;
           return delNode.value;
       }

       return null;   // 当前没有key对应的元素
   }


   // 测试
   public static void main(String[] args) {
       System.out.println("Pride and Prejudice");

       ArrayList<String> words = new ArrayList<>();
       if (FileOperation.readFile("pride-and-prejudice.txt", words)) {
           System.out.println("Total words: " + words.size());

           LinkedListMap<String, Integer> map = new LinkedListMap<>();
           for (String word : words) {
               if (map.contains(word))
                   map.set(word, map.get(word) + 1);
               else
                   map.add(word, 1);
           }

           System.out.println("Total different words: " + map.getSize());
           System.out.println("Frequency of PRIDE: " + map.get("pride"));
           System.out.println("Frequency of PRIEJUDICE: " + map.get("prejudice"));
       }
   }
}

记得在项目中放入FileOperation.java与pride-and-prejudice.txt, 运行得到测试结果:

Pride and Prejudice
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11

七. 基于二分搜索树的映射实现

BSTMap.java

import java.util.ArrayList;

public class BSTMap<K extends Comparable, V> implements Map<K, V> {
    private class Node {
        public K key;
        public V value;
        public Node left, right;

        public Node(K key, V value) {
            this.key = key;
            this.value = value;
            this.left = null;
            this.right = null;
        }
    }

    private Node root;
    private int size;

    public BSTMap() {
        root = null;
        size = 0;
    }

    @Override
    public int getSize() {
        return size;
    }

    @Override
    public boolean isEmpty() {
        return size == 0;
    }


    @Override
    // 向二分搜索树中添加新元素(key, value)
    public void add(K key, V value) {

        root = add(root, key, value);
    }

    // 向以node为根的二分搜索树中插入元素(key, value) 递归算法
    // 返回插入新节点后二分搜索树的根
    private Node add(Node node, K key, V value) {
        if (node == null) {
            size++;
            return new Node(key, value);
        }
        if (key.compareTo(node.key) < 0) {
            node.left = add(node.left, key, value);
        } else if (key.compareTo(node.key) > 0) {
            node.right = add(node.right, key, value);
        } else {// key.compareTo(node.key)==0
            node.value = value;
        }

        return node;
    }

    // 返回node为根节点的二分搜索树中, key所在的节点
    private Node getNode(Node node, K key) {
        if (node == null) {
            return null;
        }

        if (key.compareTo(node.key) == 0) {
            return node;
        } else if (key.compareTo(node.key) < 0) {
            return getNode(node.left, key);
        } else {
            return getNode(node.right, key);
        }

    }

    @Override
    public boolean contains(K key) {
        return getNode(root, key) != null;
    }

    @Override
    public V get(K key) {
        Node node = getNode(root, key);
        return node == null ? null : node.value;
    }

    @Override
    public void set(K key, V newValue) {
        Node node = getNode(root, key);
        if (node == null) {
            throw new IllegalArgumentException(key + "doesn't exist!");
        }

        node.value = newValue;
    }


    // 从二分搜索树中删除元素为key的节点
    @Override
    public V remove(K key) {

        Node node = getNode(root, key);
        if(node != null){
            root = remove(root, key);
            return node.value;
        }

        return null;
    }

    // 删除以node为根的二分搜索树中 键为key的节点, 递归算法
    // 返回删除节点后新的二分搜索树的根
    private Node remove(Node node, K key) {
        if (node == null) {
            return null;
        }
        if (key.compareTo(node.key) < 0) {
            node.left = remove(node.left, key);
        } else if (key.compareTo(node.key) > 0) {
            node.right = remove(node.right, key);
        } else {   // keynode.key相等的情况, 即删除node
            if (node.left == null) { // 待删除的节点 左子树为空
                Node rightNode = node.right;
                node.right = null;
                size--;
                return rightNode;
            }

            if (node.right == null) {  // 待删除的节点 右子树为空
                Node leftNode = node.left;
                node.left = null;
                size--;
                return leftNode;
            }
        }
        // 待删除的节点 左右子树均不为空的情况
        // 找到比待删除节点大的最小节点, 即待删除节点右子树的最小节点
        // 用这个节点顶替删除节点的位置
        Node successor = minimum(node.right);
        successor.right = removeMin(node.right);   // removeMin中进行过size--
        successor.left = node.left;

        node.left = node.right = null;
        return successor;

    }


    // 寻找二分搜索树的最小元素
    public K minimum() {
        if (size == 0) {
            throw new IllegalArgumentException("BST is empty!");
        }

        return minimum(root).key;
    }

    // 返回以node为根的二分搜索树的最小值所在的节点
    private Node minimum(Node node) {
        if (node.left == null) {
            return node;
        }
        return minimum(node.left);
    }


    // 寻找二分搜索树的最大元素
    public K maximum() {
        if (size == 0) {
            throw new IllegalArgumentException("BST is empty!");
        }

        return maximum(root).key;
    }

    // 返回以node为根的二分搜索树的最大值所在的节点
    private Node maximum(Node node) {
        if (node.right == null) {
            return node;
        }
        return maximum(node.right);
    }
    // 删除最小值所在的节点, 返回最小值
    public K removeMin() {
        K ret = minimum();
        root = removeMin(root);
        return ret;
    }

    // 删除以node为根的最小节点, 返回删除节点后的根
    private Node removeMin(Node node) {
        if (node.left == null) {
            Node rightNode = node.right;
            node.right = null;
            size--;
            return rightNode;
        }

        node.left = removeMin(node.left);
        return node;
    }

    // 删除最大值所在的节点, 返回最大值
    public K removeMax() {
        K ret = maximum();
        root = removeMax(root);
        return ret;
    }

    // 删除以node为根的最大节点, 返回删除节点后的根
    private Node removeMax(Node node) {
        if (node.right == null) {
            Node leftNode = node.left;
            node.left = null;
            size--;
            return leftNode;
        }

        node.right = removeMax(node.right);
        return node;
    }



    // 测试
    public static void main(String[] args) {
        System.out.println("Pride and Prejudice");

        ArrayList<String> words = new ArrayList<>();
        if (FileOperation.readFile("pride-and-prejudice.txt", words)) {
            System.out.println("Total words: " + words.size());

            BSTMap<String, Integer> map = new BSTMap<>();
            for (String word : words) {
                if (map.contains(word))
                    map.set(word, map.get(word) + 1);
                else
                    map.add(word, 1);
            }

            System.out.println("Total different words: " + map.getSize());
            System.out.println("Frequency of PRIDE: " + map.get("pride"));
            System.out.println("Frequency of PRIEJUDICE: " + map.get("prejudice"));
        }
    }

}

运行结果(速度明显快过LinkedListMap)

Pride and Prejudice
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11

八. 映射的复杂度分析

测试代码Main.java

import java.util.ArrayList;

public class Main {

    private static double testMap(Map<String, Integer> map, String filename){
        long startTime = System.nanoTime();

        System.out.println(filename);
        ArrayList<String> words = new ArrayList<>();
        if(FileOperation.readFile(filename, words)){
            System.out.println("Total words: " + words.size());

            for(String word: words){
                if(map.contains(word)){
                    map.set(word, map.get(word) + 1);
                }
                else{
                    map.add(word, 1);
                }
            }


            System.out.println("Total different words: " + map.getSize());
            System.out.println("Frequency of PRIDE: " + map.get("pride"));
            System.out.println("Frequency of PRIEJUDICE: "+map.get("prejudice"));
        }
        long endTime = System.nanoTime();

        return (endTime - startTime) / 1000000000;
    }

    public static void main(String[] args) {
        String filename = "pride-and-prejudice.txt";

        BSTMap<String, Integer> bstMap = new BSTMap<>();
        double time1 = testMap(bstMap, filename);
        System.out.println("BST Map: " + time1 + " s");

        System.out.println();

        LinkedListMap<String, Integer> linkedListMap = new LinkedListMap<>();
        double time2 = testMap(linkedListMap, filename);
        System.out.println("LinkedList Map: " + time2 + " s");
    }
}

测试结果, BSTMap快得多:

pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11
BST Map: 0.0 s

pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11
LinkedList Map: 13.0 s

时间复杂度和集合一样 



九.Leetcode上更多集合和映射的问题

通过两个题目,可以对什么时候用集合什么时候用映射有更深的理解。

349号题: 两个数组的交集I

给定两个数组,写一个函数来计算它们的交集。

例子:


 给定 num1= [1, 2, 2, 1], nums2 = [2, 2], 返回 [2].

提示:

每个在结果中的元素必定是唯一的。
我们可以不考虑输出结果的顺序。

解答:

import java.util.ArrayList;
import java.util.TreeSet;


class Solution {
    public int[] intersection(int[] nums1, int[] nums2) {
        TreeSet<Integer> set = new TreeSet<>();

        for(int num: nums1){
            set.add(num);
        }

        ArrayList<Integer> list = new ArrayList<>();
        for(int num: nums2){
            if(set.contains(num)){
                list.add(num);
                set.remove(num);  // 保证求交集的时候, 相同元素只会出现一次, 因为出现之后就从set中剔除了
            }
        }

        int[] res = new int[list.size()];
        for(int i = 0; i < list.size(); i++){
            res[i] = list.get(i);
        }

        return res;
    }
}

350号问题: 两个数组的交集II

给定两个数组,写一个方法来计算它们的交集。

例如:
给定 nums1 = [1, 2, 2, 1], nums2 = [2, 2], 返回 [2, 2].

注意:

   输出结果中每个元素出现的次数,应与元素在两个数组中出现的次数一致。
   我们可以不考虑输出结果的顺序。
跟进:

如果给定的数组已经排好序呢?你将如何优化你的算法?
如果 nums1 的大小比 nums2 小很多,哪种方法更优?
如果nums2的元素存储在磁盘上,内存是有限的,你不能一次加载所有的元素到内存中,你该怎么办?

可以使用映射: key 数字: value 出现频次

import java.util.TreeMap;
import java.util.ArrayList;

public class Solution {
    public int[] intersect(int[] nums1, int[] nums2) {
        TreeMap<Integer, Integer> map = new TreeMap<>();
        for (int num : nums1) {
            if (!map.containsKey(num)) {
                map.put(num, 1);
            } else {
                map.put(num, map.get(num) + 1);
            }
        }

        ArrayList<Integer> list = new ArrayList<>();
        for(int num: nums2){
            if(map.containsKey(num)){
                list.add(num);
                map.put(num, map.get(num) - 1);
                if(map.get(num) == 0){
                    map.remove(num);
                }
            }
        }
        
        int[] res = new int[list.size()];
        for(int i = 0; i < list.size(); i++){
            res[i] = list.get(i);
        }
        
        return res;
    }
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值