一. 集合基础和基于二分搜索树的集合实现
以上接口明显的都可以用之前的二分搜索树来实现
代码实现
新建项目Set, 将上一篇博客完成的BST.java放入:
.
├── Set.iml
└── src
├── BST.java
├── BSTSet.java
├── Main.java
└── Set.java
接口Set.java
public interface Set<E> {
void add(E e);
void remove(E e);
boolean contains(E e);
int getSize();
boolean isEmpty();
}
BST.java即上一篇博文中完成的二分搜索树
BSTSet.java
public class BSTSet<E extends Comparable<E>> implements Set<E> {
private BST<E> bst;
public BSTSet(){
bst = new BST<>();
}
@Override
public int getSize(){
return bst.size();
}
@Override
public boolean isEmpty(){
return bst.isEmpty();
}
@Override
public void add(E e){
bst.add(e);
}
@Override
public boolean contains(E e){
return bst.contains(e);
}
@Override
public void remove(E e){
bst.remove(e);
}
}
增加读取文件并分词的类FileOperation.java
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.Scanner;
import java.util.Locale;
import java.io.File;
import java.io.BufferedInputStream;
import java.io.IOException;
// 文件相关操作
public class FileOperation {
// 读取文件名称为filename中的内容,并将其中包含的所有词语放进words中
public static boolean readFile(String filename, ArrayList<String> words){
if (filename == null || words == null){
System.out.println("filename is null or words is null");
return false;
}
// 文件读取
Scanner scanner;
try {
File file = new File(filename);
if(file.exists()){
FileInputStream fis = new FileInputStream(file);
scanner = new Scanner(new BufferedInputStream(fis), "UTF-8");
scanner.useLocale(Locale.ENGLISH);
}
else
return false;
}
catch(IOException ioe){
System.out.println("Cannot open " + filename);
return false;
}
// 简单分词
// 这个分词方式相对简陋, 没有考虑很多文本处理中的特殊问题
// 在这里只做demo展示用
if (scanner.hasNextLine()) {
String contents = scanner.useDelimiter("\\A").next();
int start = firstCharacterIndex(contents, 0);
for (int i = start + 1; i <= contents.length(); )
if (i == contents.length() || !Character.isLetter(contents.charAt(i))) {
String word = contents.substring(start, i).toLowerCase();
words.add(word);
start = firstCharacterIndex(contents, i);
i = start + 1;
} else
i++;
}
return true;
}
// 寻找字符串s中,从start的位置开始的第一个字母字符的位置
private static int firstCharacterIndex(String s, int start){
for( int i = start ; i < s.length() ; i ++ )
if( Character.isLetter(s.charAt(i)) )
return i;
return s.length();
}
}
找到傲慢与偏见电子版pride-and-prejudice.txt
,放入目录结构中,进行测试
Main.java
import java.util.ArrayList;
public class Main {
public static void main(String[] args){
System.out.println("Pride and Prejudice");
ArrayList<String> words1 = new ArrayList<>();
FileOperation.readFile("pride-and-prejudice.txt", words1);
System.out.println("Total words: " + words1.size());
BSTSet<String> set1 = new BSTSet<>();
for(String word: words1){
set1.add(word);
}
System.out.println("Total different words: "+set1.getSize());
}
}
运行结果:
Pride and Prejudice
Total words: 125901
Total different words: 6530
基于链表的集合实现
在项目中加入我们之前实现的链表文件LinkedList.java, 也可以在用的时候导入java.util.LinkedList
新建LinkedListSet.java
public class LinkedListSet<E> implements Set<E> {
private LinkedList<E> list;
public LinkedListSet(){
list = new LinkedList<>();
}
@Override
public int getSize(){
return list.getSize();
}
@Override
public boolean isEmpty(){
return list.isEmpty();
}
@Override
public boolean contains(E e){
return list.contains(e);
}
@Override
public void add(E e){
if(!list.contains(e))
list.addFirst(e);
}
@Override
public void remove(E e){
list.removeElement(e);
}
}
在Main.java中测试
import java.util.ArrayList;
public class Main {
public static void main(String[] args){
System.out.println("Pride and Prejudice");
ArrayList<String> words1 = new ArrayList<>();
FileOperation.readFile("pride-and-prejudice.txt", words1);
System.out.println("Total words: " + words1.size());
LinkedListSet<String> set2 = new LinkedListSet<>();
for(String word: words1){
set2.add(word);
}
System.out.println("Total different words: "+set2.getSize());
}
}
运行结果没问题, 但是比用二分搜索树实现的Set要慢许多:
Pride and Prejudice
Total words: 125901
Total different words: 6530
三. 集合类的复杂度分析
Main.java
import java.util.ArrayList;
public class Main {
private static double testSet(Set<String> set, String filename) {
long startTime = System.nanoTime();
System.out.println(filename);
ArrayList<String> words1 = new ArrayList<>();
if (FileOperation.readFile(filename, words1)) {
System.out.println("Total words: " + words1.size());
for (String word : words1) {
set.add(word);
}
System.out.println("Total different words: " + set.getSize());
}
long endTime = System.nanoTime();
return (endTime - startTime) / 1000000000.0;
}
public static void main(String[] args) {
String filename = "pride-and-prejudice.txt";
BSTSet<String> bstset = new BSTSet<>();
double time1 = testSet(bstset, filename);
System.out.println("BST Set: " + time1 + " s");
LinkedListSet<String> linkedlistset = new LinkedListSet<>();
double time2 = testSet(linkedlistset, filename);
System.out.println("LinkedList Set: " + time2 + " s");
}
}
运行结果, 明显BSTSet更快:
pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
BST Set: 0.34745494 s
pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
LinkedList Set: 2.789588019 s
1.LinkedListSet的时间复杂度:
2.BSTSet的时间复杂度:
综合
3.BSTSet的局限性
- BST有最优的情况和最坏的情况
- 最优的情况: 满二分搜索树
- 最坏的情况: 只有做节点或只有右节点, 这样就等效于链表了。
4.结论
想要一直处于最优情况,我们可以使用平衡二叉树, 这在后面的博文中会讲解。
四. Leetcode中的集合问题
804号题目:
国际摩尔斯密码定义一种标准编码方式,将每个字母对应于一个由一系列点和短线组成的字符串, 比如: "a" 对应 ".-", "b" 对应 "-...", "c" 对应 "-.-.", 等等。
为了方便,所有26个英文字母对应摩尔斯密码表如下:
[".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."]
给定一个单词列表,每个单词可以写成每个字母对应摩尔斯密码的组合。例如,"cab" 可以写成 "-.-.-....-",(即 "-.-." + "-..." + ".-"字符串的结合)。我们将这样一个连接过程称作单词翻译。
返回我们可以获得所有词不同单词翻译的数量。
例如:
输入: words = ["gin", "zen", "gig", "msg"]
输出: 2
解释:
各单词翻译如下:
"gin" -> "--...-."
"zen" -> "--...-."
"gig" -> "--...--."
"msg" -> "--...--."
共有 2 种不同翻译, "--...-." 和 "--...--.".
注意:
单词列表words 的长度不会超过 100。
每个单词 words[i]的长度范围为 [1, 12]。
每个单词 words[i]只包含小写字母。
解题代码:
import java.util.TreeSet; // 类似我们自己实现的BSTSet, 但底层是基于红黑树的,功能更强大
class Solution {
public int uniqueMorseRepresentations(String[] words) {
String[] codes = {".-","-...","-.-.","-..",".","..-.","--.","....","..",".---","-.-",".-..","--","-.","---",".--.","--.-",".-.","...","-","..-","...-",".--","-..-","-.--","--.."};
TreeSet<String> set = new TreeSet<>();
for(String word: words){
StringBuilder res = new StringBuilder();
for(int i=0; i< word.length(); i++){
res.append(codes[word.charAt(i) - 'a']);
}
set.add(res.toString());
}
return set.size();
}
}
五. 映射基础
映射Map 概念
- 存储(键, 值) 数据对的数据结构(key, value)
- 根据键(key),寻找值(value)
- 非常容易使用链表或者二分搜索树实现
- 对应在python语言中其实是dict。
接口设计
Map.java
public interface Map<K, V> {
void add(K key, V value);
V remove(K key);
boolean contains(K key);
V get(K key);
void set(K key, V newValue);
int getSize();
boolean isEmpty();
}
六. 基于链表的映射实现
LinkedListMap.java
import java.util.ArrayList;
public class LinkedListMap<K, V> implements Map<K, V> {
private class Node {
public K key;
public V value;
public Node next;
public Node(K key, V value, Node next) {
this.key = key;
this.value = value;
this.next = next;
}
public Node(K key) {
this(key, null, null);
}
public Node() {
this(null, null, null);
}
@Override
public String toString() {
return key.toString() + " : " + value.toString();
}
}
private Node dummyHead;
private int size;
public LinkedListMap() {
dummyHead = new Node();
size = 0;
}
@Override
public int getSize() {
return size;
}
@Override
public boolean isEmpty() {
return size == 0;
}
private Node getNode(K key) {
Node cur = dummyHead.next;
while (cur != null) {
if (cur.key.equals(key))
return cur;
cur = cur.next;
}
return null;
}
@Override
public boolean contains(K key) {
return getNode(key) != null;
}
@Override
public V get(K key) {
Node node = getNode(key);
return node == null ? null : node.value;
}
@Override
public void add(K key, V value) {
Node node = getNode(key);
if (node == null) {
dummyHead.next = new Node(key, value, dummyHead.next);
size++;
} else {
node.value = value;
}
}
@Override
public void set(K key, V value) {
Node node = getNode(key);
if (node == null) {
throw new IllegalArgumentException(key + "doesn't exist!");
}
node.value = value;
}
@Override
public V remove(K key) {
Node prev = dummyHead;
while (prev.next != null) {
if (prev.next.key.equals(key)) {
break;
}
prev = prev.next;
}
if (prev.next != null) {
Node delNode = prev.next;
prev.next = delNode.next;
delNode.next = null;
size--;
return delNode.value;
}
return null; // 当前没有key对应的元素
}
// 测试
public static void main(String[] args) {
System.out.println("Pride and Prejudice");
ArrayList<String> words = new ArrayList<>();
if (FileOperation.readFile("pride-and-prejudice.txt", words)) {
System.out.println("Total words: " + words.size());
LinkedListMap<String, Integer> map = new LinkedListMap<>();
for (String word : words) {
if (map.contains(word))
map.set(word, map.get(word) + 1);
else
map.add(word, 1);
}
System.out.println("Total different words: " + map.getSize());
System.out.println("Frequency of PRIDE: " + map.get("pride"));
System.out.println("Frequency of PRIEJUDICE: " + map.get("prejudice"));
}
}
}
记得在项目中放入FileOperation.java与pride-and-prejudice.txt, 运行得到测试结果:
Pride and Prejudice
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11
七. 基于二分搜索树的映射实现
BSTMap.java
import java.util.ArrayList;
public class BSTMap<K extends Comparable, V> implements Map<K, V> {
private class Node {
public K key;
public V value;
public Node left, right;
public Node(K key, V value) {
this.key = key;
this.value = value;
this.left = null;
this.right = null;
}
}
private Node root;
private int size;
public BSTMap() {
root = null;
size = 0;
}
@Override
public int getSize() {
return size;
}
@Override
public boolean isEmpty() {
return size == 0;
}
@Override
// 向二分搜索树中添加新元素(key, value)
public void add(K key, V value) {
root = add(root, key, value);
}
// 向以node为根的二分搜索树中插入元素(key, value), 递归算法
// 返回插入新节点后二分搜索树的根
private Node add(Node node, K key, V value) {
if (node == null) {
size++;
return new Node(key, value);
}
if (key.compareTo(node.key) < 0) {
node.left = add(node.left, key, value);
} else if (key.compareTo(node.key) > 0) {
node.right = add(node.right, key, value);
} else {// key.compareTo(node.key)==0
node.value = value;
}
return node;
}
// 返回node为根节点的二分搜索树中, key所在的节点
private Node getNode(Node node, K key) {
if (node == null) {
return null;
}
if (key.compareTo(node.key) == 0) {
return node;
} else if (key.compareTo(node.key) < 0) {
return getNode(node.left, key);
} else {
return getNode(node.right, key);
}
}
@Override
public boolean contains(K key) {
return getNode(root, key) != null;
}
@Override
public V get(K key) {
Node node = getNode(root, key);
return node == null ? null : node.value;
}
@Override
public void set(K key, V newValue) {
Node node = getNode(root, key);
if (node == null) {
throw new IllegalArgumentException(key + "doesn't exist!");
}
node.value = newValue;
}
// 从二分搜索树中删除元素为key的节点
@Override
public V remove(K key) {
Node node = getNode(root, key);
if(node != null){
root = remove(root, key);
return node.value;
}
return null;
}
// 删除以node为根的二分搜索树中 键为key的节点, 递归算法
// 返回删除节点后新的二分搜索树的根
private Node remove(Node node, K key) {
if (node == null) {
return null;
}
if (key.compareTo(node.key) < 0) {
node.left = remove(node.left, key);
} else if (key.compareTo(node.key) > 0) {
node.right = remove(node.right, key);
} else { // key与node.key相等的情况, 即删除node
if (node.left == null) { // 待删除的节点 左子树为空
Node rightNode = node.right;
node.right = null;
size--;
return rightNode;
}
if (node.right == null) { // 待删除的节点 右子树为空
Node leftNode = node.left;
node.left = null;
size--;
return leftNode;
}
}
// 待删除的节点 左右子树均不为空的情况
// 找到比待删除节点大的最小节点, 即待删除节点右子树的最小节点
// 用这个节点顶替删除节点的位置
Node successor = minimum(node.right);
successor.right = removeMin(node.right); // removeMin中进行过size--
successor.left = node.left;
node.left = node.right = null;
return successor;
}
// 寻找二分搜索树的最小元素
public K minimum() {
if (size == 0) {
throw new IllegalArgumentException("BST is empty!");
}
return minimum(root).key;
}
// 返回以node为根的二分搜索树的最小值所在的节点
private Node minimum(Node node) {
if (node.left == null) {
return node;
}
return minimum(node.left);
}
// 寻找二分搜索树的最大元素
public K maximum() {
if (size == 0) {
throw new IllegalArgumentException("BST is empty!");
}
return maximum(root).key;
}
// 返回以node为根的二分搜索树的最大值所在的节点
private Node maximum(Node node) {
if (node.right == null) {
return node;
}
return maximum(node.right);
}
// 删除最小值所在的节点, 返回最小值
public K removeMin() {
K ret = minimum();
root = removeMin(root);
return ret;
}
// 删除以node为根的最小节点, 返回删除节点后的根
private Node removeMin(Node node) {
if (node.left == null) {
Node rightNode = node.right;
node.right = null;
size--;
return rightNode;
}
node.left = removeMin(node.left);
return node;
}
// 删除最大值所在的节点, 返回最大值
public K removeMax() {
K ret = maximum();
root = removeMax(root);
return ret;
}
// 删除以node为根的最大节点, 返回删除节点后的根
private Node removeMax(Node node) {
if (node.right == null) {
Node leftNode = node.left;
node.left = null;
size--;
return leftNode;
}
node.right = removeMax(node.right);
return node;
}
// 测试
public static void main(String[] args) {
System.out.println("Pride and Prejudice");
ArrayList<String> words = new ArrayList<>();
if (FileOperation.readFile("pride-and-prejudice.txt", words)) {
System.out.println("Total words: " + words.size());
BSTMap<String, Integer> map = new BSTMap<>();
for (String word : words) {
if (map.contains(word))
map.set(word, map.get(word) + 1);
else
map.add(word, 1);
}
System.out.println("Total different words: " + map.getSize());
System.out.println("Frequency of PRIDE: " + map.get("pride"));
System.out.println("Frequency of PRIEJUDICE: " + map.get("prejudice"));
}
}
}
运行结果(速度明显快过LinkedListMap)
Pride and Prejudice
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11
八. 映射的复杂度分析
测试代码Main.java
import java.util.ArrayList;
public class Main {
private static double testMap(Map<String, Integer> map, String filename){
long startTime = System.nanoTime();
System.out.println(filename);
ArrayList<String> words = new ArrayList<>();
if(FileOperation.readFile(filename, words)){
System.out.println("Total words: " + words.size());
for(String word: words){
if(map.contains(word)){
map.set(word, map.get(word) + 1);
}
else{
map.add(word, 1);
}
}
System.out.println("Total different words: " + map.getSize());
System.out.println("Frequency of PRIDE: " + map.get("pride"));
System.out.println("Frequency of PRIEJUDICE: "+map.get("prejudice"));
}
long endTime = System.nanoTime();
return (endTime - startTime) / 1000000000;
}
public static void main(String[] args) {
String filename = "pride-and-prejudice.txt";
BSTMap<String, Integer> bstMap = new BSTMap<>();
double time1 = testMap(bstMap, filename);
System.out.println("BST Map: " + time1 + " s");
System.out.println();
LinkedListMap<String, Integer> linkedListMap = new LinkedListMap<>();
double time2 = testMap(linkedListMap, filename);
System.out.println("LinkedList Map: " + time2 + " s");
}
}
测试结果, BSTMap快得多:
pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11
BST Map: 0.0 s
pride-and-prejudice.txt
Total words: 125901
Total different words: 6530
Frequency of PRIDE: 53
Frequency of PRIEJUDICE: 11
LinkedList Map: 13.0 s
时间复杂度和集合一样
九.Leetcode上更多集合和映射的问题
通过两个题目,可以对什么时候用集合什么时候用映射有更深的理解。
349号题: 两个数组的交集I
给定两个数组,写一个函数来计算它们的交集。
例子:
给定 num1= [1, 2, 2, 1], nums2 = [2, 2], 返回 [2].
提示:
每个在结果中的元素必定是唯一的。
我们可以不考虑输出结果的顺序。
解答:
import java.util.ArrayList;
import java.util.TreeSet;
class Solution {
public int[] intersection(int[] nums1, int[] nums2) {
TreeSet<Integer> set = new TreeSet<>();
for(int num: nums1){
set.add(num);
}
ArrayList<Integer> list = new ArrayList<>();
for(int num: nums2){
if(set.contains(num)){
list.add(num);
set.remove(num); // 保证求交集的时候, 相同元素只会出现一次, 因为出现之后就从set中剔除了
}
}
int[] res = new int[list.size()];
for(int i = 0; i < list.size(); i++){
res[i] = list.get(i);
}
return res;
}
}
350号问题: 两个数组的交集II
给定两个数组,写一个方法来计算它们的交集。
例如:
给定 nums1 = [1, 2, 2, 1], nums2 = [2, 2], 返回 [2, 2].
注意:
输出结果中每个元素出现的次数,应与元素在两个数组中出现的次数一致。
我们可以不考虑输出结果的顺序。
跟进:
如果给定的数组已经排好序呢?你将如何优化你的算法?
如果 nums1 的大小比 nums2 小很多,哪种方法更优?
如果nums2的元素存储在磁盘上,内存是有限的,你不能一次加载所有的元素到内存中,你该怎么办?
可以使用映射: key 数字: value 出现频次
import java.util.TreeMap;
import java.util.ArrayList;
public class Solution {
public int[] intersect(int[] nums1, int[] nums2) {
TreeMap<Integer, Integer> map = new TreeMap<>();
for (int num : nums1) {
if (!map.containsKey(num)) {
map.put(num, 1);
} else {
map.put(num, map.get(num) + 1);
}
}
ArrayList<Integer> list = new ArrayList<>();
for(int num: nums2){
if(map.containsKey(num)){
list.add(num);
map.put(num, map.get(num) - 1);
if(map.get(num) == 0){
map.remove(num);
}
}
}
int[] res = new int[list.size()];
for(int i = 0; i < list.size(); i++){
res[i] = list.get(i);
}
return res;
}
}