【Lintcode】892. Alien Dictionary

最新推荐文章于 2021-12-13 21:42:44 发布

记录算法题解

最新推荐文章于 2021-12-13 21:42:44 发布

阅读量204

点赞数

分类专栏： LC DFS、BFS与图论文章标签：字符串算法数据结构 leetcode

本文链接：https://blog.csdn.net/qq_46105170/article/details/106153734

版权

LC DFS、BFS与图论专栏收录该内容

305 篇文章 5 订阅

订阅专栏

题目地址：

https://www.lintcode.com/problem/alien-dictionary/description

给定一个字符串数组，在某种字典序下这个数组是有序的，要求求这个字典序并以字符串的形式返回。如果不存在这样的字典序则返回空串。如果存在多个合法解，则返回字典序最小的那个（这里的字典序指的是英文字母的自然的字典序）。

本质是求拓扑排序。首先要排除几个特殊情况，如果数组里只有一个字符串，那么只需要将其字符排序后返回即可。如果数组中有两个字符串，后面的那个是前面的那个的子串，但是后面的那个长度更短，这时候也是不存在合法解的，因为字符串的比较中若同位置的字符都相等，那么短的应该在前面。排除完特殊情况后，通过字符串的顺序建图，如果某个字符 $a$ 顺序在另一个 $b$ 之前，则在图中从 $a$ 到 $b$ 连一条有向边。最后对图做拓扑排序即可。

但是，由于题目要求返回所有拓扑序中字典序最小的那个，这个时候DFS是不管用的，DFS不能对顶点分层。而BFS则可以，BFS做拓扑排序需要算一下每个点的入度，并且要用队列。这里需要用优先队列，将字典序最小的那个入度为 $0$ 的点入队。代码如下：

import java.util.*;

public class Solution {
    /**
     * @param words: a list of words
     * @return: a string which is correct order
     */
    public String alienOrder(String[] words) {
        // Write your code here
        if (words == null || words.length == 0) {
            return "";
        }
        
        // 严格来说，在数组长度为1的时候要先对其排序后再返回
        if (words.length == 1) {
            char[] res = words[0].toCharArray();
            Arrays.sort(res);
            return new String(res);
        }
        
        // 对words进行建图
        Map<Character, Set<Character>> graph = buildGraph(words);
        // 如果返回了空图说明不存在合法字典序，返回空串
        if (graph == null) {
            return "";
        }
        
        // 得到所有顶点的入度
        Map<Character, Integer> indegrees = getIndegrees(graph);
        
        // 开一个优先队列，将所有入度为0的顶点入队
        PriorityQueue<Character> pq = new PriorityQueue<>();
        for (Map.Entry<Character, Integer> entry : indegrees.entrySet()) {
            if (entry.getValue() == 0) {
                pq.offer(entry.getKey());
            }
        }
        
        StringBuilder sb = new StringBuilder();
        while (!pq.isEmpty()) {
            char cur = pq.poll();
            sb.append(cur);
            for (char next : graph.get(cur)) {
                indegrees.put(next, indegrees.get(next) - 1);
                if (indegrees.get(next) == 0) {
                    pq.offer(next);
                }
            }
        }
        // 最后字典序的长度不足字符个数，则说明存在环，返回空串
        if (sb.length() != graph.size()) {
            return "";
        }
        
        return sb.toString();
    }
    
    private Map<Character, Integer> getIndegrees(Map<Character, Set<Character>> graph) {
        Map<Character, Integer> indegrees = new HashMap<>();
        for (Map.Entry<Character, Set<Character>> entry : graph.entrySet()) {
            indegrees.putIfAbsent(entry.getKey(), 0);
            for (char ch : entry.getValue()) {
                indegrees.put(ch, indegrees.getOrDefault(ch, 0) + 1);
            }
        }
        
        return indegrees;
    }
    
    private Map<Character, Set<Character>> buildGraph(String[] words) {
        Map<Character, Set<Character>> graph = new HashMap<>();
        // 把所有字符（也就是顶点）都先加入图中
        for (int i = 0; i < words.length; i++) {
            for (int j = 0; j < words[i].length(); j++) {
                graph.putIfAbsent(words[i].charAt(j), new HashSet<>());
            }
        }
        
        for (int i = 0; i < words.length - 1; i++) {
            String w1 = words[i], w2 = words[i + 1];
            int idx = 0;
            while (idx < w1.length() && idx < w2.length()) {
                char c1 = w1.charAt(idx), c2 = w2.charAt(idx);
                if (c1 != c2) {
                    graph.get(c1).add(c2);
                    break;
                }
                idx++;
            }
            
            // 这个对应的情况是两个字符串对应位置字符都相等，但排在后面的字符串更短，
            // 这时是不存在合法字典序的，直接返回空图
            if (idx == w2.length() && w2.length() < w1.length()) {
                return null;
            }
        }
        
        return graph;
    }
}

时间复杂度 $O(V\log V+E)$ ，空间 $O (V + E)$ 。

算法正确性证明：
算法得出的序列是拓扑排序，这一点是没问题的。至于为什么字典序最小，可以用数学归纳法来证明。当第一个点出队后，这个点显然是排在第一位的，之后将其所有邻居的入度减一，也就是将原图中的这个点和其所有邻边都删掉，这样问题规模就变小了，由归纳假设，剩余图的字典序最小的拓扑排序就得到了，由数学归纳法知道算法正确。