平衡搜索树
2-3 search trees
于是先来了解下 2-3 查找树,它可以保证树的平衡性,维护树高在 lgN 级别。这里的 2,3 指的是孩子的数目
search
查找和二叉查找树一样,虽然现在有的点有两个键,但是也没有什么关系。
insert
插入操作比较关键,解释了为什么可以保证树的平衡性,下面是各种情况的示意:
插入 2-node 时,直接插入把这个节点变成 3-node 即可,上面也没有列出来。插入 3-node 时比较复杂,要先暂时变成 4-node,然后再把三个键中间的键向父母节点转移,问题就又转移到了父母节点上。不难发现,只有在插入路径上全部都是 3-node 时,插入才会让树的高度加一(一路到根节点变成上图情况一)。
而且,上面的操作找到位置后只是改变链接的局部变换,没有数据转移什么的,时间很快,效率挺高。最坏情况下都是 2-node,树高为 lgN,最好情况下都是 3-node,树高为 log3N≈.631lgN,反正树高是对数级别,也就保证了查找和插入对数级别的性能。
但是吧,谈到实现,直接实现太复杂,有好多不同类型的节点,还要进行类型转换,而且需要处理的情况也有很多。实现这些不仅需要大量的代码,而且它们产生的额外开销可能会使算法比标准的二叉查找树更慢。我们希望维护树的平衡,同时也希望保障所需的代码能够越少越好。于是乎,红黑树出现啦!
red-black BSTs
红黑树本质上还是二叉树,关键是在标准二叉查找树的基础上添加了一些信息来表示 3-node:
3-node 里的两个键用左斜的红链接连接,较大的键为根,任意的 2-3 树都有唯一的红黑树与之对应:
在这样的表示下,显然不存在有两条红链接的节点,而且任意从根节点到空链接的路径上的黑链接数都是一样的(perfect black balance),还有注意红链接是左连接,在构建的时候要维护这些性质。
private static final boolean RED = true;
private static final boolean BLACK = false;
private class Node {
Key key;
Value val;
Node left, right;
boolean color; // color of parent link
}
private boolean isRed(Node x) {
if (x == null) return false; // null link is black
return x.color == RED;
}
根据key得到value:
public Value get(Key key) {
Node x = root;
while (x != null) {
int cmp = key.compareTo(x.key);
if (cmp < 0) x = x.left;
else if (cmp > 0) x = x.right;
else return x.val;
}
return null;
}
其它一些只要比较而不会破坏树结构的顺序相关操作也是,直接用原来二叉查找树的代码就好,主要还是看插入操作。
回想 2-3 树的插入操作,实际上都是直接来,性质被破坏了再调整。像 2-node 就直接变成 3-node,而 3-node 会暂时变成 4-node,然后再去调整。红黑树也是这样,新加入一个节点时,都把新链接认为是红色的,不符合性质再调整,像链接不是左斜或是一个节点有两个红链接。
所以,来了解下为了维护性质的调整操作。
left rotation:
左旋,顾名思义就是把红链接从右斜转到左斜。
right rotation:
右旋和左旋相反,这是一个中间状态,有时候需要先右旋再处理才行,下面会见到。
color flip
颜色转换,甚至不需要改变任何链接,只要改变颜色就好。
虽然红黑树的插入情况看起来好像很多,但是其实可以用下图来概括(最左边的就要先右旋):
private Node put(Node h, Key key, Value val) {
// insert at bottom(and color it red)
if (h == null) return new Node(Key, val, RED);
int cmp = key.compareTo(h.key);
if (cmp < 0) h.left = put(h.left, key, val);
else if (cmp > 0) h.right = put(h.right, key, val);
else h.val = val;
if (isRed(h.right) && !isRed(h.left)) h = rotateLeft(h); // lean left
if (isRed(h.left) && isRed(h.left.left)) h = rorateRight(h); // balance 4-node
if (isRed(h.left) && isRed(h.right)) flipColors(h);
return h;
}
B-trees
B 树是一个非常典型的红黑树的实际应用,是平衡树的泛化,每个节点里可以有很多键。因为通常来说,我们需要存储的数据非常大,找到存储数据所在页的时间要比从页里读取数据慢得多,所以我们希望能尽快定位特定的页。B 树每个节点可以有很多很多键,多到可以放一整页的那种:
插入的时候要注意维护树的平衡性,键数目达到 M 的节点需要分裂并向上调整,例图:
https://www.cnblogs.com/mingyueanyao/p/10322643.html
Interview Questions: Balanced Search Trees
1.BST需要满足 左结点L < 根结点P < 右结点R,那么这样设定:如果 L > P,那么说明L为红结点;如果 R < P,那么说明R为红结点。
因此在左偏红黑树中,将一个结点标记为红色的方法是:将结点red的值变为 red.value=2∗parent−red.value。
public class LLRB {
private static final int LEFT = 1;
private static final int RIGHT = 2;
private Node root;
public boolean search(int target) {
Node x = root;
int preValue = x.val;
int flag = 0;
while (x != null) {
int trueValue = x.val;
if (flag == LEFT && trueValue > preValue) {
trueValue = 2 * preValue - trueValue;//把true标记为红点
}
if (target < trueValue) {
x = x.left;
flag = LEFT;
preValue = trueValue;
} else if (target > trueValue) {
x = x.right;
} else {
return true;
}
}
return false;
}
public void insert(int val) {
insert(root, val, 0, 0);
root.val = -1 * root.val;
}
private Node insert(Node x, int val, int preValue, int flag) {
if (x == null) {
return new Node(2 * preValue - val);
}
int trueValue = x.val;
if (flag == LEFT && trueValue > preValue) {
trueValue = 2 * preValue - trueValue;//标记为红点
}
if (val < trueValue) {
x.left = insert(x.left, val, trueValue, LEFT);
} else if (val > trueValue) {
x.right = insert(x.right, val, trueValue, RIGHT);
}
if ((x.right != null && x.right.val > trueValue) && (x.left == null || x.left.val < trueValue)) {//isred(x.right) && !isred(x.left)
x = rotateLeft(x, trueValue, trueValue != x.val);
}
if (x.left != null && x.left.left != null && x.left.val > trueValue
&& x.left.left.val > (2 * (2 * trueValue - x.left.val) - x.left.left.val)) {//(isRed(h.left) && isRed(h.left.left))
x = rotateRight(x, trueValue, trueValue != x.val);
}
if ((x.left != null && x.left.val > trueValue) && (x.right != null && x.right.val < trueValue)) {
flipColors(x, trueValue);
}
return x;
}
private Node rotateLeft(Node x, int preValue, boolean isRed) {
Node y = x.right;
if (isRed) {
x.val = 2 * preValue - x.val;
}
y.val = 2 * x.val - y.val;
x.right = y.left;
y.left = x;
x.val = 2 * y.val - x.val;
if (isRed) {
y.val = 2 * preValue - y.val;
}
return y;
}
private Node rotateRight(Node x, int preValue, boolean isRed) {
Node y = x.left;
if (isRed) {
x.val = 2 * preValue - x.val;
}
y.val = 2 * x.val - y.val;
x.left = y.right;
y.right = x;
x.val = 2 * y.val - x.val;
if (isRed) {
y.val = 2 * preValue - y.val;
}
return y;
}
private void flipColors(Node x, int preValue) {
x.left.val = 2 * x.val - x.left.val;
x.right.val = 2 * x.val - x.right.val;
x.val = 2 * preValue - x.val;
}
private class Node {
Node left, right;
int val;
Node(int val) {
this.val = val;
}
}
}
2.给定一组n个被查找单词,和一组m个待查找单词,要求在n个被查找单词中找到一个最小区间,使得这个区间包含所有m个待查找单词,且这m个单词出现的顺序和给定的顺序一致。
解法:给每一个不同的单词开一个列表,记录该单词在n个被查找单词中出现的下标位置(有序);得到m个待查找单词对应的位置列表,从这m个列表中各挑出一个数字组成序列,要求该序列首尾之差最小,且要保证序列为递增序列。
import java.util.*;
public class Document {
private String[] document;
private String[] query;
public void search() {
// 记录满足条件的最短区间的两个端点
// 将first设为-1有助于判断最终能否找到有效的区间
int first = -1, last = document.length - 1;
// 根据每个不同的单词生成对应的递增下标序列
Map<String, Queue<Integer>> map = new HashMap<>();
for (int i = 0; i < document.length; i++) {
if (!map.containsKey(document[i])) {
Queue<Integer> q = new LinkedList<>();
q.add(i);
map.put(document[i], q);
} else {
map.get(document[i]).add(i);
}
}
// 摘出m个待查找单词对应的下标序列
Queue<Integer>[] q = new Queue[query.length];
for (int i = 0; i < query.length; i++) {
q[i] = map.get(query[i]);
}
// 搜寻满足条件的区间
OUTER:
for (int i : q[0]) {
int left = i, right = i; // 记录当前区间的两个端点
for (int j = 1; j < q.length; j++) {
// 每次都选取各序列中满足条件的最小的位置,并出队小于这个位置的所有下标
while (!q[j].isEmpty() && q[j].peek() <= right) {
q[j].poll();
}
if (q[j].isEmpty()) {
break OUTER; // 当某一单词对应下标全出队后,可直接跳出外层循环
} else {
right = q[j].peek(); // 更新当前区间的右端点
}
}
// 更新最短区间的左右端点
if (right - left < last - first) {
first = left;
last = right;
}
}
if (first != -1) {
System.out.println(last - first);
} else {
System.out.println("Not Found");
}
}
}
Programming Assignment 5: Kd-Trees
Kd-Trees
利用树解决几何搜索问题。
这里实现了范围搜索和最近邻搜索。
所谓范围搜索,即指定一个矩形范围,找到位于这个范围内的所有点。
所谓最近邻搜索,即指定一个查询点,找出点集中离这个点距离最近的一个点。
PointSET.java
PointSET使用暴力方法实现,线性级别时间代价。
对于大样本搜索不太适用。
import edu.princeton.cs.algs4.Point2D;
import edu.princeton.cs.algs4.Queue;
import edu.princeton.cs.algs4.RectHV;
import edu.princeton.cs.algs4.SET;
import edu.princeton.cs.algs4.StdDraw;
import edu.princeton.cs.algs4.StdOut;
import edu.princeton.cs.algs4.StdRandom;
import edu.princeton.cs.algs4.Stopwatch;
public class PointSET
{
private SET<Point2D> points; // set of points
//构造一个空的点集
public PointSET()
{
points = new SET<Point2D>();
}
public boolean isEmpty()
{
return points.isEmpty();
}
public int size()
{
return points.size();
}
public void insert(Point2D p)
{
if (p == null) throw new NullPointerException("Null point");
points.add(p);
}
public boolean contains(Point2D p)
{
if (p == null) throw new NullPointerException("Null point");
return points.contains(p);
}
public void draw()
{
StdDraw.setPenColor(StdDraw.BLACK);
StdDraw.setPenRadius(0.01);
for (Point2D point : points)
point.draw();
}
// 返回矩阵内所有点
public Iterable<Point2D> range(RectHV rect)
{
if (rect == null) throw new NullPointerException("Null rectangle");
Queue<Point2D> pointsInRect = new Queue<Point2D>();
for (Point2D point : points)
if (rect.contains(point)) pointsInRect.enqueue(point);
return pointsInRect;
}
//返回集合中最近邻点p;
public Point2D nearest(Point2D p)
{
if (p == null) throw new NullPointerException("Null point");
if (points.isEmpty()) return null;
Point2D nearestPoint = points.min();
double minDist = Double.POSITIVE_INFINITY;
for (Point2D point : points)
{
double dist = p.distanceSquaredTo(point);
if (minDist > dist)
{
nearestPoint = point;
minDist = dist;
}
}
return nearestPoint;
}
/**
* Unit tests the {@code PointSET} data type.
*
* @param args the command-line arguments
*/
public static void main(String[] args)
{
double timeOfInsert = 0.0;
double timeOfNearest = 0.0;
double timeOfRange = 0.0;
PointSET brute = new PointSET();
Stopwatch timer;
Point2D p;
for (int i = 0; i < 1000000; i++)
{
p = new Point2D(StdRandom.uniform(0.0, 1.0),
StdRandom.uniform(0.0, 1.0));
timer = new Stopwatch();
brute.insert(p);
timeOfInsert += timer.elapsedTime();
}
StdOut.print("time cost of insert(random point)(1M times) : ");
StdOut.println(timeOfInsert);
for (int i = 0; i < 100; i++)
{
p = new Point2D(StdRandom.uniform(0.0, 1.0),
StdRandom.uniform(0.0, 1.0));
timer = new Stopwatch();
brute.nearest(p);
timeOfNearest += timer.elapsedTime();
}
StdOut.print("time cost of nearest(random point)(100 times) : ");
StdOut.println(timeOfNearest);
for (int i = 0; i < 100; i++)
{
double xmin = StdRandom.uniform(0.0, 1.0);
double ymin = StdRandom.uniform(0.0, 1.0);
double xmax = StdRandom.uniform(0.0, 1.0);
double ymax = StdRandom.uniform(0.0, 1.0);
RectHV rect;
if (xmin > xmax)
{
double swap = xmin;
xmin = xmax;
xmax = swap;
}
if (ymin > ymax)
{
double swap = ymin;
ymin = ymax;
ymax = swap;
}
rect = new RectHV(xmin, ymin, xmax, ymax);
timer = new Stopwatch();
brute.range(rect);
timeOfRange += timer.elapsedTime();
}
StdOut.print("time cost of range(random rectangle)(100 times): ");
StdOut.println(timeOfRange);
}
}
KdTree.java
KdTree使用2d-tree实现。一般为对数级别时间代价。
注意这里的insert()等方法的实现最好使用一个私有辅助函数,可以在方法中增加参数,方便代码组织。
在实现最近邻搜索时有一个难点,那就是搜索完左底(或右顶)矩形中的最近邻点后,如何判断该不该搜索另一边的子树的问题。
这里提供的方法是搜索完一边的矩形范围,得到最近邻点,就再求得查询点与另一边矩形的最近距离,将与这边最近邻点的距离比较,若小于,就还得再搜索另一边,否则就不再搜索。
因为查询点与另一边矩阵范围内的任一点的距离都不会小于与这个矩阵范围的距离,所以如果查询点和已得到的最近邻点的距离要小于与另一边矩阵范围的距离,那么另一边就不存在更近的点了,就不必再搜索对应子树。这是与暴力方法相比最大的改进。
import edu.princeton.cs.algs4.Point2D;
import edu.princeton.cs.algs4.Queue;
import edu.princeton.cs.algs4.RectHV;
import edu.princeton.cs.algs4.StdDraw;
import edu.princeton.cs.algs4.StdOut;
import edu.princeton.cs.algs4.StdRandom;
import edu.princeton.cs.algs4.Stopwatch;
/**
* The {@code KdTree} class represents a set of points
* in the unit square. It supports efficient
* <em>range search</em> (find all of the points contained
* in a query rectangle) and <em>nearest neighbor search</em>
* (find a closest point to a query point) by using a 2d-tree.
*
* @author zhangyu
* @date 2017.4.3
*/
public class KdTree
{
private Node root;
private int size;
private static class Node
{
private Point2D p; // the point
private RectHV rect; // the axis-aligned rectangle corresponding to this node
private Node lb; // the left/bottom subtree
private Node rt; // the right/top subtree
private boolean isEvenLevel; // is the node at even level
public Node(Point2D p, RectHV rect, boolean isEvenLevel)
{
this.p = p;
this.rect = rect;
this.isEvenLevel = isEvenLevel;
}
}
/**
* Initializes an empty 2d-tree.
*/
public KdTree() { }
/**
* Returns true if the 2d-tree is empty.
*
* @return true if the 2d-tree is empty;
* false otherwise
*/
public boolean isEmpty()
{
return size == 0;
}
/**
* Returns the number of nodes in the 2d-tree.
*
* @return the number of nodes in the 2d-tree
*/
public int size()
{
return size;
}
/**
* Inserts point into the 2d-tree.
*
* @param p the point
* @throws NullPointerException if the point is null
*/
public void insert(Point2D p)
{
if (p == null) throw new NullPointerException("Null point");
root = insert(root, null, p, 0);
}
private Node insert(Node x, Node parent, Point2D p, int direction)
{
if (x == null)
{
// if 2d-tree is null, then insert Node with a unit rectangle
if (size++ == 0) return new Node(p, new RectHV(0, 0, 1, 1), true);
RectHV rectOfX = parent.rect; // rectangle of Node x
if (direction < 0) // go left sub-tree
{
if (parent.isEvenLevel) // left sub-rectangle
rectOfX = new RectHV(parent.rect.xmin(), parent.rect.ymin(),
parent.p.x(), parent.rect.ymax());
else // bottom sub-rectangle
rectOfX = new RectHV(parent.rect.xmin(), parent.rect.ymin(),
parent.rect.xmax(), parent.p.y());
}
else if (direction > 0) // go right sub-tree
{
if (parent.isEvenLevel) // right sub-rectangle
rectOfX = new RectHV(parent.p.x(), parent.rect.ymin(),
parent.rect.xmax(), parent.rect.ymax());
else // top sub-rectangle
rectOfX = new RectHV(parent.rect.xmin(), parent.p.y(),
parent.rect.xmax(), parent.rect.ymax());
}
return new Node(p, rectOfX, !parent.isEvenLevel);
}
int cmp = compare(p, x.p, x.isEvenLevel);
if (cmp < 0) x.lb = insert(x.lb, x, p, cmp);
else if (cmp > 0) x.rt = insert(x.rt, x, p, cmp);
return x;
}
private int compare(Point2D p, Point2D q, boolean isEvenLevel)
{
if (p == null || q == null) throw new NullPointerException("Null point");
if (p.equals(q)) return 0;
if (isEvenLevel) return p.x() < q.x() ? -1 : 1;
else return p.y() < q.y() ? -1 : 1;
}
/**
* Does the 2d-tree contain point p?
*
* @param p the point
* @return true if the 2d-tree contains p;
* false otherwise
* @throws NullPointerException if the point is null
*/
public boolean contains(Point2D p)
{
if (p == null) throw new NullPointerException("Null point");
return contains(root, p);
}
private boolean contains(Node x, Point2D p)
{
if (x == null) return false;
int cmp = compare(p, x.p, x.isEvenLevel);
if (cmp < 0) return contains(x.lb, p);
else if (cmp > 0) return contains(x.rt, p);
else return true;
}
/**
* Draws all points to standard draw.
*/
public void draw()
{
draw(root);
}
private void draw(Node x)
{
if (x == null) return;
draw(x.lb);
draw(x.rt);
StdDraw.setPenColor(StdDraw.BLACK);
StdDraw.setPenRadius(0.01);
x.p.draw();
StdDraw.setPenRadius();
// draw the splitting line segment
if (x.isEvenLevel)
{
StdDraw.setPenColor(StdDraw.RED);
StdDraw.line(x.p.x(), x.rect.ymin(), x.p.x(), x.rect.ymax());
}
else
{
StdDraw.setPenColor(StdDraw.BLUE);
StdDraw.line(x.rect.xmin(), x.p.y(), x.rect.xmax(), x.p.y());
}
}
/**
* Returns all points that are inside the rectangle as an {@code Iterable}.
*
* @param rect the rectangle
* @return all points inside the rectangle
*/
public Iterable<Point2D> range(RectHV rect)
{
if (rect == null) throw new NullPointerException("Null rectangle");
Queue<Point2D> pointQueue = new Queue<Point2D>();
range(root, pointQueue, rect);
return pointQueue;
}
private void range(Node x, Queue<Point2D> pointQueue, RectHV rect)
{
if (x == null) return;
if (rect.contains(x.p)) pointQueue.enqueue(x.p);
// if the left sub-rectangle intersects rect, then search the left-tree
if (x.lb != null && rect.intersects(x.lb.rect)) range(x.lb, pointQueue, rect);
if (x.rt != null && rect.intersects(x.rt.rect)) range(x.rt, pointQueue, rect);
}
/**
* Returns a nearest neighbor in the 2d-tree to point p;
* null if the 2d-tree is empty.
*
* @param p the point
* @return a nearest neighbor in the 2d-tree to p
*/
public Point2D nearest(Point2D p)
{
if (p == null) throw new NullPointerException("Null point");
if (root == null) return null;
return nearest(root, root.p, p);
}
private Point2D nearest(Node x, Point2D nearest, Point2D p)
{
if (x == null) return nearest;
int cmp = compare(p, x.p, x.isEvenLevel);
if (p.distanceSquaredTo(x.p) < p.distanceSquaredTo(nearest)) nearest = x.p;
if (cmp < 0)
{
nearest = nearest(x.lb, nearest, p);
// compare the current nearest to the possible nearest in the other side
if (x.rt != null)
if (nearest.distanceSquaredTo(p) > x.rt.rect.distanceSquaredTo(p))
nearest = nearest(x.rt, nearest, p);
}
else if (cmp > 0)
{
nearest = nearest(x.rt, nearest, p);
if (x.lb != null)
if (nearest.distanceSquaredTo(p) > x.lb.rect.distanceSquaredTo(p))
nearest = nearest(x.lb, nearest, p);
}
return nearest;
}
/**
* Unit tests the {@code KdTree} data type.
*
* @param args the command-line arguments
*/
public static void main(String[] args)
{
double timeOfInsert = 0.0;
double timeOfNearest = 0.0;
double timeOfRange = 0.0;
KdTree kdtree = new KdTree();
Stopwatch timer;
Point2D p;
for (int i = 0; i < 1000000; i++)
{
p = new Point2D(StdRandom.uniform(0.0, 1.0),
StdRandom.uniform(0.0, 1.0));
timer = new Stopwatch();
kdtree.insert(p);
timeOfInsert += timer.elapsedTime();
}
StdOut.print("time cost of insert(random point)(1M times) : ");
StdOut.println(timeOfInsert);
for (int i = 0; i < 100; i++)
{
p = new Point2D(StdRandom.uniform(0.0, 1.0),
StdRandom.uniform(0.0, 1.0));
timer = new Stopwatch();
kdtree.nearest(p);
timeOfNearest += timer.elapsedTime();
}
StdOut.print("time cost of nearest(random point)(100 times) : ");
StdOut.println(timeOfNearest);
for (int i = 0; i < 100; i++)
{
double xmin = StdRandom.uniform(0.0, 1.0);
double ymin = StdRandom.uniform(0.0, 1.0);
double xmax = StdRandom.uniform(0.0, 1.0);
double ymax = StdRandom.uniform(0.0, 1.0);
RectHV rect;
if (xmin > xmax)
{
double swap = xmin;
xmin = xmax;
xmax = swap;
}
if (ymin > ymax)
{
double swap = ymin;
ymin = ymax;
ymax = swap;
}
rect = new RectHV(xmin, ymin, xmax, ymax);
timer = new Stopwatch();
kdtree.range(rect);
timeOfRange += timer.elapsedTime();
}
StdOut.print("time cost of range(random rectangle)(100 times): ");
StdOut.println(timeOfRange);
}
}