4 Values whose Sum is 0(二分)

119 篇文章 1 订阅
113 篇文章 1 订阅

题目链接:4 Values whose Sum is 0

啊啊啊,这一题真是亏了,应该二分交一发的,只怪那回lower_bound和upper_bound写出阴影了,导致这回都不敢交了(要好好学习一下了)。

题目大意:给你n组数,每组给出四个数,从每一列选出一个数,使得四个数的和为0,问有多少种组合方法。

思路:前两组任意组合,记录所有的情况存入sum1数组中,后两组也任意组合,记录所有的情况存入sum2数组中。将两组数都从小到大排列。遍历sum1中的数,用lower_bound和upper_bound在sum2数组中查找 “-sum1[i]”,求一下和即可。

这回我一定要记住了,不要在记混了,还要在学习一下,手写的查找上限和下限。

代码:

#include<stdio.h>
#include<string.h>
#include<algorithm>
using namespace std;
#define mem(a,b) memset(a,b,sizeof(a))
const int maxn=4009;
int a[maxn],b[maxn],c[maxn],d[maxn];

int sum1[maxn*maxn],sum2[maxn*maxn];

int main()
{
    int n;
    while(~scanf("%d",&n))
    {
        for(int i=0; i<n; i++)
            scanf("%d%d%d%d",&a[i],&b[i],&c[i],&d[i]);

        for(int i=0; i<n; i++)//前两个的组合
            for(int j=0; j<n; j++)
                sum1[i*n+j]=a[i]+b[j];

        for(int i=0; i<n; i++)//后两个的组合
            for(int j=0; j<n; j++)
                sum2[i*n+j]=c[i]+d[j];

        sort(sum1,sum1+n*n);
        sort(sum2,sum2+n*n);

        int ans=0;
        int f;
        for(int i=0; i<n*n; i++)
        {
            if(i-1>=0&&sum1[i]==sum1[i-1])//重复的不用再查了
                ans+=f;
            else
            {
                int s=lower_bound(sum2,sum2+n*n,-sum1[i])-sum2;//查下限
                int e=upper_bound(sum2,sum2+n*n,-sum1[i])-sum2;//查上限
                ans+=e-s;
                f=e-s;
            }
        }
        printf("%d\n",ans);
    }
    return 0;
}
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
KGB Archiver console version ?005-2006 Tomasz Pawlak, [email protected], mod by Slawek ([email protected]) based on PAQ6 by Matt Mahoney PAQ6v2 - File archiver and compressor. (C) 2004, Matt Mahoney, [email protected] This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation at http://www.gnu.org/licenses/gpl.txt or (at your option) any later version. This program is distributed without any warranty. USAGE To compress: PAQ6 -3 archive file file... (1 or more file names), or or (MSDOS): dir/b | PAQ6 -3 archive (read file names from input) or (UNIX): ls | PAQ6 -3 archive To decompress: PAQ6 archive (no option) To list contents: more < archive Compression: The files listed are compressed and stored in the archive, which is created. The archive must not already exist. File names may specify a path, which is stored. If there are no file names on the command line, then PAQ6 prompts for them, reading until the first blank line or end of file. The -3 is optional, and is used to trade off compression vs. speed and memory. Valid options are -0 to -9. Higher numbers compress better but run slower and use more memory. -3 is the default, and gives a reasonable tradeoff. Recommended options are: -0 to -2 for fast (2X over -3) but poor compression, uses 2-6 MB memory -3 for reasonably fast and good compression, uses 18 MB (default) -4 better compression but 3.5X slower, uses 64 MB -5 slightly better compression, 6X slower than -3, uses 154 MB -6 about like -5, uses 202 MB memory -7 to -9 use 404 MB, 808 MB, 1616 MB, about the same speed as -5 Decompression: No file names are specified. The archive must exist. If a path is stored, the file is extracted to the appropriate directory, which must exist. PAQ6 does not create directories. If the file to be extracted already exists, it is not replaced; rather it is compared with the archived file, and the offset of the first difference is reported. The decompressor requires as much memory as was used to compress. There is no option. It is not possible to add, remove, or update files in an existing archive. If you want to do this, extract the files, delete the archive, and create a new archive with just the files you want. TO COMPILE gxx -O PAQ6.cpp DJGPP 2.95.2 bcc32 -O2 PAQ6.cpp Borland 5.5.1 sc -o PAQ6.cpp Digital Mars 8.35n g++ -O produces the fastest executable among free compilers, followed by Borland and Mars. However Intel 8 will produce the fastest and smallest Windows executable overall, followed by Microsoft VC++ .net 7.1 /O2 /G7 PAQ6 DESCRIPTION 1. OVERVIEW A PAQ6 archive has a header, listing the names and lengths of the files it contains in human-readable format, followed by the compressed data. The first line of the header is "PAQ6 -m" where -m is the memory option. The data is compressed as if all the files were concatenated into one long string. PAQ6 uses predictive arithmetic coding. The string, y, is compressed by representing it as a base 256 number, x, such that: P(s < y) <= x < P(s <= y) (1) where s is chosen randomly from the probability distribution P, and x has the minimum number of digits (bytes) needed to satisfy (1). Such coding is within 1 byte of the Shannon limit, log 1/P(y), so compression depends almost entirely on the goodness of the model, P, i.e. how well it estimates the probability distribution of strings that might be input to the compressor. Coding and decoding are illustrated in Fig. 1. An encoder, given P and y, outputs x. A decoder, given P and x, outputs y. Note that given P in equation (1), that you can find either x from y or y from x. Note also that both computations can be done incrementally. As the leading characters of y are known, the range of possible x narrows, so the leading digits can be output as they become known. For decompression, as the digits of x are read, the set of possible y satisfying (1) is restricted to an increasingly narrow lexicographical range containing y. All of the strings in this range will share a growing prefix. Each time the prefix grows, we can output a character. y +--------------------------+ Uncomp- | V ressed | +---------+ p +----------+ x Compressed Data --+--->| Model |----->| Encoder |----+ Data +---------+ +----------+ | | +----------+ V y +---------+ p +----------+ y Uncompressed +--->| Model |----->| Decoder |----+---> Data | +---------+ +----------+ | | | +-------------------------------------+ Fig. 1. Predictive arithmetic compression and decompression Note that the model, which estimates P, is identical for compression and decompression. Modeling can be expressed incrementally by the chain rule: P(y) = P(y_1) P(y_2|y_1) P(y_3|y_1 y_2) ... P(y_n|y_1 y_2 ... y_n-1) (2) where y_i means the i'th character of the string y. The output of the model is a distribution over the next character, y_i, given the context of characters seen so far, y_1 ... y_i-1. To simplify coding, PAQ6 uses a binary string alphabet. Thus, the output of a model is an estimate of P(y_i = 1 | context) (henceforth p), where y_i is the i'th bit, and the context is the previous i - 1 bits of uncompressed data. 2. PAQ6 MODEL The PAQ6 model consists of a weighted mix of independent submodels which make predictions based on different contexts. The submodels are weighted adaptively to favor those making the best predictions. The output of two independent mixers (which use sets of weights selected by different contexts) are averaged. This estimate is then adjusted by secondary symbol estimation (SSE), which maps the probability to a new probability based on previous experience and the current context. This final estimate is then fed to the encoder as illustrated in Fig. 2. Uncompressed input -----+--------------------+-------------+-------------+ | | | | V V | | +---------+ n0, n1 +----------+ | | | Model 1 |--------->| Mixer 1 |\ p | | +---------+ \ / | | \ V V \ / +----------+ \ +-----+ +------------+ +---------+ \ / \| | p | | Comp- | Model 2 | \/ + | SSE |--->| Arithmetic |--> ressed +---------+ /\ | | | Encoder | output ... / \ /| | | | / \ +----------+ / +-----+ +------------+ +---------+ / \ | Mixer 2 | / | Model N |--------->| |/ p +---------+ +----------+ Fig. 2. PAQ6 Model details for compression. The model is identical for decompression, but the encoder is replaced with a decoder. In Sections 2-6, the description applies to the default memory option (-5, or MEM = 5). For smaller values of MEM, some components are omitted and the number of contexts is less. 3. MIXER The mixers compute a probability by a weighted summation of the N models. Each model outputs two numbers, n0 and n1 represeting the relative probability of a 0 or 1, respectively. These are combined using weighted summations to estimate the probability p that the next bit will be a 1: SUM_i=1..N w_i n1_i (3) p = -------------------, n_i = n0_i + n1_i SUM_i=1..N w_i n_i The weights w_i are adjusted after each bit of uncompressed data becomes known in order to reduce the cost (code length) of that bit. The cost of a 1 bit is -log(p), and the cost of a 0 is -log(1-p). We find the gradient of the weight space by taking the partial derivatives of the cost with respect to w_i, then adjusting w_i in the direction of the gradient to reduce the cost. This adjustment is: w_i := w_i + e[ny_i/(SUM_j (w_j+wo) ny_j) - n_i/(SUM_j (w_j+wo) n_j)] where e and wo are small constants, and ny_i means n0_i if the actual bit is a 0, or n1_i if the bit is a 1. The weight offset wo prevents the gradient from going to infinity as the weights go to 0. e is set to around .004, trading off between faster adaptation (larger e) and less noise for better compression of stationary data (smaller e). There are two mixers, whose outputs are averaged together before being input to the SSE stage. Each mixer maintains a set of weights which is selected by a context. Mixer 1 maintains 16 weight vectors, selected by the 3 high order bits of the previous byte and on whether the data is text or binary. Mixer 2 maintains 16 weight vectors, selected by the 2 high order bits of each of the previous 2 bytes. To distinguish text from binary data, we use the heuristic that space characters are more common in text than NUL bytes, while NULs are more common in binary data. We compare the position of the 4th from last space with the position of the 4th from last 0 byte. 4. CONTEXT MODELS Individual submodels output a prediction in the form of two numbers, n0 and n1, representing relative probabilities of 0 and 1. Generally this is done by storing a pair of counters (c0,c1) in a hash table indexed by context. When a 0 or 1 is encountered in a context, the appropriate count is increased by 1. Also, in order to favor newer data over old, the opposite count is decreased by the following heuristic: If the count > 25 then replace with sqrt(count) + 6 (rounding down) Else if the count > 1 then replace with count / 2 (rounding down) The outputs are derived from the counts in a way that favors highly predictive contexts, i.e. those where one count is large and the other is small. For the case of c1 >= c0 the following heuristic is used. If c0 = 0 then n0 = 0, n1 = 4 c0 Else n0 = 1, n1 = c1 / c0 For the case of c1 < c0 we use the same heuristic swapping 0 and 1. In the following example, we encounter a long string of zeros followed by a string of ones and show the model output. Note how n0 and n1 predict the relative outcome of 0 and 1 respectively, favoring the most recent data, with weight n = n0 + n1 Input c0 c1 n0 n1 ----- -- -- -- -- 0000000000 10 0 40 0 00000000001 5 1 5 1 000000000011 2 2 1 1 0000000000111 1 3 1 3 00000000001111 1 4 1 4 Table 1. Example of counter state (c0,c1) and outputs (n0,n1) In order to represent (c0,c1) as an 8-bit state, counts are restricted to the values 0-40, 44, 48, 56, 64, 96, 128, 160, 192, 224, or 255. Large counts are incremented probabilistically. For example, if c0 = 40 and a 0 is encountered, then c0 is set to 44 with probability 1/4. Decreases in counter values are deterministic, and are rounded down to the nearest representable state. Counters are stored in a hash table indexed by contexts starting on byte boundaries and ending on nibble (4-bit) boundaries. Each hash element contains 15 counter states, representing the 15 possible values for the 0-3 remaining bits of the context after the last nibble boundary. Hash collisions are detected by storing an 8-bit checksum of the context. Each bucket contains 4 elements in a move-to-front queue. When a new element is to be inserted, the priority of the two least recently accessed elements are compared by using n (n0+n1) of the initial counter as the priority, and the lower priority element is discarded. Hash buckets are aligned on 64 byte addresses to minimize cache misses. 5. RUN LENGTH MODELS A second type of model is used to efficiently represent runs of up to 255 identical bytes within a context. For example, given the sequence "abc...abc...abc..." then a run length model would map "ab" -> ("c", 3) using a hash table indexed by "ab". If a new value is seen, e.g. "abd", then the state is updated to the new character and a count of 1, i.e. "ab" -> ("d", 1). A run length context is accessed 8 times, once for each bit. If the bits seen so far are consistent with the modeled character, then the output of a run length model is (n0,n1) = (0,n) if the next bit is a 1, or (n,0) if the next bit is a 0, where n is the count (1 to 255). If the bits seen so far are not consistent with the predicted byte, then the output is (0,0). These counts are added to the counter state counts to produce the model output. Run lengths are stored in a hash table without collision detection, so an element occupies 2 bytes. Generally, most models store one run length for every 8 counter pairs, so 20% of the memory is allocated to them. Run lengths are used only for memory option (-MEM) of 5 or higher. 6. SUBMODEL DETAILS Submodels differ mainly in their contexts. These are as follows: a. DefaultModel. (n0,n1) = (1,1) regardless of context. b. CharModel (N-gram model). A context consists of the last 0 to N whole bytes, plus the 0 to 7 bits of the partially read current byte. The maximum N depends on the -MEM option as shown in the table below. The order 0 and 1 contexts use a counter state lookup table rather than a hash table. Order Counters Run lengths ----- -------- ----------- 0 2^8 1 2^16 2 2^(MEM+15) 2^(MEM+12), MEM >= 5 3 2^(MEM+17) 2^(MEM+14), MEM >= 5 4 2^(MEM+18) 2^(MEM+15), MEM >= 5 5 2^(MEM+18), MEM >= 1 2^(MEM+15), MEM >= 5 6 2^(MEM+18), MEM >= 3 2^(MEM+15), MEM >= 5 7 2^(MEM+18), MEM >= 3 2^(MEM+15), MEM >= 5 8 2^20, MEM = 5 2^17, MEM = 5 2^(MEM+14), MEM >= 6 2^(MEM+14), MEM >= 6 9 2^20, MEM = 5 2^17, MEM = 5 2^(MEM+14), MEM >= 6 2^(MEM+14), MEM >= 6 Table 2. Number of modeled contexts of length 0-9 c. MatchModel (long context). A context is the last n whole bytes (plus extra bits) where n >=8. Up to 4 matching contexts are found by indexing into a rotating input buffer whose size depends on MEM. The index is a hash table of 32-bit pointers with 1/4 as many elements as the buffer (and therefore occupying an equal amount of memory). The table is indexed by a hashes of 8 byte contexts. No collision detection is used. In order to detect very long matches at a long distance (for example, versions of a file compressed together), 1/16 of the pointers (chosen randomly) are indexed by a hash of a 32 byte context. For each match found, the output is (n0,n1) = (w,0) or (0,w) (depending on the next bit) with a weight of w = length^2 / 4 (maximum 511), depending on the length of the context in bytes. The four outputs are added together. d. RecordModel. This models data with fixed length records, such as tables. The model attempts to find the record length by searching for characters that repeat in the pattern x..x..x..x where the interval between 4 successive occurrences is identical and at least 2. Because of uncertainty in this method, the two most recent values (which must be different) are used. The following 5 contexts are modeled; 1. The two bytes above the current bit for each repeat length. 2. The byte above and the previous byte (to the left) for each repeat length. 3. The byte above and the current position modulo the repeat length, for the longer of the two lengths only. e. SparseModel. This models contexts with gaps. It considers the following contexts, where x denotes the bytes considered and ? denotes the bit being predicted (plus preceding bits, which are included in the context). x.x? (first and third byte back) x..x? x...x? x....x? xx.? x.x.? xx..? c ... c?, gap length c ... xc?, gap length Table 3. Sparse model contexts The last two examples model variable gap lengths between the last byte and its previous occurrence. The length of the gap (up to 255) is part of the context. e. AnalogModel. This is intended to model 16-bit audio (mono or stereo), 24-bit color images, 8-bit data (such as grayscale images). Contexts drop the low order bits, and include the position within the file modulo 2, 3, or 4. There are 8 models, combined into 4 by addition before mixing. An x represents those bits which are part of the context. 16 bit audio: xxxxxx.. ........ xxxxxx.. ........ ? (position mod 2) xxxx.... ........ xxxxxx.. ........ ? (position mod 2) xxxxxx.. ........ ........ ........ xxxxxx.. ........ xxxxxx.. ........ ? (position mod 4 for stereo audio) 24 bit color: xxxx.... ........ ........ xxxxxxxx ........ ........ ? (position mod 3) xxxxxx.. xxxx.... xxxx.... ? (position mod 3) 8 bit data: xxx..... xxxxx... xxxxxxx. ? CCITT images (1 bit per pixel, 216 bytes wide, e.g. calgary/pic) xxxxxxxx (skip 215 bytes...) xxxxxxxx (skip 215 bytes...) ? Table 4. Analog models. f. WordModel. This is intended to model text files. There are 3 contexts: 1. The current word 2. The previous and current words 3. The second to last and current words (skipping a word) A word is defined in two different ways, resulting in a total of 6 different contexts: 1. Any sequence of characters with ASCII code > 32 (not white space). Upper case characters are converted to lower case. 2. Any sequence of A-Z and a-z (case sensitive). g. ExeModel. This models 32-bit Intel .exe and .dll files by changing relative 32-bit CALL addresses to absolute. These instructions have the form (in hex) "E8 xx yy zz 00" or "E8 xx yy zz FF" where the 32-bit operand is stored least significant byte first. These are converted to absolute addresses by adding the position of the E8 byte, and then stored in a 256 element table indexed by the low order byte (xx) along with an 8-bit count. If another E8 xx ... 00/FF with the same value of xx is encountered, then the old value is replaced and the count set back to 1. During modeling, when "E8 xx" is encountered, the bytes yy, zz, and 00/FF are predicted by adjusting xx to absolute address, then looking up the address in the table indexed by xx. If the context matches the table entry up to the current bit, then the next bit from the table is predicted with weight n for yy, 4n for zz, and 16n for 00/FF, where n is the count. 7. SSE The purpose of the SSE stage is to further adjust the probability output from the mixers to agree with actual experience. Ideally this should not be necessary, but in reality this can improve compression. For example, when "compressing" random data, the output probability should be 0.5 regardless of what the models say. SSE will learn this by mapping all input probabilities to 0.5. | Output __ | p / | / | __/ | / | / | | | / |/ Input p +------------- Fig. 3. Example of an SSE mapping. SSE maps the probability p back to p using a piecewise linear function with 32 segments. Each vertex is represented by a pair of 8-bit counters (n0, n1) except that now the counters use a stationary model. When the input is p and a 0 or 1 is observed, then the corresponding count (n0 or n1) of the two vertices on either side of p are incremented. When a count exceeds the maximum of 255, both counts are halved. The output probability is a linear interpolation of n1/n between the vertices on either side. The vertices are scaled to be longer in the middle of the graph and short near the ends. The intial counts are set so that p maps to itself. SSE is context sensitive. There are 2048 separately maintained SSE functions, selected by the 0-7 bits of the current (partial) byte and the 2 high order bits of the previous byte, and on whether the data is text or binary, using the same heuristic as for selecting the mixer context. The final output to the encoder is a weighted average of the SSE input and output, with the output receiving 3/4 of the weight: p := (3 SSE(p) + p) / 4. (4) 8. MEMORY USAGE The -m option (MEM = 0 through 9) controls model and memory usage. Smaller numbers compress faster and use less memory, while higher numbers compress better. For MEM < 5, only one mixer is used. For MEM < 4, bit counts are stored in nonstationary counters, but no run length is stored (decreasing memory by 20%). For MEM < 1, SSE is not used. For MEM < 5, the record, sparse, and analog models are not used. For MEM < 4, the word model is not used. The order of the char model ranges from 4 to 9 depending on MEM for MEM as shown in Table 6. Run Memory used by........................ Total MEM Mixers Len Order Char Match Record Sparse Analog Word SSE Memory (MB) --- ------ --- ----- ---- ----- ------ ------ ------ ---- --- ----------- 0 1 no 4 .5 1 1.5 1 1 no 5 1 2 .12 3 2 1 no 5 2 4 .12 6 3 1 no 7 10 8 .12 18 4 1 no 7 20 16 6 6 1 15 .12 64 5 2 yes 9 66 32 13 11 2 30 .12 154 6 2 yes 9 112 32 13 11 4 30 .12 202 7 2 yes 9 224 64 25 22 9 60 .12 404 8 2 yes 9 448 128 50 45 18 120 .12 808 9 2 yes 9 992 256 100 90 36 240 .12 1616 Table 5. Memory usage depending on MEM (-0 to -9 option). 9. EXPERIMENTAL RESULTS Results on the Calgary corpos are shown below for some top data compressors as of Dec. 30, 2003. Options are set for maximum compression. When possible, the files are all compressed into a single archive. Run times are on a 705 MHz Duron with 256 MB memory, and include 3 seconds to run WRT when applicable. PAQ6 was compiled with DJGPP (g++) 2.95.2 -O. Original size Options 3,141,622 Time Author ------------- ------- --------- ---- ------ gzip 1.2.4 -9 1,017,624 2 Jean Loup Gailly epm r9 c 668,115 49 Serge Osnach rkc a -M80m -td+ 661,102 91 Malcolm Taylor slim 20 a 659,213 159 Serge Voskoboynikov compressia 1.0 beta 650,398 66 Yaakov Gringeler durilca v.03a (as in README) 647,028 30 Dmitry Shkarin PAQ5 661,811 361 Matt Mahoney WRT11 + PAQ5 638,635 258 Przemyslaw Skibinski + PAQ6 -0 858,954 52 -1 750,031 66 -2 725,798 76 -3 709,806 97 -4 655,694 354 -5 648,951 625 -6 648,892 636 WRT11 + PAQ6 -6 626,395 446 WRT20 + PAQ6 -6 617,734 439 Table 6. Compressed size of the Calgary corpus. WRT11 is a word reducing transform written by Przemyslaw Skibinski. It uses an external English dictionary to replace words with 1-3 byte symbols to improve compression. rkc, compressia, and durilca use a similar approach. WRT20 is a newer version of WRT11. 10. ACKNOWLEDGMENTS Thanks to Serge Osnach for introducing me to SSE (in PAQ1SSE/PAQ2) and the sparse models (PAQ3N). Also, credit to Eugene Shelwein, Dmitry Shkarin for suggestions on using multiple character SSE contexts. Credit to Eugene, Serge, and Jason Schmidt for developing faster and smaller executables of previous versions. Credit to Werner Bergmans and Berto Destasio for testing and evaluating them, including modifications that improve compression at the cost of more memory. Credit to Alexander Ratushnyak who found a bug in PAQ4 decompression, and also in PAQ6 decompression for very small files (both fixed). Thanks to Berto for writing PAQ5-EMILCONT-DEUTERIUM from which this program is derived (which he derived from PAQ5). His improvements to PAQ5 include a new Counter state table and additional contexts for CharModel and SparseModel. I refined the state table by adding more representable states and modified the return counts to give greater weight when there is a large difference between the two counts. I expect there will be better versions in the future. If you make any changes, please change the name of the program (e.g. PAQ7), including the string in the archive header by redefining PROGNAME below. This will prevent any confusion about versions or archive compatibility. Also, give yourself credit in the help message.
FastReport.v4.15 for.Delphi.BCB.Full.Source企业版含ClientServer中文修正版支持Delphi 4-XE5 and C++Builder 6-XE5. D2010以上版本(D14_D19)安装必读 delphi2010以上版本(D14_D19)使用者安装时,请将res\frccD14_D19.exe更名名为frcc.exe frccD14_D19.exe是专门的delphi2010以上版本(D14_D19)编码器。其他低delphi版本,请使用frcc.exe FastReport® VCL is an add-on component that allows your application to generate reports quickly and efficiently. FastReport® provides all the tools necessary for developing reports, including a visual report designer, a reporting core, and a preview window. It can be used in Embarcadero (ex Borland and CodeGear) Delphi 4-XE5 and C++Builder 6-XE5. version 4.15 --------------- + Added Embarcadero RAD Studio XE5 support + Added Internal components for FireDac database engine + fixed bug with images in PDF export for OSX viewers + Added ability to set font charset to default in Style Editor - fixed duplex problem when printing several copies of the report - fixed problem with PNG images - fixed problem with TfrxPictureView transparent version 4.14 --------------- + Added Embarcadero RAD Studio XE4 support - [Lazarus] fixed bug with text output - [Lazarus] fixed bug with some visual controls in designer - [Lazarus] improved interface of the report preview and designer - [Lazarus] fixed bug with boolean propertyes in script code and expressions - fixed bug with endless loop in TfrxRichView - fixed bug with Unicode in TfrxMemoView appeared in previous release - improved MAPI interface in TfrxExportMail export - fixed some problems with allpication styles XE2/XE3 - improved compatibility with Fast Report FMX version 4.13 --------------- + Added Lazarus Beta support starts from Fast Report Professionnal edition. Current version allows preview, print and design report template under Windows and Linux platform (qt). + Added Embarcadero RAD Studio XE3 support - fixed compatibility with Fast Report FMX installed in the same IDE. This version can co exist with Fast Report FMX version at the same time. + published "Quality" property of TfrxPDFExport object + published "UseMAPI" property of TfrxExportMail object + published "PictureType" property to ODF export - fixed bug with expressions in RichEdit - fixed bug in multi-column reports - fixed exception in the report designer - fixed bug with URLs in Open Document Text and Open Document Spreadsheet exports - fixed format string in XLS OLE export - fixed format string in XLS BIFF8 export - fixed output of the check boxes on the highlighted lines in PDF export - fixed bug with PDF anchors - fixed bug when using two or more macroses in memo version 4.12 --------------- + added support of Embarcadero Rad Studio EX2 (x32/x64) + added export of Excel formulas in the BIFF export + added export of external URLs in the PDF export + added converter from Rave Reports ConverterRR2FR.pas + added Cross.KeepRowsTogether property + optimised merging cells in the BIFF export + added property DataOnly to exports + pictures format in all exports switched to PNG + improved number formats processing in the BIFF export + added property DataOnly to exports + added property TfrxODFExport.SingleSheet + added property TfrxSimpleTextExport.DeleteEmptyColumns + added property TfrxBIFFExport.DeleteEmptyRows + added progress bar to the BIFF export - fixed bug with frame for some barcode types - fixed wrong metafiles size in the EMF export - fixed processing of negative numbers in the OLE export - fixed bug in handling exceptions in the OLE export - fixed bug in creation of the progress bar (applicable to many exports) - fixed bug in the ODF export in strings processing - fixed bug in the OLE export in numbers formatting - fixed bug in the PDF export in rotating texts 90, 180 and 270 degrees - fixed bug in the ODF export in processing of headers and footers - fixed bug in the Text export in computing object bounds - fixed bug in the ODF export in UTF8 encoding - fixed hiding gridlines around nonempty cells in the BIFF export - fixed images bluring when exporting - fixed word wrapping in the Excel XML export version 4.11 --------------- + added BIFF8 XLS export filter + added to ODF export the Language property + [enterprise] added "scripts" folder for additional units ("uses" directive in report script) + [enterprise] added logs for scheduler (add info in scheduler.log) + [enterprise] added property "Reports" - "Scripts" in server configuration - set the path for "uses" directive in report script + [enterprise] added property "Http" - "MaxSessions" in server configuration - set the limit of maximum session threads, set 0 for unlimit + [enterprise] added property "Reports" - "MaxReports" in server configuration - set the limit of maximum report threads, set 0 for unlimit + [enterprise] added property "Logs" - "SchedulerLog" in server configuration - set the scheduler log file name + [enterprise] added property "Scheduler" - "Active" in server configuration - enable of scheduler + [enterprise] added property "Scheduler" - "Debug" in server configuration - enable writing of debug info in scheduler log + [enterprise] added property "Scheduler" - "StudioPath" in server configuration - set the path to FastReport Studio, leave blank for default - [enterprise] fixed bug with MIME types in http header (content-type) - [enterprise] fixed bug with default configuration (with missed config.xml) - [enterprise] fixed bug with error pages - fixed bug in XML export with the ShowProgress property - fixed bug in RTF export with font size in empty cells - fixed bug in ODF export with UTF8 encoding of the Creator field - fixed bug in XML export with processing special characters in strings - fixed bug in ODF export with properties table:number-columns-spanned, table:number-rows-spanned - fixed bug in ODF export with the background clNone color - fixed bug in ODF export with a style of table:covered-table-cell - fixed bug in ODF export with table:covered-table-cell duplicates - fixed bug in ODF export with excessive text:p inside table:covered-table-cell - fixed bug in ODF export with language styles - fixed bug in ODF export with spaces and tab symbols - fixed bug in ODF export with styles of number cells - fixed bug in ODF export with the background picture - fixed bug in ODF export with charspacing - fixed bug in ODF export with number formatting - fixed bug in ODF export with table-row tag - fixed bug in XLS(OLE) export with numbers formatting - fixed bug in RTF export with processing RTF fields - fixed bug with processing special symbols in HTML Export - fixed bug with UTF8 encoding in ODF export - fixed bug in PDF export with underlined, struck-out and rotated texts version 4.10 --------------- + added support of Embarcadero Rad Studio XE (Delphi EX/C++Builder EX) + added support of TeeChart 2010 packages (new series type aren't support in this release) + added a property TruncateLongTexts to the XLS OLE export that allows to disable truncating texts longer than a specified limit + added option EmbedProt which allows to disable embedding fonts into an encrypted PDF file + added TfrxDateEditControl.WeekNumbers property - fixed bug in XML and PDF exports with Korean charmap - fixed bug in the XLS XML export about striked-out texts - fixed bug about exporting an empty page via the XLS OLE export - fixed bug in the PDF export about coloring the background of pages - fixed bug in embedded designer when using break point in script - fixed bug with lost of focus in font size combo-box in designer - fixed bug with truncate of font size combo-box in Windows Vista/7 in designer (lost of vertical scroll bar) - fixed bug when lost file name in inherited report - fixed bug in multi-page report with EndlessHeight/EndlessWidth - fixed bug wit TfrxHeader.ReprintOnNewpage and KeepTogether - fixed bug in multi-column report with child bands - improved split mechanism (added TfrxStretcheable.HasNextDataPart for complicated data like RTF tables) - improved crosstab speed when using repeat band with crosstab object version 4.9 --------------- + added outline to PDF export + added anchors to PDF export - fixed bug with embedded TTC fonts in PDF export + added an ability to create multiimage TIFF files + added export headers/footers in ODF export + added ability to print/export transparent pictures (properties TfrxPictureView.Transparent and TfrxPictureView.TransparentColor) (PDF export isn't supported) + added new "split to sheet" modes for TfrxXMLExport + added support of /PAGE tag in TfrxRichView, engine automatically break report pages when find /PAGE tag + added ability to hide Null values in TfrxChartView (TfrxChartView.IgnoreNulls = True) + added ability to set any custom page order for printing (i.e. 3,2,1,5,4 ) + [enterprise] added variables "AUTHLOGIN" and "AUTHGROUP" inside the any report + [enterprise] now any report file can be matched with any (one and more) group, these reports are accessible only in matched groups + [enterprise] now you can set-up cache delays for each report file (reports.xml) + [enterprise] added new properties editor for reports in Configuration utility (see Reports tab) + [enterprise] added property "Xml" - "SplitType" in server configuration - allow to select split on pages type between none/pages/printonprev/rowscount + [enterprise] added property "Xml" - "SplitRowsCount" in server configuration - sets the count of rows for "rowscount" split type + [enterprise] added property "Xml" - "Extension" in server configuration - allow select between ".xml" and ".xls" extension for output file + [enterprise] added property "Html" - "URLTarget" in server configuration - allow select the target attribute for report URLs + [enterprise] added property "ReportsFile" - path to file with reports to groups associations and cache delays + [enterprise] added property "ReportsListRenewTimeout" in server configuration + [enterprise] added property "ConfigRenewTimeout" in server configuration + [enterprise] added property "MimeType" for each output format in server configuration + [enterprise] added property "BrowserPrint" in server configuration - allow printing by browser, added new template nav_print_browser.html + [enterprise] added dynamic file name generation of resulting formats (report_name_date_time) * [enterprise] SERVER_REPORTS_LIST and SERVER_REPORTS_HTML variables (list of available reports) depend from user group (for internal authentification) + added drawing shapes in PDF export (not bitmap) + added rotated text in PDF export (not bitmap) + added EngineOptions.IgnoreDevByZero property allow to ignore division by zero exception in expressions + added properties TfrxDBLookupComboBox.DropDownWidth, TfrxDBLookupComboBox.DropDownRows + added event TfrxCustomExportFilter.OnBeginExport + added ability to decrease font size in barcode object + added ability to inseret FNC1 to "code 128" barcode + added event TfrxPreview.OnMouseDown + added support of new unicode-PDF export in D4-D6 and BCB4-BCB6 * improved AddFrom method - anchor coping - fixed bug with WordWrap in PDF export - fixed bug with underlines in PDF export - fixed bug with rounded rectangles in PDF export - fixed CSV export to fit to the RFC 4180 specification - fixed bug with strikeout text in PDF export - fixed bug with incorrect export of TfrxRichView object in RTF format (wrong line spacing) - [enterprise] added critical section in TfrxServerLog.Write - fixed bug with setting up of the Protection Flags in the PDF export dialog window - fixed bug in PDF export (file structure) - fixed bug with pictures in Open Office Writer (odt) export - [enterprise] fixed bug with TfrxReportServer component in Delphi 2010 - fixed minor errors in Embarcedero RAD Studio 2010 - fixed bug with endless loop with using vertical bands together with page header and header with ReprintOnNewPage - fixed bug when using "Keeping" and Cross tables (incorrect cross transfer) - fixed bug with [CopyName#] macros when use "Join small pages" print mode - fixed bug when try to split page with endless height to several pages (NewPage, StartNewPage) - fixed bug with empty line TfrxRichView when adding text via expression - fixed bug when Footer prints even if main band is invisible (FooterAfterEach = True) - fixed resetting of Page variable in double-pass report with TfrxCrossView - fixed bug with loosing of aligning when split TfrxRichView - fixed buzz in reports with TfrxRichView when using RTF 4.1 version 4.8 --------------- + added support of Embarcadero Rad Studio 2010 (Delphi/C++Builder) + added TfrxDBDataset.BCDToCurrency property + added TfrxReportOptions.HiddenPassword property to set password silently from code + added TfrxADOConnection.OnAfterDisconnect event + added TfrxDesigner.MemoParentFont property + added new TfrxDesignerRestriction: drDontEditReportScript and drDontEditInternalDatasets + adedd checksum calculating for 2 5 interleaved barcode + added TfrxGroupHeader.ShowChildIfDrillDown property + added TfrxMailExport.OnSendMail event + added RTF 4.1 support for TfrxRichText object + [enterprise] added Windows Authentification mode + added confirmation reading for TfrxMailExport + added TimeOut field to TfrxMailExport form + added ability to use keeping(KeepTogether/KeepChild/KeepHeader) in multi-column report + added ability to split big bands(biggest than page height) by default * [enterprise] improved CGI for IIS/Apache server * changed PDF export (D7 and upper): added full unicode support, improved performance, decreased memory requirements old PDF export engine saved in file frxExportPDF_old.pas - changed inheritance mechanism, correct inherits of linked objects (fixups) - fixed bug with Mirror Mrgins in RTF, HTML, XLS, XML, OpenOffice exports - fixed bug when cross tab cut the text in corner, when corner height greater than column height - [fs] improved script compilation - improved WatchForm TListBox changet to TCheckListBox - improved AddFrom method - copy outline - Improved functional of vertical bands, shows memos placed on H-band which doesn't across VBand, also calculate expression inside it and call events (like in FR2) - Improved unsorted mode in crosstab(join same columns correctly) - Improved converter from Report Builder - Improved TfrxDesigner.OnInsertObject, should call when drag&drop field from data tree - improved DrillDownd mechanism, should work correct with master-detail-subtetail nesting - fixed bug with DownThenAcross in Cross Tab - fixed several bugs under CodeGear RAD Studio (Delphi/C++Builder) 2009 - fixed bug with emf in ODT export - fixed bug with outline when build several composite reports in double pass mode - fixed bug when group doesn't fit on the whole page - fixed "Page" and "Line" variables inside vertical bands - fixed bug with using KeepHeader in some cases - fixed bug with displacement of subreport when use PrintOnParent property in some cases - fixed small memory leak in subreports - fixed problem with PageFooter and ReportSymmary when use PrintOnPreviousPage property - fixed bug when designer shows commented functions in object inspector - fixed bug when designer place function in commented text block - fixed bug when Engine try to split non-stretcheable view and gone to endless loop - fixed bug with HTML tags in memo when use shot text and WordWrap - [enterprise] fixed bug with variables lost on refresh/export - fixed bug whih PDF,ODT export in Delphi4 and CBuilder4 - fixed bug with some codepage which use two bytes for special symbols (Japanese ans Chinese codepages) - fixed bug when engine delete first space from text in split Memo - fixed bug in multi-column page when band overlap stretched PageHeader - fixed bug with using ReprintOnNewPage version 4.7 --------------- + CodeGear RAD Studio (Delphi/C++Builder) 2009 support + [enterprise] enchanced error description in logs + added properties TfrxHTMLExport.HTMLDocumentBegin: TStrings, TfrxHTMLExport.HTMLDocumentBody: TStrings, TfrxHTMLExport.HTMLDocumentEnd: TStrings + improved RTF export (with line spacing, vertical gap etc) + added support of Enhanced Metafile (EMF) images in Rich Text (RTF), Open Office (ODS), Excel (XLS) exports + added OnAfterScriptCompile event + added onLoadRecentFile Event + added C++ Builder demos + added hot-key Ctrl + mouseWheel - Change scale in designer + added TfrxMemoView.AnsiText property - fixed bug in RTF export with EMF pictures in OpenOffice Writer - fixed some multi-thread isuues in engine, PDF, ODF exports - [enterprise] fixed integrated template of report navigator - [enterprise] fixed bug with export in Internet Explorer browser - fixed bug with font size of dot-matix reports in Excel and XML exports - fixed bug in e-mail export with many addresses - fixed bug in XLS export (with fast export unchecked and image object is null) - [enterprise] fixed bug in TfrxReportServer.OnGetVariables event - fixed bug in Calcl function - fixed memory leak in Cross editor - fixed progress bar and find dialog bug in DualView - fixed bug in PostNET and ean13 barcodes - fixed bug with TruncOutboundText in Dot Matrix report - fixed bugs with break points in syntaxis memo - improved BeforeConnect event in ADO - fixed bug in inhehited report with internal dataset - fixed bug in TfrxPanelControl with background color(Delphi 2005 and above) version 4.6 --------------- + added & , < , > to XML reader + added <nowrap> tag, the text concluded in tag is not broken by WordWrap, it move entirely + added ability to move band without objects (Alt + Move) + added ability to output pages in the preview from right to left ("many pages" mode), for RTL languages(PreviewOptions.RTLPreview) + added ability to storing picture cache in "temp" file (PreviewOptions.PictureCacheInFile) + added EngineOptions.UseGlobalDataSetList (added for multi-thread applications) - set it to False if you don't want use Global DataSet list(use Report.EnabledDataSet.Add() to add dataset in local list) + added new property Hint for all printed objects, hints at the dialog objects now shows in StatusBar + added new property TfrxDBLookupComboBox.AutoOpenDataSet (automatically opens the attached dataset after onActivate event) + added new property TfrxReportPage.PageCount like TfrxDataBand.RowCount + added new property WordWrap for dialog buttons (Delphi 7 and above). + added sort by name to data tree + added TfrxDesigner.TemplatesExt property + added TfrxStyles class in script rtti + changes in the Chart editor: ability to change the name of the series, ability to move created series, other small changes + [enterprise] added configurations values refresh in run-time + [enterprise] added new demo \Demos\ClientServer\ISAPI + [enterprise] added output to server printers from user browser (see config.xml "AllowPrint", set to "no" by default), note: experimental feature + [enterprise] added reports list refresh in run-time + [enterprise] added templates feature + [enterprise] improved speed and stability + [fs] added TfsScript.IncludePath property + [fs] added TfsScript.UseClassLateBinding property + [fs] fixed type casting from variant(string) to integer/float - changes in report inherit: FR get relative path from current loaded report(old reports based on application path works too) - corrected module for converting reports from Report Builder - fixed bug in CrossTab when set charset different from DEFAULT_CHARSET - fixed bug in RTF export with some TfrxRichView objects - fixed bug when print on landscape orientation with custom paper size - fixed bug when use network path for parent report - fixed bug with Band.Allowslit = True and ColumnFooter - fixed bug with drawing subreport on stretched band - fixed bug with embedded fonts in PDF export - fixed bug with long ReportTitle + Header + MaterData.KeepHeader = true - fixed bug with minimizing of Modal designer in BDS2005 and above - fixed bug with paths in HTML export - fixed bug with RTL in PDF export - fixed bug with SubReport in multi column page - fixed bug with Subreport.PrintOnParent = true in inherited report - fixed bug with SYMBOL_CHARSET in PDF export - fixed bug with the addition of datasets by inheritance report - fixed bug with width calculation when use HTML tags in memo - fixed compatibility with WideStrings module in BDS2006/2007 - fixed flicking in preview when use OnClickObject event - fixed free space calculation when use PrintOnPreviousPage - fixed preview bug with winXP themes and in last update - fixed subreports inherit - Thumbnail and Outline shows at right side for RTL languages - [fs] fixed bug with late binding version 4.5 --------------- + added ConverterRB2FR.pas unit for converting reports from Report Builder to Fast Report + added ConverterQR2FR.pas unit for converting reports from QuickReport to FastReport + added support of multiple attachments in e-mail export (html with images as example) + added support of unicode (UTF-8) in e-mail export + added ability to change templates path in designer + added OnReportPrint script event + added PNG support in all version (start from Basic) + added TfrxDMPMemoView.TruncOutboundText property - truncate outbound text in matrix report when WordWrap=false + added new frames styles fsAltDot and fsSquare + added new event OnPreviewDblClick in all TfrxView components + added ability to call dialogs event after report run when set DestroyForms = false + added ability to change AllowExpressions and HideZeros properties in cross Cells (default=false) + added IgnoreDupParams property to DB components + added auto open dataset in TfrxDBLookupComboBox + added new property TfrxADOQuery.LockType + added define DB_CAT (frx.inc) for grouping DB components + added TfrxPictureView.HightQuality property(draw picture in preview with hight quality, but slow down drawing procedure) + [FRViewer] added comandline options "/print filename" and "/silent_print filename" + added unicode input support in RichEditor + added new define HOOK_WNDPROC_FOR_UNICODE (frx.inc) - set hook on GetMessage function for unicode input support in D4-D7/BCB4-BCB6 + added ability chose path to FIB packages in "Recompile Wizard" + added new function TfrxPreview.GetTopPosition, return a position on current preview page + added new hot-keys to Code Editor - Ctrl+Del delete the word before cursor, Ctrl+BackSpace delete the word after cursor(as in Delhi IDE) + added "MDI Designer" example - all language resources moved to UTF8, XML - fixed bug with html tags [sup] and [sub] - fixed width calculation in TfrxMemoView when use HTML tags - fixed bug with suppressRepeated in Vertical bands - fixed bug when designer not restore scrollbars position after undo/redo - fixed visual bug in toolbars when use Windows Vista + XPManifest + Delphi 2006 - fixed bug in CalcHeight when use negative LineSpace - fixed bug in frx2xto30 when import query/table components, added import for TfrDBLookupControl component - fixed bug with Cross and TfrxHeader.ReprintOnNewPage = true - fixed converting from unicode in TfrxMemoView when use non default charset - [fs] fixed bug with "in" operator - fixed bug with aggregate function SUM - fixed bug when use unicode string with [TotalPages#] in TfrxMemoView - fixed bug with TSQLTimeStampField field type - fixed designer dock-panels("Object Inspector", "Report Tree", "Data Tree") when use designer as MDI or use several non-modal designer windows - fixed bug with hide/show dock-panels("Object Inspector", "Report Tree", "Data Tree"), now it restore size after hiding - fixed bug in XML/XLS export - wrong encode numbers in memo after CR/LF - fiexd bug in RTF export - fixed bug with undo/redo commands in previewPages designer - fixed bug with SuppressRepeated when use KeepTogether in group - fixed bug with SuppressRepeated on new page all events fired twice(use Engine.SecondScriptcall to determinate it) version 4.4 --------------- + added support for CodeGear RAD Studio 2007 + improved speed of PDF, HTML, RTF, XML, ODS, ODT exports + added TfrxReportPage.BackPictureVisible, BackPicturePrintable properties + added rtti for the TfrxCrossView.CellFunctions property + added properties TfrxPDFExport.Keywords, TfrxPDFExport.Producer, TfrxPDFExport.HideToolbar, TfrxPDFExport.HideMenubar, TfrxPDFExport.HideWindowUI, TfrxPDFExport.FitWindow, TfrxPDFExport.CenterWindow, TfrxPDFExport.PrintScaling + added ability recompile frxFIB packages in "recompile wizard" + added ability to set color property for all teechart series which support it + added, setting frame style for each frame line in style editor + added TfrxPreview.Locked property and TfrxPreview.DblClick event + added 'invalid password' exception when load report without crypt + added new parameter to InheritFromTemplate (by default = imDefault) imDefault - show Error dialog, imDelete - delete duplicates, imRename - rename duplicates + added property TfrxRTFExport.AutoSize (default is "False") for set vertical autosize in table cells * redesigned dialog window of PDF export * improved WYSIWYG in PDF export - fixed bug, the PageFooter band overlap the ReportSummary band when use EndlessHeight - fixed bug with lage paper height in preview - fixed bug with outline and encryption in PDF export - fixed bug with solid arrows in PDF export - fixed bug when print TfrxHeader on a new page if ReprintOnNewPage = true and KeepFooter = True - fixed bug when used AllowSplit and TfrxGroupHeader.KeepTogether - fixed page numbers when print dotMatrix report without dialog - fixed bug with EndlessHeight in multi-columns report - fixed font dialog in rich editor - [fs] fixed bug when create TWideStrings in script code - fixed bug with dialog form when set TfrxButtonControl.Default property to True - fixed twice duplicate name error in PreviewPages designer when copy - past object - fixed bug with Preview.Clear and ZmWholePage mode - fixed bug with using "outline" together "embedded fonts" options in PDF export - fixed multi-thread bug in PDF export - fixed bug with solid fill of transparent rectangle shape in PDF export - fixed bug with export OEM_CODEPAGE in RTF, Excel exports - fixed bug with vertical size of single page in RTF export - fixed bug with vertical arrows in PDF export - fixed memory leak with inherited reports version 4.3 --------------- + added support for C++Builder 2007 + added encryption in PDF export + added TeeChart Pro 8 support + added support of OEM code page in PDF export + added TfrxReport.CaseSensitiveExpressions property + added "OverwritePrompt" property in all export components + improved RTF export (WYSIWYG) + added support of thai and vietnamese charsets in PDF export + added support of arrows in PDF export * at inheritance of the report the script from the report of an ancestor is added to the current report (as comments) * some changes in PDF export core - fixed bug with number formats in Open Document Spreadsheet export - fixed bug when input text in number property(Object Inspector) and close Designer(without apply changes) - fixed bug in TfrxDBDataset with reCurrent - fixed bug with memory leak in export of empty outline in PDF format - line# fix (bug with subreports) - fixed bug with edit prepared report with rich object - fixed bug with shadows in PDF export - fixed bug with arrows in designer - fixed bug with margins in HTML, RTF, XLS, XML exports - fixed bug with arrows in exports - fixed bug with printers enumeration in designer (list index of bound) - fixed papersize bug in inherited reports version 4.2 --------------- + added support for CodeGear Delphi 2007 + added export of html tags in RTF format + improved split of the rich object + improved split of the memo object + added TfrxReportPage.ResetPageNumbers property + added support of underlines property in PDF export * export of the memos formatted as fkNumeric to float in ODS export - fixed bug keeptogether with aggregates - fixed bug with double-line draw in RTF export - fix multi-thread problem in PDF export - fixed bug with the shading of the paragraph in RTF export when external rich-text was inserted - fixed bug with unicode in xml/xls export - fixed bug in the crop of page in BMP, TIFF, Jpeg, Gif - "scale" printmode fixed - group & userdataset bugfix - fixed cross-tab pagination error - fixed bug with round brackets in PDF export - fixed bug with gray to black colors in RTF export - fixed outline with page.endlessheight - fixed SuppressRepeated & new page - fixed bug with long time export in text format - fixed bug with page range and outline in PDF export - fixed undo in code window - fixed error when call DesignReport twice - fixed unicode in the cross object - fixed designreportinpanel with dialog forms - fixed paste of DMPCommand object - fixed bug with the export of null images - fixed code completion bug - fixed column footer & report summary problem version 4.1 --------------- + added ability to show designer inside panel (TfrxReport.DesignReportInPanel method). See new demo Demos\EmbedDesigner + added TeeChart7 Std support + [server] added "User" parameter in TfrxReportServer.OnGetReport, TfrxReportServer.OnGetVariables and TfrxReportServer.OnAfterBuildReport events + added Cross.KeepTogether property + added TfrxReport.PreviewOptions.PagesInCache property - barcode fix (export w/o preview bug) - fixed bug in preview (AV with zoommode = zmWholePage) - fixed bug with outline + drilldown - fixed datasets in inherited report - [install] fixed bug with library path set up in BDS/Turbo C++ Builder installation - fixed pagefooter position if page.EndlessWidth is true - fixed shift bug - fixed design-time inheritance (folder issues) - fixed chm help file path - fixed embedded fonts in PDF - fixed preview buttons - fixed bug with syntax highlight - fixed bug with print scale mode - fixed bug with control.Hint - fixed edit preview page - fixed memory leak in cross-tab version 4.0 initial release --------------------- Report Designer: - new XP-style interface - the "Data" tab with all report datasets - ability to draw diagrams in the "Data" tab - code completion (Ctrl+Space) - breakpoints - watches - report templates - local guidelines (appears when you move or resize an object) - ability to work in non-modal mode, mdi child mode Report Preview: - thumbnails Print: - split a big page to several small pages - print several small pages on one big - print a page on a specified sheet (with scale) - duplex handling from print dialogue - print copy name on each printed copy (for example, "First copy", "Second copy") Report Core: - "endless page" mode - images handling, increased speed - the "Reset page numbers" mode for groups - reports crypting (Rijndael algorithm) - report inheritance (both file-based and dfm-based) - drill-down groups - frxGlobalVariables object - "cross-tab" object enhancements: - improved cells appearance - cross elements visible in the designer - fill corner (ShowCorner property) - side-by-side crosstabs (NextCross property) - join cells with the same value (JoinEqualCells property) - join the same string values in a cell (AllowDuplicates property) - ability to put an external object inside cross-tab - AddWidth, AddHeight properties to increase width&height of the cell - AutoSize property, ability to resize cells manually - line object can have arrows - added TfrxPictureView.FileLink property (can contain variable or a file name) - separate settings for each frame line (properties Frame.LeftLine, TopLine, RightLine, BottomLine can be set in the object inspector) - PNG images support (uncomment {$DEFINE PNG} in the frx.inc file) - Open Document Format for Office Applications (OASIS) exports, spreadsheet (ods) and text (odt) Enterprise components: - Users/Groups security support (see a demo application Demos\ClientServer\UserManager) - Templates support - Dynamically refresh of configuration, users/groups D2010以上版本(D14_D19)安装必读 delphi2010以上版本(D14_D19)使用者安装时,请将res\frccD14_D19.exe更名名为frcc.exe frccD14_D19.exe是专门的delphi2010以上版本(D14_D19)编码器。其他低delphi版本,请使用frcc.exe
Contents Overview 1 Lesson 1: Index Concepts 3 Lesson 2: Concepts – Statistics 29 Lesson 3: Concepts – Query Optimization 37 Lesson 4: Information Collection and Analysis 61 Lesson 5: Formulating and Implementing Resolution 75 Module 6: Troubleshooting Query Performance Overview At the end of this module, you will be able to:  Describe the different types of indexes and how indexes can be used to improve performance.  Describe what statistics are used for and how they can help in optimizing query performance.  Describe how queries are optimized.  Analyze the information collected from various tools.  Formulate resolution to query performance problems. Lesson 1: Index Concepts Indexes are the most useful tool for improving query performance. Without a useful index, Microsoft® SQL Server™ must search every row on every page in table to find the rows to return. With a multitable query, SQL Server must sometimes search a table multiple times so each page is scanned much more than once. Having useful indexes speeds up finding individual rows in a table, as well as finding the matching rows needed to join two tables. What You Will Learn After completing this lesson, you will be able to:  Understand the structure of SQL Server indexes.  Describe how SQL Server uses indexes to find rows.  Describe how fillfactor can impact the performance of data retrieval and insertion.  Describe the different types of fragmentation that can occur within an index. Recommended Reading  Chapter 8: “Indexes”, Inside SQL Server 2000 by Kalen Delaney  Chapter 11: “Batches, Stored Procedures and Functions”, Inside SQL Server 2000 by Kalen Delaney Finding Rows without Indexes With No Indexes, A Table Must Be Scanned SQL Server keeps track of which pages belong to a table or index by using IAM pages. If there is no clustered index, there is a sysindexes row for the table with an indid value of 0, and that row will keep track of the address of the first IAM for the table. The IAM is a giant bitmap, and every 1 bit indicates that the corresponding extent belongs to the table. The IAM allows SQL Server to do efficient prefetching of the table’s extents, but every row still must be examined. General Index Structure All SQL Server Indexes Are Organized As B-Trees Indexes in SQL Server store their information using standard B-trees. A B-tree provides fast access to data by searching on a key value of the index. B-trees cluster records with similar keys. The B stands for balanced, and balancing the tree is a core feature of a B-tree’s usefulness. The trees are managed, and branches are grafted as necessary, so that navigating down the tree to find a value and locate a specific record takes only a few page accesses. Because the trees are balanced, finding any record requires about the same amount of resources, and retrieval speed is consistent because the index has the same depth throughout. Clustered and Nonclustered Indexes Both Index Types Have Many Common Features An index consists of a tree with a root from which the navigation begins, possible intermediate index levels, and bottom-level leaf pages. You use the index to find the correct leaf page. The number of levels in an index will vary depending on the number of rows in the table and the size of the key column or columns for the index. If you create an index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the index. On a qualified select, update, or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. A qualified operation is one that affects only specific rows that satisfy the conditions of a WHERE clause, as opposed to accessing the whole table. An index can have multiple node levels An index page above the leaf is called a node page. Each index row in node pages contains an index key (or set of keys for a composite index) and a pointer to a page at the next level for which the first key value is the same as the key value in the current index row. Leaf Level contains all key values In any index, whether clustered or nonclustered, the leaf level contains every key value, in key sequence. In SQL Server 2000, the sequence can be either ascending or descending. The sysindexes table contains all sizing, location and distribution information Any information about size of indexes or tables is stored in sysindexes. The only source of any storage location information is the sysindexes table, which keeps track of the address of the root page for every index, and the first IAM page for the index or table. There is also a column for the first page of the table, but this is not guaranteed to be reliable. SQL Server can find all pages belonging to an index or table by examining the IAM pages. Sysindexes contains a pointer to the first IAM page, and each IAM page contains a pointer to the next one. The Difference between Clustered and Nonclustered Indexes The main difference between the two types of indexes is how much information is stored at the leaf. The leaf levels of both types of indexes contain all the key values in order, but they also contain other information. Clustered Indexes The Leaf Level of a Clustered Index Is the Data The leaf level of a clustered index contains the data pages, not just the index keys. Another way to say this is that the data itself is part of the clustered index. A clustered index keeps the data in a table ordered around the key. The data pages in the table are kept in a doubly linked list called the page chain. The order of pages in the page chain, and the order of rows on the data pages, is the order of the index key or keys. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to. Uniqueness Is Maintained In Key Values In SQL Server 2000, all clustered indexes are unique. If you build a clustered index without specifying the unique keyword, SQL Server forces uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as an additional sort key to only the rows that have duplicates of their primary sort key. You can see this extra value if you use DBCC PAGE to look at the actual index rows the section on indexes internal. . Finding Rows in a Clustered Index The Leaf Level of a Clustered Index Contains the Data A clustered index is like a telephone directory in which all of the rows for customers with the same last name are clustered together in the same part of the book. Just as the organization of a telephone directory makes it easy for a person to search, SQL Server quickly searches a table with a clustered index. Because a clustered index determines the sequence in which rows are stored in a table, there can only be one clustered index for a table at a time. Performance Considerations Keeping your clustered key value small increases the number of index rows that can be placed on an index page and decreases the number of levels that must be traversed. This minimizes I/O. As we’ll see, the clustered key is duplicated in every nonclustered index row, so keeping your clustered key small will allow you to have more index fit per page in all your indexes. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname = ‘Ota’ Nonclustered Indexes The Leaf Level of a Nonclustered Index Contains a Bookmark A nonclustered index is like the index of a textbook. The data is stored in one place and the index is stored in another. Pointers indicate the storage location of the indexed items in the underlying table. In a nonclustered index, the leaf level contains each index key, plus a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. A bookmark can take one of two forms:  If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row. This clustered key can be multiple column if the clustered index is composite, or is defined to be non-unique.  If the table is a heap (in other words, it has no clustered index), the bookmark is a RID, which is an actual row locator in the form File#:Page#:Slot#. Finding Rows with a NC Index on a Heap Nonclustered Indexes Are Very Efficient When Searching For A Single Row After the nonclustered key at the leaf level of the index is found, only one more page access is needed to find the data row. Searching for a single row using a nonclustered index is almost as efficient as searching for a single row in a clustered index. However, if we are searching for multiple rows, such as duplicate values, or keys in a range, anything more than a small number of rows will make the nonclustered index search very inefficient. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname BETWEEN ‘Master’ AND ‘Rudd’ Finding Rows with a NC Index on a Clustered Table A Clustered Key Is Used as the Bookmark for All Nonclustered Indexes If the table has a clustered index, all columns of the clustered key will be duplicated in the nonclustered index leaf rows, unless there is overlap between the clustered and nonclustered key. For example, if the clustered index is on (lastname, firstname) and a nonclustered index is on firstname, the firstname value will not be duplicated in the nonclustered index leaf rows. Note The query corresponding to the slide is: SELECT lastname, firstname, phone FROM member WHERE firstname = ‘Mike’ Covering Indexes A Covering Index Provides the Fastest Data Access A covering index contains ALL the fields accessed in the query. Normally, only the columns in the WHERE clause are helpful in determining useful indexes, but for a covering index, all columns must be included. If all columns needed for the query are in the index, SQL Server never needs to access the data pages. If even one column in the query is not part of the index, the data rows must be accessed. The leaf level of an index is the only level that contains every key value, or set of key values. For a clustered index, the leaf level is the data itself, so in reality, a clustered index ALWAYS covers any query. Nevertheless, for most of our optimization discussions, we only consider nonclustered indexes. Scanning the leaf level of a nonclustered index is almost always faster than scanning a clustered index, so covering indexes are particular valuable when we need ALL the key values of a particular nonclustered index. Example: Select an aggregate value of a column with a clustered index. Suppose we have a nonclustered index on price, this query is covered: SELECT avg(price) from titles Since the clustered key is included in every nonclustered index row, the clustered key can be included in the covering. Suppose you have a nonclustered index on price and a clustered index on title_id; then this query is covered: SELECT title_id, price FROM titles WHERE price between 10 and 20 Performance Considerations In general, you do want to keep your indexes narrow. However, if you have a critical query that just is not giving you satisfactory performance no matter what you do, you should consider creating an index to cover it, or adding one or two extra columns to an existing index, so that the query will be covered. The leaf level of a nonclustered index is like a ‘mini’ clustered index, so you can have most of the benefits of clustering, even if there already is another clustered index on the table. The tradeoff to adding more, wider indexes for covering queries are the added disk space, and more overhead for updating those columns that are now part of the index. Bug In general, SQL Server will detect when a query is covered, and detect the possible covering indexes. However, in some cases, you must force SQL Server to use a covering index by including a WHERE clause, even if the WHERE clause will return ALL the rows in the table. This is SHILOH bug #352079 Steps to reproduce 1. Make copy of orders table from Northwind: USE Northwind CREATE TABLE [NewOrders] ( [OrderID] [int] NOT NULL , [CustomerID] [nchar] (5) NULL , [EmployeeID] [int] NULL , [OrderDate] [datetime] NULL , [RequiredDate] [datetime] NULL , [ShippedDate] [datetime] NULL , [ShipVia] [int] NULL , [Freight] [money] NULL , [ShipName] [nvarchar] (40) NULL, [ShipAddress] [nvarchar] (60) , [ShipCity] [nvarchar] (15) NULL, [ShipRegion] [nvarchar] (15) NULL, [ShipPostalCode] [nvarchar] (10) NULL, [ShipCountry] [nvarchar] (15) NULL ) INSERT into NewOrders SELECT * FROM Orders 2. Build nc index on OrderDate: create index dateindex on neworders(orderdate) 3. Test Query by looking at query plan: select orderdate from NewOrders The index is being scanned, as expected. 4. Build an index on orderId: create index orderid_index on neworders(orderID) 5. Test Query by looking at query plan: select orderdate from NewOrders Now the TABLE is being scanned, instead of the original index! Index Intersection Multiple Indexes Can Be Used On A Single Table In versions prior to SQL Server 7, only one index could be used for any table to process any single query. The only exception was a query involving an OR. In current SQL Server versions, multiple nonclustered indexes can each be accessed, retrieving a set of keys with bookmarks, and then the result sets can be joined on the common bookmarks. The optimizer weighs the cost of performing the unindexed join on the intermediate result sets, with the cost of only using one index, and then scanning the entire result set from that single index. Fillfactor and Performance Creating an Index with a Low Fillfactor Delays Page Splits when Inserting DBCC SHOWCONTIG will show you a low value for “Avg. Page Density” when a low fillfactor has been specified. This is good for inserts and updates, because it will delay the need to split pages to make room for new rows. It can be bad for scans, because fewer rows will be on each page, and more pages must be read to access the same amount of data. However, this cost will be minimal if the scan density value is good. Index Reorganization DBCC SHOWCONTIG Provides Lots of Information Here’s some sample output from running a basic DBCC SHOWCONTIG on the order details table in the Northwind database: DBCC SHOWCONTIG scanning 'Order Details' table... Table: 'Order Details' (325576198); index ID: 1, database ID:6 TABLE level scan performed. - Pages Scanned................................: 9 - Extents Scanned..............................: 6 - Extent Switches..............................: 5 - Avg. Pages per Extent........................: 1.5 - Scan Density [Best Count:Actual Count].......: 33.33% [2:6] - Logical Scan Fragmentation ..................: 0.00% - Extent Scan Fragmentation ...................: 16.67% - Avg. Bytes Free per Page.....................: 673.2 - Avg. Page Density (full).....................: 91.68% By default, DBCC SHOWCONTIG scans the page chain at the leaf level of the specified index and keeps track of the following values:  Average number of bytes free on each page (Avg. Bytes Free per Page)  Number of pages accessed (Pages scanned)  Number of extents accessed (Extents scanned)  Number of times a page had a lower page number than the previous page in the scan (This value for Out of order pages is not displayed, but is used for additional computations.)  Number of times a page in the scan was on a different extent than the previous page in the scan (Extent switches) SQL Server also keeps track of all the extents that have been accessed, and then it determines how many gaps are in the used extents. An extent is identified by the page number of its first page. So, if extents 8, 16, 24, 32, and 40 make up an index, there are no gaps. If the extents are 8, 16, 24, and 40, there is one gap. The value in DBCC SHOWCONTIG’s output called Extent Scan Fragmentation is computed by dividing the number of gaps by the number of extents, so in this example the Extent Scan Fragmentation is ¼, or 25 percent. A table using extents 8, 24, 40, and 56 has three gaps, and its Extent Scan Fragmentation is ¾, or 75 percent. The maximum number of gaps is the number of extents - 1, so Extent Scan Fragmentation can never be 100 percent. The value in DBCC SHOWCONTIG’s output called Logical Scan Fragmentation is computed by dividing the number of Out of order pages by the number of pages in the table. This value is meaningless in a heap. You can use either the Extent Scan Fragmentation value or the Logical Scan Fragmentation value to determine the general level of fragmentation in a table. The lower the value, the less fragmentation there is. Alternatively, you can use the value called Scan Density, which is computed by dividing the optimum number of extent switches by the actual number of extent switches. A high value means that there is little fragmentation. Scan Density is not valid if the table spans multiple files; therefore, it is less useful than the other values. SQL Server 2000 allows online defragmentation You can choose from several methods for removing fragmentation from an index. You could rebuild the index and have SQL Server allocate all new contiguous pages for you. To rebuild the index, you can use a simple DROP INDEX and CREATE INDEX combination, but in many cases using these commands is less than optimal. In particular, if the index is supporting a constraint, you cannot use the DROP INDEX command. Alternatively, you can use DBCC DBREINDEX, which can rebuild all the indexes on a table in one operation, or you can use the drop_existing clause along with CREATE INDEX. The drawback of these methods is that the table is unavailable while SQL Server is rebuilding the index. When you are rebuilding only nonclustered indexes, SQL Server takes a shared lock on the table, which means that users cannot make modifications, but other processes can SELECT from the table. Of course, those SELECT queries cannot take advantage of the index you are rebuilding, so they might not perform as well as they would otherwise. If you are rebuilding a clustered index, SQL Server takes an exclusive lock and does not allow access to the table, so your data is temporarily unavailable. SQL Server 2000 lets you defragment an index without completely rebuilding it. DBCC INDEXDEFRAG reorders the leaf-level pages into physical order as well as logical order, but using only the pages that are already allocated to the leaf level. This command does an in-place ordering, which is similar to a sorting technique called bubble sort (you might be familiar with this technique if you've studied and compared various sorting algorithms). In-place ordering can reduce logical fragmentation to 2 percent or less, making an ordered scan through the leaf level much faster. DBCC INDEXDEFRAG also compacts the pages of an index, based on the original fillfactor. The pages will not always end up with the original fillfactor, but SQL Server uses that value as a goal. The defragmentation process attempts to leave at least enough space for one average-size row on each page. In addition, if SQL Server cannot obtain a lock on a page during the compaction phase of DBCC INDEXDEFRAG, it skips the page and does not return to it. Any empty pages created as a result of compaction are removed. The algorithm SQL Server 2000 uses for DBCC INDEXDEFRAG finds the next physical page in a file belonging to the index's leaf level and the next logical page in the leaf level to swap it with. To find the next physical page, the algorithm scans the IAM pages belonging to that index. In a database spanning multiple files, in which a table or index has pages on more than one file, SQL Server handles pages on different files separately. SQL Server finds the next logical page by scanning the index's leaf level. After each page move, SQL Server drops all locks and saves the last key on the last page it moved. The next iteration of the algorithm uses the last key to find the next logical page. This process lets other users update the table and index while DBCC INDEXDEFRAG is running. Let us look at an example in which an index's leaf level consists of the following pages in the following logical order: 47 22 83 32 12 90 64 The first key is on page 47, and the last key is on page 64. SQL Server would have to scan the pages in this order to retrieve the data in sorted order. As its first step, DBCC INDEXDEFRAG would find the first physical page, 12, and the first logical page, 47. It would then swap the pages, using a temporary buffer as a holding area. After the first swap, the leaf level would look like this: 12 22 83 32 47 90 64 The next physical page is 22, which is also the next logical page, so no work would be necessary. DBCC INDEXDEFRAG would then swap the next physical page, 32, with the next logical page, 83: 12 22 32 83 47 90 64 After the next swap of 47 with 83, the leaf level would look like this: 12 22 32 47 83 90 64 Then, the defragmentation process would swap 64 with 83: 12 22 32 47 64 90 83 and 83 with 90: 12 22 32 47 64 83 90 At the end of the DBCC INDEXDEFRAG operation, the pages in the table or index are not contiguous, but their logical order matches their physical order. Now, if the pages were accessed from disk in sorted order, the head would need to move in only one direction. Keep in mind that DBCC INDEXDEFRAG uses only pages that are already part of the index's leaf level; it allocates no new pages. In addition, defragmenting a large table can take quite a while, and you will get a report every 5 minutes about the estimated percentage completed. However, except for the locks on the pages being switched, this command needs no additional locks. All the table's other pages and indexes are fully available for your applications to use during the defragmentation process. If you must completely rebuild an index because you want a new fillfactor, or if simple defragmentation is not enough because you want to remove all fragmentation from your indexes, another SQL Server 2000 improvement makes index rebuilding less of an imposition on the rest of the system. SQL Server 2000 lets you create an index in parallel—that is, using multiple processors—which drastically reduces the time necessary to perform the rebuild. The algorithm SQL Server 2000 uses, allows near-linear scaling with the number of processors you use for the rebuild, so four processors will take only one-fourth the time that one processor requires to rebuild an index. System availability increases because the length of time that a table is unavailable decreases. Note that only the SQL Server 2000 Enterprise Edition supports parallel index creation. Indexes on Views and Computed Columns Building an Index Gives the Data Physical Existence Normally, views are only logical and the rows comprising the view’s data are not generated until the view is accessed. The values for computed columns are typically not stored anywhere in the database; only the definition for the computation is stored and the computation is redone every time a computed column is accessed. The first index on a view must be a clustered index, so that the leaf level can hold all the actual rows that make up the view. Once that clustered index has been build, and the view’s data is now physical, additional (nonclustered) indexes can be built. An index on a computed column can be nonclustered, because all we need to store is the index key values. Common Prerequisites for Indexed Views and Indexes on Computed Columns In order for SQL Server to create use these special indexes, you must have the seven SET options correctly specified: ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNING must be all ON NUMERIC_ROUNDABORT must be OFF Only deterministic expressions can be used in the definition of Indexed Views or indexes on Computed Columns. See the BOL for the list of deterministic functions and expressions. Property functions are available to check if a column or view meets the requirements and is indexable. SELECT OBJECTPROPERTY (Object_id, ‘IsIndexable’) SELECT COLUMNPROPERTY (Object_id, column_name , ‘IsIndexable’ ) Schema Binding Guarantees That Object Definition Won’t Change A view can only be indexed if it has been built with schema binding. The SQL Server Optimizer Determines If the Indexed View Can Be Used The query must request a subset of the data contained in the view. The ability of the optimizer to use the indexed view even if the view is not directly referenced is available only in SQL Server 2000 Enterprise Edition. In Standard edition, you can create indexed views, and you can select directly from them, but the optimizer will not choose to use them if they are not directly referenced. Examples of Indexed Views: The best candidates for improvement by indexed views are queries performing aggregations and joins. We will explain how the useful indexed views may be created for these two major groups of queries. The considerations are valid also for queries and indexed views using both joins and aggregations. -- Example: USE Northwind -- Identify 5 products with overall biggest discount total. -- This may be expressed for example by two different queries: -- Q1. select TOP 5 ProductID, SUM(UnitPrice*Quantity)- SUM(UnitPrice*Quantity*(1.00-Discount)) Rebate from [order details] group by ProductID order by Rebate desc --Q2. select TOP 5 ProductID, SUM(UnitPrice*Quantity*Discount) Rebate from [order details] group by ProductID order by Rebate desc --The following indexed view will be used to execute Q1. create view Vdiscount1 with schemabinding as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered index VDiscountInd on Vdiscount1 (ProductID) However, it will not be used by the Q2 because the indexed view does not contain the SUM(UnitPrice*Quantity*Discount) aggregate. We can construct another indexed view create view Vdiscount2 with schemabinding as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, SUM(UnitPrice*Quantity*Discount) SumDiscoutPrice2, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered index VDiscountInd on Vdiscount2 (ProductID) This view may be used by both Q1 and Q2. Observe that the indexed view Vdiscount2 will have the same number of rows and only one more column compared to Vdiscount1, and it may be used by more queries. In general, try to design indexed views that may be used by more queries. The following query asking for the order with the largest total discount -- Q3. select TOP 3 OrderID, SUM(UnitPrice*Quantity*Discount) OrderRebate from dbo.[order details] group By OrderID Q3 can use neither of the Vdiscount views because the column OrderID is not included in the view definition. To address this variation of the discount analysis query we may create a different indexed view, similar to the query itself. An attempt to generalize the previous indexed view Vdiscount2 so that all three queries Q1, Q2, and Q3 can take advantage of a single indexed view would require a view with both OrderID and ProductID as grouping columns. Because the OrderID, ProductID combination is unique in the original order details table the resulting view would have as many rows as the original table and we would see no savings in using such view compared to using the original table. Consider the size of the resulting indexed view. In the case of pure aggregation, the indexed view may provide no significant performance gains if its size is close to the size of the original table. Complex aggregates (STDEV, VARIANCE, AVG) cannot participate in the index view definition. However, SQL Server may use an indexed view to execute a query containing AVG aggregate. Query containing STDEV or VARIANCE cannot use indexed view to pre-compute these values. The next example shows a query producing the average price for a particular product -- Q4. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID group by ProductName, od.ProductID This is an example of indexed view that will be considered by the SQL Server to answer the Q4 create view v3 with schemabinding as select od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) Price, COUNT_BIG(*) Count, SUM(od.Quantity) Units from dbo.[order details] od group by od.ProductID go create UNIQUE CLUSTERED index iv3 on v3 (ProductID) go Observe that the view definition does not contain the table Products. The indexed view does not need to contain all tables used in the query that uses the indexed view. In addition, the following query (same as above Q4 only with one additional search condition) will use the same indexed view. Observe that the added predicate references only columns from tables not present in the v3 view definition. -- Q5. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and p.ProductName like '%tofu%' group by ProductName, od.ProductID The following query cannot use the indexed view because the added search condition od.UnitPrice>10 contains a column from the table in the view definition and the column is neither grouping column nor the predicate appears in the view definition. -- Q6. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID To contrast the Q6 case, the following query will use the indexed view v3 since the added predicate is on the grouping column of the view v3. -- Q7. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.ProductID in (1,2,13,41) group by ProductName, od.ProductID -- The previous query Q6 will use the following indexed view V4: create view V4 with schemabinding as select ProductName, od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units, COUNT_BIG(*) Count from dbo.[order details] od, dbo.Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID create unique clustered index VDiscountInd on V4 (ProductName, ProductID) The same index on the view V4 will be used also for a query where a join to the table Orders is added, for example -- Q8. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 group by ProductName, od.ProductID We will show several modifications of the query Q8 and explain why such modifications cannot use the above view V4. -- Q8a. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>25 group by ProductName, od.ProductID 8a cannot use the indexed view because of the where clause mismatch. Observe that table Orders does not participate in the indexed view V4 definition. In spite of that, adding a predicate on this table will disallow using the indexed view because the added predicate may eliminate additional rows participating in the aggregates as it is shown in Q8b. -- Q8b. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 and o.OrderDate>'01/01/1998' group by ProductName, od.ProductID Locking and Indexes In General, You Should Let SQL Server Control the Locking within Indexes The stored procedure sp_indexoption lets you manually control the unit of locking within an index. It also lets you disallow page locks or row locks within an index. Since these options are available only for indexes, there is no way to control the locking within the data pages of a heap. (But remember that if a table has a clustered index, the data pages are part of the index and are affected by the sp_indexoption setting.) The index options are set for each table or index individually. Two options, Allow Rowlocks and AllowPageLocks, are both set to TRUE initially for every table and index. If both of these options are set to FALSE for a table, only full table locks are allowed. As described in Module 4, SQL Server determines at runtime whether to initially lock rows, pages, or the entire table. The locking of rows (or keys) is heavily favored. The type of locking chosen is based on the number of rows and pages to be scanned, the number of rows on a page, the isolation level in effect, the update activity going on, the number of users on the system needing memory for their own purposes, and so on. SAP databases frequently use sp_indexoption to reduce deadlocks Setting vs. Querying In SQL Server 2000, the procedure sp_indexoption should only be used for setting an index option. To query an option, use the INDEXPROPERTY function. Lesson 2: Concepts – Statistics Statistics are the most important tool that the SQL Server query optimizer has to determine the ideal execution plan for a query. Statistics that are out of date or nonexistent seriously jeopardize query performance. SQL Server 2000 computes and stores statistics in a completely different format that all earlier versions of SQL Server. One of the improvements is an increased ability to determine which values are out of the normal range in terms of the number of occurrences. The new statistics maintenance routines are particularly good at determining when a key value has a very unusual skew of data. What You Will Learn After completing this lesson, you will be able to:  Define terms related to statistics collected by SQL Server.  Describe how statistics are maintained by SQL Server.  Discuss the autostats feature of SQL Server.  Describe how statistics are used in query optimization. Recommended Reading  Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/techart/statquery.htm Definitions Cardinality The cardinality means how many unique values exist in the data. Density For each index and set of column statistics, SQL Server keeps track of details about the uniqueness (or density) of the data values encountered, which provides a measure of how selective the index is. A unique index, of course, has the lowest density —by definition, each index entry can point to only one row. A unique index has a density value of 1/number of rows in the table. Density values range from 0 through 1. Highly selective indexes have density values of 0.10 or lower. For example, a unique index on a table with 8345 rows has a density of 0.00012 (1/8345). If a nonunique nonclustered index has a density of 0.2165 on the same table, each index key can be expected to point to about 1807 rows (0.2165 × 8345). This is probably not selective enough to be more efficient than just scanning the table, so this index is probably not useful. Because driving the query from a nonclustered index means that the pages must be retrieved in index order, an estimated 1807 data page accesses (or logical reads) are needed if there is no clustered index on the table and the leaf level of the index contains the actual RID of the desired data row. The only time a data page doesn’t need to be reaccessed is when the occasional coincidence occurs in which two adjacent index entries happen to point to the same data page. In general, you can think of density as the average number of duplicates. We can also talk about the term ‘join density’, which applies to the average number of duplicates in the foreign key column. This would answer the question: in this one-to-many relationship, how many is ‘many’? Selectivity In general selectivity applies to a particular data value referenced in a WHERE clause. High selectivity means that only a small percentage of the rows satisfy the WHERE clause filter, and a low selectivity means that many rows will satisfy the filter. For example, in an employees table, the column employee_id is probably very selective, and the column gender is probably not very selective at all. Statistics Statistics are a histogram consisting of an even sampling of values for a column or for an index key (or the first column of the key for a composite index) based on the current data. The histogram is stored in the statblob field of the sysindexes table, which is of type image. (Remember that image data is actually stored in structures separate from the data row itself. The data row merely contains a pointer to the image data. For simplicity’s sake, we’ll talk about the index statistics as being stored in the image field called statblob.) To fully estimate the usefulness of an index, the optimizer also needs to know the number of pages in the table or index; this information is stored in the dpages column of sysindexes. During the second phase of query optimization, index selection, the query optimizer determines whether an index exists for a columns in your WHERE clause, assesses the index’s usefulness by determining the selectivity of the clause (that is, how many rows will be returned), and estimates the cost of finding the qualifying rows. Statistics for a single column index consist of one histogram and one density value. The multicolumn statistics for one set of columns in a composite index consist of one histogram for the first column in the index and density values for each prefix combination of columns (including the first column alone). The fact that density information is kept for all columns helps the optimizer decide how useful the index is for joins. Suppose, for example, that an index is composed of three key fields. The density on the first column might be 0.50, which is not too useful. However, as you look at more key columns in the index, the number of rows pointed to is fewer than (or in the worst case, the same as) the first column, so the density value goes down. If you are looking at both the first and second columns, the density might be 0.25, which is somewhat better. Moreover, if you examine three columns, the density might be 0.03, which is highly selective. It does not make sense to refer to the density of only the second column. The lead column density is always needed. Statistics Maintenance Statistics Information Tracks the Distribution of Key Values SQL Server statistics is basically a histogram that contains up to 200 values of a given key column. In addition to the histogram, the statblob field contains the following information:  The time of the last statistics collection  The number of rows used to produce the histogram and density information  The average key length  Densities for other combinations of columns In the statblob column, up to 200 sample values are stored; the range of key values between each sample value is called a step. The sample value is the endpoint of the range. Three values are stored along with each step: a value called EQ_ROWS, which is the number of rows that have a value equal to that sample value; a value called RANGE_ROWS, which specifies how many other values are inside the range (between two adjacent sample values); and the number of distinct values, or RANGE_DENSITY of the range. DBCC SHOW_STATISTICS The DBCC SHOW_STATISTICS output shows us the first two of these three values, but not the range density. The RANGE_DENSITY is instead used to compute two additional values:  DISTINCT_RANGE_ROWS—the number of distinct rows inside this range (not counting the RANGE_HI_KEY value itself. This is computed as 1/RANGE_DENSITY.  AVG_RANGE_ROWS—the average number of rows per distinct value, computed as RANGE_DENSITY * RANGE_ROWS. In addition to statistics on indexes, SQL Server can also keep track of statistics on columns with no indexes. Knowing the density, or the likelihood of a particular value occurring, can help the optimizer determine an optimum processing strategy, even if SQL Server can’t use an index to actually locate the values. Statistics on Columns Column statistics can be useful for two main purposes  When the SQL Server optimizer is determining the optimal join order, it frequently is best to have the smaller input processed first. By ‘input’ we mean table after all filters in the WHERE clause have been applied. Even if there is no useful index on a column in the WHERE clause, statistics could tell us that only a few rows will quality, and those the resulting input will be very small.  The SQL Server query optimizer can use column statistics on non-initial columns in a composite nonclustered index to determine if scanning the leaf level to obtain the bookmarks will be an efficient processing strategy. For example, in the member table in the credit database, the first name column is almost unique. Suppose we have a nonclustered index on (lastname, firstname), and we issue this query: select * from member where firstname = 'MPRO' In this case, statistics on the firstname column would indicate very few rows satisfying this condition, so the optimizer will choose to scan the nonclustered index, since it is smaller than the clustered index (the table). The small number of bookmarks will then be followed to retrieve the actual data. Manually Updating Statistics You can also manually force statistics to be updated in one of two ways. You can run the UPDATE STATISTICS command on a table or on one specific index or column statistics, or you can also execute the procedure sp_updatestats, which runs UPDATE STATISTICS against all user-defined tables in the current database. You can create statistics on unindexed columns using the CREATE STATISTICS command or by executing sp_createstats, which creates single-column statistics for all eligible columns for all user tables in the current database. This includes all columns except computed columns and columns of the ntext, text, or image datatypes, and columns that already have statistics or are the first column of an index. Autostats By Default SQL Server Will Update Statistics on Any Index or Column as Needed Every database is created with the database options auto create statistics and auto update statistics set to true, but you can turn either one off. You can also turn off automatic updating of statistics for a specific table in one of two ways:  UPDATE STATISTICS In addition to updating the statistics, the option WITH NORECOMPUTE indicates that the statistics should not be automatically recomputed in the future. Running UPDATE STATISTICS again without the WITH NORECOMPUTE option enables automatic updates.  sp_autostats This procedure sets or unsets a flag for a table to indicate that statistics should or should not be updated automatically. You can also use this procedure with only the table name to find out whether the table is set to automatically have its index statistics updated. ' However, setting the database option auto update statistics to FALSE overrides any individual table settings. In other words, no automatic updating of statistics takes place. This is not a recommended practice unless thorough testing has shown you that you do not need the automatic updates or that the performance overhead is more than you can afford. Trace Flags Trace flag 205 – reports recompile due to autostats. Trace flag 8721 – writes information to the errorlog when AutoStats has been run. For more information, see the following Knowledge Base article: Q195565 “INF: How SQL Server 7.0 Autostats Work.” Statistics and Performance The Performance Penalty of NOT Having Up-To-Date Statistics Far Outweighs the Benefit of Avoiding Automatic Updating Autostats should be turned off only after thorough testing shows it to be necessary. Because autostats only forces a recompile after a certain number or percentage of rows has been changed, you do not have to make any adjustments for a read-only database. Lesson 3: Concepts – Query Optimization What You Will Learn After completing this lesson, you will be able to:  Describe the phases of query optimization.  Discuss how SQL Server estimates the selectivity of indexes and column and how this estimate is used in query optimization. Recommended Reading  Chapter 15: “The Query Processor”, Inside SQL Server 2000 by Kalen Delaney  Chapter 16: “Query Tuning”, Inside SQL Server 2000 by Kalen Delaney  Whitepaper about SQL Server Query Processor Architecture by Hal Berenson and Kalen Delaney http://msdn.microsoft.com/library/backgrnd/html/sqlquerproc.htm Phases of Query Optimization Query Optimization Involves several phases Trivial Plan Optimization Optimization itself goes through several steps. The first step is something called Trivial Plan Optimization. The whole idea of trivial plan optimization is that cost based optimization is a bit expensive to run. The optimizer can try a great many possible variations trying to find the cheapest plan. If SQL Server knows that there is only one really viable plan for a query, it could avoid a lot of work. A prime example is a query that consists of an INSERT with a VALUES clause. There is only one possible plan. Another example is a SELECT where all the columns are in a unique covering index, and that index is the only one that is useable. There is no other index that has that set of columns in it. These two examples are cases where SQL Server should just generate the plan and not try to find something better. The trivial plan optimizer finds the really obvious plans, which are typically very inexpensive. In fact, all the plans that get through the autoparameterization template result in plans that the trivial plan optimizer can find. Between those two mechanisms, the plans that are simple tend to be weeded out earlier in the process and do not pay a lot of the compilation cost. This is a good thing, because the number of potential plans in 7.0 went up astronomically as SQL Server added hash joins, merge joins and index intersections, to its list of processing techniques. Simplification and Statistics Loading If a plan is not found by the trivial plan optimizer, SQL Server can perform some simplifications, usually thought of as syntactic transformations of the query itself, looking for commutative properties and operations that can be rearranged. SQL Server can do constant folding, and other operations that do not require looking at the cost or analyzing what indexes are, but that can result in a more efficient query. SQL Server then loads up the metadata including the statistics information on the indexes, and then the optimizer goes through a series of phases of cost based optimization. Cost Based Optimization Phases The cost based optimizer is designed as a set of transformation rules that try various permutations of indexes and join strategies. Because of the number of potential plans in SQL Server 7.0 and SQL Server 2000, if the optimizer just ran through all the combinations and produced a plan, the optimization process would take a very long time to run. Therefore, optimization is broken up into phases. Each phase is a set of rules. After each phase is run, the cost of any resulting plan is examined, and if SQL Server determines that the plan is cheap enough, that plan is kept and executed. If the plan is not cheap enough, the optimizer runs the next phase, which is another set of rules. In the vast majority of cases, a good plan will be found in the preliminary phases. Typically, if the plan that a query would have had in SQL Server 6.5 is also the optimal plan in SQL Server 7.0 and SQL Server 2000, the plan will tend to be found either by the trivial plan optimizer or by the first phase of the cost based optimizer. The rules were intentionally organized to try to make that be true. The plan will probably consist of using a single index and using nested loops. However, every once in a while, because of lack of statistical information, or some other nuance, the optimizer will have to proceed with the later phases of optimization. Sometimes this is because there is a real possibility that the optimizer could find a better plan. When a plan is found, it becomes the optimizer’s output, and then SQL Server goes through all the caching mechanisms that we have already discussed in Module 5. Full Optimization At some point, the optimizer determines that it has gone through enough preliminary phases, and it reverts to a phase called full optimization. If the optimizer goes through all the preliminary phases, and still has not found a cheap plan, it examines the cost for the plan that it has so far. If the cost is above the threshold, the optimizer goes into a phase called full optimization. This threshold is configurable, as the configuration option ‘cost threshold for parallelism’. The full optimization phase assumes that this plan should be run this in parallel. If the machine is very busy, the plan will end up running it in serial, but the optimizer has a goal to produce a good parallel. If the cost is below the threshold (or a single processor machine), the full optimization phase just uses a brute force method to find a serial plan. Selectivity Estimation Selectivity Is One of The Most Important Pieces of Information One of the most import things the optimizer needs to know is the number of rows from any table that will meet all the conditions in the query. If there are no restrictions on a table, and all the rows will be needed, the optimizer can determine the number of rows from the sysindexes table. This number is not absolutely guaranteed to be accurate, but it is the number the optimizer uses. If there is a filter on the table in a WHERE clause, the optimizer needs statistics information. Indexes automatically maintain statistics, and the optimizer will use these values to determine the usefulness of the index. If there is no index on the column involved in the filter, then column statistics can be used or generated. Optimizing Search Arguments In General, the Filters in the WHERE Clause Determine Which Indexes Will Be Useful If an indexed column is referenced in a Search Argument (SARG), the optimizer will analyze the cost of using that index. A SARG has the form:  column <operator> value  value <operator> column  Operator must be one of =, >, >= <, <= The value can be a constant, an operation, or a variable. Some functions also will be treated as SARGs. These queries have SARGs, and a nonclustered index on firstname will be used in most cases: select * from member where firstname < 'AKKG' select * from member where firstname = substring('HAAKGALSFJA', 2,5) select * from member where firstname = 'AA' + 'KG' declare @name char(4) set @name = 'AKKG' select * from member where firstname < @name Not all functions can be used in SARGs. select * from charge where charge_amt < 2*2 select * from charge where charge_amt < sqrt(16) Compare these queries to ones using = instead of <. With =, the optimizer can use the density information to come up with a good row estimate, even if it’s not going to actually perform the function’s calculations. A filter with a variable is usually a SARG The issue is, can the optimizer come up with useful costing information? A filter with a variable is not a SARG if the variable is of a different datatype, and the column must be converted to the variable’s datatype For more information, see the following Knowledge Base article: Q198625 Enter Title of KB Article Here Use credit go CREATE TABLE [member2] ( [member_no] [smallint] NOT NULL , [lastname] [shortstring] NOT NULL , [firstname] [shortstring] NOT NULL , [middleinitial] [letter] NULL , [street] [shortstring] NOT NULL , [city] [shortstring] NOT NULL , [state_prov] [statecode] NOT NULL , [country] [countrycode] NOT NULL , [mail_code] [mailcode] NOT NULL ) GO insert into member2 select member_no, lastname, firstname, middleinitial, street, city, state_prov, country, mail_code from member alter table member2 add constraint pk_member2 primary key clustered (lastname, member_no, firstname, country) declare @id int set @id = 47 update member2 set city = city + ' City', state_prov = state_prov + ' State' where lastname = 'Barr' and member_no = @id and firstname = 'URQYJBFVRRPWKVW' and country = 'USA' These queries don’t have SARGs, and a table scan will be done: select * from member where substring(lastname, 1,2) = ‘BA’ Some non-SARGs can be converted select * from member where lastname like ‘ba%’ In some cases, you can rewrite your query to turn a non-SARG into a SARG; for example, you can rewrite the substring query above and the LIKE query that follows it. Join Order and Types of Joins Join Order and Strategy Is Determined By the Optimizer The execution plan output will display the join order from top to bottom; i.e. the table listed on top is the first one accessed in a join. You can override the optimizer’s join order decision in two ways:  OPTION (FORCE ORDER) applies to one query  SET FORCEPLAN ON applies to entire session, until set OFF If either of these options is used, the join order is determined by the order the tables are listed in the query’s FROM clause, and no optimizer on JOIN ORDER is done. Forcing the JOIN order may force a particular join strategy. For example, in most outer join operations, the outer table is processed first, and a nested loops join is done. However, if you force the inner table to be accessed first, a merge join will need to be done. Compare the query plan for this query with and without the FORCE ORDER hint: select * from titles right join publishers on titles.pub_id = publishers.pub_id -- OPTION (FORCE ORDER) Nested Loop Join A nested iteration is when the query optimizer constructs a set of nested loops, and the result set grows as it progresses through the rows. The query optimizer performs the following steps. 1. Finds a row from the first table. 2. Uses that row to scan the next table. 3. Uses the result of the previous table to scan the next table. Evaluating Join Combinations The query optimizer automatically evaluates at least four or more possible join combinations, even if those combinations are not specified in the join predicate. You do not have to add redundant clauses. The query optimizer balances the cost and uses statistics to determine the number of join combinations that it evaluates. Evaluating every possible join combination is inefficient and costly. Evaluating Cost of Query Performance When the query optimizer performs a nested join, you should be aware that certain costs are incurred. Nested loop joins are far superior to both merge joins and hash joins when executing small transactions, such as those affecting only a small set of rows. The query optimizer:  Uses nested loop joins if the outer input is quite small and the inner input is indexed and quite large.  Uses the smaller input as the outer table.  Requires that a useful index exist on the join predicate for the inner table.  Always uses a nested loop join strategy if the join operation uses an operator other than an equality operator. Merge Joins The columns of the join conditions are used as inputs to process a merge join. SQL Server performs the following steps when using a merge join strategy: 1. Gets the first input values from each input set. 2. Compares input values. 3. Performs a merge algorithm. • If the input values are equal, the rows are returned. • If the input values are not equal, the lower value is discarded, and the next input value from that input is used for the next comparison. 4. Repeats the process until all of the rows from one of the input sets have been processed. 5. Evaluates any remaining search conditions in the query and returns only rows that qualify. Note Only one pass per input is done. The merge join operation ends after all of the input values of one input have been evaluated. The remaining values from the other input are not processed. Requires That Joined Columns Are Sorted If you execute a query with join operations, and the joined columns are in sorted order, the query optimizer processes the query by using a merge join strategy. A merge join is very efficient because the columns are already sorted, and it requires fewer page I/O. Evaluates Sorted Values For the query optimizer to use the merge join, the inputs must be sorted. The query optimizer evaluates sorted values in the following order: 1. Uses an existing index tree (most typical). The query optimizer can use the index tree from a clustered index or a covered nonclustered index. 2. Leverages sort operations that the GROUP BY, ORDER BY, and CUBE clauses use. The sorting operation only has to be performed once. 3. Performs its own sort operation in which a SORT operator is displayed when graphically viewing the execution plan. The query optimizer does this very rarely. Performance Considerations Consider the following facts about the query optimizer's use of the merge join:  SQL Server performs a merge join for all types of join operations (except cross join or full join operations), including UNION operations.  A merge join operation may be a one-to-one, one-to-many, or many-to-many operation. If the merge join is a many-to-many operation, SQL Server uses a temporary table to store the rows. If duplicate values from each input exist, one of the inputs rewinds to the start of the duplicates as each duplicate value from the other input is processed.  Query performance for a merge join is very fast, but the cost can be high if the query optimizer must perform its own sort operation. If the data volume is large and the desired data can be obtained presorted from existing Balanced-Tree (B-Tree) indexes, merge join is often the fastest join algorithm.  A merge join is typically used if the two join inputs have a large amount of data and are sorted on their join columns (for example, if the join inputs were obtained by scanning sorted indexes).  Merge join operations can only be performed with an equality operator in the join predicate. Hashing is a strategy for dividing data into equal sets of a manageable size based on a given property or characteristic. The grouped data can then be used to determine whether a particular data item matches an existing value. Note Duplicate data or ranges of data are not useful for hash joins because the data is not organized together or in order. When a Hash Join Is Used The query optimizer uses a hash join option when it estimates that it is more efficient than processing queries by using a nested loop or merge join. It typically uses a hash join when an index does not exist or when existing indexes are not useful. Assigns a Build and Probe Input The query optimizer assigns a build and probe input. If the query optimizer incorrectly assigns the build and probe input (this may occur because of imprecise density estimates), it reverses them dynamically. The ability to change input roles dynamically is called role reversal. Build input consists of the column values from a table with the lowest number of rows. Build input creates a hash table in memory to store these values. The hash bucket is a storage place in the hash table in which each row of the build input is inserted. Rows from one of the join tables are placed into the hash bucket where the hash key value of the row matches the hash key value of the bucket. Hash buckets are stored as a linked list and only contain the columns that are needed for the query. A hash table contains hash buckets. The hash table is created from the build input. Probe input consists of the column values from the table with the most rows. Probe input is what the build input checks to find a match in the hash buckets. Note The query optimizer uses column or index statistics to help determine which input is the smaller of the two. Processing a Hash Join The following list is a simplified description of how the query optimizer processes a hash join. It is not intended to be comprehensive because the algorithm is very complex. SQL Server: 1. Reads the probe input. Each probe input is processed one row at a time. 2. Performs the hash algorithm against each probe input and generates a hash key value. 3. Finds the hash bucket that matches the hash key value. 4. Accesses the hash bucket and looks for the matching row. 5. Returns the row if a match is found. Performance Considerations Consider the following facts about the hash joins that the query optimizer uses:  Similar to merge joins, a hash join is very efficient, because it uses hash buckets, which are like a dynamic index but with less overhead for combining rows.  Hash joins can be performed for all types of join operations (except cross join operations), including UNION and DIFFERENCE operations.  A hash operator can remove duplicates and group data, such as SUM (salary) GROUP BY department. The query optimizer uses only one input for both the build and probe roles.  If join inputs are large and are of similar size, the performance of a hash join operation is similar to a merge join with prior sorting. However, if the size of the join inputs is significantly different, the performance of a hash join is often much faster.  Hash joins can process large, unsorted, non-indexed inputs efficiently. Hash joins are useful in complex queries because the intermediate results: • Are not indexed (unless explicitly saved to disk and then indexed). • Are often not sorted for the next operation in the execution plan.  The query optimizer can identify incorrect estimates and make corrections dynamically to process the query more efficiently.  A hash join reduces the need for database denormalization. Denormalization is typically used to achieve better performance by reducing join operations despite redundancy, such as inconsistent updates. Hash joins give you the option to vertically partition your data as part of your physical database design. Vertical partitioning represents groups of columns from a single table in separate files or indexes. Subquery Performance Joins Are Not Inherently Better Than Subqueries Here is an example showing three different ways to update a table, using a second table for lookup purposes. The first uses a JOIN with the update, the second uses a regular introduced with IN, and the third uses a correlated subquery. All three yield nearly identical performance. Note Note that performance comparisons cannot just be made based on I/Os. With HASHING and MERGING techniques, the number of reads may be the same for two queries, yet one may take a lot longer and use more memory resources. Also, always be sure to monitor statistics time. Suppose you want to add a 5 percent discount to order items in the Order Details table for which the supplier is Exotic Liquids, whose supplierid is 1. -- JOIN solution BEGIN TRAN UPDATE OD SET discount = discount + 0.05 FROM [Order Details] AS OD JOIN Products AS P ON OD.productid = P.productid WHERE supplierid = 1 ROLLBACK TRAN -- Regular subquery solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE productid IN (SELECT productid FROM Products WHERE supplierid = 1) ROLLBACK TRAN -- Correlated Subquery Solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE EXISTS(SELECT supplierid FROM Products WHERE [Order Details].productid = Products.productid AND supplierid = 1) ROLLBACK TRAN Internally, Your Join May Be Rewritten SQL Server’s query processor had many different ways of resolving your JOIN expressions. Subqueries may be converted to a JOIN with an implied distinct, which may result in a logical operator of SEMI JOIN. Compare the plans of the first two queries: USE credit select member_no from member where member_no in (select member_no from charge) select distinct m.member_no from member m join charge c on m.member_no = c.member_no The second query uses a HASH MATCH as the final step to remove the duplicates. The first query only had to do a semi join. For these queries, although the I/O values are the same, the first query (with the subquery) runs much faster (almost twice as fast). Another similar looking join is

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值