Performance Testing
Performance testing is commonly conducted to accomplish the following:
Assess production readiness
Evaluate against criteria
Compare performance characteristics of multiple systems or system configurations
Find the source of performance problems
Support system tuning
Find throughput levels
Core Performance Testing Activities
Identify test environment-physical environment, production environment, tools and resources
Identify performance acceptance criteria-response time, throughput and resource utilization goals
Plan and design test-key criteria, variability among users, test data and metrics
Configure the test environment-prepare tools, resources
Implement the test design-develop in accordance to design
Execute the test-run, monitor and validate the tests
Analyze results, report and reset-consolidate and share results, reprioritizeremaining tests and re-execute them as needed.
Performance, Load and Stress
Performance Testing:
determines or validates the speed, scalability and/or stability characteristics of the system or application
concerned with achieving response times, throuput and resource utilization
Load Testing:
determining or validating performance characteristics of the system when subjected to workloads and load volumes anticipated during production operations.
Stress Testing:
determining or validating performance characteristics of the system when subjected to workloads and load volumes in excess of normal
special tests to see what happens with insufficient resources.
Mathematical Principles
Presentation of performance data requires an understanding of many mathematical and statistical concepts:
Averages
Percentiles
percentiles are only applicable on their own when used to represent data that is uniformly or normally distributed with an acceptable number of outliers.
Medians
A median is simply the middle value ina data set when sequenced from lowest to highest.
If there is an equal number of data points and the two center values are not the same then either average thetwo center values or choose the value closer to the average of the entire data set.
Mode
The mode is a single value that occurs most often in a data set.
Standard Deviations
The standard deviation is the amount of variance within a set of measurement that encompasses approximately the top 68 percent of all measurements in the data set.
The small the standard deviation, the more consistent the data.
Data with a standard deviation greater than half of its mean should be treated as suspect. If the data is accurate, the phenomenon the data represents is not displaying a normal distribution pattern.
Uniform Distributions
Uniform distributions represent a collection of data that is roughly equivalent to a set of random numbers evenly spaced between upper and lower bounds.
Uniform distributions are frequently used when modeling user delays, but are not common in response time results data-uniformly distributed results in response time data may be an indication of suspect results.
Normal Distributions
Normal distributions are data sets whose member data are weighted towards the center (or median value).
Most measurements of human variance result in data sets that are normally distributed. End-user response times for Web applications are also frequently normally distributed.
Statistical Significance
In statistics, a result is called statistically significant if it is unlikely to have occured by chance.
Showing statistical significance can require more time and effort than what a commercially driven software project can warrant.
A rule of thumb is that if a result set is statistically similar to 80% of all other data sets, using the following criteria, then the result set is statistically significant.
Criteria for Statistical Significance
If more than 20 percent of the test-execution results appear not to be similar to the others, something is generally wrong with the test environment, the application, or the test itself.
If a 90th percentile value for any test execution is greater than the maximum or less than the minimum value for any of other executions, that data set is probably not statistically similar.
If measurements from a test are noticeably higher or lower, when charted side-by-side, than the results of the other test executions, it is probably not statistically similar.
If one data se4t for a particular item in a test is noticeably higher or lower, but the results for the data sets of the remaining items appear similar, the test itself is probably statistically similar.
Outliers
Any measurement that falls outside of three standard deviations, or 99 percent, of all collected measurements is considered an outlier.
The problem with this definition is that it assumes that the collected measurements are both statistically significant and distributed normally, which is not at all automatic when evaluating performance test data.
In practice for commercially driven software development, it is generally acceptable to say that values representing less than 1 percent of all measurements for a particular item that are at least three standard deviations off the mean are candidates for omission in results analysis if identical values are not found in previous or subsequent tests.