ISAPI_Rewrite 1.3 DocumentationIntroductionWhat's new Main concept Lite version limitation Special notes for the IIS6 Configuration
Configuration file format
Regular expression syntaxCacheClockRate directive RepeatLimit directive RewriteCond directive RewriteRule directive RewriteHeader directive
Special note about "pathological" regular expressions
Format string syntax Literals Wildcard Repeats Non-greedy repeats Parenthesis Non-Marking Parenthesis Alternatives Sets Character classes Collating elements Equivalence classes Line anchors Back references Forward lookahead asserts Word operators Escape operator What gets matched? Examples |
|
Introduction
ISAPI_Rewrite is a powerful regular expressions-based URL manipulation engine. It acts mostly like Apache's mod_Rewrite, but it is designed especially for Microsoft Internet Information Server. If you ever wanted to change your web site's URL scheme, this product is for you!
Some key benefits of ISAPI_Rewrite:
- Speed
ISAPI_Rewrite is extremely fast and highly scalable solution. It is written by using only pure C/C++ code, Win32 API and ISAPI. It uses intelligent configuration cache mechanism. All work is done just in one stage and there are no recursively requests or any other operations that may take a long time.
- Security
ISAPI_Rewrite is designed for operation in a shared environment. It can serve as many sites as you have. ISP and hosting providers can safely permit their users to configure ISAPI_Rewrite and be sure that any configuration changes will affect only local users environment. ISAPI_Rewrite can even solve many security problems, for example, block an access to some folders or file extensions or create more complex rules.
- Power
Flexibility and power of ISAPI_Rewrite come form its regular expression nature. With regular expressions you don't need to write a thousands check strings. The comparison and replace of URLs can be done with a few string patterns. So, ISAPI_Rewrite can do many things that cannot be done using other technology solutions available for IIS. See examples section for more information.
What's new
ISAPI_Rewrite version 1.3 build 16:
- Introduced some modifications to the Regex++ regular expressions engine to overcome a problem with "pathological" rules requiring exponential time for processing. Now time to process a single rule is limited to half-second. If a rule fails to complete in this time a processing finishes and ISAPI_Rewrite sends "500 Internal Server error" to a client to indicate configuration error.
- Added new N (Next) flag to the RewriteRule and RewriteHeader directives. It makes possible to organize loops while processing rules.
- Added RepeatLimit directive to limit the number of possible loops.
- Added F (Forbidden) flag to the RewriteRule and RewriteHeader directives. It forces to send 403 Forbidden response to a client if a positive match detected.
- Added O (nOrmalize) flag to the RewriteRule, RewriteHeader and RewriteCond directives. It points out that checked string first should be normalized (i.e. URL encoding, illegal characters, etc removed).
- Added a possibility to check ServerVariables with RewriteCond directive. It could be done using %ServerVariable instead of a header name.
- Improved configuration parsing process error logging. Now error messages contain line numbers.
ISAPI_Rewrite version 1.2 build 14 (Full version only):
- Fixed a problem introduced in the Full version 1.1 build 11. Configuration flag CacheClockRate was incorrectly parsed. And after the first cache cleanup inetinfo.exe process began to consume 99% of a CPU time.
ISAPI_Rewrite version 1.2 build 13:
- Added new flag U (Unmangle Log). Now ISAPI_Rewrite can log URL as it was originally requested.
ISAPI_Rewrite version 1.1 build 11:
- Fixed a problem with the truncation of the last character of a configuration file.
- Fixed several shortcomings with documentation and default configuration files.
- Included additional optimisation for the Internet Information Server 6.0.
- ISAPI_Rewrite now adds custom header with original URL information to the client request, so the original URL can be retrieved in the server script.
- New RewriteHeader directive now allows to rewrite not only the URL part of the client request, but any other HTTP header or even method and version information.
Main concept
ISAPI_Rewrite provides a rule-based rewriting engine to rewrite requested URLs on the fly. It supports virtually unlimited number of the rules and an unlimited number of attached rule conditions to provide a really flexible and powerful URL manipulation mechanism (Really a config file size is forcibly limited to 2Mb to prevent possible config parsing overhead). The URL manipulations can depend on tests for HTTP headers, Request-URI, method and version.
This program operates on Request-URI (including query string) and HTTP headers as it described in RFC 2068 both in per-server (global) context and per-virtual-site context. The result of operation can lead to either rewriting or redirection.
Rewriting will cause server to continue request processing with new URI as if it has been the originally requested URI. New URI can include query string section (following question mark) and may direct to any files, script calls, program invocations etc.
Redirection will cause server to send immediate response to client with redirect instruction (HTTP response code 302 with Location header), providing result URI as a new location. You can use absolute links (that is required by RFC 2068) in redirect instruction to redirect request to different host, port and protocol. Redirect instruction always causes rewriting engine to stop the processing sequence.
Rules are processed in the order as they appear in configuration file. ISAPI_Rewrite processes per-server (global) rules first and then it processes an individual virtual site rules if specified. There are no recursively requests or subsequent rollbacks in processing order, so you will never get into an infinite loop.
The rewriting engine loops through the ruleset rule by rule (RewriteRule directives). The particular rule is applied only if it matches against URI and all corresponding conditions (RewriteCond directives) matches against their test strings. ISAPI_Rewrite uses match algorithm. It means that expression is matched only if it matches the whole input string. If rule is applied ISAPI_Rewrite continues to loop through the ruleset with new URI until the last rule will be processed.
ISAPI_Rewrite saves original path info + query string before any manipulation on the URL in HTTP header named X-Rewrite-URL. Then it can be retrieved in ASP with Request.ServerVariables("HTTP_X_REWRITE_URL").
Whenever you use parentheses in Pattern or in one of the CondPattern, back-references are internally created which can be used withing the format string (using $N syntax) or withing the other patterns (using /N syntax). The references are global for the entire RewriteRule directive and corresponding RewriteCond directives. Sub matches are numbered from up to down and from left to right beginning with the first RewriteCond directive (if such is exists) corresponding to the RewriteRule directive.
To simplify rules and strengthen server security it is strongly recommended that you disable parent paths in the IIS settings.
Lite version limitation
Lite and Full versions of ISAPI_Rewrite are the same, except that Lite version doesn't support for per-virtual-site configuration, only global rules are processed.
Special notes for the IIS6
These special notes concern new features of the Internet Information Server 6.0 built-in into the Windows.NET Server and limitations imposed by those features upon the ISAPI_Rewrite functionality.
The main difference of the IIS6 from it's ancestors is a new default process model called Worker Process Isolation (WPI) mode. Also IIS6 could operate in the IIS5-compatibility mode (which have no effect on the ISAPI_Rewrite's functionality) it's main advantages could be achived only in the WPI mode.
In the WPI mode virtual web sites or even individual web applications are running inside an Application Pools. And each application pool is served by one or more isolated worker processes w3wp.exe. It looks like High isolation mode in the IIS5 but there exists one significant difference - filters are not running inside the inetinfo.exe process anymore. They running inside a worker processes as an usual applications.
It means that there could be multiple instances of a single filter (one instance for each worker process). Nevertheless this is not a problem for the ISAPI_Rewrite. But now consider the case where a virtual site itself belongs to one Application pool while child web application belongs to another pool. There will be 2 different instances of the filter and one will process requests for the site while other will process request for the child web appication. The bad thing is that IIS determines target application pool before the first invocation of the filter. And rewriting of an URLs from one pool to another pool are prohibited. So there is no easy way to redirect request from one pool to another except sending a HTTP Redirect response to the client browser.
Configuration
Configuration file format
There are two types of configuration files - global (per-server) and individual (per-virtual-site) files. The global configuration file should be named httpd.ini and should appear in the ISAPI_Rewrite installation directory. The shortcut of this file is provided through the start menu. The individual configuration files should be named httpd.ini and could appear in physical root directories of virtual sites. Both file types formats are the same and it is the standard Windows INI file braked by sections. The only section allowed in this version of ISAPI_Rewrite is [ISAPI_Rewrite]. All directives should be placed in this section and each directive should be placed on a separate line. Any text outside this section will be ignored.
httpd.ini file example:
[ISAPI_Rewrite] # This is a comment # 300 = 5 minutes CacheClockRate 300 RepeatLimit 20 # Protect httpd.ini and httpd.parce.errors files # from accessing through HTTP RewriteRule ^/httpd(/.ini|/.parse/.errors).* . [F,I] # Some custom rules RewriteCond Host: (.+) RewriteRule (.*) /$1$2 [I] |
When ISAPI_Rewrite parses configuration file it creates error log file named httpd.parse.error in the same directory where parsed file is located.
CacheClockRate directive
Syntax: CacheClockRate Interval |
This directive can appear only in global configuration context. If this directive is found in a per-virtual-site context it will be ignored and an error message will be written to a httpd.parse.errors file.
ISAPI_Rewrite caches every configuration file at first time it is loaded. Using this directive you can specify period of inactivity of particular site when it's configuration will be purged from cache. By setting this parameter big enough you can force ISAPI_Rewrite to never recycle its cache. Remember that any changes to configuration files update cache immediately after the next request regardless of this interval.
- Interval
Specifies time of inactivity (in seconds) when particular configuration will be purged from cache. The default value is 3600 (1 hour).
RepeatLimit directive
Syntax: RepeatLimit Limit |
This directive could appear both in global and in per-virtual-site configuration files. If it will appear in the global configuration file it will change the global limit for all sites. If this directive will appear in a per-virtual-site configuration file it will change a limit for this site only and this limit could not exceed the global limit.
ISAPI_Rewrite allows loops while processing rules (see the description of the N flag of the RewriteRule and RewriteHeader directives). This directive allows to limit the maximum number of possible loops. It could be set to zero or one to disable looping.
- Limit
Specifies a maximum number of allowed loops. The default value is 32.
RewriteCond directive
Syntax: RewriteCond TestVerb CondPattern [Flags] |
The RewriteCond directive defines a rule condition. Precede a RewriteRule directive with one or more RewriteCond directives. The following rewriting rule applied only if its pattern matches the current state of the URI and if these additional conditions apply too.
- TestVerb
Specifies verb that will be matched against regular expression.
TestVerb=(URL | METHOD | VERSION | HTTPHeaderName: | %ServerVariable) where:
- URL - returns Request-URI of client request as described in RFC 2068 (HTTP 1.1);
- METHOD - returns HTTP method of client request (OPTIONS, GET, HEAD, POST, PUT, DELETE or TRACE);
- VERSION - returns HTTP version;
- HTTPHeaderName - returns value of the specified HTTP header. HTTPHeaderName can be any valid HTTP header name. Header names should include the trailing colon ":". If specified header does not exists in a client's request TestVerb is treated as empty string.
HTTPHeaderName =
Accept: Accept-Charset: Accept-Encoding: Accept-Language: Authorization: Cookie: From: Host: If-Modified-Since: If-Match: If-None-Match: If-Range: If-Unmodified-Since: Max-Forwards: Proxy-Authorization: Range: Referer: User-Agent: Any-Custom-Header:
For more information about HTTP headers and their values refer to RFC 2068.
- ServerVariable - returns value of the specified Server Variable. For examlpe, SERVER_PORT. Complete list of the server variables could be found in the IIS documentation. Variable name should be prefixed with % sign.
- CondPattern
The regular expression to match TestVerb.
- [Flags]
Flags is a comma-separated list of the following flags:
- O (nOrmalize)
Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flas is useful with URLs and URL-encoded headers.
- O (nOrmalize)
RewriteRule directive
Syntax: RewriteRule Pattern FormatString [Flags] |
The RewriteRule directive is the real rewriting workhorse. The directive can occur more than once. Each directive defines one single rewriting rule. The definition order of these rules is important, because this order is used when applying the rules at run-time.
- Pattern
Specifies regular expression that will be matched against Request-URI. See regular expression syntax section for more info.
- FormatString
Specifies format string that will generate new URI. See format string syntax section for more info.
- [Flags]
Flags is a comma-separated list of the following flags:
- I (ignore case)
Indicates that characters are matched regardless of case. This flag affects RewriteRule directive and all corresponding RewriteCond directives.
- F (Forbidden)
Stops the rewriting process and sends 403 Forbidden response to a client. Note that FormatString is useless in this case and could be set to any non-empty string.
- L (last rule)
Stop the rewriting process here and don't apply any more rewriting rules. Use this flag to prevent the currently rewritten URI from being rewritten further by following rules.
- N (Next iteration)
Forces rewriting engine to modify rule's target and restart rule checking from the beginning (all modifications are saved). Number of restarts is limited by the value specified in the RepeatLimit directive. If this number is exceeded N flag will be simply ignored.
- R (explicit redirect)
Force server to send immediate response to client with redirect instruction, providing result URI as a new location. Redirect rule is always the last rule.
- U (Unmangle Log)
Log the URL as it was originally requested and not as the URL was rewritten.
- O (nOrmalize)
Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flas is useful with URLs and URL-encoded headers.
- I (ignore case)
RewriteHeader directive
Syntax: RewriteHeader HeaderName Pattern FormatString [Flags] |
The RewriteHeader directive is more general variant of RewriteRule directive and it is designed to rewrite not only the URL part of client request, but any HTTP header. This directive can be used to rewrite, create or delete any HTTP headers, or even change method of the client request.
- HeaderName
Specifies a HTTP header that will be rewritten. Possible values are the same as for the TestVerb parameter in the RewriteCond directive. Thus, RewriteRule directive is a synonym to the RewriteHeader URL Pattern Format [Flags]
- Pattern
Specifies regular expression that will be matched against specified header. See regular expression syntax section for more information.
- FormatString
Specifies format string that will generate new header value. See format string syntax section for more information.
- [Flags]
Flags is a comma-separated list of the following flags:
- I (ignore case)
Indicates that characters are matched regardless of case. This flag affects RewriteHeader directive and all corresponding RewriteCond directives.
- F (Forbidden)
Stops the rewriting process and sends 403 Forbidden response to a client. Note that FormatString is useless in this case and could be set to any non-empty string.
- L (last rule)
Stop the rewriting process here and don't apply any more rewriting rules.
- N (Next iteration)
Forces rewriting engine to modify rule's target and restart rule checking from the beginning (all modifications are saved). Number of restarts is limited by the value specified in the RepeatLimit directive. If this number is exceeded N flag will be simply ignored.
- R (explicit redirect)
Force server to send immediate response to client with redirect instruction, providing new URI as a new location. Redirect rule is always the last rule.
- U (Unmangle Log)
Log the URL as it was originally requested and not as the URL was rewritten.
- O (nOrmalize)
Normalizes string before processing. Normalization includes removing of an URL-encoding, illegal characters, etc. This flas is useful with URLs and URL-encoded headers.
- I (ignore case)
To remove header, format string pattern should generate an empty string. For example this rule will remove user agent information from the client request:
RewriteHeader User-Agent: .* $0 |
And this rule will add Old-URL header to the request, providing a Request-URL as a header value:
RewriteCond URL (.*) |
This last example will direct all WebDAV requests to the /webdav.asp script by changing request method:
RewriteCond METHOD OPTIONS |
Regular expression syntax
This section covers the regular expression syntax used by ISAPI_Rewrite.Special note about "pathological" regular expressions
ISAPI_Rewrite uses a very powerful regular expressions engine Regex++ written by Dr. John Maddock. But as any real thing it's not ideal: There exists some "pathological" expressions which may require exponential time for matching; these all involve nested repetition operators, for example attempting to match the expression "(a*a)*b" against N letter a's requires time proportional to 2N. These expressions can (almost) always be rewritten in such a way as to avoid the problem, for example "(a*a)*b" could be rewritten as "a*b" which requires only time linearly proportional to N to solve. In the general case, non-nested repeat expressions require time proportional to N2, however if the clauses are mutually exclusive then they can be matched in linear time - this is the case with "a*b", for each character the matcher will either match an "a" or a "b" or fail, where as with "a*a" the matcher can't tell which branch to take (the first "a" or the second) and so has to try both.
In the version 1.3 of ISAPI_Rewrite we introduced some modifications to the Regex++ regular expressions engine to overcome a problem with "pathological" rules requiring exponential time for processing. Now time to process a single rule is limited to half-second. If a rule fails to complete in this time a processing finishes and ISAPI_Rewrite sends "500 Internal Server error" to a client to indicate configuration error. Also the failed rule is disabled to prevent performance losses. These solution is a hack of Regex++ and should be considered as temporary. We have contacted Dr. Maddock and he told us that he will try to detect such "pathologial" expressions in a future version of Regex++ so there will be no need for hacks at all.
Literals
All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "/". A literal is a character that matches itself.
Wildcard
The dot character "." matches any single character except null character and newline character.
Repeats
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.
Examples:
- "ba*" will match all of "b", "ba", "baaa" etc.
- "ba+" will match "ba" or "baaaa" for example but not "b".
- "ba?" will match "b" or "ba".
- "ba{2,4}" will match "baa", "baaa" and "baaaa".
Non-greedy repeats
Non-greedy repeats are possible by appending a '?' after the repeat; a non-greedy repeat is one which will match the shortest possible string.
For example to match html tag pairs one could use something like:
"</s*tagname[^>]*>(.*?)</s*/tagname/s*>"
In this case $1 will contain the text between the tag pairs, and will be the shortest possible matching string.
Parenthesis
Parentheses serve two purposes, to group items together into a sub-expression, and to mark what generated the match. For example the expression "(ab)*" would match all of the string "ababab". All sub matches marked by parenthesis can be back referenced using /N or $N syntax. It is permissible for sub-expressions to match null strings. Sub-expressions are indexed from left to right starting from 1, sub-expression 0 is the whole expression.
Non-Marking Parenthesis
Sometimes you need to group sub-expressions with parenthesis, but don't want the parenthesis to spit out another marked sub-expression, in this case a non-marking parenthesis (?:expression) can be used. For example the following expression creates no sub-expressions:
"(?:abc)*"
Alternatives
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behaviour from repetition operators.
Examples:
- "a(b|c)" could match "ab" or "ac".
- "abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.
Examples:
Character literals:
- "[abc]" will match either of "a", "b", or "c".
- "[^abc] will match any character other than "a", "b", or "c".
Character ranges:
- "[a-z]" will match any character in the range "a" to "z".
- "[^A-Z]" will match any character other than those in the range "A" to "Z".
Character classes
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:
alnum | Any alpha numeric character. |
alpha | Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale. |
blank | Any blank character, either a space or a tab. |
cntrl | Any control character. |
digit | Any digit 0-9. |
graph | Any graphical character. |
lower | Any lower case character a-z. Other characters may also be included depending upon the locale. |
Any printable character. | |
punct | Any punctuation character. |
space | Any whitespace character. |
upper | Any upper case character A-Z. Other characters may also be included depending upon the locale. |
xdigit | Any hexadecimal digit character, 0-9, a-f and A-F. |
word | Any word character - all alphanumeric characters plus the underscore. |
unicode | Any character whose code is greater than 255, this applies to the wide character traits classes only. |
There are some shortcuts that can be used in place of the character classes:
- /w in place of [:word:]
- /s in place of [:space:]
- /d in place of [:digit:]
- /l in place of [:lower:]
- /u in place of [:upper:]
Collating elements
Collating elements take the general form [.tagname.] inside a set declaration, where tagname is either a single character, or a name of a collating element, for example [[.a.]] is equivalent to [a], and [[.comma.]] is equivalent to [,]. ISAPI_Rewrite supports all the standard POSIX collating element names, and in addition the following digraphs: "ae", "ch", "ll", "ss", "nj", "dz", "lj", each in lower, upper and title case variations. Multi-character collating elements can result in the set matching more than one character, for example [[.ae.]] would match two characters, but note that [^[.ae.]] would only match one character.
Equivalence classes
Equivalenceclassestakethegeneralform[=tagname=] inside a set declaration, where tagname is either a single character, or a name of a collating element, and matches any character that is a member of the same primary equivalence class as the collating element [.tagname.]. An equivalence class is a set of characters that collate the same, a primary equivalence class is a set of characters whose primary sort key are all the same (for example strings are typically collated by character, then by accent, and then by case; the primary sort key then relates to the character, the secondary to the accentation, and the tertiary to the case). If there is no equivalence class corresponding to tagname, then [=tagname=] is exactly the same as [.tagname.].
To include a literal "-" in a set declaration then: make it the first character after the opening "[" or "[^", the endpoint of a range, a collating element, or precede it with an escape character as in "[/-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, a collating element, or precede with an escape character.
Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.
Back references
A back reference is a reference to a previous sub-expression that has already been matched, the reference is to what the sub-expression matched, not to the expression itself. A back reference consists of the escape character "/" followed by a digit "1" to "9", "/1" refers to the first sub-expression, "/2" to the second etc. For example the expression "(.*)/1" matches any string that is repeated about its mid-point for example "abcabc" or "xyzxyz". A back reference to a sub-expression that did not participate in any match, matches the null string. In ISAPI_Rewrite all back references are global for entire RewriteRule and corresponding RewriteCond directives. Sub matches are numbered up to down and left to right beginning from the first RewriteCond directive of the corresponding RewriteRule directive, if there is one.
Forward Lookahead Asserts
There are two forms of these; one for positive forward lookahead asserts, and one for negative lookahead asserts:
- "(?=abc)" matches zero characters only if they are followed by the expression "abc".
- "(?!abc)" matches zero characters only if they are not followed by the expression "abc".
Word operators
The following operators are provided for compatibility with the GNU regular expression library.
- "/w" matches any single character that is a member of the "word" character class, this is identical to the expression "[[:word:]]".
- "/W" matches any single character that is not a member of the "word" character class, this is identical to the expression "[^[:word:]]".
- "/<" matches the null string at the start of a word.
- "/>" matches the null string at the end of the word.
- "/b" matches the null string at either the start or the end of a word.
- "/B" matches a null string within a word.
Escape operator
The escape character "/" has several meanings.
- The escape operator may introduce an operator for example: back references, or a word operator.
- The escape operator may make the following character normal, for example "/*" represents a literal "*" rather than the repeat operator.
Single character escape sequences:
The following escape sequences are aliases for single characters:
Escape sequence | Character code | Meaning |
---|---|---|
/a | 0x07 | Bell character. |
/t | 0x09 | Tab character. |
/v | 0x0B | Vertical tab. |
/e | 0x1B | ASCII Escape character. |
/0dd | 0dd | An octal character code, where dd is one or more octal digits. |
/xXX | 0xXX | A hexadecimal character code, where XX is one or more hexadecimal digits. |
/x{XX} | 0xXX | A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character. |
/cZ | z-@ | An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'. |
Miscellaneous escape sequences:
The following are provided mostly for perl compatibility, but note that there are some differences in the meanings of /l /L /u and /U:
Escape sequence | Meaning |
---|---|
/w | Equivalent to [[:word:]]. |
/W | Equivalent to [^[:word:]]. |
/s | Equivalent to [[:space:]]. |
/S | Equivalent to [^[:space:]]. |
/d | Equivalent to [[:digit:]]. |
/D | Equivalent to [^[:digit:]]. |
/l | Equivalent to [[:lower:]]. |
/L | Equivalent to [^[:lower:]]. |
/u | Equivalent to [[:upper:]]. |
/U | Equivalent to [^[:upper:]]. |
/C | Any single character, equivalent to '.'. |
/X | Match any Unicode combining character sequence, for example "a/x 0301" (a letter a with an acute). |
/Q | The begin quote operator, everything that follows is treated as a literal character until a /E end quote operator is found. |
/E | The end quote operator, terminates a sequence begun with /Q. |
What gets matched?
The regular expression will match the first possible matching string, if more than one string starting at a given location can match then it matches the longest possible string. In cases where their are multiple possible matches all starting at the same location, and all of the same length, then the match chosen is the one with the longest first sub-expression, if that is the same for two or more matches, then the second sub-expression will be examined and so on. Note that ISAPI_Rewrite uses MATCH algorithm. The result is matched only if the expression matches the whole input sequence. For example:
- RewriteCond URL ^/somedir/.* #will match any request to somedir directory and subdirectories, while
- RewriteCond URL ^/somedir/ #will match only request to the root of the somedir.
Format string syntax
In format strings, all characters are treated as literals except: "(", ")", "$", "/", "?", ":".
To use any of these as literals you must prefix them with the escape character /
The following special sequences are recognized:
Grouping:
Use the parenthesis characters ( and ) to group sub-expressions within the format string, use /( and /) to represent literal '(' and ')'.
Sub-expression expansions:
The following perl like expressions expand to a particular matched sub-expression:
$` | Expands to all the text from the end of the previous match to the start of the current match, if there was no previous match in the current operation, then everything from the start of the input string to the start of the match. |
$' | Expands to all the text from the end of the match to the end of the input string. |
$& | Expands to all of the current match. |
$0 | Expands to all of the current match. |
$N | Expands to the text that matched sub-expression N. |
Conditional expressions:
Conditional expressions allow two different format strings to be selected dependent upon whether a sub-expression participated in the match or not:
?Ntrue_expression:false_expression
Executes true_expression if sub-expression N participated in the match, otherwise executes false_expression.
Example: suppose we search for "(while)|(for)" then the format string "?1WHILE:FOR" would output what matched, but in upper case.
Escape sequences:
The following escape sequences are also allowed:
/a | The bell character. |
/f | The form feed character. |
/n | The newline character. |
/r | The carriage return character. |
/t | The tab character. |
/v | A vertical tab character. |
/x | A hexadecimal character - for example /x0D. |
/x{} | A possible unicode hexadecimal character - for example /x{1A0} |
/cx | The ASCII escape character x, for example /c@ is equivalent to escape-@. |
/e | The ASCII escape character. |
/dd | An octal character constant, for example /10. |
Examples
Emulating host-header-based virtual sites on a single site
For example you have registered two domains www.site1.com and www.site2.com. Now you can create two different sites using single physical site. Add the following rules to your httpd.ini file:
[ISAPI_Rewrite] RewriteCond Host: (?:www/.)?site1/.com RewriteRule (.*) /site1$1 RewriteCond Host: (?:www/.)?site2/.com RewriteRule (.*) /site2$1 |
Now just place your sites in /site1 and /site2 directories.
Or you can use more general rules:
[ISAPI_Rewrite] RewriteCond Host: (?:www/.)?(.+) RewriteRule (.*) /$1$2 |
The directory names for sites should be like /somesite1.com, /somesite2.info, etc.
Using loops (Next flag) to convert request parameters
Suppose you wish to access physical URLs like http://www.myhost.com/foo.asp?a=A&b=B&c=C using requests like http://www.myhost.com/a/A/b/B/c/C/foo.asp and the number of parameters may vary from one request to another.
There exists at least two possible solutions. You could simply add a separate rule for each possible number of parameters or you could use a technique demonstrated by the following example.
[ISAPI_Rewrite] RewriteRule /([^/]*)/([^/]*)(.*)foo.asp(.+)? $3foo.asp(?4$4&:/?)$1=$2 [N,I] |
This rule will extract one parameter from request URL, append it to the end of the request string and restart rules processing from the beginning. So it will loop until all parameters will be moved to the right place (or until the RepeatLimit will be exceeded).
Moving sites from UNIX to IIS
This rules can help change the URL from /~username to /username and /file.html to /file.htm. It can be useful if you just moved your site from UNIX to IIS and keep getting hits to the old pages from search engines and other external pages.
[ISAPI_Rewrite] #redirecting to update old links RewriteRule (.*)/.html $1.htm RewriteRule /~(.*) http/://myserver/$1 [R] |
Moving site location
Many webmasters asked for a solution to the following problem: They want to redirect all requests to one web server to the another web server. Such problems usually arise when you need to establish a newer web server which will replace the old one over time. The solution is to use ISAPI_Rewrite on the old web server:
[ISAPI_Rewrite] #redirecting to update old links RewriteRule (.+) http/://newwebserver$1 [R] |
Browser-dependent content
It is sometimes necessary to provide browser-dependent content at least for important top-level pages, i.e. one has to provide a full-featured version for the Internet Explorer, a minimum-featured version for the Lynx browsers and an average-featured version for all others.
We have to act on the HTTP header "User-Agent". The sample code does the following: If the HTTP header "User-Agent" contains "MSIE", the target foo.htm
is rewritten to foo.IE.htm
. If the browser is "Lynx" or "Mozilla" of version 1 or 2 the URL becomes foo.20.htm
. Other browsers receive page foo.32.html
. All this is done by the following ruleset:
[ISAPI_Rewrite] RewriteCond User-Agent: .*MSIE.* RewriteRule foo/.htm foo.IE.htm [L] RewriteCond User-Agent: (?:Lynx|Mozilla/[12]).* RewriteRule foo/.htm foo.20.htm [L] RewriteRule foo/.htm foo.32.htm [L] |
Dynamically generated robots.txt
robots.txt is a file that search engines use to discover URLs that should or should not be indexed. But creation of this file for large sites with lot of dynamic content is a very complex task. Have you ever dreamed about dynamically generated robots.txt? Let's write robots.asp script:
<%@ Language=JScript EnableSessionState=False%> <% //The script must return plain text Response.ContentType="text/plain"; /* Place generation code here */ %> |
Now make it robots.txt using single rule:
[ISAPI_Rewrite] RewriteRule /robots/.txt /robots.asp |
Server side XML processing
Content of the site stored in XML files. There is /XMLProcess.asp file that processes XML files on server and returns HTML to end user. URLs to the documents have a form of:
http://www.mysite.com/XMLProcess.asp?xml=/somdir/somedoc.xml
But many popular search engines will not index such documents because URLs contain question mark (document is dynamically generated). ISAPI_Rewrite can competely eliminate this problem.
[ISAPI_Rewrite] RewriteRule /doc(.*)/.htm /XMLProcess.asp/?xml=$1.xml |
Now to access documents use URL like http://www.mysite.com/doc/somedir/somedoc.htm. Search engines will never know that physically there is no somedoc.htm file and content is dynamically generated.
Using conditional expressions
Sometimes you need to apply rule when some pattern not matches. Unlike Apache's mod_Rewrite, ISAPI_Rewrite don't have support for non-matching patterns because it is not very obvious what should rewriting engine do when non-matching pattern generates a sub matches. Instead of non-matching patterns you can use conditional expressions in format strings.
For example you need to move all users not using Internet Explorer to the other location:
[ISAPI_Rewrite] #if user agent is Internet Explorer leave URI untouched #else precede it with /nonie RewriteCond User-Agent: .*(MSIE)?.* RewriteRule (.+) ?1$2:/nonie$2 |
Proxy throughput
We are planning to implement proxy throughput abilities in the next version of ISAPI_Rewrite. But since rewriting URLs can lead to script or program invocations, you can implement your own proxy mechanism using ASP or CGI/ISAPI applications. Create proxy.asp file in the root of site and write following ASP code:
<%@ Language=JScript EnableSessionState=False%> <% //we are using MSXML3.0 to travel via HTTP var httpReq; if(Request.QueryString.Count) { httpReq=Server.CreateObject("Msxml2.ServerXMLHTTP"); httpReq.setTimeouts(5000,5000,15000,15000); httpReq.open("GET",""+Request.QueryString, false); httpReq.send(); Response.Status=httpReq.status; Response.ContentType=httpReq.getResponseHeader("Content-Type"); Response.BinaryWrite(httpReq.responseBody); } %> |
Now /proxy.asp can be used as a proxy providing requested URL in query string. Usage example:
[ISAPI_Rewrite] #throughput all content of /images folder to another server RewriteRule /images(.+) /proxy.asp/?http/://myimagearchive.net/image$1 |
Blocking inline-images
Assume we have some pages with inlined GIF graphics under http://www.quux-corp.de/. These graphics are nice, so others directly incorporate them via hyperlinks to their pages. We don't like this practice because it adds useless traffic to our server.
While we cannot 100% protect the images from inclusion, we can at least restrict the cases where the browser sends a HTTP Referer header.
[ISAPI_Rewrite] RewriteCond Referer: .+ RewriteCond Referer: (http://www/.quux-corp/.de/)?.* RewriteRule (.*/.gif) ?1$2:/404.asp [I] |
ISAPI_Rewrite filter uses Regex++ librarary. This document contains part of Regex++ library documentation.
Regex++ (Version Boost 1.28.0)
Copyright (c) 1998-2002, Dr John Maddock