Search Engine Safe (SES) URLs

There's a regular question that comes up on the CF-Talk list every now and again about how to do Search Engine Safe (SES) URLs. This article is designed to examine the issues, show some code that will provide SES functionality, explain the code, and answer common questions that may come up. The code assumes IIS 5 (or better) and CFMX 6.1. Minor alterations need to be made for other versions of ColdFusion or other webservers.

The basic idea of SES URLs is to create a URL that is passing variables (variable=value pairs) but does not look like it is. Why? Because search engines don't always pick up links to pages where variables are passed. Before I go into this, let me define some terms:

Static Page - A page called from a URL that does not contain variables

Dynamic Page - A page called from a URL that contains variables


The general rule that I was told (and have seen) is that a search engine will index a static page without problems. It will also index a dynamic page as long as the link to that dynamic page is coming from a static page (or what looks like a static page). It will not index a dynamic page that is called from a dynamic page. This does vary some based on search engine, but the above is a very good rule to live by.

This means that if you have a large number of pages that are only accessible by passing variables, you may be out of luck. The solution is to hide the variable/value pairs and make the link LOOK like a static one.

The first issue with SES URLs is the question mark (?). This separates the URL from the variables and is the primary marker of a dynamic page. The second issue, which does not occur with all search engines, is the variable/value separator which is an equals sign (=). The third issue is the linker between multiple variable/value pairs, the ampersand (&).

Our goal is to turn a URL that looke like this:

into one that looks like this:

A search engine will see the first URL as a dynamic one. It will see the second as a static one. In order to use this, a function needs to be written on the server side to convert the URL into something that is usable. The function below will take the static looking URL and extract variable/value pairs from it. As we are using ColdFusion, we're writing the function as a ColdFusion User Defined Function (UDF), but the concept can be translated into just about any language.
function SES()
{
var UrlVars=ReReplaceNoCase(trim(cgi.Path_Info), '.+/.cfm/? *', '');
var loopcount=ListLen(urlvars, '/');
var potential="";
if (cgi.script_name EQ cgi.path_info)
return 0;
for(i=1;i LTE loopcount; i=i+1)
{
potential=trim(listgetat(UrlVars, i, '/'));
if (REFindNoCase('^[a-z][a-z0-9_]*:.*', potential))
SetVariable('Url.'&listfirst(potential, ':'), listlast(potential,
':'));
}
return 1;
}

03: This line is the basic cleaner to remove the template being called, the domain name, etc. and leave us with any potential variable/value combinations. It assumes a slash (/) as the character between the URL and the variable value pairs (which is common).

04: Assuming a Slash (/) is also used to separate between multiple variable/value pairs, get the number of such pairs.

06: If the cgi.script_name is the same as the cgi.path_info, then drop out of the functions. There are no variable pairs here. Note that we do this after setting the variables as the use of var in a CFSCRIPT based UDF requires the vars to be above any other code. Also note that we are scoping the CGI variables. This is not really needed, but is done to save the lookup time needed by CF to see if there's a path_info variable in variables, form, url, etc. before getting to the CGI scope. The savings is almost nill, but is good practice.

07: We always return something from a UDF, even if that something will never be used. In general, a 0 is passed back as a failure code and a 1 is passed back as a success.

08: Loop over each variable/value pair to turn them into URL scope variables.

10: Get the potential variable/value pair. We call this potential as the next line of code will check if it has the necessary format to actually be a variable with a value.

11: Check if the potential variable starts with a letter followed by 0 or more letters, numbers or underscores (_). These are the only characters that are allowed in a variable (yes, in CFMX a variable can start with a dollar sign, but we're ignoring that). Note also that we're using a colon (:) as the separator between the variable and the value. To be legal, a colon has to exist. On the other hand, we allow for a variable to be passed without any value.

12: If the potential pair is legal, then separate them into the variable portion, which will have the url scope placed before it and the value. Note that I'm using the SetVariable() function for this operation. I personally feel that this is the proper way of setting a dynamic variable. The code will work just as well if done using the quoted pound sign method.

14: Again, we return something at the end of the function just to be 'proper'.

To make use of this function, just call it on any page you expect to have SES URLs on. Works like a charm and is simple to put into effect. Of course, this is the simple, standard method. If you look at a URL such as this http://www.houseoffusion.com/lists.cfm/link=m:4:33409:167612, you can see that I'm doing some more interesting things here. This serves to make the URL shorter, hides some of what's going on and requires more exact code. But that's for another time. :)

Before I end, let me take a few questions.

1. Why are you using a colon (:) to seperate between variables and values? Other systems use slashes (/) to seperate everything.
There are two reasons to use a colon (:). The first is style. There should be a clear seperation between variables and values. This is both for debugging purposes and to keep a clean style. The second reason is a more rational one. My system allows for a variable with a NULL value. Using slashes only does not.

2. This doesn't seem to work on my system! Why?
I've heard people say that in order for this to work on IIS, there's a special setting that must be turned off. In your IIS admin, go to Home Directory, Configuration and edit the extension you wish SES to work with (.cfm in our case). At the bottom of the edit screen is a checkbox that says "check that file exists". Make sure this is in the off position. I personally have never had a problem with this, but I'm trusting in the errors and solutions of others.

3. When using SES URLs, my relative paths are messed up. How do I fix this?
Because we're hiding the variables behind slashes, any relative line will be looking to start from a directory that basically does not exist. For this reason, when using SES URLs, you have to make all references to web content absolute. This means that ../images/logo.gif has to be /images/logo/gif. An annoyance, but a price most are willing to pay for the results.

4. Passing dates in a mm/dd/yy format blows up. Is there a fix?
Because we're using a slash (/) as a delimiter, a date passed in that format is seen as just more variable/value pairs (and unbalanced at that). To fix this, you have to translate the date you're sending on the URL into something else. Dashes (-) are a good alternative. Remember not to use colons (:) as they're used by the code for variable/value separators.

I'm sure there are more questions and problems and I will address them as they come up. I strongly feel that the use of this code or other code of the sort is a must to get your dynamic site indexed by the major search engines. House of Fusion is heavily indexed by just about every search engine due to this and I don't mind coding a small fix to make sure it works properly.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值