Tracking Visitors with ASP.NET

转自:http://www.15seconds.com/issue/021119.htm
Introduction
Click Here flex accessunit

If you have a Web site or Web application that makes money, you are probably very interested in who visits your site, how much time they spend, what they look at, and if they come back. Wayne Plourde looks at some of the ways you can track visitors/users on your site with IIS and ASP.NET without spending a fortune on costly log analysis applications.

First, I will begin with a discussion of IIS Web log files and where they fall short in providing a complete picture of your visitor's activity. Next, I will show you how to build a tracking class to store info about users while they visit. Then, I will explain some options in delivering that data to you. While doing so, I will take advantage of some of the new features of ASP.NET, particularly some new Session and Application context events.

These techniques will be suitable for low to medium traffic Web sites. Hopefully they will give you a quick leg up on understanding traffic patterns on your site before you are forced to purchase a costly Web log analysis program. For a startup or hobby Web site, this may be a prohibitive expense. Of course, if your Web site is on the scale of Amazon.com, the complexity of your customer activity is in a whole different league and the following techniques may be inadequate. In the article, I will anticipate that you have a basic understanding of the Session and Application context and the Request and Response objects, especially how they interact with cookies. In addition, you should have some familiarity with configuring ASP.NET applications with the web.config file.

Finally, we will discuss privacy -- what you need to know about the information you collect and the rights of the visitors you collected it from.

download source code

IIS Web Logs

In case you didn't know, IIS (Internet Information Server), as with most Web servers, keeps a very thorough record of every request made against it. Each time a page or an image is requested a new record is added to the log file. We know that tracking visitors and customers on our Web sites is important, but what kind of information can we learn from them? People often talk about how many hits they get on their site; however, this isn't very instructive without some additional information to accompany it.

Who is visiting the site?

Web servers can determine the IP address of each visitor. With this info, we can lookup the "owner" of the IP (usually a service provider) using a WHOIS utility and also run a reverse DNS which can provide a domain name associated with the IP, making it possible to determine the type of organization or even the country from the domain extension, such as .uk (Great Britain) or .nz (New Zealand).

By aggregating the activity on a single IP address, we can approximate the number of unique visitors on the site. I say "approximate" because many Internet users do not have a fixed IP address. Their IP is dynamically allocated by their service provider when they connect. The time a unique visitor spends on the site is called a session. ASP also helps us with this by providing a Session context while the visitor is continually active for a predetermined amount of time. Later, we will discuss how cookies can be used to keep track of visitors across sessions.

Another important piece of data about your visitor is what tool they used to get to your site. Web servers receive a user agent string identifying the application making the request - most often a browser. Sometimes, visitors can be automated tools known as spiders, crawlers or robots. These tools are generally used by search engines to crawl through your site looking for content. If search engines bring customers to your site, then you will want to know how often crawlers, like Googlebot, visit. This knowledge is critical to understanding how fresh your content is on the search engines

What did they look at?

Of course, we not only want to know what pages visitors viewed but also the order in which they were visited. This can tell you where the user entered the site and where they they left. You can also begin to see patterns. For instance, if relatively few visitors are clicking on your product page link from you home page, then it is a pretty good indication that your link is either not prominent enough or your marketing pitch needs to be stronger.

How long did they stay?

It is very useful to know the entire length of the visit in addition to the length of time between page requests. This can tell you if visitors are plowing through the site or spending some time studying the content. It is very important to understand that the HTTP protocol does not maintain a connection to the server between requests, so we must rely on the timestamps of the requests. Unfortunately, because HTTP access does not maintain a connection once the page content is retrieved, it is very difficult to know the amount of time the user looked at the last visited page.

Where did they come from?

When a visitor clicks on a link on another site that links to your site, most browsers pass this information along to the server. This link is known as a Referrer URL. Referrers can be links on other sites, links for search engines or links from ads you have placed. Once you learn to read these, the information can be quite useful. For instance, here is a typical Referrer URL:

http://www.google.com/search?q=widgets&hl=en&lr=&ie=UTF-8&start=10&sa=N

From this we can learn several things. First, we know that the visitor came from Google. Second, their search term was "widgets." Third, the start index for the Google search page was 10. Since the default page size on Google is 10 records, we know our link was on the second page of the search results.

By the way, many webmasters are stating on various forums that over 50% of their traffic is generated by Google - you must understand both the behind the scenes and front end aspects of Google's operations. Submit your site now for free:

http://www.google.com/addurl.html

It can take up to two months to get listed but if you follow our tracking hints below, you will know exactly when Googlebot crawls your site.

Did they visit before?

This is one piece of data that our log file cannot tell us, since the IP address of the visitor can change between sessions. However, with cookies, we can keep track of visitors across sessions to count how many times they visit the site. In addition, we can track the original URL they requested on their first visit and the original referrer. This can be vital to marketing feedback since many visitors will not make an impulse purchase on their first visit. They will most likely make a bookmark, shop around a bit, then return when they are ready to make the purchase. If we check the referrer when they return from a bookmark, it will be blank. So, knowing how they originally found your site can be crucial feedback to your marketing strategy.

Where Are the Log Files?

The default location for Web server log files are is under the system32 directory, typically:

C:"WINNT"system32"LogFiles"W3SVC1

You can set the location for the log files in the Web site properties dialog of the Internet Service Manager. However, they must be on the same machine.

Why are there so many log files?

The default setting for IIS Web Logs is to create a new file every day. The files are formatted in the following manner: exYYMMDD.log. So you might see a log files like ex020611.log for June 6th 2002.

You can set the time period for the log files in the Web site properties dialog of the Internet Service Manager. Your choices are hourly, daily, weekly or monthly. In addition, you can partition the log files by file size.

What's in the Log Files?

Log files contain much of the "who, what, where, when, and how" information we discussed earlier as being essential to our tracking process. There are two primary formats for the log file - the W3C Extended log format and the Microsoft IIS log format. The main difference between the two is that the W3C format uses spaces to delimit data items in each record and the Microsoft format uses a comma delimiter (which is frankly a little easier to read). The W3C format is the default setting and is most widely supported by log file analysis programs.


Figure 1 - Screen shot of typical log file.

Here are a few of the key data items in the log file.

DateDateThe date on which the activity occurred.
TimeTimeThe time the activity occurred.
Client IP Addressc-ipThe IP address of the client that accessed your server.
URI Stemcs-uri-stemThe resource accessed; for example, Default.htm.
URI Querycs-uri-queryThe query, if any, the client was trying to perform.
Http Statussc-statusThe status of the action, in HTTP terms. 200 is OK, 404 is file not found.
User Agentcs(User-Agent)The browser used on the client.
Referrercs(Referer)The previous site visited by the user. This site provided a link to the current site.

Make sure that these are activated in your Web server log properties, particularly if you are running IIS on a Windows 2000 workstation. It appears that this platform only includes a smaller subset of the data items by default. For a full list of log file properties, click on the following link:

http://www.microsoft.com/windows2000/en/server/iis/htm/core/iiintlg.htm?id=172

For more information on IIS log file configuration visit the Windows 2000 documentation online at

http://www.microsoft.com/windows2000/en/server/iis/htm/core/iilogsa.htm?id=108

Analyzing Log Files

While log files are thorough, they are not very user friendly. For instance, you may not really care that the image, bg02xyz.gif, was accessed 370 times in conjunction with other page requests. Or you may find that the pertinent data that you are looking for is spread throughout the file. This is often the case if you have a lot of simultaneous visitors on your site - their activity will be mixed together in the log, making it impossible to see session activity at a glance. This can even get nastier when your data is spread across multiple files. Also, unless you are an expert at reading user agent strings, you may have difficulty distinguishing real visitors (viable customers) from robots. Log files are just plain noisy and cluttered. For this reason, there are a number of tools available on the market for analyzing your log files. While some are expensive, there are some free ones that offer a fairly robust feature set.

Web Log analysis products

I don't want to necessarily endorse any of these products; however, here are a few that are worth investigating:

Webtrends - http://www.netiq.com/webtrends/default.asp
Sawmill - http://www.sawmill.net/
Analog (Free) - http://www.analog.cx/

Where Analysis Program fall short

Most analysis tools assume that you are getting mass quantities of visitors, so the data is aggregated and averaged. This means you usually won't be able to view activity within individual sessions. If you are just starting an eCommerce site, and you are not receiving much traffic, you may want to see exactly what path each visitor took through your site.

In addition, you want to know if visitors return. Log files cannot provide this information, so analysis programs will not be a help either.

Besides, running reports can be a lengthy process. Do you really want to take the time everyday to get some straightforward answers on the activity of your site? There must be an easier way.

What Can ASP.NET Do for Tracking?

If you have control over the code on your ASP.NET site, there is an easier way.

The Session Context

The Session context nicely encapsulates the total time a visitor is on the site. As we mentioned earlier, the nature of HTTP protocol is that is connections are not maintained. Once all the content for a page request is transferred, the connection is terminated. ASP circumnavigates this problem by maintaining a session through the use of cookies. When a visitor first enters a site, they are given a unique id in a cookie. When they return within a fixed amount of time, the cookie is returned, making it possible for the server to identify them as the same visitor. Of course, this technique is useless if a visitor's browser has cookie support turned off. My experience, however, is that this is very rare. Fortunately, though, ASP.NET provides a work around by supporting cookieless sessions.

Keep in mind that only browsers support cookies, so when robots visit your site, each request will initiate a new session.

The Session context not only can hold values across page requests, it can also first fire events on Session Start and on Session End. These will be very useful in setting up our tracking object and handling notification and reporting.

The Request Object

The Request object gives us programmatic access to many of the same data items that are stored in the log file such as the requested URL, the Referrer, the UserHostAddress (IP), and the UserAgent. In addition, the object provides a Browser Capabilities object, which can give you very detailed information about what is implied by the user agent string.

Cookies

Cookies are a small collection of name and value pairs that can be set by a server to be stored on the visitor's computer by the browser. Cookies can be set on the server with the Response object and retrieved later by the Request object. As we mentioned, ASP has always used cookies to facilitate the management of a visitor's session. You can use cookies to manage tracking data across sessions. For our tracking purposes, we will store three pieces of data in cookies: the number of visits, the original URL and referrer requested by the visitor on their first visit.

Also, cookies are great way to keep track of the activities of registered users. You can store other identifying information about a visitor or user and associate this information with your tracking data. The less anonymous your visitors are, the better able you will be to target particular demographics and eventually convert more visitors to customers. However, if you are storing sensitive data in cookies, you should look at protecting the data with some form of encryption. Cookie data is passed in the clear over non SSL connections.

Adding Comments to the IIS Log

A simple way to provide more information to the IIS log files is to append it yourself with the Response.AppendToLog method. By using Session and Application events you can place keywords that you can later use when search the files. You might even be able to train some of the analysis programs to understand you keywords and hopefully provide more meaningful stats.


There are some restrictions, however. Since the data you provide is appended to the URI Query portion of the log file, you are limited to 80 characters. In addition, you cannot use commas since they are a delimiter for some of the log file formats. Also, if you anticipate that there will be other querystring elements you may want to prepend your string with an ampersand.

Creating a Session Tracker Class in ASP.NET

Now we can finally get to some code. I have created a simple ASP.NET Web application using VB.Net. The application contains six ASP.NET Web forms that hold a menu user control to simplify navigation between the pages. Remember, our purpose to track activity.


Fig 2 - The Home Page

The key element of the example application is the SessionTracker class. The class is designed to assemble all the necessary tracking data on its own and provide that data for reporting through a number of read-only properties:

VisitCountNumber of times the visitor have visited the site
OriginalReferrerThe Referrer from the visitor's first visit
OriginalURLThe Requested URL from the visitor's first visit
SessionReferrerThe Referrer for the current visitor session
SessionURLThe Requested URL for the current visitor session
SessionUserHostAddressThe IP Address of the visitor
SessionUserAgentThe Browser or other application
BrowserA reference to the HTTPBrowserCapabilites object which provides additional information inferred from the UserAgent
Pagesan ArrayList of the names and timestamps of the ASP.NET web forms viewed during the sessions

Initializing the class


When the New constructor method is called as the class is created, we grab an instance of the current HTTPContext with the HTTPContext.Current static method call. This allows us to get access to the Request and Response objects for acquiring the request info and cookies. A reference to the HTTPContext is held as a member variable. Three helper functions are used to deal with the cookie data: incrementVisitCount, setOriginalReferrer, and setOriginalURL. We also set a default expiration time to be used with all our cookies.

Storing the SessionTracker in the Session context

Being able to create a class like the SessionTracker and store it in the Session context is one of the great advantages of ASP.NET. Of course, in ASP 3.0 we could have easily created a similar COM object; however, if it was developed with VB, we could not have saved it in the Session. In case you were not aware of the problems in saving Single Threaded Apartment (STA) components in a session or application context, read the following.

http://support.microsoft.com/default.aspx?scid=KB;en-us;q243543

Tracking Each Page View

Our next goal is to track every page that our visitors hit. This can be done easily by utilizing one of the Application event handlers in the Global.asax.

Application_PreRequestHandlerExecute

There are two events that are called on the application context before a page request begins: Application_PreRequestHandlerExecute and Application_BeginRequest. Unfortunately, the session context is not available when the Application_BeginRequest is called (I suspect this is a minor bug). For this reason we use the PreRequestHandlerExecute event. In the event, we extract the tracker from the session and pass the current URL through the addPage method.


Remember, only requests for ASPX files will initiate this event.

AddPage method

In the SessionTracker's addPage method, we receive the name of the page then create an instance of the small SessionTrackerPage class which has two public members:

PageName and the Time.

Of course, we could have accessed the current request URL string from the HTTPContext in the class; however, this approach means you can manipulate the URL as you see fit. For instance, you can crop it down to just the file name. Or replace the file name by using a Map collection to a reference a common name for the page.

Sending Notifications

Now that we have this data neatly collected, it would be nice if we had some way to see it.

Sending e-mails to notify webmaster of activity

E-mail notifications can give you are a real-time sense of the activity patterns on the site. It can also make you a little nervous when you haven't received an e-mail in a while.

The MailUtil class

To simplify the e-mail notification, I've encapsulated the functionality into a MailUtil class. Since the object does not require any state, the class has two public shared methods: SendSessionStartAlert and SendSessionEndAlert. These functions should be called from the Session_OnStart and the Session_OnEnd events.


Using configuration settings

In order to keep the MailUtil class very flexible, I used the appSetting section of the web.config file to set a number of parameters for the e-mail process. This allows you to quickly change the behavior of the notifications with having to change code.


Be sure to change the e-mail once you set up the sample application.

SendSessionStartAlert method

The SendSessionStartAlert method is passed to the SessionTracker object. From there we assemble the mail message.


Note that I use HTML for the BodyFormat. The body of the e-mail is assembled as HTML by the createTrackerMessageBody method. See the MailUtil.vb file in the downloadable source code.

One of the interesting pieces of information provided in the e-mail are the WHOIS and NSLOOKUP links in the e-mail output. You might say this is a poor man's way of researching visitors by automating links to the following public web services. http://arin.net/ http://ripe.net/ripencc/pub-services/db/whois/whois.html

ARINAmerican Registry for Internet Numbers manage the Internet numbering resources for North and South America, the Caribbean, and sub-Saharan Africa.
RIPERéseaux IP Européens - Large WHOIS database for Europe and Asia
NSLookup or Reverse DNShttp://www.zoneedit.com/lookup.html

SendSessionEndAlert method

The Session end e-mail is very similar to the Session start e-mail, except at the session end we can provide a listing of the pages visited. Fig 3 is an example of a session end notification.


Fig 3 - Session Email Notification

System.Web.Mail

The core of our e-mail functionality is from the .Net Framework's System.Web.Mail namespace. We use both the MailMessage object and the static SMTPServer object.

On some machines, you may have a problem getting the e-mail to send properly. If this is the case, make sure that the SMTP service is running. You can find the SMTP configuration in the Internet Service Manager. If it is running, then you need to make sure that the SMTP Server is configured properly to allow e-mail to be sent from the local machine. Under the "Access" tab, select Relay Restrictions. Then either add settings to allow localhost or 127.0.0.1 or check the "Allow all computers which successfully authenticate to relay, regardless of the list above" option.


Fig4 - SMTP configuration

If this is not possible, you can always specify an SMTPserver on another machine, if available.

Creating Your Own Log File

If all this e-mail stuff makes you nervous, you can always create your own log file. There are just a few considerations you should keep in mind if you decide to do so.

  • Make sure your app settings allow you to write to the file system.
  • Make sure you serialize writing to the file. In a multi-user environment, each thread must wait its turn before writing. You can accomplish this by calling Application.Lock directly before the write and Application.Unlock directly after the write.
  • Although the IIS logging function is highly optimized to have little affect on the performance of the sites, you still may want to turn it off. No need to use resources to write to two files.
  • Optimize your log file access by storing the file stream object in the application context. This way the file can remain open for faster writes.

Putting the data in a high-end database may be a better idea since it will remove some of the contention issues and allow you to create dynamic queries on the data. But whether you use a file or a database, remember that most hosting services have a limit on the amount of disk space or database space you can use. Exceeding your limits can compromise your site.

Excluding Items from Notifications and Logging

Once everything is set up, you can decide on what data to filter where. For instance, you may want to create some filter logic to only send e-mails when the visitor is a particular robot you are expecting. Everything that is not a robot can be put in the custom log file. You can also set up exclusion filters. For instance, you may want to exclude logging for traffic that comes from yourself. Or, if you use a site-monitoring tool to ensure your sites operation, you can eliminate that as well.

Changing Your Session Timeout for Testing

The sessionState element of the web.config file provides easy access to the session timeout without having to write code.


I recommend setting the timeout to 1 minute while testing the implementation of your tracking system. Be sure to change it back to your designated default before sending your site to its production host.

Establishing Privacy Policies

When you start collecting data about visitors it is important to protect your interests by disclosing to your visitors what data you are collecting and what you plan to do with the data. This is called a privacy policy. You simply need to create a page with your statement and link to it from your homepage or any forms that collect data on the site. Here is a link to JupiterMedia's (parent of 15seconds.com) privacy policy:

http://www.internet.com/corporate/privacy/privacypolicy.html

Privacy policies are especially important if your site is geared towards children. In this case, it falls under the jurisdiction of the Children's Online Privacy Protection Act, which has very strict guidelines about what you can and can't do with information you collect on your site. Here is an excellent article that gives an overview of the subject:

http://html.about.com/library/weekly/aa043001a.htm

Controlling privacy in Internet Explorer 6

Browsers are now stepping in and providing tools to alert you to privacy issues that you might not normally be aware of. Here is an overview of the many privacy features in Internet Explorer 6.

http://www.microsoft.com/windows/ie/evaluation/overview/privacy.asp

Platform for Privacy Preferences

The Platform for Privacy Preferences or P3P has established a standard for creating both natural language and XML based privacy documents. The following article from MSDN describes the process for deploying a privacy policy on your site.

http://msdn.microsoft.com/library/default.asp?url=/workshop/security/privacy/overview/createprivacypolicy.asp

P3P files

There are also a number of free P3P editors

http://www.alphaworks.ibm.com/tech/p3peditor

I have created both a privacy.htm and a privacy.xml file using the Alpha works editor and have included then into the sample web app. Once you have created the files, just link the HTML page to your home page and target the xml file in the head of each of your content pages as follows:


Deploying the Sample Project

There are just a few things you need to do.

  • Create a new virtual directory in the Internet Services Manager
  • Make sure the application has been created under the new Virtual Directory's properties dialog.
  • Expand the zip file into the directory
  • Open the solution file in Visual Studio .Net to build and run the app.

Conclusion

The more you know about your potential customers the better able you will be to convert them into paying customers. Now there is no excuse not to know who is coming to your site.

About the Author

Wayne Plourde is a consulting Software Architect who began his career as a building architect twenty years ago. In 1995, he succumbed to the call of the World Wide Web, and since then has been designing sophisticated Web-based and client-server applications for corporations around the country. Wayne holds both MCSD and SCJP certifications and is one step away from a .NET MCAD certification. You can contact Wayne at wayne@plourdenet.com or visit his Web site at http://www.plourdenet.com.

转载于:https://www.cnblogs.com/jerryhong/articles/1093193.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值