Most of the time, you are not calculating Pi to the nth decimal or training a neural network. You are hitting a database with queries and updates and you are hitting this database hard. Correct usage of data access technologies, plus knowledge of your backend database is crucial to maintaining high performance and sustained scalability requirements. Here are some items you might want to take into account from your business layers.
Use the Latest Version of the Microsoft Data Access Components
Windows 2000 ships with Microsoft Data Access Components (MDAC) 2.5. This MDAC version includes improved functionality and other features, such as XML integration without requiring you to create a file, URL resource support, and hierarchical namespace navigation.
MDAC 2.5 is also available for Windows NT 4.0 and Windows 9x computers. See the link below for more information and downloads.
MDAC includes both the MS ODBC and OLE DB components. Use the latest version of these components as shipped in the MDAC package. Do not mix and match ODBC and OLE DB components to create the "flavor of the week." The MDAC solution is tested and deployed in a discrete package of DLLs. The key word here is tested. Other configurations have a high probability of working sporadically.
Get the latest MDAC version and information from www.microsoft.com/data/.
Close and Set Your ADO Objects to Nothing As Soon As Possible
When you use ADO objects, be sure to explicitly call the Close method and set the recordsets and connections you used to Nothing as soon as you are done with them. One of the most common coding errors in ADO is forgetting to close once you're done working with the object. While explicitly closing the object is not mandatory, doing so can be the difference between a working and failing application.
Setting ADO objects to Nothing clears up your error information stored in Err. This may bring problems if you cleanup your ADO objects in an error handler. Therefore, if you are expecting this information to be preserved, store the Number, Source, and Description in variables during the ADO object cleanup process before closing the object.
Note If you are going to return a disconnected recordset, you should not close the recordset. By closing the recordset, you will be destroying the internal cache for it, so all you need to do is put its ActiveConnection to Nothing.
To confirm that your Connection or Recordset objects are closed correctly, you can implement the following code:
If Not oMyRecordset Is Nothing Then If oMyRecordset.State <> 0 Then oMyRecordset.Close Set oMyRecordset = Nothing End If
As mentioned before, you shouldn't call the Close method in a recordset that you will return to the client.
Avoid Reusing ADO Connection Objects to Execute Many Commands
Avoid reusing open connection objects over multiple database commands. That is, open, use, and close a connection whenever needed.
If you have wrapped ADO code in a data-access object, this will not be a problem, since these wrappers perform in a stateless fashion. However, if you have ADO code in your objects, and especially if you have ADO connections stored at class level, please review the Knowledge Base article below. The main problem lies in keeping open database cursors while other things are being executed. If you are retrieving recordsets to send to the client, disconnect them from the RDBMS.
Implementing a way to reuse ADO Connection objects greatly increases the maintenance effort and sometimes leads to design mistakes that could be avoided by good encapsulation. The ADO-OLE DB layer implements its own internal connection pooling. This connection pooling makes the following flow the preferred way to access data on the server:
- Create ADO connection
- Open connection
- Use it
- Close connection
- Set ADO connection object to Nothing
This will be easy to develop, easy to maintain, as well as fast and scalable.
Use OLE DB Providers Instead of ODBC Drivers
When using OLE DB through ADO, you can choose to use a native OLE DB provider, or choose a special OLE DB provider (MSDASQL—the "OLE DB Provider for ODBC Drivers") that will transform and then forward all calls to an ODBC driver.
Use OLE DB providers that speak natively with the database if there is one available for better performance characteristics, more functionality, forward compatibility, and the participation of a vibrant OLE DB third party solutions market.
You can tell if you are using ODBC if any of the below is true:
- You are configuring a System, User, or File DSN in the ODBC snap-in to use an application.
- You have "MSDASQL" or "DSN" in your connection string.
- You do not indicate a provider in your connection string.
If you are using ODBC, see if a native provider accesses the same database with the same functionality support. Key areas to look for are Connection Pooling, Distributed Transaction support, and so on.
Use Universal Data Link (UDL) Files to Persist Connection Strings
You can use a UDL file to store database connection very much in the same way you use ODBC Data Source Names (DSNs). A UDL file stores OLE DB connection information, such as Provider, username, password, and other options. You can open your ADO connections with the information stored in this UDL file, thus letting your administrator change it as required and avoiding the need to open the registry or to use ODBC.
Make sure your package identity has read access to the file and that the right Administrators have read-write access to it (to actually edit it). In the case of a Web site, make sure your DLL and your UDL files are not in a virtual directory, which would make them directly accessible and downloadable through the Web.
To create a UDL file, create an empty file and name it with a UDL extension. You enter all the information by double-clicking the file and entering the information in the property sheets for it, as shown below.
To open an ADO connection with the UDL, use the following syntax:
oOConnection.Open "FILE Name=C:/SecureStuff/MyDataLink.UDL"
You may instruct an administrator to place the DLL and UDL on the same secure directory, in which case you do need to hardcode the full UDL file path. As seen in Figure 6, you will need to set Allow saving password to store the full connection string. You will get a warning indicating that you are storing your password in cleartext in a file. If you set the right NTFS permissions on the file, this will not be a problem.
Do not place a UDL file in a file share so that other servers can open it. If you have many servers, keep a copy of the UDL file on each one.
Figure 6. Setting Allow saving password
You might be concerned about performance; after all, this is hitting the disk, right? Actually, when using this technique the file system will cache the file in memory and stress tests show the performance degradation is negligible, if measurable at all, when comparing it to accessing System or File DSNs.
Use Stored Procedures Instead of Dynamic SQL Strings
Using stored procedures has many benefits including:
- Adds an abstraction level from the database schema, by isolating the conceptual data command from the command's implementation. Stored procedures also make the application code less coupled, allowing for easier testing.
- Provides for better code distribution. There is less logic in packing up parameters and returning result sets than in building, sending, and parsing a SQL string, plus doing all the above.
- Is more efficient in runtime since their execution plans are precompiled. Dynamic SQL strings have to include parameters to have their query plans cached optimally.
- Allows for more runtime flexibility. SQL strings compiled in your DLLs are harder to change once deployed than a stored procedure.
- Allows for better-engineered code since their parameters have declared directions and types.
- Allow for better security configuration. Having stored procedures restricted to different application/package identities or to different connection strings in different server applications is usually easier to maintain.
- If you have used stored procedures, then you will have an easier growth path if that ever needs to change.
Adding a stored procedure is easy. Just look up the CREATE PROCEDURE statement in T-SQL and follow the guidelines. If you choose to use stored procedures in the implementation of your data access objects, then be consistent for maintenance reasons. Use them either entirely or not at all. This will aid troubleshooting or feature building for later development efforts without having to second-guess the mix of dynamic SQL and Stored Procedures.
Use OUTPUT Parameters on Stored Procedures for Single-Row Return Values
When you return a single item of data or a single row from a database lookup, do so using OUTPUT parameters rather than a result. OUTPUT parameters bypasses any cursor/result set building on the server and allows for a more efficient data transmission between the components and the database.
However, depending on your architecture, you may not wish to have different code paths for single-row and multi-row queries. You must also take into consideration how dynamic your database schema is. With more strongly typed stored procedures you are coupling them more to your database. Flexibility, as usual, challenges performance.
Do Not Pass ADODB.Connection Objects Across Apartments
There are several considerations regarding cross-apartment/process marshalling of database connections. It is recommended that you do not do this. You'll see unexpected behaviors that are imposed by the limitations of marshalling and database drivers.
If you are doing this, it is probable that you are attempting to manage a pool of connections and have implemented your own connection dispenser. In most cases, you are working too hard to provide pooling functionality yourself. Usually, even the most stringent resource managers allow for different connections on different activities that happen on the same apartment. If your ODBC driver or OLE DB provider allows for pooling (which it usually does), close connections and open them as required, and let the OLE DB/ODBC layer provide the infrastructure work.
If the reason you are using cross-apartment/process marshalling is to provide for the proper connection string, then store the string in SPM as discussed earlier and recreate the connection in each method call as we have suggested.
Remember the ACID Rules
Remember the rules of transactional operations, usually described as the ACID properties: Atomicity, Consistency, Isolated, and Durable. In application design, one might forget that the Atomicity and Consistency properties of transactions require all operations to be "serializable." The action of serialization imposes locking mechanisms on all the appropriate resources. If you would like a transaction to have a larger span than the locks for it, then you probably want more than one transaction.
Entire books have been written on this subject. Just keep in mind that these concepts are tightly coupled with each other: Transactions, ACID properties, and Locks.
Principles of Transaction Processing by Philip A. Bernstein and Eric Newcomer (ISBN: 1558604154)
These books are a solid foundation for database and transaction understanding, but they are definitively oriented to a more academic community and are harder to follow.
Transaction Processing: Concepts and Techniques by Jim Gray and Andreas Reuter (ISBN: 1558601902)
Introduction to Relational Database Systems by C. J. Date (ISBN: 020154329X)
Use SetComplete and SetAbort to Vote on Transactions
When you call either of these methods, you are placing your vote on the current transaction, and telling COM+/MTS that you are ready to be deactivated (destroyed or set to Nothing in VB talk). All methods should place their vote on the current transaction. This allows for a better encapsulation of logic and therefore increases any opportunities for reusability or integration.
Voting is a good thing to do while developing your code. It forces you to consider the error cases and makes you think about resource management. Once any participating method call has voted to abort the transaction, the transaction is doomed. When the method exits, the object becomes available for reuse. There is no reason to continue any more processing of the transaction at the point of calling SetAbort.
Keep in mind this will deactivate your objects after the method call is completed and therefore will require a stateless implementation. See above for more details on the differences between stateful/statelessness.
Just before returning from your method, call GetObjectContext.SetComplete. In your error handler, call GetObjectContext.SetAbort before calling Err.Raise. This will insure that you are placing the appropriate vote regardless of the caller. Of course, more complex voting schemes may be implemented but the one just described is a good start.
Understand the Implications of Using Command.Parameters.Refresh
The ADO Command object can be used to execute parameterized stored procedures. There are two ways to let the Command object know about the parameters and their data types, direction, and other information.
The first method, .Parameters.Refresh, requires a call to the database engine to retrieve this information from the stored procedure's declaration. Note that this method requires the Command's ActiveConnection to have an open connection object or a valid connection string.
The second method is specifying the parameter information yourself. In this case, you have to know the exact names, types, and directions of the parameters being added and call Parameters.Append repeatedly to populate the collection.
When invoking the Parameters.Refresh method, you are actually executing a query against the database server for the parameter information. Take into account network traffic, run-time adaptability of your component, and proximity of the database (measured in communication cost with the DB), along with the benefits of having accurate parameter information before deciding to use the .Refresh method or not.
This decision is a compromise between run-time flexibility versus performance. Adding hard-coded values yourself will always be faster, but any other method (retrieving these values from the SPM for example) will probably be slower than issuing the .Refresh method. However, if you wrap your MDAC code in a data-access helper class, you will find that the .Refresh method allows for added runtime flexibility with the results being a less frequent recompilation and redeployment of components.
Just knowing that this facility is available to you should be an important factor in your implementation. The .Refresh method can help you achieve a good degree of encapsulation without being too tightly coupled to the backend.
Don't Use Data Environments on Server-side Components
Visual Basic 6.0 Data Environments greatly assist using ADO to perform operations on databases. However, for components intended to run under MTS or COM+, use straight ADO code, which is more stable under concurrent environments. If you have wrapped ADO in a helper class or function, development is even faster and easier than with Data Environments.
The Data Environment wraps ADO with its own policies regarding connection and object reuse. Some of the implementations the Data Environment uses were driven by client-side technology decisions. Some of these design decisions were implemented in such a way as to not be fully reliable when used as a server-side facility. If you have used the Data Environment in your distributed solutions on the server, you can expect some instability at some point in your application life cycle.
Rather than relying on the abstraction level provided by Data Environments, build a wrapper for your data access code that allows you direct manipulation of the data access components without any intermediate handlers. Pure ADO code is MTS friendly and meets the base requirements for functioning well in a multi-threaded environment.
Understand the Impact of Changing a Transaction's Isolation Level
All SQL Server transactions initiated from transactional COM+ objects are promoted to have serializable isolation specified. The default is Read Committed. This is the highest isolation level and it guarantees consistency in all scenarios.
However, this means that you will see more locks being held when your database operations are executing from transactional COM+ objects. These locks are good and important. They are in place to keep your data operations consistent. However, if consistency falls behind performance in your goals, you may evaluate tuning the isolation level. To put it another way, only resort to changing isolation levels when the cost of inconsistency is lower than the cost of low performance.
There are ways to change isolation levels as a performance optimization, but you must understand the implications of doing this, which may include incorrect data being sent to the client, unexpected calculation results, and more.
The easiest way to determine if a piece of data is eligible for a less restrictive isolation level requires that you address the full impact of exposing the application logic to inconsistent results. For some applications and use scenarios, a certain level of error or miscalculation is acceptable at the expense of accurate, but late, data.
Different isolation levels protect you against different hazards that may occur when successive read or write operations take place on the same piece of data from two transactions. For example:
- Write-After-Read (protected when using Serializable level)
- Read-After-Write (protected under Cursor Stability and Serializable levels)
- Write-After-Write (protected under Browse, Cursor Stability, and Serializable levels)
As a rule of thumb, try assessing the precision required by each command, as opposed to deciding it application-wide. If precision is not an issue, then your database access methods could be eligible for a lower isolation level and better scalability. However, you should document your decision well and explain this in your metrics.
You can change the isolation level on a command in two ways.
- Set the isolation level for the transaction. All commands executed against the connection will share it. On SQL Server, run the following line after opening a database connection:
oConn.Execute "SET TRANSACTION ISOLATION LEVEL x" Where "x" stands for the isolation level desired.
Another way to do this is to specify a hint in the actual statement or stored procedure.
- See the online SQL Server books for the WITH keyword and valid isolation level values. Other RDBMSs and resource managers have different implementations of isolation across transactions, connections, and commands.
The links under the Remember the ACID Rules section will be useful as well.