优化JDBC性能的三大技巧

最新推荐文章于 2024-08-07 07:15:00 发布

zhaofupobaoyang

最新推荐文章于 2024-08-07 07:15:00 发布

阅读量493

点赞数

分类专栏：笔记文章标签： jdbc 优化 transactions performance application database

笔记专栏收录该内容

13 篇文章 0 订阅

订阅专栏

介绍

开发一个注重性能的JDBC应用程序不是一件容易的事. 当你的代码运行很慢的时候
JDBC驱动程序并不会抛出异常告诉你.

本系列的性能提示将为改善JDBC应用程序的性能介绍一些基本的指导原则，这其
中的原则已经被许多现有的JDBC应用程序编译运行并验证过。这些指导原则包括:

正确的使用数据库MetaData方法
只获取需要的数据
选用最佳性能的功能
管理连接和更新

以下这些一般性原则可以帮助你解决一些公共的JDBC系统的性能问题.

管理连接和更新

本节的指导方针将帮助你管理连接与更新，以改善你的JDBC程序的性能。

管理连接

对于程序性能来说，连接管理很重要. 通过一次连接并执行多次Statement可以
优化你的程序，而不要使用多次连接来执行。避免在建立一个初始连接后连
到一个数据源

虽然在连接期间获取JDBC驱动的信息是一个好的习惯, 但一次性取得这些信
息比两次或多次取得会更有效. 例如, 有些应用程序建立一个连接，然后调用
另一个独立模块中的方法去取得JDBC驱动信息. 被设计为独立部分的程序应
该传入一个已创建的连接给它而不应该再自己建立一个新的连接.

另一个坏习惯就是你的程序中执行SQL语句时数次的打开和关闭连接. Connection
对象应该和多个Statement对象关联. 一个Statement对象,定义了SQL语句在内存中
的信息, 能管理多个SQL语句.
你可以借助连接池技术显著的改善性能, 对于那些跨网络和互联网的连接特别
需要这种技术. 连接池技术可以使连接可以重用. 它关闭连接时并没有关闭与数
据库的物理连接. 当程序请求一个连接时, 会自动使用一个活动的连接, 从而避
免建立新连接的网络I/O操作.

连接池技术的另一方面, JDBC 3.0也对Statement池作了规定. 跟连接池技术类似,
Statement池缓存了PreparedStatement对象，可以在没有程序干预的情况下重用缓
存的PreparedStatement对象. 例如, 程序可以会像下面的语句一样建立一个
PreparedStatement对象:

select name, address, dept, salary from personnel
where empid = ? or name like ? or address = ?"

当PreparedStatement对象被创建时, SQL查询会解析并校验语义，并生成一个
optimization plan. 在一些数据库系统中，preparedStatement的创建过程是极其影
响性能的，例如DB2. 一但prepared statement一关闭，JDBC 3.0兼容的驱动程
序会将这个prepared statement放入本地缓存而不是丢弃它. 如果稍后程序试图
使用同一SQL查询建立prepared statement, 这种情况在许多应用程序中都可能
发生, 这时驱动程序能简单从本地缓存获取相关的statement而不是透过网络连
到服务器让数据库创建.

Connection和Statement在执行之前应该作一些相应处理. 花些时间想一想Connection
的管理，以改善程序的性能和可能维护性.

管理事务的提交

由于磁盘I/O和隐含的网络I/O操作，提交事务是很慢的过程. 应该使用
WSConnection.setAutoCommit(false)的方式将Autocommit关闭.

一次提交动作究竟做了什么? 数据库服务器必须将每个数据页中包含了被
更新和新增的数据保存到磁盘. 这是一连续的写文件的操作, 说是这么说, 它却是一个磁盘I/O操作. 缺省情况下, 当连接到数据源时，Autocommit是打开的, Autocommit模式通常会削弱性能，因为每次操作会自动提交而引起大量磁盘I/O操作.

此外, 许多数据库并没有提供Autocommit模式. 对于此种类型的数据库, JDBC驱
动必须明白,对于每次操作，都要使用一个COMMIT语句和一个
BEGIN TRANSACTION句来提交.

虽然使用事务有助于提升性能, 但不要太过分依赖这个技巧. 不用事务可以减少
因防止其它用户存取某行而长时间死锁在该行上面.

选择合适的Transaction模型

许多系统支持分布式的事务处理; 这意味着, 事务跨越了多个连接. 分布式事务处
理至少比正常的慢四次，因为所有模块之间的调用引起的通讯会引发日志和网
络I/O操作(JDBC驱动, 事务监控以及数据库系统). 如非必要，应避免使用分布式
事务处理, 而尽可能的使用本地的事务处理方式. 应该注意到一些典型的Java应用
服务器已提供了一个缺省的分布式事务处理机制.

要得到最好的系统性能, 设计出来的应用程序应只使用单一的数据库连接.

使用updateXXX方法

虽然updateXXX方法没有适用于所有的应用类型, 开发者也应该尝试去使用updateXXX
和delete方法. 使用ResultSet对象的updateXXX可以让开发者在不用建立复杂的SQL语
句的情况下去更新数据. 开发者只需要简单的提供字段名就行了. 之后，如果要移动
光标位置, 要先调用updateRow()方法应用更新.

在下面的代码片断中, 使用了getInt()方法来接收ResultSet对象中的Age字段的值, 并且
使用updateInt()方法将年龄值更新为25. 最后调用updateRow()方法将更新应用到数据库.

int n = rs.getInt("Age"); // resultset中可能包含n个Age的值
 ... 
rs.updateInt("Age", 25); rs.updateRow();

除了使应用程序更容易维护之外, 使用updateXXX也可以改善程序的性能. 因为通
过Select语句之后数据库服务器已经定位在那一行上, 而不用再次定位要更新的数据
而消耗性能, 服务器通常有一个内部指针指向被用到的行(例如， ROWID).

使用getBestRowIdentifier()

使用getBestRowIdentifier()可以为要更新的数据确定Where子句中的最佳的字段标识符
. 通过隐含字段常常能最快的访问到字段, 这些字段标识符只可以通过getBestRowIdentifier()得到.

有些程序无法被设计成使用updateXXX方法. 它们可能会借助getPrimaryKeys()和调
用getIndexInfo()找出唯一性索引字段，按一定的规则生成Where子句. 这些方法会导
致相当复杂的查询.

看一下下面的例子:

ResultSet WSrs = WSs.executeQuery (
"SELECT first_name, last_name, ssn, 
address, city, state, zip FROM emp"); 
// 获取数据
 ... 
WSs.executeUpdate (
"UPDATE EMP SET ADDRESS = ? 
WHERE first_name = ? and last_name = ? 
and ssn = ? and address = ? and city = ? 
and state = ? and zip = ?"); 
// 相当复杂的查询

应用程序应该调用getBestRowIdentifier()取得最佳的字段集合(也许是一个隐含
字段) 它们标明了特定的记录. 许多数据库支持特殊字段，它们并没有显式的在
表中被用户所定义，是表的隐藏字段(例如, ROWID和TID). 这些隐含字段通常
提供能最快的存取数据, 因为它们指向了记录的精确位置. 因为隐含字段不是表
结构定义的一部分, 它们不可以从getColumns得到. 要确定隐含字段是否存在, 请调
用getBestRowIdentifier().

再来看一下前一个例子:

... ResultSet WSrowid = 
getBestRowIdentifier() 
(... "emp", ...);
 ...
 WSs.executeUpdate (
"UPDATE EMP SET ADDRESS = ? WHERE ROWID = ?"; 
// 最快的存取数据!

如果你的数据源并没有包含隐含字段, 调用getBestRowIdentifier()会得到它的
唯一性索引字段(如果这个索引存在的话). 所以, 你的程序不必通过getIndexInfo
来查找唯一性索引.

Introduction

Developing performance-oriented JDBC applications is not easy. JDBC drivers do not throw exceptions to tell you when your code is running too slow.
This series of articles presents some general guidelines for improving JDBC application performance that have been compiled by examining the JDBC implementations of numerous shipping JDBC applications. These guidelines include:

Using Database MetaData Methods Appropriately
Retrieving only Required Data
Selecting Functions that Optimize Performance
Managing Connections and Updates
Designing and Running Benchmarks for Performance

In the last article, we discussed selecting functions that optimize performance. These general rules about managing connections and updates should help you solve some additional common JDBC system performance problems.

Managing Connections and Updates

The guidelines in this section will help you to manage connections and updates to improve system performance for your JDBC applications.

Managing Connections

Connection management is important to application performance. Optimize your application by connecting once and using multiple statement objects, instead of performing multiple connections. Avoid connecting to a data source after establishing an initial connection.

Although gathering driver information at connect time is a good practice, it is often more efficient to gather it in one step rather than two steps. For example, some applications establish a connection and then call a method in a separate component that reattaches and gathers information about the driver. Applications that are designed as separate entities should pass the established connection object to the data collection routine instead of establishing a second connection.

Another bad practice is to connect and disconnect several times throughout your application to perform SQL statements. Connection objects can have multiple statement objects associated with them. Statement objects, which are defined to be memory storage for information about SQL statements, can manage multiple SQL statements.
You can improve performance significantly with connection pooling, especially for applications that connect over a network or through the World Wide Web. Connection pooling lets you reuse connections. Closing connections does not close the physical connection to the database. When an application requests a connection, an active connection is reused, thus avoiding the network I/O needed to create a new connection.

In addition to connection pooling tuning options, JDBC 3.0 also specifies semantics for providing a statement pool. Similar to connection pooling, a statement pool caches PreparedStatement objects so that they can be re-used from a cache without application intervention. For example, an application may create a PreparedStatement object similar to the following SQL statement:

select name, address, dept, salary from personnel
where empid = ? or name like ? or address = ?"

When the PreparedStatement object is created, the SQL query is parsed for semantic validation and a query optimization plan is produced. The process of creating a prepared statement is extremely expensive in terms of performance with some database systems such as DB2. Once the prepared statement is closed, a JDBC 3.0-compliant driver places the prepared statement into a local cache instead of discarding it. If the application later attempts to create a prepared statement with the same SQL query, a common occurrence in many applications, the driver can simply retrieve the associated statement from the local cache instead of performing a network roundtrip to the server and an expensive database validation.

Connection and statement handling should be addressed before implementation. Spending time and thoughtfully handling connection management improves application performance and maintainability.

Managing Commits in Transactions

Committing transactions is slow due to the result of disk I/O and potentially network I/O. Always turn Autocommit off by using the WSConnection.setAutoCommit(false) setting.

What does a commit actually involve? The database server must flush back to disk every data page that contains updated or new data. This is usually a sequential write to a journal file, but nonetheless, is a disk I/O. By default, Autocommit is on when connecting to a data source, and Autocommit mode usually impairs performance because of the significant amount of disk I/O needed to commit every operation.

Furthermore, most database servers do not provide an Autocommit mode natively. For this type of server, the JDBC driver must explicitly issue a COMMIT statement and a BEGIN TRANSACTION for every operation sent to the server. In addition to the large amount of disk input/output required to support Autocommit mode, a performance penalty is paid for up to three network requests for every statement issued by an application.

Although using transactions can help application performance, do not take this tip too far. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. Commit transactions in intervals that allow maximum concurrency.

Choosing the Right Transaction Model

Many systems support distributed transactions; that is, transactions that span multiple connections. Distributed transactions are at least four times slower than normal transactions due to the logging and network I/O necessary to communicate between all the components involved in the distributed transaction (the JDBC driver, the transaction monitor, and the database system). Unless distributed transactions are required, avoid using them. Instead, use local transactions when possible. It should be noted that many Java application servers typically provide a default transaction behavior that utilizes distributed transactions.

For the best system performance, design the application to run under a single Connection object.

Using updateXXX Methods

Although programmatic updates do not apply to all types of applications, developers should attempt to use programmatic updates and deletes. Using the updateXXX () methods of the ResultSet object allows the developer to update data without building a complex SQL statement. Instead, the developer simply supplies the column in the result set that is to be updated and the data that is to be changed. Then, before moving the cursor from the row in the result set, the updateRow() method must be called to update the database as well.

In the following code fragment, the value of the Age column of the ResultSet object rs is retrieved using the method getInt(), and the method updateInt() is used to update the column with an int value of 25. The method updateRow() is called to update the row in the database that contains the modified value.

int n = rs.getInt("Age"); 
// n contains value of Age column in the resultset rs
...
rs.updateInt("Age", 25); 
rs.updateRow();

In addition to making the application more easily maintainable, programmatic updates usually result in improved performance. Because the database server is already positioned on the row for the Select statement in process, performance-expensive operations to locate the row to be changed are not needed. If the row must be located, the server usually has an internal pointer to the row available (for example, ROWID).

Using getBestRowIdentifier()

Use getBestRowIdentifier() to determine the optimal set of columns to use in the Where clause for updating data. Pseudo-columns often provide the fastest access to the data, and these columns can only be determined by using getBestRowIdentifier().

Some applications cannot be designed to take advantage of positional updates and deletes. Some applications might formulate the Where clause by using all searchable result columns, by calling getPrimaryKeys(), or by calling getIndexInfo() to find columns that might be part of a unique index. These methods usually work, but might result in fairly complex queries.

Consider the following example:

ResultSet WSrs = WSs.executeQuery 
   ("SELECT first_name, last_name, ssn, address, city, state, zip 
   FROM emp");
// fetch data
...
WSs.executeUpdate ("UPDATE EMP SET ADDRESS = ?
   WHERE first_name = ? and last_name = ? and ssn = ? 
   and address = ? and city = ? and state = ? 
   and zip = ?");
// fairly complex query

Applications should call getBestRowIdentifier() to retrieve the optimal set of columns (possibly a pseudo-column) that identifies a specific record. Many databases support special columns that are not explicitly defined by the user in the table definition but are "hidden" columns of every table (for example, ROWID and TID). These pseudo-columns generally provide the fastest access to the data because they typically are pointers to the exact location of the record. Because pseudo-columns are not part of the explicit table definition, they are not returned from getColumns. To determine if pseudo-columns exist, call getBestRowIdentifier().

Consider the previous example again:

...
ResultSet WSrowid = getBestRowIdentifier() 
   (... "emp", ...);
...
WSs.executeUpdate ("UPDATE EMP SET ADDRESS = ?
  WHERE ROWID = ?";
// fastest access to the data!

If your data source does not contain special pseudo-columns, then the result set of getBestRowIdentifier() consists of the columns of the most optimal unique index on the specified table (if a unique index exists). Therefore, your application does not need to call getIndexInfo to find the smallest unique index.

zhaofupobaoyang

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
优化JDBC性能的三大技巧

介绍开发一个注重性能的JDBC应用程序不是一件容易的事. 当你的代码运行很慢的时候JDBC驱动程序并不会抛出异常告诉你. 本系列的性能提示将为改善JDBC应用程序的性能介绍一些基本的指导原则，这其中的原则已经被许多现有的JDBC应用程序编译运行并验证过。这些指导原则包括: 正确的使用数据库MetaData方法只获取需要的数据选用最佳性能的功
复制链接

扫一扫

专栏目录