JDBC大数据查询

最新推荐文章于 2022-11-07 20:16:33 发布

weixin_33892359

最新推荐文章于 2022-11-07 20:16:33 发布

阅读量943

点赞数

文章标签：大数据数据库 java

原文链接：https://my.oschina.net/liuyuanyuangogo/blog/330196

版权

2019独角兽企业重金招聘Python工程师标准>>>

对于所有的数据库而言，标准JAVA API中消耗jvm资源最少的结果集是：

Statement stmt =conn.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(100);//该接口仅仅是java的一个hint，至于是否实现了该方法。要根据数据库供应商的不同处理而有所不同

我们在大数据查询时，除了使用上述resultset类型，还可以使用其他辅助功能来降低大JVM内存消耗，这在不同数据库中有不同的处理方式，详情如下:

PostgreSQL9.3 JDBC 接口

http://jdbc.postgresql.org/documentation/93/index.html

官方JDBC文档：PG查询基于游标的结果集，并使用setFetchSize()来开启和关闭游标：

http://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

A small number of rows are cached on the client side of the connection and when exhausted the next block of rows is retrieved by repositioning the cursor.

Note

Cursor based ResultSets cannot be used in all situations. There a number of restrictions which will make the driver silently fall back to fetching the wholeResultSet at once.

The connection to the server must be using the V3 protocol. This is the default for (and is only supported by) server versions 7.4 and later.
The Connection must not be in autocommit mode. The backend closes cursors at the end of transactions, so in autocommit mode the backend will have closed the cursor before anything can be fetched from it. *The Statement must be created with a ResultSet type of ResultSet.TYPE_FORWARD_ONLY. This is the default, so no code will need to be rewritten to take advantage of this, but it also means that you cannot scroll backwards or otherwise jump around in the ResultSet.
The query given must be a single statement, not multiple statements strung together with semicolons.

Example 5.2. Setting fetch size to turn cursors on and off.

Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour).

// make sure autocommit is off
conn.setAutoCommit(false);
Statement st = conn.createStatement();
 
// Turn use of the cursor on.
st.setFetchSize(50);
ResultSet rs = st.executeQuery("SELECT * FROM mytable");
while(rs.next())
{
   System.out.print("a row was returned.");
}
rs.close();
 
// Turn the cursor off.
st.setFetchSize(0);
rs = st.executeQuery("SELECT * FROM mytable");
while(rs.next())
{
   System.out.print("many rows were returned.");
}
rs.close();
 
// Close the statement.
st.close();

官方邮件列表：关于PG JDBC 中setFetchSize()方法仅能用于非自动提交模式的讨论。

http://postgresql.1045698.n5.nabble.com/setFetchSize-td4935215.html

Oracle JDBC开发者手册

oracle9i：http://docs.oracle.com/cd/B10501_01/java.920/a96654/resltset.htm#1020507

oracle10g：http://docs.oracle.com/cd/B19306_01/java.102/b14355/resltset.htm#g1036317

JAVA Fetch Size

http://docs.oracle.com/cd/B10501_01/java.920/a96654/resltset.htm#1020507

默认情况下。oracle jdbc执行完一个查询后，会从数据库游标一次返回10行到结果集。10行是默认的oralce row-prefetch值。可以通过修改row-prefetch值来修改一次从数据库游标返回的数据行数(参考Oracle Row Prefetching)。

JDBC 2.0 也允许声明查询结果的一次数据库往返抓取的数据行数，这个行数被称为fetch size.在oracle jdbc中，row-prefetch值被用作statement对象的默认fetch size。设置fetch size，能够覆盖row-prefetch的设置，从而对后续使用该statement对象的执行查询起作用。

feth size也用于resultset。当statement对象执行一个查询，该statement的fetch size 会传给该查询生成的resultset对象。因此，也可以设置resultset的fetch size来覆盖statement传过来的fetch size(注意：resulset生成之后对statement设置的fetch size，对resultset不起作用)。

resultset的fetch size要么是显式的声明，要么默认等同于statement传来的 fetch size，来决定后续的resultset一次往返从数据库返回的数据行数。这包括仍然需要完成初始查询的任何往返，以及重复抓取数据到resultset。（数据可以通过显式或者隐式的方式重新获取，要更新scroll-sensitive或者scroll-insensitive/updatable类型的resultset，请参考 "Refetching Rows".)

以下方法可以用于所有的 Statement, PreparedStatement, CallableStatement, 和 ResultSet 对象，来设置和获取 fetch size:

void setFetchSize(int rows) throws SQLException

int getFetchSize() throws SQLException

要设置fetch size，请在执行查询之前，对statement对象调用setFetchSize()。如果fetch size为N，那么每次往返数据库能返回N行数据。

执行完查询后，可以对resultset对象调用setFetchSize()方法，来覆盖statement传过来的fetch size.这将会影响接下来的所有数据库往返，来为初始查询获取更多行，并且影响所有后续的数据行再次抓取(请参考"Refetching Rows".)。

Oracle Row Prefetching

http://docs.oracle.com/cd/B10501_01/java.920/a96654/oraperf.htm#1002425

Oracle JDBC drivers include extensions that allow you to set the number of rows to prefetch into the client while a result set is being populated during a query. This feature reduces the number of round trips to the server.

Note:

With JDBC 2.0, the ability to preset the fetch size has become standard functionality. For information about the standard implementation of this feature, see "Fetch Size".

Setting the Oracle Prefetch Value

Standard JDBC receives the result set one row at a time, and each row requires a round trip to the database. The row-prefetching feature associates an integer row-prefetch setting with a given statement object. JDBC fetches that number of rows at a time from the database during the query. That is, JDBC will fetch N rows that match the query criteria and bring them all back to the client at once, where N is the prefetch setting. Then, once your next() calls have run through those N rows, JDBC will go back to fetch the next N rows that match the criteria.

You can set the number of rows to prefetch for a particular Oracle statement (any type of statement). You can also reset the default number of rows that will be prefetched for all statements in your connection. The default number of rows to prefetch to the client is 10.

Set the number of rows to prefetch for a particular statement as follows:

Cast your statement object to an OracleStatement, OraclePreparedStatement, or OracleCallableStatement object, as applicable, if it is not already one of these.
Use the setRowPrefetch() method of the statement object to specify the number of rows to prefetch, passing in the number as an integer. If you want to check the current prefetch number, use thegetRowPrefetch() method of the Statement object, which returns an integer.

Set the default number of rows to prefetch for all statements in a connection, as follows:

Cast your Connection object to an OracleConnection object.
Use the setDefaultRowPrefetch() method of your OracleConnection object to set the default number of rows to prefetch, passing in an integer that specifies the desired default. If you want to check the current setting of the default, then use the getDefaultRowPrefetch() method of the OracleConnection object. This method returns an integer.
Equivalently, instead of calling setDefaultRowPrefetch(), you can set the defaultRowPrefetch Java property if you use a Java Properties object in establishing the connection. See "Specifying a Database URL and Properties Object".

Notes:

Do not mix the JDBC 2.0 fetch size API and the Oracle row-prefetching API in your application. You can use one or the other, but not both.
Be aware that setting the Oracle row-prefetch value can affect not only queries, but also: 1) explicitly refetching rows in a result set through the result set refreshRow() method available with JDBC 2.0 (relevant for scroll-sensitive/read-only, scroll-sensitive/updatable, and scroll-insensitive/updatable result sets); and 2) the "window" size of a scroll-sensitive result set, affecting how often automatic refetches are performed. The Oracle row-prefetch value will be overridden, however, by any setting of the fetch size. See "Fetch Size" for more information.

Example: Row Prefetching

The following example illustrates the row-prefetching feature. It assumes you have imported the oracle.jdbc.* interfaces.

Connection conn =  DriverManager.getConnection("jdbc:oracle:oci:","scott","tiger");//Set the default row-prefetch setting for this connection
((OracleConnection)conn).setDefaultRowPrefetch(7);
/* The following statement gets the default row-prefetch value for  the connection, that is, 7. 
*/
Statement stmt = conn.createStatement();
 /* Subsequent statements look the same, regardless of the row  prefetch value. Only execution time changes. 
*/
ResultSet rset = stmt.executeQuery("SELECT ename FROM emp"); System.out.println( rset.next () ); 
while( rset.next () )
System.out.println( rset.getString (1) ); 
//Override the default row-prefetch setting for this statement ( (OracleStatement)stmt ).setRowPrefetch (2); 
ResultSet rset = stmt.executeQuery("SELECT ename FROM emp"); System.out.println( rset.next () ); 
while( rset.next() )
System.out.println( rset.getString (1) ); 
stmt.close();

Oracle Row-Prefetching Limitations

There is no maximum prefetch setting, but empirical evidence suggests that 10 is effective. Oracle does not recommend exceeding this value in most situations. If you do not set the default row-prefetch value for a connection, 10 is the default.

A statement object receives the default row-prefetch setting from the associated connection at the time the statement object is created. Subsequent changes to the connection's default row-prefetch setting have no effect on the statement's row-prefetch setting.

If a column of a result set is of datatype LONG or LONG RAW (that is, the streaming types), JDBC changes the statement's row-prefetch setting to 1, even if you never actually read a value of either of those types.

If you use the form of the DriverManager class getConnection() method that takes a Properties object as an argument, then you can set the connection's default row-prefetch value that way. See"Specifying a Database URL and Properties Object".

Java标准setFetchSize() 与 oracle的API .setDefaultRowPrefetch() 对比

http://docs.oracle.com/cd/B10501_01/java.920/a96654/resltset.htm#1020507

Using the JDBC 2.0 fetch size is fundamentally similar to using the Oracle row-prefetch value, except that with the row-prefetch value you do not have the flexibility of distinct values in the statement object and result set object. The row prefetch value would be used everywhere.

Furthermore, JDBC 2.0 fetch size usage is portable and can be used with other JDBC drivers. Oracle row-prefetch usage is vendor-specific.

See "Oracle Row Prefetching" for a general discussion of this Oracle feature.

Note:

Do not mix the JDBC 2.0 fetch size API and the Oracle row prefetching API in your application. You can use one or the other, but not both.

MS SQL Server

http://technet.microsoft.com/en-us/library/aa342344%28SQL.90%29.aspx

当我们必须处理一个大的结果集时，首选的是java setFetchSize（）接口，但是目前以下代码对sqlserver是无效的，因为sqlserver并没有对java标准提出的setFetchSize提议作出响应。

Statement stmt = con.createStatement(ResultSet.TYPE_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(100);//该接口仅仅是java的一个hint，sqlserver并没有对此作出响应

结果集的一次载入行数与sql server jdbc驱动的 selectMethod有关。selectMethod分为direct和cursor。当使用direct时，驱动会一次性载入所有结果集到jvm内存中，因此造成了out of memory；而使用cursor时，会在服务器端创建一个cursor，因此不会占据客户端的大量内存。因此默认情况下sqlserver会一次载入所有结果，因此响应速度快，内存消耗多。

要解决out of memory的问题，方法有两个：

1.修改SQLServer jdbc的URL为selectMethod=cursor类型的：

jdbc.url=jdbc:sqlserver://127.0.0.1;instanceName=ProductDB;databaseName=product_index;selectMethod=cursor

这种方式会对整个应用程序起作用，可能引起其他普通情况的读取性能下降。

2.使用SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY类型的Statement：

Statement stmt = con.createStatement(SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY,ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(100);

这种方式会造成SQL Server的API侵入，但该设置仅对当前查询起作用，不失为一种更好的办法。

关于sqlserver的游标类型，请参考http://technet.microsoft.com/en-us/library/ms378405(v=sql.110).aspx.

3.另外，可以使用setMaxRows的方法来查询表的指定前几行，而不是把整个表的数据行全部都查询出来。

Statement/PreparedStatement setMaxRows(N)等价于SELECT TOP N SQL 语法，其中：

N<=表的总行数时，返回的结果集为表的前N行。

N>表的总行数时，返回的结果集为表的总行数，不报错。

String sql ="select id,name,data from dbo.blobtable ";          
stmt = conn.prepareStatement(sql);
stmt.setMaxRows(2);
rs = stmt.executeQuery();
//或者
String sql ="select id,name,data from dbo.blobtable ";
stmt = conn.createStatement();
stmt.setMaxRows(2);
rs = stmt.executeQuery(sql);
 
//等价于：
 
String sql ="select top 2 id,name,data from dbo.blobtable ";          
stmt = conn.prepareStatement(sql);
rs = stmt.executeQuery();
//或者
String sql ="select top 2 id,name,data from dbo.blobtable ";
stmt = conn.createStatement();
rs = stmt.executeQuery(sql);

这种方式并不适用于所有情况，需要酌情选择。

4.除了以上方式，新特性---自适应缓冲（Adaptive Buffering）可以用来降低大数据查询的资源消耗和响应时间。

http://technet.microsoft.com/en-us/library/bb879937(v=sql.110).aspx

自适应缓冲的使用方法：

Starting with the JDBC driver version 2.0, the default behavior of the driver is "adaptive". In other words, in order to get the adaptive buffering behavior, your application does not have to request the adaptive behavior explicitly. In the version 1.2 release, however, the buffering mode was "full" by default and the application had to request the adaptive buffering mode explicitly.

There are three ways that an application can request that statement execution should use adaptive buffering:

The application can set the connection property responseBuffering to "adaptive". For more information on setting the connection properties, see Setting the Connection Properties.

url property：

responseBuffering

type：

String

["full"|"adaptive"]

default：

adaptive

description：

If this property is set to "adaptive", the minimum possible data is buffered when necessary. The default mode is "adaptive" for the Microsoft SQL Server JDBC Driver version 2.0 or later.

When this property is set to "full", the entire result set is read from the server when a statement is executed. The default mode was "full" for the Microsoft SQL Server 2005 JDBC Driver version 1.2 and provides a backward compatibility with the Microsoft SQL Server 2005 JDBC Driver versions 1.0, 1.1, and 1.2.

Note Note:

After upgrading the JDBC driver from version 1.2 to 2.0, the default buffering behavior will be "adaptive." If your application has never set the "responseBuffering" property and you want to keep the version 1.2 default behavior in your application, you must set the responseBufferring propery to "full" either in the connection properties or by using the setResponseBuffering method of theSQLServerStatement object.

The application can use the setResponseBuffering method of the SQLServerDataSource object to set the response buffering mode for all connections created through that SQLServerDataSource object.
The application can use the setResponseBuffering method of the SQLServerStatement class to set the response buffering mode for a particular statement object.

When using the JDBC Driver version 1.2, applications needed to cast the statement object to a SQLServerStatement class to use the setResponseBuffering method. The code examples in the Reading Large Data Sample and Reading Large Data with Stored Procedures Sample demonstrate this old usage.

However, with the JDBC driver version 2.0, applications can use the isWrapperFor method and the unwrap method to access the vendor-specific functionality without any assumption about the implementation class hierarchy. For example code, see the Updating Large Data Sample topic.

自适应缓冲使用指南

Developers should follow these important guidelines to minimize memory usage by the application:

Avoid using the connection string property selectMethod=cursor to allow the application to process a very large result set. Starting with the Microsoft SQL Server 2005 JDBC Driver version 1.2, the adaptive buffering feature allows applications to process very large forward-only, read-only result sets without using a server cursor. Note that when you set selectMethod=cursor, all forward-only, read-only result sets produced by that connection are impacted. In other words, if your application routinely processes short result sets with a few rows, creating, reading, and closing a server cursor for each result set will use more resources on both client-side and server-side than is the case where the selectMethod is not set to cursor.
Read large text or binary values as streams by using the getAsciiStream, the getBinaryStream, or the getCharacterStream methods instead of the getBlobor the getClob methods. Starting with the version 1.2 release, the SQLServerCallableStatement class provides new get<Type>Stream methods for this purpose.
Ensure that columns with potentially large values are placed last in the list of columns in a SELECT statement and that the get<Type>Stream methods of theSQLServerResultSet are used to access the columns in the order they are selected.
Ensure that OUT parameters with potentially large values are declared last in the list of parameters in the SQL used to create the SQLServerCallableStatement. In addition, ensure that the get<Type>Stream methods of the SQLServerCallableStatement are used to access the OUT parameters in the order they are declared.
Avoid executing more than one statement on the same connection simultaneously. Executing another statement before processing the results of the previous statement may cause the unprocessed results to be buffered into the application memory.
There are some cases where using selectMethod=cursor instead of responseBuffering=adaptive would be more beneficial, such as:
In both cases, you need to consider the overhead of creating, reading, and closing the server cursors.
- If your application processes a forward-only, read-only result set slowly, such as reading each row after some user input, using selectMethod=cursorinstead of responseBuffering=adaptive might help reduce resource usage by SQL Server.
- If your application processes two or more forward-only, read-only result sets at the same time on the same connection, using selectMethod=cursorinstead of responseBuffering=adaptive might help reduce the memory required by the driver while processing these result sets.

In addition, the following list provides some recommendations for scrollable and forward-only updatable result sets:

For scrollable result sets, when fetching a block of rows the driver always reads into memory the number of rows indicated by the getFetchSize method of theSQLServerResultSet object, even when the adaptive buffering is enabled. If scrolling causes an OutOfMemoryError, you can reduce the number of rows fetched by calling the setFetchSize method of the SQLServerResultSet object to set the fetch size to a smaller number of rows, even down to 1 row, if necessary. If this does not prevent an OutOfMemoryError, avoid including very large columns in scrollable result sets.
For forward-only updatable result sets, when fetching a block of rows the driver normally reads into memory the number of rows indicated by thegetFetchSize method of the SQLServerResultSet object, even when the adaptive buffering is enabled on the connection. If calling the next method of theSQLServerResultSet object results in an OutOfMemoryError, you can reduce the number of rows fetched by calling the setFetchSize method of theSQLServerResultSet object to set the fetch size to a smaller number of rows, even down to 1 row, if necessary. You can also force the driver not to buffer any rows by calling the setResponseBuffering method of the SQLServerStatement object with "adaptive" parameter before executing the statement. Because the result set is not scrollable, if the application accesses a large column value by using one of the get<Type>Stream methods, the driver discards the value as soon as the application reads it just as it does for the forward-only read-only result sets.

【博客】What is adaptive response buffering and why should I use it?

http://blogs.msdn.com/b/jdbcteam/archive/2007/05/02/what-is-adaptive-response-buffering-and-why-should-i-use-it.aspx

MySQL

http://dev.mysql.com/doc/refman/5.0/es/connector-j-reference-implementation-notes.html

ResultSet
By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and can not allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.
To enable this functionality, you need to create a Statement instance in the following manner:
```
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
```
The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this any result sets created with the statement will be retrieved row-by-row.
There are some caveats with this approach. You will have to read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.
The earliest the locks these statements hold can be released (whether they be MyISAM table-level locks or row-level locks in some other storage engine such as InnoDB) is when the statement completes.
If the statement is within scope of a transaction, then locks are released when the transaction completes (which implies that the statement needs to complete first). As with most other databases, statements are not complete until all the results pending on the statement are read or the active result set for the statement is closed.
Therefore, if using streaming results, you should process them as quickly as possible if you want to maintain concurrent access to the tables referenced by the statement producing the result set.
Statement
When using versions of the JDBC driver earlier than 3.2.1, and connected to server versions earlier than 5.0.3, the setFetchSize() method has no effect, other than to toggle result set streaming as described above.

通过以上介绍可知，fetch size的设置影响到jvm内存消耗和查询的响应时间，设置fetch size时要权衡两者。一般jdbc默认的fetch size是10，这个值是比较小的，可以防查询时out of momory的问题，但是程序的响应时间会变长。字段多的表或者行数大的表需要一个小的fetch size值来降低查询结果集的jvm内存消耗；但是，字段少或者行数据小的表需要一个大的fetch size值来降低查询结果集的响应时间。

其他资料：http://webmoli.com/2009/02/01/jdbc-performance-tuning-with-optimal-fetch-size/

转载于:https://my.oschina.net/liuyuanyuangogo/blog/330196