Tuning Lazy Fetching (batch size)

最新推荐文章于 2021-10-17 16:24:59 发布

sony315

最新推荐文章于 2021-10-17 16:24:59 发布

阅读量610

点赞数

分类专栏： Hibernate 文章标签： hibernate class collections database associations performance

本文链接：https://blog.csdn.net/sony315/article/details/6598031

版权

Hibernate 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

Hibernate: Tuning Lazy Fetching

At 12:10 AMon Aug 8, 2005, R.J. Lorimer wrote:

Fresh Jobs for Developers Post a job opportunity

» n-Side seeks a Software Engineer in Belgium, Louvain-la-neuve
» n-Side seeks a Software Engineer in Belgium, Louvain-la-neuve
» mekelle university seeks a developer in ethiopian instiute of technology,mekelle

Last time I talked briefly about lazy associations in Hibernate, and how they could be applied to minimize unnecessary database requests. We learned how Hibernate introduces and manages lazy associations, and how you can develop to ensure that the details of these lazy connections don't trip you up. Today I want to expand on those ideas, and learn how we can optimize the lazy fetching model. You should read this previous article: Hibernate: Understanding Lazy Fetching, otherwise today's tip won't make a dollar's worth of sense.

As I mentioned last time, the best bet whenever possible if data is known ahead-of-time to be necessary is to use joins. Joining across tables (or really in Hibernate's case, joining across objects) can dramatically improve performance in database selects. By joining, you can perform one select, as opposed to n+1 to get data from two tables.For those that aren't familiar, the n+1 selects come from the one select on the base table, and then one for each joining record in the next table. If you have ever put any timing on JDBC code, you have probably learned that the bulk of the time in database access is not, in fact, related to the amount of the data, but rather the entire processing sequence of preparing and sending the select statement itself, as well as the database processing each select individually 'in a vacuum'. Even though using a 'join' will result in the same amount of data being brought back(assuming all columns are selected), the database only has to parse a single select statement, and in addition can (potentially) optimize based on the awareness of wanting to select from multiple tables.

While database joins are probably the optimal solution for performance whenever possible (and believe me, Hibernate can do joins, quite well I might add), it isn't always worth the complexity that may be required in your database code; the performance impact may be minimal given the context that you are working with, so you choose lazy fetching. However, it can still be beneficial to find ways to tune lazy fetching to kind of get the best of both worlds.

Say you have this database structure for a veterinarian's office:

*----------------*                 *-----------------*
|      pet       |                 |      owner      |
|----------------| *             1 |-----------------|
| - id           |-----------------| - id            |
| - name         |                 | - pet_id        |
*----------------*                 | - name          |
                                   *-----------------*

Simple (overly simple perhaps), but it works for today's discussion.

Let's say you had a page that was to show all of the pets, as well as their owner's various information. While it is true that this case is *begging* for a join to be performed, remember that we are trying to see how far we can get without forcing ourselves to have to ripple the knowledge required for joins all throughout our application. Hibernate when using lazy fetching (in its default format) will run n+1 selects to give us all of the pets and owners - where n is the number of pets. So, assuming we have 3 pets and 3 owners:

*--------------------------*
| id |   name   | owner_id |
|----+----------|----------|
| 1  | Snoopy   |     2    |
| 2  | Garfield |     3    |
| 3  | Satchel  |     1    |
*--------------------------*

*---------------*
| id |   name   |
|---------------|
| 2  |   Rick   |
| 3  |   Matt   |
| 1  |   R.J.   |
*---------------*

The selects that would be eventually fired by Hibernate would look like this:

-- get all of the pets first
select * from pet 

-- get the owner for each pet returned
select * from owner where pet_id=1
select * from owner where pet_id=2
select * from owner where pet_id=3

In fact, I have run this solution locally using this test class:

package org.javalobby.tnt.hibernate.lazy;
 
import java.util.List;
import org.hibernate.*;
import com.javalobby.tnt.hibernate.*;
 
public class LazyTest {
	public static void main(String[] args) {
		Session s = HibernateSupport.currentSession();
		try {
			Query q = s.createQuery("from Pet");
			List<Pet> l = q.list();
			for(Pet p : l) {
				System.out.println("Pet: " + p.getName());
				System.out.println("Owner: " + p.getOwner().getName());
			}
		}
		finally {
			HibernateSupport.closeSession(s);
		}
	}
}

...and here is the output with some silly data on my local test class scenario (sprinkled with my log statements so you can see the order and timing of the SQL execution):

Hibernate: select pet0_.id as id, pet0_.name as name0_, pet0_.owner_id as owner3_0_ from Pet pet0_
Pet: Snoopy
Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id=?
Owner: Rick
Pet: Garfield
Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id=?
Owner: Matt
Pet: Satchel
Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id=?
Owner: R.J.

This is 4 (3+1, n=3) select statements. This is certainly not optimal. The biggest problem is that this application isn't going to scale. Before you know it, you'll have fifty registered pets, and you're executing fifty-one select statements, taking up a very noticable amount of time. Wouldn't it be nice if we could do something more like this:

-- get all of the pets first
select * from pet

-- get all owners in a single select
select * from owner where pet_id in (1, 2, 3)

Now we only have two selects, and the second one can scale much better than linearly. This is great; but how can we achieve this through Hibernate? Cases like this are often the scenarios that people attack O/R mappers over, saying they aren't smart enough and flexible enough to meet the performance demands. It turns out Hibernate provides all kinds of options in this case.

Batching Selects

The way to tell Hibernate to use the latter solution is to tell it that a certain class is batch-able. You do this by adding the batch-size attribute to either a.) the entity definition for the association being fetched (e.g. the definition for the Owner class) or b.) the collection definition on a class with a collection mapping. Here is the mapping declaration for the example above:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
      "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
          "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">

<hibernate-mapping package="com.javalobby.tnt.hibernate">
	
	<class name="Pet">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 <many-to-one name="owner" column="owner_id" class="Owner"/>
	</class>
	
	<class name="Owner" 
batch-size="50"
>
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	</class>
</hibernate-mapping>

Note the batch size which I have manually set to fifty. What a batch size means is the number of sub-elements that will be loaded at one time (the number of parameters to the 'in' clause of the SQL). If you set this number to 10, for instance, and you had 34 records to load the association for, it would load ten, ten, ten, and then four - executing 5 total select statements.

Here is the finished SQL emitted by Hibernate (sprinkled with my log statements so you can see when they were triggered again):

Hibernate: select pet0_.id as id, pet0_.name as name0_, pet0_.owner_id as owner3_0_ from Pet pet0_
Pet: Snoopy
Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id in (?, ?, ?)
Owner: Rick
Pet: Garfield
Owner: Matt
Pet: Satchel
Owner: R.J.

Let's say now, that this example gets turned on it's head, and we want to look at owners rather than pets. Owners (as our diagram above implies) are allowed to have multiple pets. We want to be able to select all owners, and then iterate over each of their pets. Let's see what Hibernate does in this scenario. Here is our new class:

package org.javalobby.tnt.hibernate.lazy;
 
import java.util.List;
import org.hibernate.*;
import com.javalobby.tnt.hibernate.*;
 
public class LazyTest {
	public static void main(String[] args) {
		Session s = HibernateSupport.currentSession();
		try {
			Query q = s.createQuery("from Owner");
			List<Owner> l = q.list();
			for(Owner owner : l) {
				System.out.println("Owner: " + owner.getName());
				for(Pet pet : owner.getPets()) {
					System.out.println("\tPet: " + pet.getName());
				}
			}
		}
		finally {
			HibernateSupport.closeSession(s);
		}
	}
}

Here is our new mapping declaration:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
      "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
          "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">

<hibernate-mapping package="com.javalobby.tnt.hibernate">
	
	<class name="Pet">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 <many-to-one name="owner" column="owner_id" class="Owner"/>
	</class>
	
	<class name="Owner" batch-size="50">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 
<set name="pets">
	 	<key column="owner_id" />
	 	<one-to-many class="Pet"/>
	 </set>

	</class>
</hibernate-mapping>

... here is some additional data just to exercise the one-to-many relationship:

*--------------------------*
| id |   name   | owner_id |
|----+----------|----------|
| 1  | Snoopy   |     2    |
| 2  | Garfield |     3    |
| 3  | Satchel  |     1    |
| 4  | Bucky    |     1    |
| 5  | Odie     |     3    |
*--------------------------*

*---------------*
| id |   name   |
|---------------|
| 2  |   Rick   |
| 3  |   Matt   |
| 1  |   R.J.   |
*---------------*

... and here is the output:

Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_
Owner: R.J.
Hibernate: 
	select 
		pets0_.owner_id as owner3___, 
		pets0_.id as id__, 
		pets0_.id as id0_, 
		pets0_.name as name0_0_, 
		pets0_.owner_id as owner3_0_0_ 
	from Pet pets0_ 
	where pets0_.owner_id=?
	Pet: Satchel
	Pet: Bucky
Owner: Rick
Hibernate: 
	select 
		pets0_.owner_id as owner3___, 
		pets0_.id as id__, 
		pets0_.id as id0_, 
		pets0_.name as name0_0_, 
		pets0_.owner_id as owner3_0_0_ 
	from Pet pets0_ 
	where pets0_.owner_id=?
	Pet: Snoopy
Owner: Matt
Hibernate: 
	select 
		pets0_.owner_id as owner3___, 
		pets0_.id as id__, 
		pets0_.id as id0_, 
		pets0_.name as name0_0_, 
		pets0_.owner_id as owner3_0_0_ 
	from Pet pets0_ 
	where pets0_.owner_id=?
	Pet: Garfield
	Pet: Odie

As we can see, we are back to a slow linear situation - it is running a select for each owner it gets back; that's really not optimal. Thankfully, collections can be batched as well - here is our new mapping declaration:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
      "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
          "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">

<hibernate-mapping package="com.javalobby.tnt.hibernate">
	
	<class name="Pet">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 <many-to-one name="owner" column="owner_id" class="Owner"/>
	</class>
	
	<class name="Owner" batch-size="50">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 <set name="pets" 
batch-size="50"
>
	 	<key column="owner_id" />
	 	<one-to-many class="Pet"/>
	 </set>
	</class>
</hibernate-mapping>

... and here is our new output:

Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_
Owner: R.J.
Hibernate: select pets0_.owner_id as owner3___, pets0_.id as id__, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id in (?, ?, ?)
	Pet: Bucky
	Pet: Satchel
Owner: Rick
	Pet: Snoopy
Owner: Matt
	Pet: Garfield
	Pet: Odie

Much better! Keep in mind that the 'batch-size' parameter has *no* bearing on how many elements inside the collection are loaded. Instead, it defines how many collections should be loaded in a single select. No matter what setting you provide, it will always retrieve 'Bucky and Satchel' in a single select statement as defined above, because they are part of the same collection. I repeat - batch size in collections defines *how many collections* will be retrieved at once.

Subselect Selection

The last form of fetching I want to cover is subselect fetching. Subselect fetching is very similar to batch size controlled fetching, which I just described, but takes the 'numerical complications' out of the equation. Subselect fetching is actually a different type of fetching strategy that is applied to collection style associations. Unlike join style fetching, however, subselect fetching is still compatible with lazy associations. The difference is that subselect fetching just gets "the whole shootin' match" as a co-worker of mine would say, rather than just a batch. In other words, it uses subselect execution to pass the ID set of the main entity set into the select off of the association table:

select * from owner
select * from pet where owner_id in (select id from owner)

This is very similar to the previous examples, but all of the burden is now put on the database; and the batch size is effectively infinity.

Here is the new mapping declaration:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC
      "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
          "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">

<hibernate-mapping package="com.javalobby.tnt.hibernate">
	
	<class name="Pet">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 <many-to-one name="owner" column="owner_id" class="Owner"/>
	</class>
	
	<class name="Owner" batch-size="50">
	 <id name="id"><generator class="native"/></id>
	 <property name="name"/>
	 <set name="pets" fetch="subselect">
	 	<key column="owner_id" />
	 	<one-to-many class="Pet"/>
	 </set>
	</class>
</hibernate-mapping>

... and here is the output:

Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_
Owner: R.J.
Hibernate: 
	select 
		pets0_.owner_id as owner3_1_, 
		pets0_.id as id1_,
		pets0_.id as id0_,
		pets0_.name as name0_0_,
		pets0_.owner_id as owner3_0_0_ 
	from Pet 
		pets0_ 
	where 
		pets0_.owner_id 
	in 
		(select owner0_.id from Owner owner0_)
		
	Pet: Satchel
	Pet: Bucky
Owner: Rick
	Pet: Snoopy
Owner: Matt
	Pet: Garfield
	Pet: Odie

Not too shabby! As you can see, even without explicitly using joins, Hibernate is able to optimize our query set quite well. Note, however, that subselect fetching is only available when processing a collection style association, and not for single-point associations.

Lazy fetching, while usually not as performant as joins, can be optimized quite well, and potentially allows for more reusability and expressiveness in your application code.