N+1 select problem

最新推荐文章于 2022-07-18 20:55:56 发布

柏油路留不下脚印

最新推荐文章于 2022-07-18 20:55:56 发布

阅读量996

点赞数

分类专栏： JPA-hibernate-SpringDataJPA database 文章标签：性能

本文链接：https://blog.csdn.net/wwq518/article/details/51407643

版权

JPA-hibernate-SpringDataJPA 同时被 2 个专栏收录

9 篇文章 0 订阅

订阅专栏

database

7 篇文章 0 订阅

订阅专栏

N+1 is one of the most famous question among java developers. It will give some performance issue if you where working on a large project. So it’s better to resolve it using some below techniques.

What is n+1 problem ?
Ans: I am taking one example to let you understand that what n+1 problem is.
Suppose you have a number of students in a college and every students have some number of books. So one to many relation is between student and books. Now suppose that you have to iterate through the collection of student and display all the books name he have. So the query will looks like this:

"select * from Students"
"select * from Books where studentId=?"

Here you have 1 select statement for the student and if you have n number of students you have to fire n more query to select the books. So at the last you have to put n+1 select statement in order to perform this operation.

Now the next question is how to solve it ?
Using join fetching(it will join the parent and children and fetch all the information in a single statement) we can be able to solve n+1 problem. Now our next query will look like this:

"from Students s join fetch s.Books b"

The N+1 query problem is a common performance antipattern. It looks like this:

$cats = load_cats();
foreach ($cats as $cat) {
  $cats_hats = load_hats_for_cat($cat);
  // ...
}

Assuming load_cats() has an implementation that boils down to:

SELECT * FROM cat WHERE ...

and load_hats_for_cat($cat) has an implementation something like this:

SELECT * FROM hat WHERE catID = ...

you will issue “N+1” queries when the code executes, where N is the number of cats:

SELECT * FROM cat WHERE ...
SELECT * FROM hat WHERE catID = 1
SELECT * FROM hat WHERE catID = 2
SELECT * FROM hat WHERE catID = 3
SELECT * FROM hat WHERE catID = 4
SELECT * FROM hat WHERE catID = 5
...

The problem with this is that each query has quite a bit of overhead. It is much faster to issue 1 query which returns 100 results than to issue 100 queries which each return 1 result. This is particularly true if your database is on a different machine which is, say, 1-2ms away on the network. In this case, issuing 100 queries serially has a minimum cost of 100-200ms, even if they can be satisfied instantly by MySQL. This is far higher than the entire server-side generation cost for most Phabricator pages should be.

Batching Queries
Fix the N+1 query problem by batching queries. Load all your data before iterating through it (this is oversimplified and omits error checking):

$cats = load_cats();
$hats = load_all_hats_for_these_cats($cats);
foreach ($cats as $cat) {
  $cats_hats = $hats[$cat->getID()];
}

That is, issue these queries:

SELECT * FROM cat WHERE ...
SELECT * FROM hat WHERE catID IN (1, 2, 3, 4, 5, ...)

In this case, the total number of queries issued is always 2, no matter how many objects there are. You’ve removed the “N” part from the page’s query plan, and are no longer paying the overhead of issuing hundreds of extra queries. This will perform much better (although, as with all performance changes, you should verify this claim by measuring it).

Hibernate:
在Session的缓存（一级）中存放的是相互关联的对象图。
以Customer对象和Order对象为例，默认情况下，当Hibernate从数据库中加载Customer对象时，会同时加载所有关联的Order对象（默认下，立即检索机制）。假定ORDERS表的CUSTOMER_ID外键允许为null。

以下Session的find()方法用于到数据库中检索所有的Customer对象：
List customerLists=session.find(“from Customer as c”);
运行以上find()方法时，Hibernate将先查询CUSTOMERS表中所有的记录，然后根据每条记录的ID，到ORDERS表中查询有参照关系的记录，Hibernate将依次执行以下select语句：
select * from CUSTOMERS;
select * from ORDERS where CUSTOMER_ID=1;
select * from ORDERS where CUSTOMER_ID=2;
select * from ORDERS where CUSTOMER_ID=3;
select * from ORDERS where CUSTOMER_ID=4;
通过以上5条select语句，Hibernate最后加载了4个Customer对象和5个Order对象，在内存中形成了一幅关联的对象图。

以Customer和Order为例，将“N+1”分解开，“N”表示N个客户，“N次select查询”表示根据N个客户来查找对应的Order所执行的select查询；“1次select查询”表示查找客户所执行的select查询。

Hibernate在检索与Customer关联的Order对象时，使用了默认的立即检索策略。这种检索策略存在两大不足：
（1）select语句的数目太多，需要频繁的访问数据库，会影响检索性能。如果需要查询n个Customer对象，那么必须执行n+1次select查询语句。这就是经典的n+1次select查询问题。这种检索策略没有利用SQL的连接查询功能。
例如以上5条select语句完全可以通过以下1条select语句来完成：
select * from CUSTOMERS left outer join ORDERS on CUSTOMERS.ID=ORDERS.CUSTOMER_ID
以上select语句使用了SQL的左外连接查询功能，能够在一条select语句中查询出CUSTOMERS表的所有记录，以及匹配的ORDERS表的记录。

（2）在应用逻辑只需要访问Customer对象，而不需要访问Order对象的场合，加载Order对象完全是多余的操作，这些多余的Order对象白白浪费了许多内存空间。

对于第一点不足可以使用SQL的连接查询功能；
对于第二点不足可以使用hibernate中的延迟加载策略。

为了解决以上问题，Hibernate提供了其他两种检索策略：延迟检索策略和迫切左外连接检索策略。延迟检索策略能避免多余加载应用程序不需要访问的关联对象，迫切左外连接检索策略则充分利用了SQL的外连接查询功能，能够减少select语句的数目。