I try to understand how to correct write big queries with a lot of JOIN clause. Are this queries has the same performance?
/*Sql 1*/
SELECT G.ID, T1.QUANTITY, T2.QUANTITY
FROM GOODS G
LEFT JOIN
/*First subquery*/
(SELECT SUM(QUANTITY) QUANTITY, GOOD_ID
FROM MY_TABLE_1
GROUP BY GOOD_ID
) T1 ON G.ID = T1.GOOD_ID
LEFT JOIN
/*Second subquery*/
(SELECT SUM(QUANTITY) QUANTITY, GOOD_ID
GROUP BY GOOD_ID
FROM MY_TABLE_2
) T2 ON G.ID = T2.GOOD_ID
/*and so on....Next same subqueries*/
WHERE G.ID IN (1, 2, 3, 4);
In the next query same WHERE clause set in all joining subqueries and outside LEFT JOIN clause replace on INNER CLAUSE. Is it a good solution? Or oracle optimizer automatic make such things?
/*Sql 2*/
SELECT G.ID, T1.QUANTITY, T2.QUANTITY
FROM GOODS G
INNER JOIN
/*First subquery*/
(SELECT SUM(QUANTITY) QUANTITY, GOOD_ID
FROM GOODS G
LEFT JOIN MY_TABLE_1 M ON M.GOOD_ID = G.ID
WHERE G.ID IN (1, 2, 3, 4);
GROUP BY GOOD_ID
) T1 ON G.ID = T1.GOOD_ID
INNER JOIN
/*Second subquery*/
(SELECT SUM(QUANTITY) QUANTITY, GOOD_ID
FROM GOODS G
LEFT JOIN MY_TABLE_2 M ON M.GOOD_ID = G.ID
WHERE G.ID IN (1, 2, 3, 4);
GROUP BY GOOD_ID
) T2 ON G.ID = T2.GOOD_ID
/*and so on....Next same subqueries*/
WHERE G.ID IN (1, 2, 3, 4);
Query performance will depend on a lot of things including:
- The relative size of the tables involved
- The presence or absence of indices covering join columns and filter criteria
- The currency of the table statistics in the database
- The way that the query is written.
It is absolutely untrue that OUTER joins are faster than INNER joins (although there might be some queries for which that is true). So it is not a great use of your time to try to convert queries to use more OUTER joins based on that theory.
There are entire books written on performance optimization, but some general rules include
-
In your query, do table joins and WHERE criteria in most-selective to least-selective order (i.e. if you have a join that will reduce the size of the return set by 80%, put it up near the top of your query). The optimizer should theoretically be able to re-order the join these criteria based on table statistics, but I have found that giving it that extra help sometimes makes the difference.
-
Learn to read SQL execution plans. They will tell you exactly how the database went about fulfilling your request, and you can learn whether it missed and index (or if the database is missing an index that might be useful for performance), when it was able to use ranges, direct index hits, or full table scans, and the like.
-
It is sometimes useful, with very long-running queries, to divert some preliminary results into a temporary table, index that table appropriately, and then do the rest of your querying against that table. That is particularly true when you need to join or filter by some computed criteria that would force a table scan of a large table when you can find some strategy to pre-filter part of your result by more direct criteria.
- You write "It is absolutely untrue that OUTER joins are faster than INNER joins" and then "..to try to convert queries to use more OUTER joins.." . So what is good practice - use LEFT OUTER JOIN or try replace it on INNER JOIN ? Jul 26, 2013 at 6:20
- I agree that inner joins are much better. An outer join means that Oracle has to make rows on every row. "JOIN" without any term means "INNER JOIN". An OUTER JOIN is only to be used if you need it, as in you want a blank value if there is no matching data. Otherwise use INNER JOIN Jul 26, 2013 at 6:21
- The use of OUTER and INNER joins should be dictated almost entirely by the relationship between the tables in question, and the kind of data you're trying to extract from that relationship. I generally have a small bias towards INNER joins, since they will probably result in less intermediate IO, but you'll have to your your experience and your understanding of the data to guide you.
For best results, cut down your first table as fast as possible so that oracle doesn't create lots of extra rows to throw away. Then add that filter inside the subqueries, not after they complete. I'd try:
SELECT G.ID, T1.QUANTITY, T2.QUANTITY
FROM GOODS G
JOIN DUAL ON G.ID IN (1, 2, 3, 4)
LEFT JOIN (SELECT SUM(QUANTITY) AS QUANTITY, GOOD_ID
FROM MY_TABLE_1 T
GROUP BY GOOD_ID
) T1 ON T1.GOOD_ID = G.ID
LEFT JOIN (SELECT SUM(QUANTITY) AS QUANTITY, GOOD_ID
FROM MY_TABLE_2 T
GROUP BY GOOD_ID
) T2 ON T2.GOOD_ID = G.ID
这里原本最后的where条件应该会起作用的,不需要join dual 多此一举
1,22099 silver badges1212 bronze badges
- I don't understand for what use JOIN DUAL ON G.ID IN (1, 2, 3, 4) and condition " WHERE T.GOOD_ID = G.ID" in subqueries give error Jul 26, 2013 at 6:13
- JOIN DUAL ON G.ID IN (1, 2, 3, 4) is the same as the WHERE clause you have at the end. Doing it this way, if you have 1000's of other rows in GOODS then Oracle does not need to make row joins to all of GOODS' rows that will just be thrown away. I'm not sure why there is an error on the where clauses in the subqueries - you could remove that. The main point was to do the join dual first to only get the four IDs before continuing Jul 26, 2013 at 6:19
- In your query the last two JOIN must be LEFT JOIN Jul 26, 2013 at 6:25
- You mean that per your requirements, they need to be LEFT JOIN in order to show results even if there is no row in T1 or T2? If that is what you need, then yes, but just know that every OUTER JOIN makes your query slower. "LEFT JOIN" = "LEFT OUTER JOIN"
Optimizer takes care of most of the things. You should do following Try to join on primary key If not possible then provide indexes for columns involved in join If possible---hash JOIN 不一定适用
Filter results using where clause to reduce rows in source table before join