代写BA,DS和A&F专业的R和python作业请私信
文章目录
- 1. We'll take the CASE
- 1.1 We'l take the CASE (video)
- 1.2 Basic CASE statements
- 1.3 CASE statements comparing column values
- 1.4 CASE statements comparing two column values part 2
- 1.5 In CASE things get more complex (video)
- 1.6 In CASE of rivalry
- 1.7 Filtering your CASE statement
- 1.8 CASE WHEN with aggregate functions (video)
- 1.9 COUNT using CASE WHEN
- 1.10 COUNT and CASE WHEN with multiple conditions
- 1.11 Calculating percent with CASE and AVG
- 2. Short and Simple Subqueries
- 2.1 WHERE are the Subqueries? (video)
- 2.2 Filtering using scalar subqueries
- 2.3 Filtering using a subquery with a list
- 2.4 Filtering with more complex subquery conditions
- 2.5 Subqueries in FROM (video)
- 2.6 Joining Subqueries in FROM
- 2.7 Building on Subqueries in FROM
- 2.8 Subqueries in SELECT (video)
- 2.9 Add a subquery to the SELECT clause
- 2.10 Subqueries in Select for Calculations
- 2.11 Subqueries everywhere! And best practices! (video)
- 2.12 ALL the Subqueries EVERYWHERE
- 2.13 Add a subquery in FROM
- 2.14 Add a subquery in SELECT
- 3. Correlated Queries, Nested Queries, and Common Table Expressions
- 3.1 Correlated Subqueries (video)
- 3.2 Basic Correlated Subqueries
- 3.3 Correlated subquery with multiple conditions
- 3.4 Nested subqueries (video)
- 3.5 Nested simple subqueries
- 3.6 Nest a subquery in FROM
- 3.7 Common Table Expressions (video)
- 3.8 Clean up with CTEs
- 3.9 Organizing with CTEs
- 3.10 CTEs with nested subqueries
- 3.11 Deciding on techniques to use (video)
- 3.12 Get team names with a subquery
- 3.13 Get team names with correlated subqueries
- 3.14 Get team names with CTEs
- 3.15 Which technique to use?
- 4. Window Functions
- 4.1 It's OVER (video)
- 4.2 The match is OVER
- 4.3 What's OVER here?
- 4.4 Flip OVER your results
- 4.5 OVER with a PARTITION (video)
- 4.6 PARTITION BY a column
- 4.7 PARTITION BY multiple columns
- 4.8 Sliding Windows (video)
- 4.9 Slide to the left
- 4.10 Slide to the right
- 4.11 Bringing it all together (video)
- 4.12 Setting up the home team CTE
- 4.13 Setting up the away team CTE
- 4.14 Putting the CTEs together
- 4.15 Add a window function
1. We’ll take the CASE
1.1 We’l take the CASE (video)
1.2 Basic CASE statements
What is your favorite team?
The European Soccer Database contains data about 12,800 matches from 11 countries played between 2011-2015! Throughout this course, you will be shown filtered versions of the tables in this database in order to better explore their contents.
In this exercise, you will identify matches played between FC Schalke 04 and FC Bayern Munich. There are 2 teams identified in each match in the hometeam_id
and awayteam_id
columns, available to you in the filtered matches_germany
table. ID can join to the team_api_id
column in the teams_germany
table, but you cannot perform a join on both at the same time.
However, you can perform this operation using a CASE
statement once you’ve identified the team_api_id
associated with each team!
Instruction 1
- Select the team’s long name and API id from the
teams_germany
table. - Filter the query for FC Schalke 04 and FC Bayern Munich using
IN
, giving you theteam_api_IDs
needed for the next step.
SELECT
-- Select the team long name and team API id
team_long_name,
team_api_id
FROM teams_germany
-- Only include FC Schalke 04 and FC Bayern Munich
WHERE
team_long_name IN ('FC Schalke 04',
'FC Bayern Munich');
Instruction 2
- Create a
CASE
statement that identifies whether a match in Germany includedFC Bayern Munich
,FC Schalke 04
, or neither as the home team. - Group the query by the
CASE
statement alias,home_team
.
-- Identify the home team as Bayern Munich, Schalke 04, or neither
SELECT
CASE WHEN hometeam_id = 9789 THEN 'FC Schalke 04'
WHen hometeam_id = 9823 THEN 'FC Bayern Munich'
ELSE 'Other' END AS home_team,
COUNT(id) AS total_matches
FROM matches_germany
-- Group by the CASE statement alias
GROUP BY home_team;
1.3 CASE statements comparing column values
Barcelona is considered one of the strongest teams in Spain’s soccer league.
In this exercise, you will be creating a list of matches in the 2011/2012 season where Barcelona was the home team. You will do this using a CASE statement that compares the values of two columns to create a new group – wins, losses, and ties.
In 3 steps, you will build a query that identifies a match’s winner, identifies the identity of the opponent, and finally filters for Barcelona as the home team. Completing a query in this order will allow you to watch your results take shape with each new piece of information.
The matches_spain
table currently contains Barcelona’s matches from the 2011/2012 season, and has two key columns, hometeam_id
and awayteam_id
, that can be joined with the teams_spain
table. However, you can only join teams_spain
to one column at a time.
Instruction 1
- Select the
date
of the match and create aCASE
statement to identify matches as home wins, home losses, or ties.
SELECT
-- Select the date of the match
date,
-- Identify home wins, losses, or ties
CASE WHEN home_goal > away_goal THEN 'Home win!'
WHEN home_goal < away_goal THEN 'Home loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain;
Instruction 2:
- Left join the
teams_spain
tableteam_api_id
column to thematches_spain
tableawayteam_id
. This allows us to retrieve the away team’s identity. - Select
team_long_name
fromteams_spain
asopponent
and complete theCASE
statement from Step 1.
SELECT
m.date,
--Select the team long name column and call it 'opponent'
t.team_long_name AS opponent,
-- Complete the CASE statement with an alias
CASE WHEN m.home_goal > away_goal THEN 'Home win!'
WHEN m.home_goal < away_goal THEN 'Home loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain AS m
-- Left join teams_spain onto matches_spain
LEFT JOIN teams_spain AS t
ON m.awayteam_id = t.team_api_id;
Instruction 3
- Complete the same
CASE
statement as the previous steps. - Filter for matches where the home team is FC Barcelona (id =
8634
).
SELECT
m.date,
t.team_long_name AS opponent,
-- Complete the CASE statement with an alias
CASE WHEN m.home_goal > away_goal THEN 'Barcelona win!'
WHEN m.home_goal < away_goal THEN 'Barcelona loss :('
ELSE 'Tie' END AS outcome
FROM matches_spain AS m
LEFT JOIN teams_spain AS t
ON m.awayteam_id = t.team_api_id
-- Filter for Barcelona as the home team
WHERE m.hometeam_id = 8634;
1.4 CASE statements comparing two column values part 2
Instruction
- Complete the
CASE
statement to identify Barcelona’s away team games (id = 8634)
as wins, losses, or ties. - Left join the
teams_spain
tableteam_api_id
column on thematches_spain
tablehometeam_id
column. This retrieves the identity of the home team opponent. - Filter the query to only include matches where Barcelona was the away team.
-- Select matches where Barcelona was the away team
SELECT
m.date,
t.team_long_name AS opponent,
CASE WHEN m.home_goal < m.away_goal ThEN'Barcelona win!'
WHEN m.home_goal > m.away_goal THEN'Barcelona loss :('
ELSE 'Tie' END AS outcome
FROM
matches_spain AS m
-- Join teams_spain to matches_spain
LEFT JOIN
teams_spain AS t
ON
m.hometeam_id = t.team_api_id
WHERE
m.awayteam_id = 8634;
1.5 In CASE things get more complex (video)
1.6 In CASE of rivalry
Barcelona and Real Madrid have been rival teams for more than 80 years. Matches between these two teams are given the name El Clásico (The Classic). In this exercise, you will query a list of matches played between these two rivals.
You will notice in Step 2 that when you have multiple logical conditions in a CASE statement, you may quickly end up with a large number of WHEN clauses to logically test every outcome you are interested in. It’s important to make sure you don’t accidentally exclude key information in your ELSE clause.
In this exercise, you will retrieve information about matches played between Barcelona (id = 8634
) and Real Madrid (id = 8633
). Note that the query you are provided with already identifies the Clásico matches using a filter in the WHERE
clause.
Instruction 1
- Complete the first
CASE
statement, identifying Barcelona or Real Madrid as the home team using thehometeam_id
column. - Complete the second
CASE
statement in the same way, usingawayteam_id
.
SELECT
date,
-- Identify the home team as Barcelona or Real Madrid
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END AS home,
-- Identify the away team as Barcelona or Real Madrid
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END AS away
FROM
matches_spain
WHERE
(awayteam_id = 8634 OR hometeam_id = 8634) AND
(awayteam_id = 8633 OR hometeam_id = 8633);
Instruction 2
- Construct the final
CASE
statement identifying who won each match. Note there are 3 possible outcomes, but 5 conditions that you need to identify. - Fill in the logical operators to identify Barcelona or Real Madrid as the winner.
SELECT
date,
CASE WHEN hometeam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as home,
CASE WHEN awayteam_id = 8634 THEN 'FC Barcelona'
ELSE 'Real Madrid CF' END as away,
-- Identify all possible match outcomes
CASE WHEN home_goal > away_goal AND
hometeam_id = 8634
THEN 'Barcelona win!'
WHEN home_goal > away_goal AND
hometeam_id = 8633
THEN 'Real Madrid win!'
WHEN home_goal < away_goal AND
awayteam_id = 8634
THEN 'Barcelona win!'
WHEN home_goal < away_goal AND
awayteam_id = 8633
THEN 'Real Madrid win!'
ELSE 'Tie!' END AS outcome
FROM
matches_spain
WHERE (awayteam_id = 8634 OR hometeam_id = 8634) AND
(awayteam_id = 8633 OR hometeam_id = 8633);
1.7 Filtering your CASE statement
Let’s generate a list of matches won by Italy’s Bologna team! There are quite a few additional teams in the two tables, so a key part of generating a usable query will be using your CASE
statement as a filter in the WHERE
clause.
CASE
statements allow you to categorize data that you’re interested in – and exclude data you’re not interested in. In order to do this, you can use a CASE statement as a filter in the WHERE statement to remove output you don’t want to see.
Here is how you might set that up:
SELECT *
FROM table
WHERE
CASE WHEN a > 5 THEN 'Keep'
WHEN a <= 5 THEN 'Exclude' END = 'Keep';
In essence, you can use the CASE statement as a filtering column like any other column in your database. The only difference is that you don’t alias the statement in WHERE.
Instruction 1
- Identify
Bologna
’s team ID listed in theteams_italy
table by selecting theteam_long_name
andteam_api_id
.
-- Select team_long_name and team_api_id from team
SELECT
team_long_name,
team_api_id
FROM teams_italy
-- Filter for team name
WHERE team_long_name = 'Bologna';
Instruction 2
- Select the
season
anddate
that a match was played. - Complete the
CASE
statement so that only Bologna’s home and away wins are identified.
-- Select the season and date columns
SELECT
season,
date,
-- Identify when Bologna won a match
CASE WHEN hometeam_id = 9857 AND
home_goal > away_goal THEN 'Bologna Win'
WHEN awayteam_id = 9857 AND
away_goal > home_goal THEN 'Bologna Win'
END AS outcome
FROM matches_italy;
Instruction 3
- Select the
home_goal
andaway_goal
for each match. - Use the
CASE
statement in theWHERE
clause to filter allNULL
values generated by the statement in the previous step.
-- Select the season, date, home_goal, and away_goal columns
SELECT
season,
date,
home_goal,
away_goal
FROM
matches_italy
WHERE
-- Exclude games not won by Bologna
CASE WHEN hometeam_id = 9857 AND
home_goal > away_goal THEN 'Bologna Win'
WHEN awayteam_id = 9857 AND
away_goal > home_goal THEN 'Bologna Win'
END IS NOT NULL;
1.8 CASE WHEN with aggregate functions (video)
1.9 COUNT using CASE WHEN
Instruction 1
- Create a
CASE
statement that identifies theid
of matches played in the 2012/2013 season. Specify that you wantELSE
values to beNULL
. - Wrap the
CASE
statement in aCOUNT
function and group the query by thecountry
alias.
SELECT
c.name AS country,
-- Count games from the 2012/2013 season
COUNT(CASE WHEN m.season = '2012/2013'
THEN m.id
ELSE NULL END) AS matches_2012_2013
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
Instruction 2
- Create 3
CASE WHEN
statements counting the matches played in each country across the 3 seasons. END
yourCASE
statement without anELSE
clause.
SELECT
c.name AS country,
-- Count matches in each of the 3 seasons
COUNT(CASE WHEN m.season = '2012/2013'
THEN m.id END) AS matches_2012_2013,
COUNT(CASE WHEN m.season = '2013/2014'
THEN m.id END) AS matches_2013_2014,
COUNT(CASE WHEN m.season = '2014/2015'
THEN m.id END) AS matches_2014_2015
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
1.10 COUNT and CASE WHEN with multiple conditions
Instruction
- Create 3
CASE
statements to “count” matches in the'2012/2013'
,'2013/2014'
, and'2014/2015'
seasons, respectively. - Have each
CASE
statement return a1
for every match you want to include, and a0
for every match to exclude. - Wrap the
CASE
statement in aSUM
to return the total matches played in each season. - Group the query by the country name alias.
SELECT
c.name AS country,
-- Sum the total records in each season where the home team won
SUM(CASE WHEN m.season = '2012/2013' AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2012_2013,
SUM(CASE WHEN m.season = '2013/2014' AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2013_2014,
SUM(CASE WHEN m.season = '2014/2015' AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2014_2015
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
-- Group by country name alias
GROUP BY country;
1.11 Calculating percent with CASE and AVG
Instruction
- Create 3
CASE
statements toCOUNT
the total number of home team wins, away team wins, and ties. This will allow you to examine the total number of records. You will convert this to anAVG
in the next step.
SELECT
c.name AS country,
-- Sum the home wins, away wins, and ties in each country
COUNT(CASE WHEN m.home_goal > m.away_goal
THEN m.id END) AS home_wins,
COUNT(CASE WHEN m.home_goal < m.away_goal
THEN m.id END) AS away_wins,
COUNT(CASE WHEN m.home_goal = m.away_goal
THEN m.id END) AS ties
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
Instruction 2
- Calculate the percentage of matches tied using a
CASE
statement insideAVG
. - Fill in the logical operators for each statement. Alias your columns as
ties_2013_2014
andties_2014_2015
, respectively.
SELECT
c.name AS country,
-- Calculate the percentage of tied games in each season
AVG(CASE WHEN m.season='2013/2014' AND
m.home_goal = m.away_goal THEN 1
WHEN m.season='2013/2014' AND
m.home_goal != m.away_goal THEN 0 END) AS ties_2013_2014,
AVG(CASE WHEN m.season='2014/2015' AND
m.home_goal = m.away_goal THEN 1
WHEN m.season='2014/2015' AND
m.home_goal != m.away_goal THEN 0 END) AS ties_2014_2015
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
Instruction 3
- The previous “ties” columns returned values with 14 decimal points, which is not easy to interpret. Use the
ROUND
function to round to 2 decimal points.
SELECT
c.name AS country,
-- Round the percentage of tied games to 2 decimal points
ROUND(AVG(CASE WHEN m.season='2013/2014' AND
m.home_goal = m.away_goal THEN 1
WHEN m.season='2013/2014' AND
m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2013_2014,
ROUND(AVG(CASE WHEN m.season='2014/2015' AND
m.home_goal = m.away_goal THEN 1
WHEN m.season='2014/2015' AND
m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2014_2015
FROM country AS c
LEFT JOIN matches AS m
ON c.id = m.country_id
GROUP BY country;
2. Short and Simple Subqueries
2.1 WHERE are the Subqueries? (video)
2.2 Filtering using scalar subqueries
Instruction 1:
- Calculate triple the average home + away goals scored across all matches. This will become your subquery in the next step. Note that this column does not have an alias, so it will be called
?column?
in your results.
-- Select the average of home + away goals, multiplied by 3
SELECT
3 * AVG(home_goal + away_goal)
FROM matches_2013_2014;
Instruction 2:
SELECT
-- Select the date, home goals, and away goals scored
date,
home_goal,
away_goal
FROM matches_2013_2014
-- Filter for matches where total goals exceeds 3x the average
WHERE (home_goal + away_goal) >
(SELECT 3 * AVG(home_goal + away_goal)
FROM matches_2013_2014);
2.3 Filtering using a subquery with a list
Instruction:
- Create a subquery in the
WHERE
clause that retrieves all uniquehometeam_ID
values from thematch
table. - Select the
team_long_name
andteam_short_name
from theteam
table. Exclude all values from the subquery in the main query.
SELECT
-- Select the team long and short names
team_long_name,
team_short_name
FROM team
-- Exclude all values from the subquery
WHERE team_api_id NOT IN
(SELECT DISTINCT hometeam_ID FROM match);
2.4 Filtering with more complex subquery conditions
Instruction:
- Create a subquery in
WHERE
clause that retrieves allhometeam_ID
values from match with ahome_goal
score greater than or equal to 8. - Select the
team_long_name
andteam_short_name
from theteam
table. Include all values from the subquery in the main query.
SELECT
-- Select the team long and short names
team_long_name,
team_short_name
FROM team
-- Filter for teams with 8 or more home goals
WHERE team_api_id IN
(SELECT hometeam_ID
FROM match
WHERE home_goal >= 8);
2.5 Subqueries in FROM (video)
2.6 Joining Subqueries in FROM
- Create the subquery to be used in the next step, which selects the country ID and match ID (
id
) from thematch
table. - Filter the query for matches with greater than or equal to 10 goals.
Instruction 1:
SELECT
-- Select the country ID and match ID
country_id,
id
FROM match
-- Filter for matches with 10 or more goals in total
WHERE (home_goal + away_goal) >= 10;
Instruction 2:
SELECT
-- Select country name and the count match IDs
c.name AS country_name,
COUNT(sub.id) AS matches
FROM country AS c
-- Inner join the subquery onto country
-- Select the country id and match id columns
INNER JOIN (SELECT country_id, id
FROM match
-- Filter the subquery by matches with 10+ goals
WHERE (home_goal + away_goal) >= 10) AS sub
ON c.id = sub.country_id
GROUP BY country_name;
2.7 Building on Subqueries in FROM
Instruction :
SELECT
-- Select country, date, home, and away goals from the subquery
country,
date,
home_goal,
away_goal
FROM
-- Select country name, date, and total goals in the subquery
(SELECT c.name AS country,
m.date,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) AS total_goals
FROM match AS m
LEFT JOIN country AS c
ON m.country_id = c.id) AS subq
-- Filter by total goals scored in the main query
WHERE total_goals >= 10;
2.8 Subqueries in SELECT (video)
2.9 Add a subquery to the SELECT clause
Instruction:
- In the subquery, select the average total goals by adding
home_goal
andaway_goal
. - Filter the results so that only the average of goals in the 2013/2014 season is calculated.
- In the main query, select the average total goals by adding
home_goal
andaway_goal
. This calculates the average goals for each league. - Filter the results in the main query the same way you filtered the subquery. Group the query by the league name.
SELECT
l.name AS league,
-- Select and round the league's total goals
ROUND(AVG(m.home_goal + m.away_goal), 2) AS avg_goals,
-- Select & round the average total goals for the season
(SELECT round(AVG(home_goal + away_goal), 2)
FROM match
WHERE season = '2013/2014') AS overall_avg
FROM league AS l
LEFT JOIN match AS m
ON l.country_id = m.country_id
-- Filter for the 2013/2014 season
WHERE season = '2013/2014'
GROUP BY league;
2.10 Subqueries in Select for Calculations
Instruction:
- Select the average goals scored in a match for each league in the main query.
- Select the average goals scored in a match overall for the 2013/2014 season in the subquery.
- Subtract the subquery from the average number of goals calculated for each league.
- Filter the main query so that only games from the 2013/2014 season are included.
SELECT
-- Select the league name and average goals scored
l.name AS league,
ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
-- Subtract the overall average from the league average
ROUND(AVG(m.home_goal + m.away_goal) -
(SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2013/2014'),2) AS diff
FROM league AS l
LEFT JOIN match AS m
ON l.country_id = m.country_id
-- Only include 2013/2014 results
WHERE m.season = '2013/2014'
GROUP BY l.name;
2.11 Subqueries everywhere! And best practices! (video)
2.12 ALL the Subqueries EVERYWHERE
Instruction:
- Extract the average number of home and away team goals in two
SELECT
subqueries. - Calculate the average home and away goals for the specific stage in the main query.
- Filter both subqueries and the main query so that only data from the 2012/2013 season is included.
- Group the query by the
m.stage
column.
SELECT
-- Select the stage and average goals for each stage
m.stage,
ROUND(AVG(m.home_goal + m.away_goal),2) AS avg_goals,
-- Select the average overall goals for the 2012/2013 season
ROUND((SELECT AVG(home_goal + away_goal)
FROM match
WHERE season = '2012/2013'),2) AS overall
FROM match AS m
-- Filter for the 2012/2013 season
WHERE m.season = '2012/2013'
-- Group by stage
GROUP BY stage;
2.13 Add a subquery in FROM
Instruction:
- Calculate the average home goals and average away goals from the match table for each stage in the
FROM
clause subquery. - Add a subquery to the
WHERE
clause that calculates the overall average home goals. - Filter the main query for stages where the average home goals is higher than the overall average.
- Select the
stage
andavg_goals
columns from thes
subquery into the main query.
SELECT
-- Select the stage and average goals from the subquery
s.stage,
ROUND(s.avg_goals,2) AS avg_goals
FROM
-- Select the stage and average goals in 2012/2013
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE
-- Filter the main query using the subquery
s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match WHERE season = '2012/2013');
2.14 Add a subquery in SELECT
Instruction:
- Create a subquery in
SELECT
that yields the average goals scored in the 2012/2013 season. Name the new columnoverall_avg
. - Create a subquery in
FROM
that calculates the average goals scored in each stage during the 2012/2013 season. - Filter the main query for stages where the average goals exceeds the overall average in 2012/2013.
SELECT
-- Select the stage and average goals from s
s.stage,
ROUND(s.avg_goals,2) AS avg_goal,
-- Select the overall average for 2012/2013
(SELECT AVG(home_goal + away_goal) FROM match WHERE season = '2012/2013') AS overall_avg
FROM
-- Select the stage and average goals in 2012/2013 from match
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = '2012/2013'
GROUP BY stage) AS s
WHERE
-- Filter the main query using the subquery
s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match WHERE season = '2012/2013');
3. Correlated Queries, Nested Queries, and Common Table Expressions
3.1 Correlated Subqueries (video)
3.2 Basic Correlated Subqueries
Instruction:
- Select the
country_id
,date
,home_goal
, andaway_goal
columns in the main query. - Complete the
AVG
value in the subquery. - Complete the subquery column references, so that
country_id
is matched in the main and subquery.
SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter the main query by the subquery
(home_goal + away_goal) >
(SELECT AVG((sub.home_goal + sub.away_goal) * 3)
FROM match AS sub
-- Join the main query to the subquery in WHERE
WHERE main.country_id = sub.country_id);
3.3 Correlated subquery with multiple conditions
- Select the
country_id
,date
,home_goal
, andaway_goal
columns in the main query. - Complete the subquery: Select the matches with the highest number of total goals.
- Match the subquery to the main query using
country_id
andseason
. - Fill in the correct logical operator so that total goals equals the max goals recorded in the subquery.
Instruction 1:
SELECT
-- Select country ID, date, home, and away goals from match
main.country_id,
main.date,
main.home_goal,
main.away_goal
FROM match AS main
WHERE
-- Filter for matches with the highest number of goals scored
(home_goal + away_goal) =
(SELECT MAX(sub.home_goal + sub.away_goal)
FROM match AS sub
WHERE main.country_id = sub.country_id
AND main.season = sub.season);
3.4 Nested subqueries (video)
3.5 Nested simple subqueries
Instruction 1:
- Complete the main query to select the season and the max total goals in a match for each season. Name this
max_goals
. - Complete the first simple subquery to select the max total goals in a match across all seasons. Name this
overall_max_goals
. - Complete the nested subquery to select the maximum total goals in a match played in July across all seasons.
- Select the maximum total goals in the outer subquery. Name this entire subquery
july_max_goals
.
SELECT
-- Select the season and max goals scored in a match
season,
MAX(home_goal + away_goal) AS max_goals,
-- Select the overall max goals scored in a match
(SELECT MAX(home_goal + away_goal) FROM match) AS overall_max_goals,
-- Select the max number of goals scored in any match in July
(SELECT MAX(home_goal + away_goal)
FROM match
WHERE id IN (
SELECT id FROM match WHERE EXTRACT(MONTH FROM date) = 07)) AS july_max_goals
FROM match
GROUP BY season;
3.6 Nest a subquery in FROM
Instruction 1:
- Generate a list of matches where at least one team scored 5 or more goals.
-- Select matches where a team scored 5+ goals
SELECT
country_id,
season,
id
FROM match
WHERE home_goal > 5 OR away_goal > 5;
Instruction 2:
- Turn the query from the previous step into a subquery in the
FROM
statement. COUNT
the matchid
s generated in the previous step, and group the query bycountry_id
andseason
.
-- Count match ids
SELECT
country_id,
season,
count(id) AS matches
-- Set up and alias the subquery
FROM (
SELECT
country_id,
season,
id
FROM match
WHERE home_goal >= 5 OR away_goal >= 5) AS subquery
-- Group by country_id and season
GROUP BY country_id, season;
Instruction 3:
- Finally, declare the same query from step 2 as a subquery in
FROM
with the aliasouter_s
. Left join it to thecountry
table using the outer query’scountry_id
column. - Calculate an
AVG
of high scoring matches per season in the main query.
SELECT
c.name AS country,
-- Calculate the average matches per season
AVG(id) AS avg_seasonal_high_scores
FROM country AS c
-- Left join outer_s to country
LEFT JOIN (
SELECT country_id, season,
COUNT(id) AS matches
FROM (
SELECT country_id, season, id
FROM match
WHERE home_goal >= 5 OR away_goal >= 5) AS inner_s
-- Close parentheses and alias the subquery
GROUP BY country_id, season ) AS outer_s
ON c.id = outer_s.country_id
GROUP BY country;
3.7 Common Table Expressions (video)
3.8 Clean up with CTEs
Instruction:
- Complete the syntax to declare your CTE.
- Select the
country_id
and matchid
from thematch
table in your CTE. - Left join the CTE to the league table using
country_id
.
-- Set up your CTE
WITH match_list AS (
SELECT
country_id,
id
FROM match
WHERE (home_goal + away_goal) >= 10)
-- Select league and count of matches from the CTE
SELECT
l.name AS league,
COUNT(match_list.id) AS matches
FROM league AS l
-- Join the CTE to the league table
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name;
3.9 Organizing with CTEs
Instruction:
- Declare your CTE, where you create a list of all matches with the league name.
- Select the league, date, home, and away goals from the CTE.
- Filter the main query for matches with 10 or more goals.
-- Set up your CTE
WITH match_list AS (
-- Select the league, date, home, and away goals
SELECT
l.name AS league,
m.date,
m.home_goal,
m.away_goal,
(m.home_goal + m.away_goal) AS total_goals
FROM match AS m
LEFT JOIN league as l ON m.country_id = l.id)
-- Select the league, date, home, and away goals from the CTE
SELECT league, date, home_goal, away_goal
FROM match_list
-- Filter by total goals
WHERE total_goals >= 10;
3.10 CTEs with nested subqueries
Instruction 1:
- Declare a CTE that calculates the total goals from matches in August of the 2013/2014 season.
- Left join the CTE onto the league table using
country_id
from thematch_list
CTE. - Filter the list on the inner subquery to only select matches in August of the 2013/2014 season.
-- Set up your CTE
WITH match_list AS (
SELECT
country_id,
(home_goal + away_goal) AS goals
FROM match
-- Create a list of match IDs to filter data in the CTE
WHERE id IN (
SELECT id
FROM match
WHERE season = '2013/2014' AND EXTRACT(MONTH FROM date) = 8))
-- Select the league name and average of goals in the CTE
SELECT
l.name,
AVG(match_list.goals)
FROM league AS l
-- Join the CTE onto the league table
LEFT JOIN match_list ON l.id = match_list.country_id
GROUP BY l.name;
3.11 Deciding on techniques to use (video)
3.12 Get team names with a subquery
Instruction 1:
- Create a query that left joins
team
tomatch
in order to get the identity of the home team. This becomes the subquery in the next step.
SELECT
m.id,
t.team_long_name AS hometeam
-- Left join team to match
FROM match AS m
LEFT JOIN team as t
ON m.hometeam_id = team_api_id;
Instruction 2:
- Add a second subquery to the
FROM
statement to get the away team name, changing only thehometeam_id
. Left join both subqueries to thematch
table on theid
column.
SELECT
m.date,
-- Get the home and away team names
hometeam,
awayteam,
m.home_goal,
m.away_goal
FROM match AS m
-- Join the home subquery to the match table
LEFT JOIN (
SELECT match.id, team.team_long_name AS hometeam
FROM match
LEFT JOIN team
ON match.hometeam_id = team.team_api_id) AS home
ON home.id = m.id
-- Join the away subquery to the match table
LEFT JOIN (
SELECT match.id, team.team_long_name AS awayteam
FROM match
LEFT JOIN team
-- Get the away team ID in the subquery
ON match.awayteam_id = team.team_api_id) AS away
ON away.id = m.id;
3.13 Get team names with correlated subqueries
Instruction 1:
- Using a correlated subquery in the
SELECT
statement, match theteam_api_id
column fromteam
to thehometeam_id
frommatch
.
SELECT
m.date,
(SELECT team_long_name
FROM team AS t
-- Connect the team to the match table
WHERE t.team_api_id = m.hometeam_id) AS hometeam
FROM match AS m;
Instruction 2:
SELECT
m.date,
(SELECT team_long_name
FROM team AS t
WHERE t.team_api_id = m.hometeam_id) AS hometeam,
-- Connect the team to the match table
(SELECT team_long_name
FROM team AS t
WHERE t.team_api_id = m.awayteam_id) AS awayteam,
-- Select home and away goals
home_goal,
away_goal
FROM match AS m;
3.14 Get team names with CTEs
Instruction 1:
- Select
id
frommatch
andteam_long_name
fromteam
. Join these two tables together onhometeam_id
inmatch
andteam_api_id
inteam
.
SELECT
-- Select match id and team long name
m.id,
t.team_long_name AS hometeam
FROM match AS m
-- Join team to match using team_api_id and hometeam_id
LEFT JOIN team AS t
ON t.team_api_id = m.hometeam_id;
Instruction 2:
- Declare the query from the previous step as a common table expression.
SELECT
everything from the CTE into the main query. Your results will not change at this step!
-- Declare the home CTE
WITH home AS (
SELECT m.id, t.team_long_name AS hometeam
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id)
-- Select everything from home
SELECT *
FROM home;
Instruction 3:
- Let’s declare the second CTE,
away
. Join it to the first CTE on theid
column. - The
date
,home_goal
, andaway_goal
columns have been added to the CTEs.SELECT
them into the main query.
WITH home AS (
SELECT m.id, m.date,
t.team_long_name AS hometeam, m.home_goal
FROM match AS m
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id),
-- Declare and set up the away CTE
away AS (
SELECT m.id, m.date,
t.team_long_name AS awayteam, m.away_goal
FROM match AS m
LEFT JOIN team AS t
ON m.awayteam_id = t.team_api_id)
-- Select date, home_goal, and away_goal
SELECT
home.date,
home.hometeam,
away.awayteam,
home.home_goal,
away.away_goal
-- Join away and home on the id column
FROM home
INNER JOIN away
ON home.id = away.id;
3.15 Which technique to use?
4. Window Functions
4.1 It’s OVER (video)
4.2 The match is OVER
Instruction 1:
- Select the match ID, country name, season, home, and away goals from the
match
andcountry
tables. - Complete the query that calculates the average number of goals scored overall and then includes the aggregate value in each row using a window function.
SELECT
-- Select the id, country name, season, home, and away goals
m.id,
c.name AS country,
m.season,
m.home_goal,
m.away_goal,
-- Use a window to include the aggregate average in each row
AVG(m.home_goal + m.away_goal) OVER() AS overall_avg
FROM match AS m
LEFT JOIN country AS c ON m.country_id = c.id;
4.3 What’s OVER here?
Instruction:
- Select the league name and average total goals scored from
league
andmatch
. - Complete the window function so it calculates the rank of average goals scored across all leagues in the database.
- Order the rank by the average total of home and away goals scored.
SELECT
-- Select the league name and average goals scored
l.name AS league,
AVG(m.home_goal + m.away_goal) AS avg_goals,
-- Rank each league according to the average goals
RANK() OVER(ORDER BY AVG(m.home_goal + m.away_goal)) AS league_rank
FROM league AS l
LEFT JOIN match AS m
ON l.id = m.country_id
WHERE m.season = '2011/2012'
GROUP BY l.name
-- Order the query by the rank you created
ORDER BY league_rank;
4.4 Flip OVER your results
Instruction 1:
- Complete the same parts of the query as the previous exercise.
- Complete the window function to rank each league from highest to lowest average goals scored.
- Order the main query by the rank you just created.
SELECT
-- Select the league name and average goals scored
l.name AS league,
AVG(m.home_goal + m.away_goal) AS avg_goals,
-- Rank leagues in descending order by average goals
RANK () OVER(ORDER BY AVG(m.home_goal + m.away_goal) DESC) AS league_rank
FROM league AS l
LEFT JOIN match AS m
ON l.id = m.country_id
WHERE m.season = '2011/2012'
GROUP BY l.name
-- Order the query by the rank you created
ORDER BY league_rank;
4.5 OVER with a PARTITION (video)
4.6 PARTITION BY a column
Instruction 1:
- Complete the two window functions that calculate the home and away goal averages. Partition the window functions by season to calculate separate averages for each season.
- Filter the query to only include matches played by Legia Warszawa,
id = 8673
.
SELECT
date,
season,
home_goal,
away_goal,
CASE WHEN hometeam_id = 8673 THEN 'home'
ELSE 'away' END AS warsaw_location,
-- Calculate the average goals scored partitioned by season
AVG(home_goal) OVER (PARTITION BY season) AS season_homeavg,
AVG(away_goal) OVER (PARTITION BY season) AS season_awayavg
FROM match
-- Filter the data set for Legia Warszawa matches only
WHERE
hometeam_id = 8673
OR awayteam_id = 8673
ORDER BY (home_goal + away_goal) DESC;
4.7 PARTITION BY multiple columns
- Construct two window functions partitioning the average of home and away goals by season and month.
- Filter the data set by Legia Warszawa’s team ID (
8673
) so that the window calculation excludes all teams who did not play against them.
Instruction:
SELECT
date,
season,
home_goal,
away_goal,
CASE WHEN hometeam_id = 8673 THEN 'home'
ELSE 'away' END AS warsaw_location,
-- Calculate average goals partitioned by season and month
AVG(home_goal) OVER(PARTITION BY season,
EXTRACT(month FROM date)) AS season_mo_home,
AVG(away_goal) OVER(PARTITION BY season,
EXTRACT(month FROM date)) AS season_mo_away
FROM match
WHERE
hometeam_id = 8673
OR awayteam_id = 8673
ORDER BY (home_goal + away_goal) DESC;
4.8 Sliding Windows (video)
4.9 Slide to the left
- Complete the window function by:
- Assessing the running total of home goals scored by FC Utrecht.
- Assessing the running average of home goals scored.
- Ordering both the running average and running total by
date
.
Instruction:
SELECT
date,
home_goal,
away_goal,
-- Create a running total and running average of home goals
SUM(home_goal) OVER(ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
AVG(home_goal) OVER(ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg
FROM match
WHERE
hometeam_id = 9908
AND season = '2011/2012';
4.10 Slide to the right
Instruction 1:
- Complete the window function by:
- Assessing the running total of home goals scored by FC Utrecht.
- Assessing the running average of home goals scored.
- Ordering both the running average and running total by date, descending.
SELECT
-- Select the date, home goal, and away goals
date,
home_goal,
away_goal,
-- Create a running total and running average of home goals
SUM(home_goal) OVER(ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
AVG(home_goal) OVER(ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg
FROM match
WHERE
awayteam_id = 9908
AND season = '2011/2012';
4.11 Bringing it all together (video)
4.12 Setting up the home team CTE
Instruction:
- Create a
CASE
statement that identifies each match as a win, lose, or tie for Manchester United. - Fill out the logical operators for each
WHEN
clause in theCASE
statement (equals, greater than, less than). - Join the tables on home team ID from
match
, andteam_api_id
fromteam
. - Filter the query to only include games from the 2014/2015 season where Manchester United was the home team.
SELECT
m.id,
t.team_long_name,
-- Identify matches as home/away wins or ties
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome
FROM match AS m
-- Left join team on the home team ID and team API id
LEFT JOIN team AS t
ON m.hometeam_id = t.team_api_id
WHERE
-- Filter for 2014/2015 and Manchester United as the home team
season = '2014/2015'
AND t.team_long_name = 'Manchester United';
4.13 Setting up the away team CTE
Instruction:
- Complete the
CASE
statement syntax. - Fill out the logical operators identifying each match as a win, loss, or tie for Manchester United.
- Join the table on
awayteam_id
, andteam_api_id
.
SELECT
m.id,
t.team_long_name,
-- Identify matches as home/away wins or ties
CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
WHEN m.home_goal < m.away_goal THEN 'MU Win'
ELSE 'Tie' END AS outcome
-- Join team table to the match table
FROM match AS m
LEFT JOIN team AS t
ON m.awayteam_id = t.team_api_id
WHERE
-- Filter for 2014/2015 and Manchester United as the away team
season = '2014/2015'
AND t.team_long_name = 'Manchester United';
4.14 Putting the CTEs together
Instruction:
- Declare the
home
andaway
CTEs before your main query. - Join your CTEs to the match table using a
LEFT JOIN
. - Select the relevant data from the CTEs into the main query.
- Select the date from
match
, team names from the CTEs, and home/ away goals frommatch
in the main query.
-- Set up the home team CTE
WITH home AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome
FROM match AS m
LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
-- Set up the away team CTE
away AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome
FROM match AS m
LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
-- Select team names, the date and goals
SELECT DISTINCT
m.date,
home.team_long_name AS home_team,
away.team_long_name AS away_team,
m.home_goal,
m.away_goal
-- Join the CTEs onto the match table
FROM match AS m
LEFT JOIN home ON m.id = home.id
LEFT JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
AND (home.team_long_name = 'Manchester United'
OR away.team_long_name = 'Manchester United');
4.15 Add a window function
Instruction 1:
- Set up the CTEs so that the home and away teams each have a name, ID, and score associated with them.
- Select the date, home team name, away team name, home goal, and away goals scored in the main query.
- Rank the matches and order by the difference in scores in descending order.
-- Set up the home team CTE
WITH home AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Win'
WHEN m.home_goal < m.away_goal THEN 'MU Loss'
ELSE 'Tie' END AS outcome
FROM match AS m
LEFT JOIN team AS t ON m.hometeam_id = t.team_api_id),
-- Set up the away team CTE
away AS (
SELECT m.id, t.team_long_name,
CASE WHEN m.home_goal > m.away_goal THEN 'MU Loss'
WHEN m.home_goal < m.away_goal THEN 'MU Win'
ELSE 'Tie' END AS outcome
FROM match AS m
LEFT JOIN team AS t ON m.awayteam_id = t.team_api_id)
-- Select columns and and rank the matches by date
SELECT DISTINCT
m.date,
home.team_long_name AS home_team,
away.team_long_name AS away_team,
m.home_goal, m.away_goal,
RANK() OVER(ORDER BY ABS(home_goal - away_goal) DESC) as match_rank
-- Join the CTEs onto the match table
FROM match AS m
LEFT JOIN home ON m.id = home.id
LEFT JOIN away ON m.id = away.id
WHERE m.season = '2014/2015'
AND ((home.team_long_name = 'Manchester United' AND home.outcome = 'MU Loss')
OR (away.team_long_name = 'Manchester United' AND away.outcome = 'MU Loss'));