1. Introduction to Joins
1.1 Introduction to INNER JOIN (video)
1.2 INNER JOIN
PostgreSQL was mentioned in the slides but you’ll find that these joins and the material here applies to different forms of SQL as well.
Recall from the video the basic syntax for an INNER JOIN, here including all columns in both tables:
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;
You’ll start off with a SELECT
statement and then build up to an inner join with the cities
and countries
tables.
Instruction 1:
Begin by selecting all columns from the cities
table.
SELECT *
FROM cities;
Instruction 2:
- Inner join the
cities
table on the left to thecountries
table on the right, keeping all of the fields in both tables. - You should match the tables on the
country_code
field incities
and thecode
field incountries
. - Do not alias your tables here or in the next step. Using
cities
andcountries
is fine for now.
SELECT *
FROM cities
-- 1. Inner join to countries
INNER JOIN countries
-- 2. Match on the country codes
ON cities.country_code = countries.code;
Instruction 3:
Modify the SELECT
statement to keep only the name of the city, the name of the country, and the name of the region the country resides in.
Recall from our Intro to SQL course that you can alias fields using AS
. Alias the name of the city AS city
and the name of the country AS country
.
-- 1. Select name fields (with alias) and region
SELECT cities.name AS city, countries.name as country, region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;
1.3 INNER JOIN (2)
Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS
to add the alias immediately after the table name with a space. Check out the aliasing of cities
and countries
below.
SELECT c1.name AS city, c2.name AS country
FROM cities AS c1
INNER JOIN countries AS c2
ON c1.country_code = c2.code;
Notice that to select a field in your query that appears in multiple tables, you’ll need to identify which table/table alias you’re referring to by using a .
in your SELECT
statement.
You’ll now explore a way to get data from both the countries
and economies
tables to examine the inflation rate for both 2010 and 2015.
Sometimes it’s easier to write SQL code out of order: you write the SELECT
statement after you’ve done the JOIN
.
Instruction
- Join the tables
countries
(left) andeconomies
(right) aliasingcountries AS c
andeconomies AS e
. - Specify the field to match the tables
ON
. - From this join,
SELECT
:c.code
, aliased ascountry_code
.name
,year
, andinflation_rate
, not aliased.
-- 3. Select fields with aliases
SELECT c.code AS country_code, c.name, e.year, e.inflation_rate
FROM countries AS c
-- 1. Join to economies (alias e)
INNER JOIN economies AS e
-- 2. Match on code
ON c.code= e.code;
1.4 INNER JOIN (3)
The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:
SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
INNER JOIN another_table
ON left_table.id = another_table.id;
As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. countries AS c
)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.
Now, for each country, you want to get the country name, its region, and the fertility rate and unemployment rate for both 2010 and 2015.
Note that results should work throughout this course with or without table aliasing unless specified differently.
Instruction 1:
- Inner join
countries
(left) andpopulations
(right) on thecode
andcountry_code
fields respectively. - Alias
countries AS c
andpopulations AS p
. - Select
code
,name
, andregion
fromcountries
and also selectyear
andfertility_rate
frompopulations
(5 fields in total).
-- 4. Select fields
SELECT c.code, name, region, year, fertility_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join with populations (as p)
INNER JOIN populations AS p
-- 3. Match on country code
ON c.code = p.country_code;
Instruction 2:
- Add an additional inner join with
economies
to your previous query by joining oncode
. - Include the
unemployment_rate
column that became available through joining witheconomies
. - Note that
year
appears in bothpopulations
andeconomies
, so you have to explicitly usee.year
instead ofyear
as you did before.
-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to populations (as p)
INNER JOIN populations AS p
-- 3. Match on country code
ON c.code = p.country_code
-- 4. Join to economies (as e)
INNER JOIN economies AS e
-- 5. Match on country code
ON c.code = e.code;
Instruction 3:
- Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?
- The trouble with doing your last join on
c.code = e.code
and not also includingyear
is that e.g. the 2010 value forfertility_rate
is also paired with the 2015 value forunemployment_rate
. - Fix your previous query: in your last
ON
clause, useAND
to add an additional joining condition. In addition to joining oncode
inc
ande
, also join onyear
ine
andp
.
-- 6. Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- 1. From countries (alias as c)
FROM countries AS c
-- 2. Join to populations (as p)
INNER JOIN populations AS p
-- 3. Match on country code
ON c.code = p.country_code
-- 4. Join to economies (as e)
INNER JOIN economies AS e
-- 5. Match on country code and year
ON c.code = e.code AND p.year = e.year;
1.5 INNER JOIN via USING (video)
1.6 Review INNER JOIN using ON
Why does the following code result in an error?
SELECT c.name AS country, l.name AS language
FROM countries AS c
INNER JOIN languages AS l;
INNER JOIN
requires a specification of the key field (or fields) in each table.
1.7 INNER JOIN with USING
When joining tables with a common field name, e.g.
SELECT *
FROM countries
INNER JOIN economies
ON countries.code = economies.code
You can use USING
as a shortcut:
SELECT *
FROM countries
INNER JOIN economies
USING(code)
You’ll now explore how this can be done with the countries
and languages
tables.
Instruction:
- Inner join
countries
on the left andlanguages
on the right withUSING(code).
- Select the fields corresponding to:
- country name AS country,
- continent name,
- language name AS language, and
- whether or not the language is official.
Remember to alias your tables using the first letter of their names.
-- 4. Select fields
SELECT c.name AS country, continent, l.name AS language, official
-- 1. From countries (alias as c)
FROM countries as c
-- 2. Join to languages (as l)
INNER JOIN languages as l
-- 3. Match using code
USING(code)
1.8 Self-ish Joins, just in CASE (video)
1.9 Self-join
In this exercise, you’ll use the populations
table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!
Since you’ll be joining the populations
table to itself, you can alias populations
as p1
and also populations
as p2
. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.
Instruction 1:
- Join
populations
with itselfON
country_code
. - Select the
country_code
fromp1
and thesize
field from bothp1
andp2
. SQL won’t allow same-named fields, so aliasp1.size
assize2010
andp2.size
assize2015
.
-- 4. Select fields with aliases
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations AS p1
-- 2. Join to itself (alias as p2)
INNER JOIN populations AS p2
-- 3. Match on country code
ON p1.country_code = p2.country_code;
Instruction 2:
Notice from the result that for each country_code
you have four entries laying out all combinations of 2010 and 2015.
Extend the ON
in your query to include only those records where the p1.year
(2010) matches with p2.year - 5
(2015 - 5 = 2010). This will omit the three entries per country_code
that you aren’t interested in.
-- 5. Select fields with aliases
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015
-- 1. From populations (alias as p1)
FROM populations AS p1
-- 2. Join to itself (alias as p2)
INNER JOIN populations AS p2
-- 3. Match on country code
ON p1.country_code = p2.country_code
-- 4. and year (with calculation)
AND p1.year = p2.year - 5;
Instruction 3:
As you just saw, you can also use SQL to calculate values like p2.year - 5
for you. With two fields like size2010
and size2015
, you may want to determine the percentage increase from one field to the next:
With two numeric fields A and B, the percentage growth from A to B can be calculated as (B−A)/A∗100.0.
Add a new field to SELECT
, aliased as growth_perc
, that calculates the percentage population growth from 2010 to 2015 for each country, using p2.size
and p1.size
.
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015,
-- 1. calculate growth_perc