Back-end Engineering Articles

I write and talk about backend stuff like Ruby, Ruby On Rails, Databases, Testing, Architecture / Infrastructure / System Design, Cloud, DevOps, Backgroud Jobs, and more...

Twitter:
@daniel_moralesp

2019-11-22

Introduction to SQL - Joins

Now that we know how to set and work with multiple tables, it is time to join all of these tables in some manner, just to have more sense about the data stored right there or because we want to search for a particular data.

To do so, we have to know another concept called: Joins. 

Database Design

Note: we'll continue working with tables created in the last blog post, so please read it and execute it to have the proper context about the current topic.

In that blog post, we had these tables.


We tried to figure out how a record looks like following the Foreign Keys. For instance, if we followed the green path that belongs to Order with id 1, we have this:


We must first look at the table Order because it contains the foreign key, and we can follow the right path. So let's follow the green-way:

* Order with id 1 belongs to a customer with id 2, Ana.
* Order with id 1 belongs to a subscription with id 1, The Time Magazine.
* This means that Ana is the owner of the Order with id 1, and she bought a subscription to Time Magazine, and the Order was made at 01-10-2021


Let's follow the other path. Let's call it the red path.


In this case
* Order with id 2 belongs to a customer with id 1, Justin.
* Order with id 2 belongs to a subscription with id 2, The Economist.
* This means that Justin is the owner of the Order with id 2, and he bought a subscription to The Economist, and the Order was made at 02-02-2022

Current Data

Now, let's check what's inside our current tables.

Note: if you followed the last blog post, you have these tables created in PostgreSQL

First, let's explore the current data in Orders and Customers.

Orders

newsletters=# SELECT * FROM orders;
 order_id | customer_id | subscription_id |    purchase_date    
----------+-------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53
        2 |           2 |               4 | 2021-05-15 17:51:53
        3 |           3 |               4 | 2021-05-18 13:51:53
        4 |           9 |               9 | 2021-05-21 17:51:53
        5 |           7 |               5 | 2021-06-02 12:51:53
        6 |           8 |               2 | 2021-06-14 13:51:53
        7 |           5 |               8 | 2021-06-14 17:51:53
        8 |           9 |               5 | 2021-06-16 19:51:53
        9 |           4 |               4 | 2021-06-17 20:51:53
       10 |           1 |               7 | 2021-07-18 22:51:53
       11 |           5 |               4 | 2021-07-20 17:51:53
       12 |           3 |               2 | 2021-09-21 17:51:53
       13 |           3 |               5 | 2021-12-01 18:51:53
       14 |           6 |               5 | 2021-12-14 16:51:53
       15 |           1 |               2 | 2021-12-15 15:51:53
       16 |           1 |               2 | 2021-12-16 14:51:53
       17 |           3 |               6 | 2021-12-17 13:51:53
       18 |           3 |               8 | 2021-12-18 12:51:53
       19 |           4 |               9 | 2021-12-19 11:51:53
       20 |           1 |               7 | 2021-12-19 10:51:53
(20 rows)




Customers

newsletters=# SELECT * FROM customers;
 customer_id |   customer_name   |      address      
-------------+-------------------+-------------------
           1 | Allie Rahaim      | 123 Broadway
           2 | Jacquline Diddle  | 456 Park Ave.
           3 | Lizabeth Letsche  | 789 Main St.
           4 | Jessia Butman     | 1 Columbus Ave.
           5 | Inocencia Goyco   | 12 Amsterdam Ave.
           6 | Bethann Schraub   | 29 Monticello
           7 | Janay Priolo      | 81 Harrisburg
           8 | Ophelia Sturdnant | 31 Deerfield Ave.
           9 | Eryn Vilar        | 56 Morton St.
          10 | Jina Farraj       | 100 Bryan Ave.
(10 rows)



Subscriptions

newsletters=# SELECT * FROM subscriptions;
 subscription_id |    description    | price_per_month | subscription_length 
-----------------+-------------------+-----------------+---------------------
               1 | Politics Magazine |              10 |                  12
               2 | Politics Magazine |              11 |                   6
               3 | Politics Magazine |              12 |                   3
               4 | Fashion Magazine  |              15 |                  12
               5 | Fashion Magazine  |              17 |                   6
               6 | Fashion Magazine  |              19 |                   3
               7 | Sports Magazine   |              11 |                  12
               8 | Sports Magazine   |              12 |                   6
               9 | Sports Magazine   |              13 |                   3
(9 rows)


Let's follow the first Order.

 order_id | customer_id | subscription_id |    purchase_date    
----------+-------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53

This means that order 1 belongs to the customer with id 3, Lizabeth Letsche, and lives in 789 Main St. Lizabeth has subscribed to subscription id 2 or Politics Magazine. 

We should go deeper in our data understanding, so let's do it. But, first, just pay attention to orders tables.

 order_id | customer_id | subscription_id |    purchase_date    
----------+-------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53
        2 |           2 |               4 | 2021-05-15 17:51:53
        3 |           3 |               4 | 2021-05-18 13:51:53
        4 |           9 |               9 | 2021-05-21 17:51:53
        5 |           7 |               5 | 2021-06-02 12:51:53
        6 |           8 |               2 | 2021-06-14 13:51:53
        7 |           5 |               8 | 2021-06-14 17:51:53
        8 |           9 |               5 | 2021-06-16 19:51:53
        9 |           4 |               4 | 2021-06-17 20:51:53
       10 |           1 |               7 | 2021-07-18 22:51:53
       11 |           5 |               4 | 2021-07-20 17:51:53
       12 |           3 |               2 | 2021-09-21 17:51:53
       13 |           3 |               5 | 2021-12-01 18:51:53
       14 |           6 |               5 | 2021-12-14 16:51:53
       15 |           1 |               2 | 2021-12-15 15:51:53
       16 |           1 |               2 | 2021-12-16 14:51:53
       17 |           3 |               6 | 2021-12-17 13:51:53
       18 |           3 |               8 | 2021-12-18 12:51:53
       19 |           4 |               9 | 2021-12-19 11:51:53
       20 |           1 |               7 | 2021-12-19 10:51:53


We see that the customer with id 3 has made 6 orders, and it seems that he is likely our best customer (but we don't know the customer's name). On the other hand, the subscription id 2 has 5 orders and seems like our best-selling subscription (but we don't know the newsletter name).

The critical thing to consider here is: If we only look at the orders table, we cannot tell what is really going on in each Order, but if we refer to the other tables, we can get a complete picture. We call these "joining tables."

INNER JOIN

In SQL, we combine tables with a statement called JOIN. If we want to connect orders with customers, we can do something like this:

newsletters=# SELECT * FROM orders
newsletters-# JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;


 order_id | customer_id | subscription_id |    purchase_date    | customer_id |   customer_name   |      address      
----------+-------------+-----------------+---------------------+-------------+-------------------+-------------------
        1 |           3 |               2 | 2021-05-14 14:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        2 |           2 |               4 | 2021-05-15 17:51:53 |           2 | Jacquline Diddle  | 456 Park Ave.
        3 |           3 |               4 | 2021-05-18 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        4 |           9 |               9 | 2021-05-21 17:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        5 |           7 |               5 | 2021-06-02 12:51:53 |           7 | Janay Priolo      | 81 Harrisburg
        6 |           8 |               2 | 2021-06-14 13:51:53 |           8 | Ophelia Sturdnant | 31 Deerfield Ave.
        7 |           5 |               8 | 2021-06-14 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
        8 |           9 |               5 | 2021-06-16 19:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        9 |           4 |               4 | 2021-06-17 20:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       10 |           1 |               7 | 2021-07-18 22:51:53 |           1 | Allie Rahaim      | 123 Broadway
       11 |           5 |               4 | 2021-07-20 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
       12 |           3 |               2 | 2021-09-21 17:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       13 |           3 |               5 | 2021-12-01 18:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       14 |           6 |               5 | 2021-12-14 16:51:53 |           6 | Bethann Schraub   | 29 Monticello
       15 |           1 |               2 | 2021-12-15 15:51:53 |           1 | Allie Rahaim      | 123 Broadway
       16 |           1 |               2 | 2021-12-16 14:51:53 |           1 | Allie Rahaim      | 123 Broadway
       17 |           3 |               6 | 2021-12-17 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       18 |           3 |               8 | 2021-12-18 12:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       19 |           4 |               9 | 2021-12-19 11:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       20 |           1 |               7 | 2021-12-19 10:51:53 |           1 | Allie Rahaim      | 123 Broadway
(20 rows)


Now we have a better picture of the data. If the customer with id 3 is our best customer, we can identify easier that the customer is: Lizabeth Letsche. Now let's dissect the statement created. 

newsletters=# SELECT * FROM orders
newsletters-# JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;

    • * The first line selects all the combined tables' columns (with the command SELECT *). If we only want some of them, we must specify them. Then it sets the first table on which we want to make the query (orders in our case)
    • * The second line uses JOIN to say that you want to combine the information from orders with customers.
    • * The third line tells how to combine the two tables. We want to match the customer_id of orders with the customer_id of customers

  • Because names may be repeated in multiple tables, we use the table_name.column_name syntax to ensure an unambiguous query. We can use it in ON or in SELECT this syntax. Instead of selecting all columns (*), we will choose order_id from orders and customer_name from customers.


newsletters=# SELECT orders.order_id, customers.customer_name
newsletters-# FROM orders
newsletters-# JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;
 order_id |   customer_name   
----------+-------------------
        1 | Lizabeth Letsche
        2 | Jacquline Diddle
        3 | Lizabeth Letsche
        4 | Eryn Vilar
        5 | Janay Priolo
        6 | Ophelia Sturdnant
        7 | Inocencia Goyco
        8 | Eryn Vilar
        9 | Jessia Butman
       10 | Allie Rahaim
       11 | Inocencia Goyco
       12 | Lizabeth Letsche
       13 | Lizabeth Letsche
       14 | Bethann Schraub
       15 | Allie Rahaim
       16 | Allie Rahaim
       17 | Lizabeth Letsche
       18 | Lizabeth Letsche
       19 | Jessia Butman
       20 | Allie Rahaim
(20 rows)


With this query, it is evident who made each Order. 

Let's do something similar but with subscriptions.

newsletters=# SELECT *
newsletters-# FROM orders
newsletters-# JOIN subscriptions
newsletters-# ON orders.subscription_id = subscriptions.subscription_id;
 order_id | customer_id | subscription_id |    purchase_date    | subscription_id |    description    | price_per_month | subscription_length 
----------+-------------+-----------------+---------------------+-----------------+-------------------+-----------------+---------------------
        1 |           3 |               2 | 2021-05-14 14:51:53 |               2 | Politics Magazine |              11 |                   6
        2 |           2 |               4 | 2021-05-15 17:51:53 |               4 | Fashion Magazine  |              15 |                  12
        3 |           3 |               4 | 2021-05-18 13:51:53 |               4 | Fashion Magazine  |              15 |                  12
        4 |           9 |               9 | 2021-05-21 17:51:53 |               9 | Sports Magazine   |              13 |                   3
        5 |           7 |               5 | 2021-06-02 12:51:53 |               5 | Fashion Magazine  |              17 |                   6
        6 |           8 |               2 | 2021-06-14 13:51:53 |               2 | Politics Magazine |              11 |                   6
        7 |           5 |               8 | 2021-06-14 17:51:53 |               8 | Sports Magazine   |              12 |                   6
        8 |           9 |               5 | 2021-06-16 19:51:53 |               5 | Fashion Magazine  |              17 |                   6
        9 |           4 |               4 | 2021-06-17 20:51:53 |               4 | Fashion Magazine  |              15 |                  12
       10 |           1 |               7 | 2021-07-18 22:51:53 |               7 | Sports Magazine   |              11 |                  12
       11 |           5 |               4 | 2021-07-20 17:51:53 |               4 | Fashion Magazine  |              15 |                  12
       12 |           3 |               2 | 2021-09-21 17:51:53 |               2 | Politics Magazine |              11 |                   6
       13 |           3 |               5 | 2021-12-01 18:51:53 |               5 | Fashion Magazine  |              17 |                   6
       14 |           6 |               5 | 2021-12-14 16:51:53 |               5 | Fashion Magazine  |              17 |                   6
       15 |           1 |               2 | 2021-12-15 15:51:53 |               2 | Politics Magazine |              11 |                   6
       16 |           1 |               2 | 2021-12-16 14:51:53 |               2 | Politics Magazine |              11 |                   6
       17 |           3 |               6 | 2021-12-17 13:51:53 |               6 | Fashion Magazine  |              19 |                   3
       18 |           3 |               8 | 2021-12-18 12:51:53 |               8 | Sports Magazine   |              12 |                   6
       19 |           4 |               9 | 2021-12-19 11:51:53 |               9 | Sports Magazine   |              13 |                   3
       20 |           1 |               7 | 2021-12-19 10:51:53 |               7 | Sports Magazine   |              11 |                  12
(20 rows)





Now we have a clear map of the two combined or joined tables. 

So far, we have been using the keyword JOIN to do this operation. However, we could call it INNER JOIN and have the same result. In SQL, Inner Joins are the most common operation, but they have some limits. For instance: when we run a simple JOIN, our result only includes rows that match our ON condition.

If we analyze our customers and orders table in detail, we can see that the customer with id 10 that belongs to Jina Farraj doesn't have any Orders. So the question is: Was she included on the Inner Join table? No, because she doesn't have any Order, he will be excluded from the joining table until she makes an order. 

Note: let's keep an eye on customer 10, Jina Farraj, and with that, we can understand the different Joins

LEFT JOINS

What if we combine two tables and keep some rows that do not match? We can do that with a LEFT JOIN. A left join will keep all the rows of the first table, no matter if they don't match the second table. Let's do an example.


newsletters=# SELECT *
newsletters-# FROM customers
newsletters-# LEFT JOIN orders
newsletters-# ON customers.customer_id = orders.customer_id;
 customer_id |   customer_name   |      address      | order_id | customer_id | subscription_id |    purchase_date    
-------------+-------------------+-------------------+----------+-------------+-----------------+---------------------
           3 | Lizabeth Letsche  | 789 Main St.      |        1 |           3 |               2 | 2021-05-14 14:51:53
           2 | Jacquline Diddle  | 456 Park Ave.     |        2 |           2 |               4 | 2021-05-15 17:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |        3 |           3 |               4 | 2021-05-18 13:51:53
           9 | Eryn Vilar        | 56 Morton St.     |        4 |           9 |               9 | 2021-05-21 17:51:53
           7 | Janay Priolo      | 81 Harrisburg     |        5 |           7 |               5 | 2021-06-02 12:51:53
           8 | Ophelia Sturdnant | 31 Deerfield Ave. |        6 |           8 |               2 | 2021-06-14 13:51:53
           5 | Inocencia Goyco   | 12 Amsterdam Ave. |        7 |           5 |               8 | 2021-06-14 17:51:53
           9 | Eryn Vilar        | 56 Morton St.     |        8 |           9 |               5 | 2021-06-16 19:51:53
           4 | Jessia Butman     | 1 Columbus Ave.   |        9 |           4 |               4 | 2021-06-17 20:51:53
           1 | Allie Rahaim      | 123 Broadway      |       10 |           1 |               7 | 2021-07-18 22:51:53
           5 | Inocencia Goyco   | 12 Amsterdam Ave. |       11 |           5 |               4 | 2021-07-20 17:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       12 |           3 |               2 | 2021-09-21 17:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       13 |           3 |               5 | 2021-12-01 18:51:53
           6 | Bethann Schraub   | 29 Monticello     |       14 |           6 |               5 | 2021-12-14 16:51:53
           1 | Allie Rahaim      | 123 Broadway      |       15 |           1 |               2 | 2021-12-15 15:51:53
           1 | Allie Rahaim      | 123 Broadway      |       16 |           1 |               2 | 2021-12-16 14:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       17 |           3 |               6 | 2021-12-17 13:51:53
           3 | Lizabeth Letsche  | 789 Main St.      |       18 |           3 |               8 | 2021-12-18 12:51:53
           4 | Jessia Butman     | 1 Columbus Ave.   |       19 |           4 |               9 | 2021-12-19 11:51:53
           1 | Allie Rahaim      | 123 Broadway      |       20 |           1 |               7 | 2021-12-19 10:51:53
          10 | Jina Farraj       | 100 Bryan Ave.    |          |             |                 | 
(21 rows)


Can you see the difference?

The customer with ID 10 now appears in the result, but the data related to the Order is empty or null. That is because the LEFT JOIN keeps all the rows of the first table, no matter if they don't match the second table. So now, let's explain the query executed.

newsletters=# SELECT *
newsletters-# FROM customers
newsletters-# LEFT JOIN orders
newsletters-# ON customers.customer_id = orders.customer_id;

    • * The first line selects all columns of both tables.
    • * The second line selects customers (the left table).
    • * The third line runs a LEFT JOIN on table orders (the right table).
    • * The fourth line tells SQL how to run the join (looking for the matching values in column customer_id)

  • When should we use INNER JOIN instead of LEFT JOIN?
    • * It depends on how we want to select the combined data.
    • * Generally, we use INNER JOIN to select only the rows that match the ON condition.
    • * We use LEFT JOIN when we want every row of the first table, whether it matches the second one.

  • RIGHT JOIN

  • It works very similarly to LEFT JOIN. The main difference is the position of the tables. So, for example, if we replicate the last exercise to see the customer with id 10, we can do something like this:



newsletters=# SELECT *
newsletters-# FROM orders
newsletters-# RIGHT JOIN customers
newsletters-# ON orders.customer_id = customers.customer_id;
 order_id | customer_id | subscription_id |    purchase_date    | customer_id |   customer_name   |      address      
----------+-------------+-----------------+---------------------+-------------+-------------------+-------------------
        1 |           3 |               2 | 2021-05-14 14:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        2 |           2 |               4 | 2021-05-15 17:51:53 |           2 | Jacquline Diddle  | 456 Park Ave.
        3 |           3 |               4 | 2021-05-18 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
        4 |           9 |               9 | 2021-05-21 17:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        5 |           7 |               5 | 2021-06-02 12:51:53 |           7 | Janay Priolo      | 81 Harrisburg
        6 |           8 |               2 | 2021-06-14 13:51:53 |           8 | Ophelia Sturdnant | 31 Deerfield Ave.
        7 |           5 |               8 | 2021-06-14 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
        8 |           9 |               5 | 2021-06-16 19:51:53 |           9 | Eryn Vilar        | 56 Morton St.
        9 |           4 |               4 | 2021-06-17 20:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       10 |           1 |               7 | 2021-07-18 22:51:53 |           1 | Allie Rahaim      | 123 Broadway
       11 |           5 |               4 | 2021-07-20 17:51:53 |           5 | Inocencia Goyco   | 12 Amsterdam Ave.
       12 |           3 |               2 | 2021-09-21 17:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       13 |           3 |               5 | 2021-12-01 18:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       14 |           6 |               5 | 2021-12-14 16:51:53 |           6 | Bethann Schraub   | 29 Monticello
       15 |           1 |               2 | 2021-12-15 15:51:53 |           1 | Allie Rahaim      | 123 Broadway
       16 |           1 |               2 | 2021-12-16 14:51:53 |           1 | Allie Rahaim      | 123 Broadway
       17 |           3 |               6 | 2021-12-17 13:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       18 |           3 |               8 | 2021-12-18 12:51:53 |           3 | Lizabeth Letsche  | 789 Main St.
       19 |           4 |               9 | 2021-12-19 11:51:53 |           4 | Jessia Butman     | 1 Columbus Ave.
       20 |           1 |               7 | 2021-12-19 10:51:53 |           1 | Allie Rahaim      | 123 Broadway
          |             |                 |                     |          10 | Jina Farraj       | 100 Bryan Ave.
(21 rows)




If you pay attention to the last row, the empty values are in the first column (that belongs to the orders), and then we have the customer info. That is because the RIGHT JOIN keeps all the rows of the second table, no matter if they don't match the first table.

With this, we have the most common usage of Join Tables, and you can test the different results with different data by yourself.

I hope you learned a lot.

Thanks for reading
Daniel Morales