Back-end Engineering Articles

I write and talk about backend stuff like Ruby, Ruby On Rails, Databases, Testing, Architecture / Infrastructure / System Design, Cloud, DevOps, Backgroud Jobs, and more...



Introduction to SQL - Aggregate Functions And Grouping

We've been advancing a lot in SQL over the last blog posts. Now we'll be studying important concepts in relational databases like Aggregate Functions and Grouping.

Note: To follow this tutorial, we'll continue with databases created in Introduction to SQL, Constraints, and Queries. So I encourage you to follow these posts to have the proper context. 

Aggregate Functions

We have learned how to write database queries now; let's know how to do SQL calculations. When we run calculations on multiple rows of a table, we call them "Aggregates." In database management, an aggregate function is a function where the values of multiple rows are grouped together as input on specific criteria to form a single value of more significant meaning

Let's first go into the PostgreSQL console.

sudo su - postgres
[sudo] password for daniel: 
postgres@daniel-MS-7A15:~$ psql
psql (13.3 (Ubuntu 13.3-1.pgdg16.04+1), server 9.6.22)
Type "help" for help.

postgres=# \c marvel_tshirts
psql (13.3 (Ubuntu 13.3-1.pgdg16.04+1), server 9.6.22)
You are now connected to database "marvel_tshirts" as user "postgres".

Then let's inspect what we have in the user's table. 

marvel_tshirts=# SELECT * FROM users;
 id |  name  | age 
  1 | Justin |  22
  2 | Ana    |  21
  4 | Viktor |  39
(3 rows)

Now that we know what we have, let's go ahead with Aggregate Functions.


  • * COUNT() is the fastest way to calculate how many rows are in a table.
  • * COUNT() is a function that takes the name of a column as an argument and counts the number of non-empty values in that column.

marvel_tshirts=# SELECT COUNT(*)
marvel_tshirts-# FROM users;
(1 row)

This means that we have 3 users

  • * COUNT() includes duplicate values of a column
  • * To run a query with non-duplicate values

marvel_tshirts=# SELECT COUNT (DISTINCT name)
marvel_tshirts-# FROM users;
(1 row)

We don't have any repeated names in the user's table. However, if you have any, the COUNT combined with DISTINCT will avoid duplicate records. 

This means that we have 3 distinct users.


  • SUM() is a function that takes the column name as an argument and returns the sum of all the values in that column.

marvel_tshirts=# SELECT SUM (age)
marvel_tshirts-# FROM users;
(1 row)

This means that we sum up of the ages of the 3 users are 82

MAX() and MIN()

  • * The MAX() and MIN() functions return the highest or lowest value in the column, respectively.
  • * MAX() takes the column's name as an argument and returns the highest value in that column.
  • * MIN() works the same, but oppositely; it returns the lower value

marvel_tshirts=# SELECT MAX (age)
marvel_tshirts-# FROM users;
(1 row)

marvel_tshirts=# SELECT MIN (age)
marvel_tshirts-# FROM users;
(1 row)

39 is the max-age of the users, and 21 and the youngest user. 


It stands for average. SQL uses the AVG() function to quickly calculate the average value of a particular column. The AVG() function takes the name of a column as an argument and returns the average value of the given column.

marvel_tshirts=# SELECT AVG(age)
marvel_tshirts-# FROM users;
(1 row)

The average age of our users is 27.33. However, we see a large number… How can we round it?


  • * By default, SQL tries to be as accurate as possible without rounding. But we can make the results easier to read using the ROUND() function
  • * ROUND() takes two arguments inside the parenthesis: the column name and an integer.
  • * The integer functions as the number of decimal places we want in the result.

In the last exercise, we can use it as follows:

marvel_tshirts=# SELECT ROUND(AVG(age), 0)
marvel_tshirts-# FROM users;
(1 row)

marvel_tshirts=# SELECT ROUND(AVG(age), 1)
FROM users;
(1 row)

marvel_tshirts=# SELECT ROUND(AVG(age), 2)
FROM users;
(1 row)

marvel_tshirts=# SELECT ROUND(AVG(age), 3)
FROM users;
(1 row)

As we can see, the first argument is the result of the AVG() operation, which is 27.3333333… and then we pass a second argument which is the number of decimal numbers we want. So if we put zero, we'll have an integer, and if we pass 1 or more, we'll have a float number given the number of decimals. 

That's the primary aggregate function. There are more, but for now, I think it is enough to have the right amount of tools at hand. Now we'll see another kind of essential operation: Group By.


At some point, we will need to compute aggregate functions for data with specific characteristics. For example, "find the number of customers in each country." 

  • * GROUP BY is a clause in SQL used with aggregate functions. It is used in collaboration with the SELECT clause to bring identical data into groups.
  • * The GROUP BY clause goes after any WHERE, but before ORDER BY or LIMIT. 

Previously we've created a Friends table with the following columns and rows.

marvel_tshirts=# SELECT * FROM friends;
 id |   name   |  birthday  |        email        
  2 | Emiliano | 2006-09-26 |
  3 | Maria    | 1986-09-02 | [email protected]
  4 | Maria    | 1986-09-02 | 
  5 | Maria    | 1986-09-02 | 
(4 rows)

Now we execute the following group by

marvel_tshirts=# SELECT name, MIN(birthday)
marvel_tshirts-# FROM friends
marvel_tshirts-# GROUP BY name
marvel_tshirts-# ORDER BY name;
   name   |    min     
 Emiliano | 2006-09-26
 Maria    | 1986-09-02
(2 rows)

It's a bit weird our current data, but what it means is:
  • * We want to group by each name, taking into account the MIN value of the birthday.

Let's do another example. Let's check this code.

Let's suppose we have a table called Movies. And inside it, we have the year of release and imdb_rating of each of them. Imdb_rating is a Rating given by users from the IMDB webpage. 

This code will group the movies by year, taking into account the average rating.

Year | Rating
2005 | 6.75
2006 | 6.71
2007 | 6.78
2008 | 7.00
2009 | 6.67
2010 | 6.92

This result is meaningful because we can see that the best year measured by the imdb_rating was 2008, which means the movies released that year was so good. After all, they were rated very well on average. At the same time, 2009 was the worst compared to the others. Ok, I think you got it. Group By is beneficial, and we'll be using it a lot later. 

I think we have enough for this time. I hope you learned a lot.

Thanks for reading
Daniel Morales