Anatomy of a SQL statement

SQL is a declarative, English-like domain language for querying, analyzing, and manipulating data. SQL originated from relational databases but has since seen wide adoption outside of them. Because SQL statements can almost be read as English sentences and its declarative nature, that is, users declare what results they want and not how to get to these results (the latter is the approach of imperative programming languages, such as C, Java, Python, etc), SQL is generally seen as one of the best high-level languages for analyzing data due to its easy-to-learn syntax.

SQL has different language elements that can, at the very top level, be divided between queries and data manipulation. SQL queries use the SELECT statement, while SQL used for data manipulation uses the INSERT, UPDATE, DELETE, and MERGE statements. The latter group of statements for data manipulation is referred to as Data Manipulation Language or DML.

SQL queries
- Using SQL Queries
  - SELECT and FROM
    - SELECT without FROM
  - JOIN
  - WHERE
  - GROUP BY
  - HAVING
  - ORDER BY
  - OFFSET
  - FETCH
    - OFFSET & FETCH
Data Manipulation Language (DML)
- Using DML
  - INSERT INTO
  - Update
  - DELETE
  - MERGE
Conclusion

SQL queries

SQL queries are probably the most common operations used in SQL, as they allow users to retrieve and analyze data from one or more tables. SQL query statements include the following elements:

SELECT
FROM
JOIN
WHERE
GROUP BY
HAVING
ORDER BY
OFFSET
FETCH

The SELECT statement includes a couple of elements, but only the first two are required: SELECT and FROM. However, some databases, including Oracle Database and MySQL, make the FROM clause optional if the SELECT refers only to self-contained expressions, such as SELECT 1; SELECT sysdate; and SELECT my_function();. In these cases, the data is not derived from a table, hence FROM is not required.

Optional components are illustrated by [ ] around them.

SELECT <expressions>
  FROM <table or sub-query>
    [ JOIN <to other table> ON <join condition> ]
  [ WHERE <predicates> ]
  [ GROUP BY <expressions>
    [ HAVING <predicates> ] ]
  [ ORDER BY <expressions> ]
  [ OFFSET ]
  [ FETCH ]

A common misconception is that these components execute in the same order as they appear in the query. This is not the case, as the SELECT component is processed after the HAVING clause. The following lists the order that which the clauses are processed and their purpose:

order	clause	purpose
1	`FROM`	Indicates which table(s) to retrieve data from. The `FROM` clause determines the working set of data that is being retrieved. This usually refers to a table, but can also include a subquery (another `SELECT` query that acts as the input source for the current query).
2	`JOIN`	Specifies the rules for joining multiple tables. The `JOIN` clause is part of the `FROM` clause and combines the data from multiple tables into one dataset. It is one of the fundamental operators of the relational model to combine different relations into one set. The JOIN clause allows join conditions that ensure that only rows that logically belong together are joined (rows with matching primary key –> foreign key relationships). Multiple `JOIN` clauses can be specified to join multiple tables into the data set. Because the `JOIN` clause is part of the `FROM` clause, it cannot be specified without a preceding `FROM` statement in the query.
3	`WHERE`	Filters the rows returned by the query. The `WHERE` clause filters the data set based on the provided predicates or filter conditions and discards all rows that do not match them. It narrows down the results, for example, to retrieve all `countries` from the continent of `Europe` instead of all countries in the world.
4	`GROUP BY`	Aggregates (or groups) rows with common values in the specified column(s) into one row. The resulting row is sometimes referred to as a grouping set. Because rows with common values are aggregated into one row, there will only be as many rows as there are unique values. For values of columns not specified in `GROUP BY`, aggregate functions in the `SELECT` clause are required to aggregate these values per group.
5	`HAVING`	Filters rows resulting from the `GROUP BY` clause. It is therefore a part of the `GROUP BY` and cannot be used without a preceding `GROUP BY` statement in the query.
6	`SELECT`	Defines the list of column(s) and expression(s) to appear in the query result output. The `SELECT` clause computes any expressions and defines the list of columns to be returned, or projected, as the query result.
7	`ORDER BY`	Identifies which column(s) to use to sort the resulting data and in which direction to sort them (ascending or descending). If the `ORDER BY` is omitted, the order of the rows returned by a SQL query is undefined.
8	`OFFSET`	Specifies the number of rows to skip in the result set before returning the data.
9	`FETCH`	Specifies the number of rows to return from the result.

Using SQL Queries

Now that you’re familiar with what the various SQL query clauses mean, you can start using them. You can follow along with these exercises using the data model in my GitHub repository.

SELECT and FROM

In its simplest form, a SQL query comprises of a SELECT and FROM clause:

SQL> SELECT *
  2*  FROM regions;

REGION_ID    NAME
____________ ________________
AF           Africa
AN           Antarctica
AS           Asia
EU           Europe
NA           North America
OC           Oceania
SA           South America

7 rows selected.

This query selects all rows and all columns (as indicated by the * after the SELECT which is short for “all columns”) from a table called regions. If we wanted to return a given list of columns, these can be called out specifically, for example:

SQL> SELECT name
  2*  FROM regions;

NAME
________________
Africa
Antarctica
Asia
Europe
North America
Oceania
South America

7 rows selected.

SELECT without FROM

The SELECT statement is also able to compute expressions, for example, 1+2. Technically, neither the constant 1 nor the constant 2 are coming from any table, but the ISO SQL standard nevertheless requires a FROM clause. Many databases have hence provided “dummy” tables to enable such queries, for example, the dual table in Oracle Database.

SQL> SELECT 1+2
  2*  FROM dual;

   1+2
______
     3

However, many databases, including Oracle Database, have also loosened this restriction from the SQL standard and allow for queries to omit the FROM clause in such cases:

SQL> SELECT 1+2;

   1+2
______
     3

JOIN

The relational model is all about normalizing data, that is, putting independent data into separate tables and defining relationships between these tables. To recombine such normalized data, joins are provided to join these tables back together.

The following example has two tables: the previously queried regions table, and the new countries table. To write a query that joins both tables into one result, you use the JOIN clause. Without a JOIN clause, if you specify the two tables in the FROM clause, every row from the table regions will be multiplied by every row from the table countries. This is often referred to as the cross product and it’s a common mistake that SQL beginners make. For example:

SQL> SELECT r.name, c.name
  2*  FROM regions r, countries c;

NAME             NAME
________________ ___________________________________
Africa           Kosovo
Africa           Yemen
Africa           South Africa
Africa           Zambia
Africa           Zimbabwe
Africa           Andorra
Africa           United Arab Emirates
Africa           Afghanistan
Africa           Antigua and Barbuda
Africa           Albania
Africa           Armenia
Africa           Angola
Africa           Argentina
Africa           Austria
Africa           Australia
...
...
...
South America    Uzbekistan
South America    Vatican City
South America    Saint Vincent and the Grenadines
South America    Venezuela
South America    Vietnam
South America    Vanuatu
South America    Samoa

1,372 rows selected.

The output from this query is obviously incorrect. There are neither 1,372 countries, nor is Austria located in Africa. What we really want is to join all rows from the countries table with the regions table where the region_id is the same. This is generally referred to as the join condition and can be specified in the ON clause as part of the JOIN clause:

SQL> SELECT r.name, c.name
  2   FROM regions r
  3    JOIN countries c
  4*    ON (r.region_id=c.region_id);

NAME             NAME
________________ ___________________________________
Africa           South Africa
Africa           Zambia
Africa           Zimbabwe
Africa           Angola
Africa           Burkina Faso
Africa           Burundi
Africa           Benin
...
...
...
South America    Ecuador
South America    Guyana
South America    Peru
South America    Paraguay
South America    Suriname
South America    Uruguay
South America    Venezuela

196 rows selected.

That’s more like it!

There is one more thing to note: The above queries specified SELECT r.name, c.name and put the letters r and c next to the table names. These are table aliases and are required to tell the database which table column you want. If the statement just said SELECT name, name, it would be unclear whether the query refers to the regions table column name or the countries table column name.

WHERE

The WHERE clause filters the rows produced by the FROM clause. Until now, you have always gotten all the rows that were in the tables. If you want to return only all countries in South America, this is where the WHERE clause comes in. It can be used to match all rows with the regions.name column value 'South America':

SQL> SELECT r.name, c.name
  2   FROM regions r
  3    JOIN countries c
  4     ON (r.region_id=c.region_id)
  5*  WHERE r.name = 'South America';

NAME             NAME
________________ ____________
South America    Argentina
South America    Bolivia
South America    Brazil
South America    Chile
South America    Colombia
South America    Ecuador
South America    Guyana
South America    Peru
South America    Paraguay
South America    Suriname
South America    Uruguay
South America    Venezuela

12 rows selected.

GROUP BY

The GROUP BY clause is often a puzzle for SQL beginners. The clause is used to aggregate multiple rows into a group, essentially combining multiple rows into one row. When would such a thing ever be useful? Well, for example, the countries table contains a column called population, but the regions table does not:

SQL> SELECT r.name, c.name, c.population
  2   FROM regions r
  3    JOIN countries c
  4     ON (r.region_id=c.region_id)
  5*  WHERE r.name = 'South America';

NAME             NAME            POPULATION
________________ ____________ _____________
South America    Argentina         44694000
South America    Bolivia           11306000
South America    Brazil           208847000
South America    Chile             17925000
South America    Colombia          48169000
South America    Ecuador           16291000
South America    Guyana              741000
South America    Peru              31331000
South America    Paraguay           7026000
South America    Suriname            598000
South America    Uruguay            3369000
South America    Venezuela         31689000

12 rows selected.

A common business question might be: “What is the total population of each region?” Given that the regions table does not have a column with that information, the answer can be provided only by calculating the sum of the population column for each country per region. So, you need a mechanism that takes the 196 rows of the countries table and puts them into seven groups or buckets based on their region (because there are seven rows in the regions table). However, the query cannot just put the 196 rows into seven rows; it needs to calculate the sum of the population per region based on the populations of the countries belonging to the region.

This can be done by applying the SUM() aggregate function over the population column:

SQL> SELECT r.name, SUM(c.population)
  2   FROM regions r
  3    JOIN countries c
  4     ON (r.region_id=c.region_id)
  5*  GROUP BY r.name;

NAME                SUM(C.POPULATION)
________________ ____________________
Africa                     1263685000
Asia                       4439011000
Europe                      748985000
North America               575767000
Oceania                      37556000
South America               421986000

6 rows selected.

This query shows something else interesting. Although there are seven regions in the regions table, the query produced six rows. This is because there is a region 'Antarctica', but there is no country with that region_id in the countries table. Hence, the JOIN clause filters that region out (because there is no matching region_id in the countries table as specified by the ON clause).

A GROUP BY clause does not require any JOIN clause; you can create groups in just a single table. For example, “How many countries start with the same letter?” can also be answered via a GROUP BY. To do this, create as many groups per the unique first letter value for all rows by using the SUBSTR() function, and then count the rows that fall into that group or category:

SQL> SELECT SUBSTR(name,1,1), COUNT(*)
  2   FROM countries
  3*  GROUP BY SUBSTR(name,1,1);

SUBSTR(NAME,1,1)       COUNT(*)
___________________ ___________
K                             6
Y                             1
S                            26
Z                             2
A                            11
U                             7
B                            17
C                            17
D                             5
G                            11
E                             8
F                             3
M                            18
H                             3
I                             8
J                             3
N                            11
L                             9
O                             1
P                             9
Q                             1
R                             3
T                            12
V                             4

24 rows selected.

HAVING

The HAVING clause filters rows resulting from the GROUP BY clause based on the predicate(s) provided. For example, if you want to return only the regions that have a population of more than 500 million people, this cannot be specified in the WHERE clause because the WHERE clause is processed before the GROUP BY clause. Hence, the WHERE clause has no notion of the population of a region. This is where the HAVING clause comes in. From a logical perspective, it behaves the same as the WHERE clause, but it filters at a different processing stage:

SQL> SELECT r.name, SUM(c.population)
  2   FROM regions r
  3    JOIN countries c
  4     ON (r.region_id=c.region_id)
  5   GROUP BY r.name
  6*   HAVING SUM(c.population) > (500 * 1000 * 1000);

NAME                SUM(C.POPULATION)
________________ ____________________
Africa                     1263685000
Asia                       4439011000
Europe                      748985000
North America               575767000

ORDER BY

The ORDER BY clause sorts the resulting data. So far, the undefined sorting of the rows has worked out, except when it came to “countries per first letter.” The ORDER BY clause can be used to return the rows in alphabetical order:

SQL> SELECT SUBSTR(name,1,1), COUNT(*)
  2   FROM countries
  3   GROUP BY SUBSTR(name,1,1)
  4*  ORDER BY SUBSTR(name,1,1);

SUBSTR(NAME,1,1)       COUNT(*)
___________________ ___________
A                            11
B                            17
C                            17
D                             5
E                             8
F                             3
G                            11
H                             3
I                             8
J                             3
K                             6
L                             9
M                            18
N                            11
O                             1
P                             9
Q                             1
R                             3
S                            26
T                            12
U                             7
V                             4
Y                             1
Z                             2

24 rows selected.

By default, rows are sorted in ascending order, but you can reverse it with the DESC (DESCENDING) keyword:

SQL> SELECT SUBSTR(name,1,1), COUNT(*)
  2   FROM countries
  3   GROUP BY SUBSTR(name,1,1)
  4*  ORDER BY SUBSTR(name,1,1) DESC;

SUBSTR(NAME,1,1)       COUNT(*)
___________________ ___________
Z                             2
Y                             1
V                             4
U                             7
T                            12
S                            26
R                             3
Q                             1
P                             9
O                             1
N                            11
M                            18
L                             9
K                             6
J                             3
I                             8
H                             3
G                            11
F                             3
E                             8
D                             5
C                            17
B                            17
A                            11

24 rows selected.

OFFSET

The OFFSET clause specifies the number of rows to skip before starting to return data. This clause is shorthand for what would otherwise require analytic queries or subqueries. For example, asking “Give me all countries in South America ranked by square kilometers except the first three” can be answered with:

SQL> SELECT c.name, c.area_sq_km
  2   FROM countries c
  3    JOIN regions r
  4     ON (c.region_id=r.region_id)
  5   WHERE r.name = 'South America'
  6   ORDER BY area_sq_km DESC
  7*  OFFSET 3 ROWS;

NAME            AREA_SQ_KM
____________ _____________
Colombia           1138910
Bolivia            1098581
Venezuela           912050
Chile               756102
Paraguay            406752
Ecuador             283561
Guyana              214969
Uruguay             176215
Suriname            163820

9 rows selected.

FETCH

The FETCH clause specifies the number of rows to return from the result. Some databases call this the LIMIT clause. Like the OFFSET clause, this is also a shorthand and can be used to answer business questions like “What are the top three countries in terms of population?” This can be answered with:

SQL> SELECT name, population
  2   FROM countries
  3   ORDER BY population DESC
  4*  FETCH FIRST 3 ROWS ONLY;

NAME                POPULATION
________________ _____________
China               1384689000
India               1296834000
United States        329256000

You may wonder what would happen if two rows tie in the third position; will both rows be returned? Or just the first? For these cases, the FETCH clause provides the ONLY and WITH TIES keywords. The above just used ONLY because it is unlikely that two countries have the same population.

However, when ranking countries by letter, there is much more room for overlap. For example, in the countries per first letter example, when ranked by number of countries, it is clear that some letters have the same number:

SQL> SELECT SUBSTR(name,1,1), COUNT(*)
  2   FROM countries
  3   GROUP BY SUBSTR(name,1,1)
  4*  ORDER BY COUNT(*) DESC;

SUBSTR(NAME,1,1)       COUNT(*)
___________________ ___________
S                            26
M                            18
B                            17
C                            17
T                            12
A                            11
N                            11
G                            11
L                             9
P                             9
I                             8
E                             8
U                             7
K                             6
D                             5
V                             4
J                             3
H                             3
F                             3
R                             3
Z                             2
Q                             1
Y                             1
O                             1

24 rows selected.

If you run the same FETCH clause on this query, the letter C would be omitted from the results, although it has exactly the same number of countries as the letter B:

SQL> SELECT SUBSTR(name,1,1), COUNT(*)
  2   FROM countries
  3   GROUP BY SUBSTR(name,1,1)
  4   ORDER BY COUNT(*) DESC
  5*  FETCH FIRST 3 ROWS ONLY;

SUBSTR(NAME,1,1)       COUNT(*)
___________________ ___________
S                            26
M                            18
B                            17

This is where the WITH TIES keyword comes in, as it will include ties in the results:

SQL> SELECT SUBSTR(name,1,1), COUNT(*)
  2   FROM countries
  3   GROUP BY SUBSTR(name,1,1)
  4   ORDER BY COUNT(*) DESC
  5*  FETCH FIRST 3 ROWS WITH TIES;

SUBSTR(NAME,1,1)       COUNT(*)
___________________ ___________
S                            26
M                            18
B                            17
C                            17

OFFSET & FETCH

Combining the OFFSET and FETCH clauses allow another nice shorthand that would otherwise require an analytical query or subquery. Consider the question “What is the second smallest country on the planet in terms of square kilometers?” This can be answered by combining OFFSET to return results starting from the second row, and FETCH to fetch just the second row

SQL> SELECT name, area_sq_km
  2   FROM countries
  3   ORDER BY area_sq_km
  4   OFFSET 1 ROW
  5*  FETCH FIRST 1 ROW ONLY;

NAME         AREA_SQ_KM
_________ _____________
Monaco                2

Data Manipulation Language (DML)

The Data Manipulation Language is a set of SQL statements used to add, update, and delete data. SQL used for data manipulation uses INSERT, UPDATE, DELETE and MERGE statements.

clause	purpose
`INSERT`	Inserts data in a table by adding one or more rows to a table.
`UPDATE`	Updates one or more rows in a table.
`DELETE`	Deletes one or more rows from a table.
`MERGE`	Can be used to add (insert) new rows, update existing rows or delete data in a table, depending on whether the specified condition matches. It is a convenient way to execute one operation, where you would otherwise have to execute multiple `INSERT` or `UPDATE` statements.

Using DML

Now that you’re familiar with what the various DML statements mean, you can start using them. You can follow along with these exercises using the data model in my GitHub repository

INSERT INTO

The INSERT INTO statement adds rows to a table. It can be used by either defining one or more rows using the VALUES clause or by inserting the result of a subquery. Take a look at the VALUES clause first:

SQL> -- Creates an empty copy of countries called my_tab
SQL> CREATE TABLE my_tab AS SELECT * FROM countries WHERE rownum=0;

Table MY_TAB created.

SQL> INSERT INTO my_tab (country_id, country_code, name, population, region_id)
  2*  VALUES (1, 'GV', 'State of Gerald', 1, 'AN');

1 row inserted.

SQL> SELECT * FROM my_tab;

COUNTRY_ID    COUNTRY_CODE    NAME               OFFICIAL_NAME       POPULATION    AREA_SQ_KM    LATITUDE    LONGITUDE TIMEZONE    REGION_ID
_____________ _______________ __________________ ________________ _____________ _____________ ___________ ____________ ___________ ____________
1             GV              State of Gerald                                 1                                                    AN

The VALUES clause allows multiple rows to be defined by separating them with a comma (,):

SQL> INSERT INTO my_tab (country_id, country_code, name, population, region_id)
  2   VALUES (2, 'VX', 'Venzi Country', 1, 'AN'),
  3*         (3, 'XX', 'Gerald Island', 1, 'AN');

2 rows inserted.

SQL> SELECT * FROM my_tab;

COUNTRY_ID    COUNTRY_CODE    NAME               OFFICIAL_NAME       POPULATION    AREA_SQ_KM    LATITUDE    LONGITUDE TIMEZONE    REGION_ID
_____________ _______________ __________________ ________________ _____________ _____________ ___________ ____________ ___________ ____________
1             GV              State of Gerald                                 1                                                    AN
2             VX              Venzi Country                                   1                                                    AN
3             XX              Gerald Island                                   1                                                    AN

To use a SQL query as input for the INSERT statement, just replace VALUES with SELECT. The columns of your table and the SELECT list must match:

SQL> INSERT INTO my_tab SELECT * FROM countries;

196 rows inserted.

SQL> SELECT *
  2   FROM my_tab
  3*  FETCH FIRST 5 ROWS ONLY;

COUNTRY_ID    COUNTRY_CODE    NAME                                OFFICIAL_NAME                          POPULATION    AREA_SQ_KM    LATITUDE    LONGITUDE TIMEZONE              REGION_ID
_____________ _______________ ___________________________________ ___________________________________ _____________ _____________ ___________ ____________ _____________________ ____________
VAT           VA              Vatican City                        Vatican City State                           1000          0.44    41.90225      12.4533 Europe/Vatican        EU
VCT           VC              Saint Vincent and the Grenadines                                               102000           389    13.08333        -61.2 America/St_Vincent    NA
VEN           VE              Venezuela                           Bolivarian Republic of Venezuela         31689000        912050           8          -66 America/Caracas       SA
VNM           VN              Vietnam                             Socialist Republic of Vietnam            97040000        331210    16.16667    107.83333 Asia/Ho_Chi_Minh      AS
VUT           VU              Vanuatu                             Republic of Vanuatu                        288000         12189         -16          167 Pacific/Efate         OC

Update

The UPDATE statement updates entries in a table. It has a SET clause that sets columns to a given value and a WHERE clause to specify which rows to update. You almost always want a WHERE clause for your UPDATE statement; otherwise, the UPDATE statement will update all rows in the table.

SQL> UPDATE my_tab
  2   SET population = 2
  3*  WHERE country_code = 'GV';

1 row updated.

SQL> SELECT *
  2   FROM my_tab
  3*  WHERE country_code = 'GV';

COUNTRY_ID    COUNTRY_CODE    NAME               OFFICIAL_NAME       POPULATION    AREA_SQ_KM    LATITUDE    LONGITUDE TIMEZONE    REGION_ID
_____________ _______________ __________________ ________________ _____________ _____________ ___________ ____________ ___________ ____________
1             GV              State of Gerald                                 2                                                    AN

The UPDATE statement can also join other tables to update rows based on a WHERE clause condition outside of the table that is being updated. For example, say you want to adjust the population of all countries in South America by 10% more (an expression formulated as population*1.1). You can filter the rows to update based on a filter via the regions table for the countries that have the appropriate region_id for South America:

SQL> UPDATE countries c
  2   SET c.population = c.population*1.1
  3   FROM regions r
  4    WHERE c.region_id=r.region_id
  5*   AND r.name = 'South America';

12 rows updated.

DELETE

The DELETE statement deletes rows in a table and works very similarly to the UPDATE statement. As with UPDATE, with the DELETE statement you almost always want a WHERE clause; otherwise, you will delete all rows in a table.

SQL> DELETE FROM my_tab
  2*  WHERE country_code = 'GV';

1 rows deleted.

Also like the UPDATE statement, you can apply the same filter based on other tables’ column values:

SQL> DELETE FROM my_tab c
  2   FROM regions r
  3    WHERE r.region_id=c.region_id
  4*   AND r.name = 'Antarctica';

2 rows deleted.

MERGE

The MERGE statement is more sophisticated than the INSERT, UPDATE and DELETE statements. The MERGE statement allows you to conditionally insert or update (and even delete some) rows with one execution. This is most helpful when you want to load data into tables with existing rows and, for example, do not want to manually check whether a given row already exists. If it does, you would need to issue an UPDATE statement or an INSERT statement otherwise. Instead, you can write one statement with a matching condition that will do the INSERT or UPDATE automatically for you.

Imagine every night you get a file with updated data from all the countries in the world. Some countries may have reported new population numbers, and very occasionally, a new country is formed. Instead of running a bunch of UPDATE statements and rerunning the corresponding INSERT statement only when an UPDATE statement returns 0 rows updated, you can do both with one MERGE statement.

First, load all the data into an empty staging table (in this example, my_tab), and from there run the MERGE statement to merge the data into the target table (in this example, the countries table):

SQL> MERGE INTO countries c
  2   USING my_tab m
  3   ON (c.country_id=m.country_id)
  4   WHEN NOT MATCHED THEN
  5    INSERT VALUES (m.country_id, m.country_code, m.name, m.official_name, m.population, m.area_sq_km, m.latitude, m.longitude, m.timezone, m.region_id)
  6   WHEN MATCHED THEN
  7*   UPDATE SET c.population=m.population;

196 rows merged.

The statement above merges data into the countries table based on matching country_id (primary key) values. If the countries table includes a row with the same country_id value as the my_tab table, then the statement just updates the population column (as seen within the WHEN MATCHED THEN UPDATE clause). If the MERGE statement doesn’t find a corresponding row with the same country_id values in the countries table, then it inserts the row with all the fields into the countries table.

The MERGE statement also provides some flexibility. Say that you just want to update the countries table but never insert into it. You can just omit the WHEN NOT MATCHED INSERT clause:

SQL> MERGE INTO countries c
  2   USING my_tab m
  3   ON (c.country_id=m.country_id)
  4   WHEN MATCHED THEN
  5*   UPDATE SET c.population=m.population;

196 rows merged.

Conclusion

SQL is a powerful, widely adopted, declarative language for data processing and data manipulation. Understanding the core components of SQL and how it operates is the first step to unleashing its power on your data.

The data model used in this blog post can be found at: https://github.com/gvenzl/sample-data/tree/main/countries-cities-currencies

This blog post was originally published in two parts on The New Stack: How to Write SQL Queries and How to Use Data Manipulation Language (DML) in SQL