📄 Table of Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬ ✦ ✦ ✦ ▬▬▬▬▬▬▬▬▬▬▬▬▬▬
SQL: Understanding Joins
SQL helps us handle data with relational databases. When it comes to having different tables in a database, we need to find a way to connect the tables’ data, and this is where joins come in to save the day!
There a couple of different types of joins:
- (Inner) join
- Left (OUTER) join
- Right (OUTER) join
- Full (OUTER) join
Note: In some databases
RIGHT JOIN is called
RIGHT OUTER JOIN. and so on.
See the image below to give you a visualization of those different joins:
Let’s explore the relationship between these two tables using different types of JOINS available:
◉ Inner Join
An inner join will allow us to return data records that have a matching value in both tables. In mathematical terms, an inner join is the intersection of the two tables.
We can create an INNER JOIN between the two tables via the shared ArtistId value, see the syntax of using inner join to connect the tables data here:
◉ Left (OUTER) Join
A left join keyword will return all data records from the left table and the data records that meet a condition from the right table.
◉ Right (OUTER) Join
The right join is actually quite similar to the left join! With the right join, we get the flip side of results.
The RIGHT JOIN will return all rows from the right table and all matched rows from the left table.
◉ Full (OUTER) Join
A full join will go ahead and return all the rows from the joined tables whether they are matched or not.
NOTE:- SQLite does NOT support the use of a full join sometimes referred to as a full outer join
Bonus: Cross Join
Not mentioned earlier, but something worth knowing in your backpocket is the cross join!
A cross join creates a product of rows from the two tables, without a join condition. Syntax for a cross join:
You can see there are some duplicate data results, a use case for a cross join would be to create an initial data set that would allow for further queries or analysis.
Time/Date functions in SQL
◉ SQL Server
ROW_NUMBER() is a window function that assigns a sequential integer to each row within the partition of a result set. The row number starts with 1 for the first row in each partition.
The following shows the syntax of the
row_number() OVER (partition by dts.aDate , nomligne ORDER BY entrydate desc) as rownumber
Using SQL Server
ROW_NUMBER() over partitions example
The following example uses the
ROW_NUMBER() function to assign a sequential integer to each customer. It resets the number when the city changes:
ROW_NUMBER() OVER (
PARTITION BY city
ORDER BY first_name
city;Code language: SQL (Structured Query Language) (sql)
The following picture shows the partial output:
In this example, we used the
PARTITION BY clause to divide the customers into partitions by city. The row number was reinitialized when the city changed.
use Case of using ROW_NUMBER()
One of the excellent use cases of using ROW_NUMBER() is to delete all duplicated rows except unique row
◉ CAST Function
CAST function converts the selected data type into another. Pretty straight forward. It changes one type into your preferred type. The syntax is below.
CAST(expression AS datatype)
Below is an example of how it can be applied to date and time.
CAST(NOW() AS TIMESTAMP),
CAST(NOW() AS DATE),
CAST(NOW() AS TIME),
In this query, we expect to have 6 outputs. If you are unfamiliar with
CURRENT_TIME they are SQL functions that retrieve the current time or date. Below are all the outputs from the query in order. (note — you will get different numbers since the functions call the exact time or date)
- 2020–03–28 23:18:20.261879+00:00 # returns timestamp plus timezone
- 2020–03–28 23:18:20.261879 # returns only timestamp (sans timezone)
We can see how the
CAST function works with time, and the last two outputs using
CURRENT_TIME are just there for you to see the comparison of results.
More Examples — Not with timestamps
SQL also allows
CAST() functions to be used with types that are not timestamped.
CAST(1.34 AS INT),
CAST(1 AS BOOLEAN),
CAST(2.65 AS DEC(3,0))
The results from this query are,
- 1 → Since an integer can’t have decimals, it will round to the nearest integer value
- true → 1 as a boolean is true, and 0 is false
- 3 → using the
DEC()we can also do the reverse of our first integer CAST.
Next, we can look at pulling a specific format out of a timestamp. The goal is to extract a portion out of a timestamp. For example, if we want just the month from the date 12/10/2018, we would get December (12).
Let’s take a look at
EXTRACT(part FROM date)
We state the type of extraction we want as part and then the source to be extracted date.
EXTRACTis an import tool in time series data analysis. It helps you isolate groups in your timestamps to aggregate data based on precise timings.
Let’s assume we run
NOW() and our timestamp is 2020–03–29 00:27:51.677318+00:00, we can use
EXTRACTto get the following.
EXTRACT(MINUTE FROM NOW()) AS MINUTE,
EXTRACT(HOUR FROM NOW()) AS HOUR,
EXTRACT(DAY FROM NOW()) AS DAY,
EXTRACT(WEEK FROM NOW()) AS WEEK,
EXTRACT(MONTH FROM NOW()) AS MONTH,
EXTRACT(YEAR FROM NOW()) AS YEAR,
EXTRACT(DOW FROM NOW()) AS DAY_OF_WEEK,
EXTRACT(DOY FROM NOW()) AS DAY_OF_YEAR,
EXTRACT(QUARTER FROM NOW()) AS QUARTER,
EXTRACT(TIMEZONE FROM NOW()) AS TIMEZONE
We see that we can go into great detail in how we want to extract information out of our timestamps.
Note — DOW — The day of the week is Sunday(0) to Saturday(6).
Truncate — to shorten by or as if by cutting off
The purpose of truncating a date in SQL is to return an interval at an absolute precision. The precision values are a subset of the field identifiers that can be used with the
DATE_TRUNC will return an interval or timestamp rather than a number.
The syntax for
time_column is the database column that contains the timestamp you'd like to round, and
‘[interval]’ dictates your desired precision level.
Let’s assume our
NOW() returns the same 2020–03–29 00:27:51.677318+00:00, we can use
date_part to get the following.
CAST(DATE_TRUNC('DAY', NOW()) AS DATE) AS DAY,
CAST(DATE_TRUNC('WEEK', NOW()) AS DATE) AS WEEK,
CAST(DATE_TRUNC('MONTH', NOW()) AS DATE) AS MONTH,
CAST(DATE_TRUNC('YEAR', NOW()) AS DATE) AS YEAR
Think of using
DATE_TRUNC as getting the interval of where it is currently, and each level of an interval is how the date gets trimmed. Note — we did not need to use
CAST in this example. I used it to ensure the format was clean for analysis.
We can use
DATE_TRUNC on the rental car scenario from earlier, and try to find which day in the year regardless of time is more popular for rentals.
CAST(DATE_TRUNC('DAY', RENTAL_DATE) AS DATE) AS RENTAL_DAY,
COUNT(*) AS RENTALS
ORDER BY RENTALS DESC
generate_series in Postgres
generate_series as the name implies allows you to generate a set of data starting at some point, ending at another point, and optionally set the incrementing value.
generate_series works on two datatypes:
Let’s get started with the most basic example:
FROM generate_series(1, 5);
generate_series pretty straight-forward, but what interesting ways can it be used?
Generating fake data
By putting our
generate_series inside a CTE we can easily now generate a set of numbers and then perform some operation against each value. If we want to generate some fake number we can use
random() which generates a random number between 0.0 and 1.0.
WITH numbers AS (
FROM generate_series(1, 5)
)SELECT generate_series * random()
FROM numbers; ?column?
SQL aggregate function like
SUM have something in common: they all aggregate across the entire table. But what if you want to aggregate only part of a table? For example, you might want to count the number of entries for each year.
In situations like this, you’d need to use the
GROUP BY clause.
GROUP BY allows you to separate data into groups, which can be aggregated independently of one another.
ORDER BY clause allows you to reorder your results based on the data in one or more columns.
Ascending order, and it’s SQL’s default. If you order a numerical column in ascending order, it will start with smaller (or most negative) numbers, with each successive row having a higher numerical value than the previous.