#006: Good Habits to Help You Write Cleaner SQL
Jun 19, 2022Even in 2022, SQL is absolutely still one of the most important languages for data engineers.
Yet despite its long lifespan many developers continue to write lazy and poorly written code.
In this edition I want to share with you a few habits to help you avoid being “that developer”
And instead write cleaner SQL code.
Use CTEs or Temp Tables
A common issue I see is trying to stuff all logic into one big SELECT statement.
While this will technically work and give you a result...
There are a few reasons why you should consider using CTEs/Temp Tables.
Logically group concepts
It’s hard to explain a column’s logic if it is mixed in with tons of other joins and conditions.
It becomes hard to read and you may even impact other data in unexpected ways.
Breaking out logic into CTEs/Temp Tables helps isolate logic (easier to define).
And gives your query a nice overall flow (easier to follow).
Easier debugging
You will also have a hard time debugging a single SELECT without impacting the final result.
Breaking up the logic makes this process much easier, and less risky.
It will be very clear what is (and is not) going on in a particular CTE/Temp Table.
You'll be able to identify issues much faster.
And limit the potential impact of any new changes.
Better performance
This last point is primarily for traditional row-based databases, such as SQL Server.
Keeping it all in one SELECT forces the query engine to do a lot of work for each row.
Even if it’s not necessary.
With temp tables, you can isolate the work and only have it process what it needs.
Suggestion:
Create your first temp table to act as a “base data set”.
Use this to identify only the ID's of records needed for the final result set.
Then, all following temp table build on top of this base set.
Rather than re-joining with the entire source table each time.
I’ve seen this strategy dramatically improve query times on SQL Server.
Consistent Spacing
Extra or inconsistent spacing won’t technically impact the result itself.
However, being mindful of spacing in your queries still has a lot of benefits.
Fold/Unfold code blocks
Most code editors are built-in with great features to help improve your workflow.
One of which is the ability to fold/unfold code blocks.
I use all the time to help me focus on only the snippets of code I care about.
(*Hint - this is also made easier by implementing the CTE/Temp Table approach)
If your spacing is consistent, you help the editors allow folding in the right spots.
If your spacing/indentations are all over the place...
It won’t turn out the way you anticipated and defeat the purpose of this great feature.
Easier to read
Imagine reading a book where
each line started with a different indentation.
Or if certain words had more spaces
in between than others.
It would be very frustrating and difficult to read.
The same is true when reading SQL code.
Keeping spaces consistent will make it easier to read and work with.
Looks more professional
This should come as no surprise...
But consistency and organization simply looks more professional.
This will be noticed immediately once somebody opens your query.
The first impression will show you are diligent with your craft.
And therefore have certain abilities (or potential) as a developer.
Consistent Case & Style
Consistent case and style will separate average from great looking SQL queries.
For example always using upper or lower case for column names.
Or using snake_case vs camelCase.
The best way to handle this is to create a team style guide.
Here is a great example of one from the folks at dbt labs.
This is helpful for a few reasons.
Easier on-boarding
We all come from different backgrounds and have our own ways of developing.
When joining a new team there is always phase of getting used to the new workflow.
A clear style guide helps new members contribute right away and not feel left out.
Easier reviews
Consistency drives efficiency.
If the style is consistent, it will help all developers more easily review each other’s code.
Which means it can get moved to production faster.
Suggestion:
A nice way to handle this without feeling annoying is to implement a Linter.
These are programs you can add to your project to make sure your code follows certain conditions before it can be merged.
It’s much less offensive when these tedious changes are called out by a computer.
Rather than a human co-worker.
A great example of a helpful linter is SQL Fluff.
Looks better
Similar to spacing, having consistency just plain looks better.
An analogy I like to think of is having a clean room.
If you walk into your room and it's clean and everything is in order...
It has an immediate calming effect.
Otherwise, the initial feeling is that of organized chaos.
Which has the opposite effect.
That's all for this edition. One (or more) tips to help you level up your skills as a data engineer.
If you found this helpful, the best way to say thanks is to share it with somebody else.
Thank you for reading and I'll see you next time.
- Mike
New to data engineering? Check out my FREE Starter Guide PDFs.
The Starter Guide for Modern Data
Build Modern Data Architectures With More Structure, Faster.
Show more impact with modern tools like dbt, Snowflake & GitHub by following a simple foundational design.
You'll also get other helpful content from me. Unsubscribe anytime.