My confession and some tips to help you if you don’t like SQL either
Dear Data-Traveller, please note that this is a Linkedin-Remix.
I posted this content already on Linkedin in April 2022, but I want to make sure it doesn´t get lost in the social network abyss.
For your accessibility-experience and also for our own content backup, we repost the original text here.
Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!
Screenshot with Comments:
Plain Text:
Confession: I don’t like SQL.
When starting to work in data, I tried to ignore it for some time. I knew the basics from university, but it never triggered me to go deeper.
Instead, I went into Python and Pandas, and I still love it.
But at some point, it was clear that there was no way around SQL. Too many things in today’s setups are more manageable when you can write queries.
And if you and your company are venturing out to develop a modern data stack (good luck with that), advanced SQL skills are necessary.
Why? Because then it’s not only about getting the suitable tables, it’s about creating queries that:
– are maintainable – 500 line queries with three subqueries are hard to change
– are performant – yes, the cloud warehouse can’t do magic on all queries
– are built for the job – learn about materialization and when to use what
– are readable – you move on at some time, be nice to your successor
– are partitioned if needed – yes, cloud warehouses can cost much money if you ask them to run through your 300 GB of raw event data every time
And you can learn all these even when you don’t like SQL.
Here are some tips when you don’t like SQL:
– use CTEs (with statement blocks) – coming from python, this helped me to build a more suitable structure
– learn window functions – they look weird but are super helpful – at least look at RANK and ROW_NUMBER when you are dealing with multiple rows for the same things (due to data ingestions)
– I rarely use subquery – totally personal taste, but they creep me out
– embrace left joins – they are the friendly and predictable joins (do not create new rows in most cases – make sure that the joining data is unique on the matching criteria)
Today my work is 30-40% SQL – and it’s ok. I still wish for easier ways to refactor, decouple, and, most importantly, debug things.