Feel free to share your way of checking the data transformations in your SQL setup for errors

Dear Data-Traveller, please note that this is a LinkedIn-Remix.

I posted this content already on LinkedIn in September 2022, but I want to make sure it doesn’t get lost in the social network abyss.

For your accessibility-experience and also for our own content backup, we repost the original text here.

Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!

Link to Post

Screenshot with Comments:

Screenshot of Timo Dechau's LinkedIn post with text about testing SQL use cases

Plain Text:

I am looking for some ideas for a specific SQL test case.

There are some metrics that business owners can immediately see in their dashboards if modeling is broken.

One is marketing spent or cost. A good marketing person would check from time to time the costs on a monthly basis in their ad platform and compare them to the cost in their campaign reporting dashboard. And if they don’t match, they will have questions for me.

Especially in bigger models (and depending on the structure), the costs can go through different transformations, especially if you are fancy enough to apply some attribution (which most of us are). But in the end, the sum should be the same.

But how to test this? Especially not only input and output but across the transformations (so I can easily spot when something is starting to be off).

My first approach for this looks like this:
– I create one table/view where I calculate a sum of the costs for the different tables based on various filter criteria (because, for fun, sometimes names are different across tables) – I do this in different CTEs for the sake of readability and quick extension
– I then finally combine it in one with plenty of union alls. And use a window function to compare a sum to the previous one
– when the sum of all comparisons is 0, the test can pass – this can be implemented with tools like Metaplane or dbt for constant monitoring

This works quite well for me, and the structure even calls for putting it into a dbt package.

How do you test these use cases?