The challenges and importance of finding and choosing the right data model – and fully understanding it

Dear Data-Traveller, please note that this is a Linkedin-Remix.

I posted this content already on Linkedin in June 2022, but I want to make sure it doesn’t get lost in the social network abyss.

For your accessibility-experience and also for our own content backup, we repost the original text here.

Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!

Link to Post

Screenshot with Comments:

Plain Text:

Ok, I get that SQL writing part. But how do we structure data in the data warehouse? Where do I start? How to refactor? Where is the … boilerplate for this?

Data Modeling appreciation week.

These were my initial thoughts when starting to build data warehouses. And these are still my thoughts today (on a different level). I read some books (some of the classics). They gave me context but not really an answer.

For quite some time, I thought I was alone with that. When I asked these questions in the early dbt Slack community, no one could give me a good answer (well, you have staging and then mart or models for use cases).

And especially the model for use cases part drove and drives me crazy. By that, you pile up random models who, in some detail, are doing similar things but slightly different, and you create a swamp.

The swamp term is one of the best I heard about data setup, and I didn’t create it. Chad Sanderson did.

He was someone who showed me that I am not alone. And he is brighter and has more experience with these kinds of setups, but he has the same questions.

On LinkedIn, he has written for quite some time about the current state of data modeling and the (modern) data stack in general. About the challenges, the limitations, the trap doors.

And now he has even a Substack, which I highly recommend subscribing to.

Creating a data model for your business is complex, and there are no easy answers. And there is no tutorial out there. You can read about the different approaches and understand the pros and cons and start to experiment with them in your setups. Having a plan of what to achieve and test, refine, and test again is best.

But investing in your data model is appropriately the best idea.

Start with Chad’s latest post about the role and importance of a data model. It’s a great start to understanding the current state, the challenges, and the opportunity to invest in a data model.

Mentioned Substack: https://dataproducts.substack.com/p/the-death-of-data-modeling-pt-1