What they really mean and why that’s important
Dear Data-Traveller, please note that this is a Linkedin-Remix.
I posted this content already on Linkedin in April 2022, but I want to make sure it doesn´t get lost in the social network abyss.
For your accessibility-experience and also for our own content backup, we repost the original text here.
Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!
Screenshot with comments:
In our little data world are we naming things too much based on our marketing perspective. And is there serious over-selling going on.
Let’s do some examples:
dbt is not a data model tool. I see this notion quite often. It’s first a SQL orchestration and testing tool. Of course, I can use it to build and manage a data model. But this requires me to do the thinking not dbt
Snowflake and BigQuery are not data warehouses. Great people like .Rogier Werschkull. and Chad Sanderson remind us about that. They are analytical databases in the cloud. Of course, you can build a data warehouse with them. But this requires you to come up with a concept and architecture.
Fivetran and Airbyte are not ELT tools – they extract and load for you. And you are in charge of the transformation. They are basically supermarkets with self-checkout. Great idea but you have to do more.
Segment and Rudderstack are not really CDPs – Arpit Choudhury has written a great piece about it – they are customer data infrastructure, the collection and identity stitching layer
Reverse ETL is just ETL
Why is this important?
Because often these labels create expectations about the solution that these tools can’t fulfill.
When I set up Snowflake and think that I have a data warehouse now – I create huge expectations in my organization that I can’t fulfill.
Same with dbt – Ok, we need a data model, let’s use dbt for this. And then you add one sql file to the next one and call it a model.
Tools are tools, just that.