My own take on building a high quality data stack that is simple and fast enough to be viable and fun to handle

Dear Data-Traveller, please note that this is a Linkedin-Remix.

I posted this content already on Linkedin in May 2022, but I want to make sure it doesn´t get lost in the social network abyss.

For your accessibility-experience and also for our own content backup, we repost the original text here.

Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!

Link to Original Post

Screenshot with comments:

Plain Text with Image:

Last year – oh no, it was already four months ago.

I wrote about my weekend side-project – creating a hipster data stack with many cool tools—just a playground to try new things.

So, happened?

First of all, the playground got severe.

I need some data about the content I am doing at deepskydata. I also need some data about my funnels. It’s not playing anymore.

And it took too much time. Sure, it’s a side project. But even with dedicating more time, it takes some time to build all parts and connect them.

I need a fast track.

Interestingly most of my clients need this fast track too. They are often startups or established companies that want to grow a service a product (quickly). They need to find out instantly where they are stuck, what segment works for them, and which one is too expensive.

The setups I am creating for them are not built on the Modern Data Stack. Creating the data model would work (but would take more time) – but most of them don’t have a data team. So extending the setup becomes impossible.

Instead, we implement something different:

> Data schema

We introduce a hierarchy for the data we collect. Business core events are our backbone.
These events are designed carefully, tracked from reliable sources, monitored, and match the operational data 100% (no more missing transactions). We can do this with Avo or Segment protocols (but I recommend Avo here).

> Data collection

We use one layer for all event data. We are receiving it from the frontend, backend, or SaaS tools. We can do this with Rudderstack, Snowplow (special case), Segment, or Jitsu.

> Data activation

For most companies, there are these core functions where data can immediately help:

– Visualize the customer journey funnel (and in cohorts to see improvements over time) – this tells you where you need to focus.

– Show if growth experiments (the work on marketing, sales, or product features) change the funnel (aka business outcome)

– Segmentation, Segmentation, Segmentation to find over- and under-performers within these reports (helps you with the optimization)

To simplify, we don’t create a SQL data model – we use product analytics tools for that: Amplitude, Mixpanel, Heap, or Posthog (which I am testing out extensively with the help of restack).

No dbt, no airflow. For now.

What about the beloved data warehouse? We don’t build one yet.

But we still use the tools behind it, like Snowflake and BigQuery. But differently.

We use them to create Data Adds-ons, Extensions, or Apps (however you want to call them).

Some examples:

> Enrich with backend data

> Control and enrich the event data before it enters

> Marketing cost attribution

Do you work with a similar approach or similar to some extent? Or do you think the modern stack is far easier to maintain than I described it? Let me know

Because I spent so much time thinking about the Simple data stack – I created a new Substack for it where I will post my findings:

So head over and subscribe when you want to learn more.