A few ways to do it and why it is really helpful

Dear Data-Traveller, please note that this is a Linkedin-Remix.

I posted this content already on Linkedin in June 2022, but I want to make sure it doesn’t get lost in the social network abyss.

For your accessibility-experience and also for our own content backup, we repost the original text here.

Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!

Link to Post

Screenshot with Comments:

Plain Text:

Context is the most beloved thing for data setup.

The more you can get, the better (yes, please deliver it in a structured way).

In most of setups I am working we enrich the collected data (behavioral, ad, crm and business events) with some kind of meta data that we collect from stakeholders.

Some examples:
– Marketing reports: channel mapping, campaign correction, campaign goal, campaign type, internal costs

– Content: author, focus topic, seo keyword

– Business: plan data (super useful)

So how to handle this context data?

It needs an easy interface, so business teams can add and update data easily.

My usual go-to approach here is Google sheets since most of my setups are using BigQuery and the integration is really easy and you can use the data in a query.

Super easy to use but also super easy to break. So make sure to explain to the teams how they can edit these sheets (no, you can’t just add a new column in between).

If you are working with dbt to orchestrate your sql, you can use their seed feature. Here you add CSV files and they are seeded into your database during runs. Problem: CSV files are not really user-friendly. Of course, you can use an Excel sheet as original and then always generate a CSV and add it to your repo.

How are you handling context data in your setup? What are some best practices you learned?