Weekend Project – Building an event-driven data stack
Dear Data-Traveller, please note that this is a Linkedin-Remix.
I posted this content already on Linkedin in December 2021, but I want to make sure it doesn´t get lost in the social network abyss.
For your accessibility-experience and also for our own content backup, we repost the original text here.
Have a look, leave a like if you like it, and join the conversation in the comments if this sparks a thought!
Original Post:
Plain Text:
Weekend Project – Building an event-driven data stack
New Episode
The first data has arrived!
I got my two major event-data pipelines ready today:
1. Streamprocessor
This is a wonderful product by Robert Sahlin – still in Alpha stage – so it took a bit to set it up – but Robert helped my a lot.
But now schema-based event streaming pipeline works. Only case at the moment: When a contact is created in Hubspot – I immediately see this as event with context data in BigQuery – just 1-2s.
2. Jitsu
Initially I had planned to use Rudderstack here. But I came across Jitsu some weeks ago. And I like the architecture to set it up in your infrastructure more.
Because – I could easily set it up on Google Cloud using Cloud Run and Memorystore. And Jitsu also offers data streaming into BigQuery. So it arrives basically at the same time like Streamprocessor data
Why two systems?
Mostly to try them out. And there a significant differences. Streamprocessor uses Dataflow, Jitsu manages all in a docker container (that can be scaled into multiple instances).
SP is schema-based, Jitsu schemaless. SP through Dataflow enables real streaming applications (and there are interesting things in the pipeline eg. for PII tokenization).
Tomorrow I might show a quick video how the data flows.
If you are interested to see how I set up Jitsu on GCP, let me know in the comments. SP is still alpha and invite only – when this changed I will do a video as well.