Zemata's data needs for now are uncomplicated and can be met with a simple Lambda Architecture.

The Business unit's data needs are more urgent. Analysts need to generate reports on user conversion rate, Ads success rates etc., daily. While Data science unit needs are less urgent but more encapsulating.

The Data

Site visitor data like location, click ID, web URL, type of device, etc. from the frontend services will be collected for both the Business unit and Data Science unit. Adequate provisions will also have to be made for other types of data such as Video, Images, Text.

<aside> 👉🏾 We will use Fake Web Events, a python package that generates fake web events in a neat json format, to simulate site visitor data.

</aside>

Ingesting The Data

Cloud Pub/Sub is Google's asynchronous messaging service. Useful for Load Balancing (distributing large queue of tasks), event notification, data ingestion and streaming etc. Pub/Sub is our data ingestion hero. Our front and backend services can publish events data to a topic and have various services subscribe to it.

Creating a Pub/Sub topic is easily done with one command:

gcloud pubsub topics create zematadata

A topic looks like this:

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/3d172eb7-5f87-47e0-8cf3-b683e375aa80/Annotation_2021-04-21_133941.png

Publishing the data

With the right permissions, applications can publish messages to pub/sub topics. Client Libraries are the preferred way for our apps to communicate with cloud resources and Google offers them in various languages. Including Python which happens to be my fave.