Week 8 - Data Automation

Like other sections I have mentioned in this camping, Week 1 - Infrastructure includes how to set up your project with Docker and Terraform, as well as how to use Localstack to run AWS infrastructure locally (for testing purposes).

Other sections like Warehouse, Batching, Streaming, and Orchrestration with Airflow and everything are contrainerized and run in Docker containers. You futhermore will be able to run those containers with K8s and deploy them to Cloud services like AWS, GCP, or Azure.

Back in the day, DevOps was a big buzzword, and it was all about automating everything. Nowadays, DevOps is a set of practices that help you to automate your processes and reduce the time to market. It is now a starting point for other technologies which are used to automate processes such as: DevSecOps, DataOps, LLMOps, MLOps, FinOps, BIOps, etc.

BUT, no matter whwat it Ops is, remember that it is not a magic wand. It helps you to automate the process and reduce the risk of human error and ensure the quality of delivery.

What xOps areas for Data Automation covered?

Tip

Start to build the slim dbt CICD process with GitHub

  • Idenpotency with Data Lakehouse where you can replay, re-process, re-run data pipeline without introducing data drife and backfilling overhead.

I will not mention the CICD process to deploy data pieplines or how to manage data warehouses because it is a very broad topic with all tools and terminologies like Infrastructure as Code (IaC) with Standard tools like Terraform or Cloud SDK.

You will see the concepts and foundations is not changed, but the techologies are evolving and changing for every minute. xOps is not the one-size-fits-all solution, you will need to adapt it to your needs and business requirements.