Week 8 - Data Automation
Like other sections I have mentioned in this camping, Week 1 - Infrastructure includes how to set up your project with Docker and Terraform, as well as how to use Localstack to run AWS infrastructure locally (for testing purposes).
Other sections like Warehouse, Batching, Streaming, and Orchrestration with Airflow and everything are contrainerized and run in Docker containers. You futhermore will be able to run those containers with K8s and deploy them to Cloud services like AWS, GCP, or Azure.
Back in the day, DevOps was a big buzzword, and it was all about automating everything. Nowadays, DevOps is a set of practices that help you to automate your processes and reduce the time to market. It is now a starting point for other technologies which are used to automate processes such as: DevSecOps, DataOps, LLMOps, MLOps, FinOps, BIOps, etc.
BUT, no matter whwat it Ops is, remember that it is not a magic wand. It helps you to automate the process and reduce the risk of human error and ensure the quality of delivery.
What xOps areas for Data Automation covered?
- Data Quality with Data Quality Tools (dbt is a good example), sample Analytics Engineering with dbt
- Idenpotency with Data Lakehouse where you can replay, re-process, re-run data pipeline without introducing data drife and backfilling overhead.
I will not mention the CICD process to deploy data pieplines or how to manage data warehouses because it is a very broad topic with all tools and terminologies like Infrastructure as Code (IaC) with Standard tools like Terraform or Cloud SDK.
You will see the concepts and foundations is not changed, but the techologies are evolving and changing for every minute. xOps is not the one-size-fits-all solution, you will need to adapt it to your needs and business requirements.