Week 1 - Infrastructure and Configuration
- Overview of Architecture
- Technologies
- Data Open Sources (Postgres, Iceberg, MinIO, Kafka, Spark, dbt, Airflow, Deequ, DuckDb…)
- Docker
- Terraform
- AWS Cloud
Watch this video https://youtu.be/S-RfuZGlLUQ
I put the sample code here: Github repository
Docker + Postgres
Practice this Code
What you will learn from This Week?
- Introduction to Docker
- Why do we need Docker
- Getting start with Docker 10 minutes with Docker
- Building Docker image and running docker container
- Ingesting NY Taxi Data to Postgres
- Running Postgres locally with Docker
- Using
pgcli
for connecting to the database - Exploring the NY Taxi dataset
- Ingesting the data into the database
- Try to Deploy your own python data processing in Docker
- Note if you have problems with
pgcli
, check this video for an alternative way to connect to your database
- Connecting pgAdmin and Postgres
- The pgAdmin tool by using information provided in coding section
- Docker networks
- Putting the ingestion script into Docker
- Converting the Jupyter notebook to a Python script
- Parametrizing the script with argparse
- Dockerizing the ingestion script
- Running Postgres and pgAdmin with Docker-Compose
- Why do we need Docker-compose
- Docker-compose YAML file
- Running multiple containers with
docker-compose up
- SQL refresher
- Adding the Zones table
- Inner joins
- Basic data quality checks
- Left, Right and Outer joins
- Group by
- Optional: If you have some problems with docker networking, check Port Mapping and Networks in Docker
- Docker networks
- Port forwarding to the host environment
- Communicating between containers in the network
.dockerignore
file
AWS + Terraform
Practice this Code
- Introduction to AWS (Amazon Web Service)
- Introduction to Terraform Concepts & AWS Pre-Requisites
- Creating AWS Infrastructure with Terraform
Environment setup
For the course you’ll need:
- Python 3 (e.g. installed with Anaconda)
- Docker with docker-compose
- Terraform
Check out my dotfile for setting up as Data Engineer
What make you better ?
- Know and practice how Docker and Docker Compose Works
- Know and practice how to create/start/pull docker image
- Know and practice how to create docker containers using docker compose
- Know and practice what is IaC and Terraform
- Know and practice how to create Service with Terraform
- Know and practice how to writing function to extract/load/transform data
- Know and practice Data Warehouse as basis? (e.g: Snowflake, or only Postgres)