Hands On

Curated List of Data Engineering Project

What We Want To Do

  • Data pipeline
  • Analysis Video Streaming pipeline
  • Practice coding and big data framework
  • Hands on with Cloud Provider
  • Fast to setup and learn to adapt with project
  • Wrap up knowledge and starting a new project

Setup Local Environment

As Data Engineer and Developer, we’ve spend hours and hours to configure terminal, services for different developments tool.

Let’s check out this repo Setup Data Dev Env for local environment setup.

This video will show how I hack my Macbook M1 configured for data engineering project. 11 Containers on 8Gb RAM, base version of Macbook M1

It would be great if we have a strong machine for practicing project as long as data and project has being scale bigger, but with the maximalism perspective → lets try with what you have on hands.

Before starting the data engineering practices, you’ll need to review the knowledge of fundamental of data engineering, check the Section 1 and Section 2 for scanning what knowledge you are missing

MonitoringInfra

Big Data Framework

  • Target: Have familiar with Big Data Framework, process data with Spark, set up basic components of data pipeline using Docker containers

  • Stack: Apache Spark, Apache Airflow, Python

  • Implementation: Checkout my repo Streaming data pipeline

AirflowSparkDemo

Kappa Data Pipeline aka Realtime using AWS

Data Modeling and Analytic Engineering

  • Target:

    • Transforming the data loaded in DWH to Analytical Views
    • Understand 4 aspects of the analysis
  • Stack: Airflow, dbt Cloud, Snowflake

  • Implementation: building data model, pipeline, quality, dashboard for booking. Check this repo Booking Project

dbtBuildCloud

Data pipeline with Open Source Mage AI and Clickhouse

Tip

You will have better learning and fun with OSS, purchase them instead of Snowflake and Databricks with extremely high of expenses.

Motivation of using Open Source: - Saving your money by using these Open Source Software OSS: - Having small business that processing data with streaming and batching,… - Practice data engineering skills: create pipeline, orchestration, warehouse with micro partitioning

MageAndClickhouse

AWS Ingestion Pipeline

  • Target: Designing and developing data pipeline architecture especially using AWS Data services. This is a typical data pipeline using AWS Data services.

  • Stack: AWS Clouds, AWS Data Services

  • Implementation: Checkout this blog post AWS Data Pipeline

aws-data-pipeline-design

Azure Data Pipeline in 1 hour

  • Target: In order to resolve the problems what we need to do is to understand the current data pipeline concepts and identify the area where we are working on the possible improvement and adopt the changes.

  • Stack: Azure Data Services, Power BI

  • Implementation: Checkout this blog post Azure Data Pipeline

Note

Start from basic and understand the fundamental knowledge of what services/tools your're using.
This way helps you understand and more easier to switch to other services.

Or you can check the Recent Note section to getting the list of projects, thoughts.

alt text

Design ETL Pipeline for Interview Assessment

  • Check out this Free Documentation: drive docs
  • If you want to get the Word version with editable format, you can check the sponsor & store

How to do everything

Keep the rules:

  • Start from small
  • Keep discipline
  • Consistent
  • Never end