ETL-Data-warehouse-Pipeline-with-PostgresDB-Mage-Bigquery-and-Lockerstudio

This project is about building an end to end ETL data pipeline with Postgres database as the staging area, data is converted into Fact and Dimension Tables and loaded into Mage an open source modern data engineering data pipeline tool for transforming and integrating data, Bigquery as the data warehouse and Lockerstudio.

Building-weather-data-pipeline-orchestration-with-apache-airflow

This project involves creating and monitoring an ETL pipelines using Apache Airflow. A weather data was used in for this project and it is stored in postgres database using a docker-compose file.

Building-an-end-to-end-data-warehouse-streaming-pipeiline-with-Apache-kafka-spark-and-mongoDB

This project involves building an end to end data streaming pipeline with the use of APIs, Apache kafka, Spark and MongoDB as a document store and visualised with streamlit dashboard all together packaged in a docker enviroment with individual docker containers running.

Streaming-data-with-Apache-Kafka-on-postgres-using-debezium-connector

This is a project that shows the use of Apache Kafka in real time streaming database changes using python scripts from Postgres relational database with Debezium Postgres connector.

Data-manipulation-with-Apache-Spark

This project involves the use of Apache Spark for Data for different data manipulation with Spark-Python API (Pyspark), SparkSQL,RDDs, Sparkdataframe performing Cleaning, Filtering, Joins, Aggregation, Grouping Partitioning :Extracting data, performing transformations and actions and loading it into Postgres Database.

Data-manipulation-with-Apache-Spark-on-Linux-Machine

This is a project which involves High volume data manipulation with Apache Spark on Linux Machine.

Training-a-car-model-using-SparkML-libraries

This is a Apache Spark project that involves Training a Car Model (CSV) format, with Weight, Horsepower and Miles per gallon (mpg) coumns as the input columns. Pyspark Machine learning libraries like Vector Assembler, Correlation, Linear Regression, Normalizer was used to train these imput columns.

Building-an-etl-ingestion-data-pipeline-with-apache-airflow

This project involves building an ETL pipeline for a data set transforming data into scripts, storing and querying in postgres database (Pgcli and Pgadmin). Ingested data is then dockerized and Apache Airflow is used to monitor data workflow in AWS cloud storage.

Streaming-time-series-data-with-influxdb-and-grafana-dashboard

This is a project for time based data like measurmant from IOT devices or events been streamed to a database from machines, sensors which are visualized on a dashboard. This is a project of timeseries data with InfluxDB and Grafana dashboards.

Data-pipeline-on-AWS

This is a project that involves building a platform and pipeline on AWS for Data Engineering.

CERTIFICATIONS

IBM Data Engineering Professional Certificate (Coursera)

IBM Relational Database Administration (Coursera)

IBM Data Warehouse Professional Certificate (Coursera)

IBM NoSQL Professional Certificate (Coursera)

Learn Data Engineering

AWS Cloud Practitioner Essentials Certificate (IBM/COURSERA)

<