Ketan Hadkar

Aspiring Data Engineer | Databricks & AWS Certified

Mumbai, IN.

About

Highly motivated early-career Data Engineer with a strong foundation in building and optimizing scalable ETL pipelines and data warehouses on AWS. Proven ability to leverage technologies like Apache Spark, Python, and SQL to process multi-format data, automate workflows, and deliver actionable insights. Eager to apply certified cloud and big data expertise to drive data-driven solutions in a dynamic environment.

Work

Mactores

Data Engineer Intern

Mumbai, Maharashtra, India

Mar 2025

→

Jul 2024

Summary

• Gaining exposure to Apache Spark, AWS, and Databricks through internship training. • Earned Databricks Certified Data Engineer Associate and AWS Certified Cloud Practitioner, demonstrating skills in data ingestion, transformation, and cloud workflows. • Built and tested ETL pipelines in personal projects using Spark, Airflow, and key AWS services. • Explored and implemented pipeline design strategies such as incremental loading, partitioning, and orchestration to improve efficiency and scalability

Education

Vasantdada Patil Pratishthan's College of Engineering & Visual Arts, Mumbai University

Aug 2020

→

May 2024

B.Tech

Electronics & Telecommunication Engineering (EXTC)

Grade: 7.85 CGPA

Certificates

AWS Certified Cloud Practitioner

Aug 2025

Issued By

Amazon Web Services (AWS)

Databricks Certified Data Engineer Associate

Jul 2025

Issued By

Databricks

Skills

Programming Languages

Python, SQL, PySpark.

Databases

MySQL, MongoDB.

Big Data & ETL Frameworks

Apache Spark, Hadoop, Apache Airflow, Databricks.

Developer Tools & Concepts

Git, Docker, VS Code, PyCharm, IntelliJ, Eclipse, SFTP Server, DBeaver, YAML, Linux (Ubuntu).

Cloud Platforms & Services

AWS, Databricks.

Projects

ETL Data Pipeline and Warehouse Implementation using AWS

Summary

Designed and implemented a robust, scalable ETL pipeline and data warehouse on AWS, integrating multi-format data for advanced analytics and reporting.

Built Automated Incremental ETL Pipeline

Summary

Designed and automated an ETL pipeline using Apache Spark, MySQL, S3, and Airflow, incorporating full-load and incremental strategies with upsert functionality. Optimized performance through advanced partitioning and mirroring, reducing processing time by 40% for large-scale datasets and enabling faster data retrieval and insights.