
Swapnil M.
About Me
Swapnil is a Senior Data Engineer with 8+ years of IT experience, proficient in both on-premise and cloud environments, with extensive knowledge of Big Data technologies.
Swapnil is a Senior Data Engineer with 8+ years of IT experience, proficient in both on-premise and cloud environments, with extensive knowledge of Big Data technologies.
Led a successful migration and optimization of a data warehouse infrastructure to Databricks, significantly improving data accessibility, performance, and security.
Analyzed existing infrastructure and identified bottlenecks for optimization.
Designed and implemented a migration strategy from on-premise data sources to Databricks.
Developed ETL pipelines to ensure seamless data integration into Databricks.
Optimized data models to enhance performance and reduce costs.
Integrated Databricks with Jenkins for CI/CD automation.
Developed an end-to-end architecture to clean, process, and aggregate data, generating KPI tables for business analytics and Tableau reporting.
Designed the project architecture and data flow for KPI generation.
Processed large datasets by cleaning and aggregating data as per business requirements.
Automated processes using Jenkins, managed Oozie jobs, and handled deployments.
Worked directly with stakeholders for project enhancements and requirements gathering.
Migrated multiple AWS-hosted applications to a centralized account and developed a modern data analytics lake, centralizing data from various sources.
Analyzed the existing architecture and migrated applications to a centralized AWS account.
Implemented and updated code in Scala and Python for data migration and ensured proper data permissions through Lake Formation.
Created and optimized AWS Glue jobs, Spark jobs on EMR, and automated infrastructure management via CloudFormation.
Built a data lake for storing and processing credit reporting data, enabling large-scale analytics and migrating existing credit scoring modules from legacy systems to Hadoop.
Developed Apache Spark jobs to load and process data into Hive external tables.
Automated code releases and scheduled jobs using GoCD pipelines.
Implemented regression testing scripts and unit tests using Scala Test/Flat Spec.
Streamlined data ingestion processes from multiple data sources.