Senior MLOps & Data Infrastructure Engineer with 5+ years of experience architecting end-to-end production environments for AI/ML workloads. Expert in Model Deployment and Monitoring using AWS SageMaker, including managing real-time endpoints, batch inference, and model versioning via SageMaker Model Registry. Highly proficient in Infrastructure as Code (Terraform & CloudFormation) for automating secure, scalable cloud resources. Proven track record in building high-throughput ETL pipelines with AWS Glue and Airflow to deliver ML-ready datasets while ensuring operational excellence through CloudWatch performance monitoring.
Architected and managed AWS SageMaker deployment lifecycles, utilizing Model Registry for version control and deploying real-time endpoints for backend integration.
Implemented proactive Model Monitoring strategies using Amazon CloudWatch to track latency, throughput, and CPU utilization, ensuring high-availability operational SLAs.
Developed Terraform and CloudFormation scripts to automate the provisioning of SageMaker endpoints and supporting data infrastructure, reducing configuration errors.
Designed scalable ETL pipelines using AWS Glue and Airflow to transform unstructured data into curated, ML-ready datasets.
Automated batch jobs and data validation layers using AWS Batch and Python for large-scale analytical processing.
Developed Terraform scripts for provisioning AWS infrastructure components.
Collaborating with data scientists to deliver curated datasets for AI/ML use cases.
Automated ML Data Platform Managed a unified analytical ecosystem by executing Redshift SQL tuning for large queries and developing ELT workflows with DBT. Leveraged Terraform for full-stack infrastructure automation and deployed AWS Lambda for real-time data ingestion and system alerting. Integrated SageMaker endpoints with web interfaces via Django to provide real-time AI-driven business insights.