Sumit S.

About Me

Lead DevOps Engineer with 8+ years of experience designing and operating cloud-native, distributed systems on AWS, including 5+ years managing production-grade Kubernetes deployments at scale. S. S. brings expertise in Terraform, Ansible, CI/CD, and Docker, with strong programming in Python, Bash, and Golang. Proven track record of building internal platforms, improving system reliability to 99.9%+ uptime, and enabling multi-team environments through automation and observability. Passionate about Platform Engineering with a focus on Kubernetes-based systems and AI-driven DevOps, experienced in end-to-end MLOps pipeline orchestration, model deployment, and ML observability.

AI, ML & LLM

Backend

Database

DevOps

Workflow

Git GitHub Actions GitOps

Other

Work history

Useful BI
Lead DevOps Engineer
2024 - 2026 (2 years)
Remote
  • Designed and built cloud-based platform components on AWS enabling 15+ teams to provision isolated environments, databases, and IAM securely via Python-driven automation

  • Designed and deployed end-to-end ML training pipelines on Kubernetes using Argo Workflows, reducing model release cycle time by 40%

  • Led Kubernetes platform operations managing production-grade deployments across multiple microservices, ensuring 99.9%+ availability and zero-downtime releases using GitOps

EPAM Systems
Senior Devops Engineer
2022 - 2024 (2 years)
Remote
  • Designed and deployed large-scale distributed systems on AWS using Kubernetes and Terraform, supporting enterprise workloads across multiple environments

  • Provisioned and managed GPU-enabled Kubernetes node pools on AWS using Terraform, supporting distributed model training workloads

  • Architected a centralized log aggregation system using CloudWatch Metric Streams, Kinesis Data Firehose, and Splunk, handling millions of events per day

Opstree Solutions
Devops Specialist
2021 - 2022 (1 year)
Remote
  • Developed Ansible-based configuration management solutions for provisioning golden AMIs and managing infrastructure consistency across environments

  • Implemented platform observability using Prometheus and Grafana across EC2, ECS, and EKS clusters

  • Developed automated remediation services using Python and AWS Lambda to respond to security threats and misconfigurations

DXC Technology
Devops Engineer
2018 - 2020 (2 years)
Remote
  • Built and operated containerized microservices using Docker and ECS, supporting scalable backend workloads

  • Developed secure, hardened base images for Python and Java services, integrating vulnerability scanning into CI/CD pipelines

  • Built hardened Docker base images for Python and Java with integrated security scanning, reducing vulnerabilities by 70%

Education

Education
Bachelor of Technology (B Tech) CSE
GGSIPU
2013 - 2017 (4 years)
Education
AWS Certified Cloud Practitioner - Foundational
Education
HashiCorp Certified: Terraform Associate
Education
AWS Certified Solutions Architect – Associate