Andrey is an experienced Engineer with 18+ years of industry expertise and know-how in the provision of diverse IT approaches, infrastructure and solutions for the cloud, containerization, Linux and DevOps. He has the technical wherewithal to manage large DevOp infrastructure; having provided 100% uptime for 300+ VMs in AWS, handled large sharded databases (20+ servers and 30+ Tb of Data), and ELK to inject over 1M of entries per hour.
Performed Kubernetes cluster management with 7 on-premises clusters on CoreOS (up to 200+ nodes), providing uptime and supporting k8s infrastructure, configuring and monitoring for k8s clusters.
Created a log management system based on FluentD and ElasticSearch with 4 ES Clusters (up to 25 nodes per cluster). Configured advanced logs parsing and deployed monitoring tools for logging.
Configured a monitoring system based on Prometheus, Grafana, NewRelic and PagerDuty. Managed AWS resources (EC2, VPC, S3, AWS ES, R53, Athena and etc.) and ES via Terraform, Ansible and Python
Synthesized and maintained the operational efficiency of the SaaS infrastructure keeping 100% uptime for 4 weeks in a row on release deployments for the client.
Optimized hardware infrastructure for efficiency; reducing monthly operational costs by $200K.
Implemented a logs management solution that injected 1 million records per hour and migrated large shared database (30+ Tb) on the infrastructure.