Naga is a Senior Data Engineer with 10+ years of experience designing, developing, and optimizing entire data pipelines including Data Science algorithms in production at scale with operational reliability. He has extensive experience with Python, Shell Scripting, Big Data, Hadoop, Hive, Spark, Kafka, Airflow, as well as cloud environments including AWS and Azure. Naga's expertise includes assessing organizational data needs and architecting the right approach, migrating traditional warehouse platforms to Big Data if necessary.
Created a configuration-based data orchestration self-service platform including Airflow scheduling, ingestion, and key data pipelines in Kafka, Spark, Hive.
Supported AI/ML squads to enhance and enrich data services by embedding feature extraction into pipeline for Machine Learning and building models.
Migrated legacy Talend ETL jobs to AWS Lambda orchestrated by AWS managed Airflow.
Developed frameworks for Lambda to integrate with Salesforce, efficient data ingestion process along with custom operators in Airflow.
Designed and implemented a data ingestion and analytics pipeline for customer trials.
Implemented AWS S3 and Redshift-based data service to be used by Data Science teams.
Implemented various Pandas optimizations to speed up data science and data engineering code execution.
Redeveloped Data Science algorithms in production at scale to meet performance and operational stability needs.
Reimplemented core data science algorithms to match with Scala performance, enabling the business to maintain a single version of Python code across trials and production.
Implemented and enhanced a data ingestion framework in Big Data using Spark (PySpark), Solr, Hive, and Kafka (real-time data and stats streaming).
Designed and developed a tool to recommend the right plan to customers based on product and usage behavior, increasing customer satisfaction score and revenue (~ $5M annually).
Created and developed a tool to help consultants quickly process business segment customer requests related to cost center hierarchy allocation.
Reduced defects in pre-production test environments by 50% through quality code and automated unit testing.
Optimized the Oracle PL/SQL code and reduced it from 40K lines to 25K for easy maintainability, performance, and ability to deliver future enhancements quickly.
Improved performance and reduced processing time of a critical invoicing module from 3 hours to 2 hours 15 minutes.
Developed a tool to significantly reduce efforts in test data creation (30 minutes to 5 minutes for every customer in test environment).
Led a team of 3 developers for a year and successfully managed to deliver all requests on schedule within budget/time constraints.
ONZO is a global analytics company specializing in Smart Meter consumption data analysis.
The ATLAS platform analyzes consumer energy consumption to generate personalized customer experiences and support business challenges for Energy Companies.
ONZO has developed a data ingestion and analytics pipeline, including data science algorithms and infrastructure, for customer trials and production.