Naman is a passionate data engineer with 7 years of experience in the analysis, design, development, implementation, maintenance and support, with experience in developing strategic methods for deploying big data technologies. Hands-on experience in the design, implementation, and productionalization of an enterprise-grade ML on big data solution for two Fortune 500 financial institutions. Highly-skilled in working with Databricks, Azure, ADF, Spark, and Scala.
Delivers data warehouse and ETL solutions as part of an agile team using advanced machine learning techniques to improve performance and processes.
Helps build and improve infrastructure, application and performance development and ensures tight security including data encryption, security groups, and environment scanning.
Ensures high-quality deliverables and implements DevOps and security best practices in fast-paced environments.
Architected and built a Terabyte scale Data-as-a-service platform using Snowflake, DBT, DBT Cloud, Gitlab, Periscope (Sisense), FiveTran & Stitch.
Led the Data Migration exercise to lift a large collection of their ANSI SQL logic running on AWS Redshift via Periscope to DBT SQL logic running on Snowflake via DBT Cloud.
Evangelized Data Engineering within the client by conducting learning sessions on DBT Cloud, thereby enabling the SQL analysts to raise PRs on core PROD logic.
Led a team of five engineers to productionalize an end-to-end platform for ingesting data from 200+ scrapers, parse it, normalize it, and produce it for downstream consumers.
Built a modern enterprise grade Delta Lake for online retail data using Databricks, Azure, ADF, Spark, and Scala.
Built the platform using open source software (hence no vendor lock-in and no black boxes in the architecture) and with technologies meant to scale.
Led the design, implementation, and productionalization of our enterprise grade ML on big data solution for 2 clients, both Fortune 500 financial institutions.
Significantly reduced the size of our deployments by combining over 40 drivers into a single Spark app. The shared dependencies and common functionalities increased code cohesiveness and ease of maintenance.
Advocated for and won consensus for replicating the target client cluster specifications and mocking client data internally, hence significantly increasing the stability of our deployments.
Designed and implemented a Spark app in Scala to build Apache Solr indices from Hive tables. The app was designed for roll back on any failure, and reduced the downtime for downstream consumers from ~3 hrs to ~10 seconds.
Productionalized the Spark app to ingest more than 100Gb of data as a daily batch job, partition and store as parquet in HDFS, with corresponding Hive partitions at the query layer. The app replaced a legacy Oracle solution and required ~10% of previous time.
Implemented Spark Structured Streaming app to ingest data from Kafka and upsert into Kudu tables in a kerborized cluster.
Was part of the Galvanize data science program on a Python-based curriculum that introduces best practices in machine learning, statistical analysis, natural language processing, and data visualization.
Created a two week capstone project which was a Mortgage Market Tri Analysis for optimizing capital allocation for mortgage market loans.
Used the ACF and PACF graphs to determine the appropriate number of AR and MA lags. The SARIMAX model was pretty successful in forecasting for the 30% unseen data.
Acted as Product Manager for the design and implementation of CA-Agile Central (Rally) framework (a SaaS platform for Agile development) for 7 individual work streams - increased the efficiency and quality of the software development process for our client and was greatly appreciated by them.
Established processes across the engagement that reduced build down-time by over 50%, allowing us to have two builds per week as opposed to one per week.
Analyzed, managed, and actively tracked completion metrics, defect status, and time & expenses - created forecasts and staffing models used to plan future releases, managed onboarding of new resources, and created client deliverables for executive leadership.
Automated Oracle source table validation with HDFS using Scala Spark.
Implemented a dynamic validation interface allowing users to choose between high-level and data-level validation, reducing manual debugging by over 99%.
Education
Cloudera Spark and Hadoop Developer
Certifications
Data Science program focusing on Python, Statistics, and Machine Learning
Galvanize - San Francisco, SoMa
Bachelor of Science (B.S.), Computer Science & Engineering 2014