Naman J.

Naman J.

Senior Data Engineer

India
Hire Naman J. Hire Naman J. Hire Naman J.

About Me

Naman is a passionate data engineer with 7 years of experience in the analysis, design, development, implementation, maintenance and support, with experience in developing strategic methods for deploying big data technologies. Hands-on experience in the design, implementation, and productionalization of an enterprise-grade ML on big data solution for two Fortune 500 financial institutions. Highly-skilled in working with Databricks, Azure, ADF, Spark, and Scala.

Work history

UpStack
UpStack
Senior Data Engineer
2020 - Present (4 years)
Remote
  • Delivers data warehouse and ETL solutions as part of an agile team using advanced machine learning techniques to improve performance and processes.

  • Helps build and improve infrastructure, application and performance development and ensures tight security including data encryption, security groups, and environment scanning.

  • Ensures high-quality deliverables and implements DevOps and security best practices in fast-paced environments.

GeoLocation & Satellite Data Management co.
GeoLocation & Satellite Data Management co.
Independent Data Solutions Architect
2021 - 2022 (1 year)
Remote
  • Architected and built a Terabyte scale Data-as-a-service platform using Snowflake, DBT, DBT Cloud, Gitlab, Periscope (Sisense), FiveTran & Stitch.

  • Led the Data Migration exercise to lift a large collection of their ANSI SQL logic running on AWS Redshift via Periscope to DBT SQL logic running on Snowflake via DBT Cloud.

  • Evangelized Data Engineering within the client by conducting learning sessions on DBT Cloud, thereby enabling the SQL analysts to raise PRs on core PROD logic.

E-commerce product
E-commerce product
Senior Data Engineer
2019 - 2021 (2 years)
United States of America
  • Led a team of five engineers to productionalize an end-to-end platform for ingesting data from 200+ scrapers, parse it, normalize it, and produce it for downstream consumers.

  • Built a modern enterprise grade Delta Lake for online retail data using Databricks, Azure, ADF, Spark, and Scala.

  • Built the platform using open source software (hence no vendor lock-in and no black boxes in the architecture) and with technologies meant to scale.

Stealth Start-up
Stealth Start-up
Senior Data Engineer
2019 - 2020 (1 year)
Singapore
  • Led the design, implementation, and productionalization of our enterprise grade ML on big data solution for 2 clients, both Fortune 500 financial institutions.

  • Significantly reduced the size of our deployments by combining over 40 drivers into a single Spark app. The shared dependencies and common functionalities increased code cohesiveness and ease of maintenance.

  • Advocated for and won consensus for replicating the target client cluster specifications and mocking client data internally, hence significantly increasing the stability of our deployments.

phData
phData
Senior Data Engineer
2017 - 2019 (2 years)
United States of America
  • Designed and implemented a Spark app in Scala to build Apache Solr indices from Hive tables. The app was designed for roll back on any failure, and reduced the downtime for downstream consumers from ~3 hrs to ~10 seconds.

  • Productionalized the Spark app to ingest more than 100Gb of data as a daily batch job, partition and store as parquet in HDFS, with corresponding Hive partitions at the query layer. The app replaced a legacy Oracle solution and required ~10% of previous time.

  • Implemented Spark Structured Streaming app to ingest data from Kafka and upsert into Kudu tables in a kerborized cluster.

Galvanize
Galvanize
Data Scientist Fellow
2016 - 2017 (1 year)
United States of America
  • Was part of the Galvanize data science program on a Python-based curriculum that introduces best practices in machine learning, statistical analysis, natural language processing, and data visualization.

  • Created a two week capstone project which was a Mortgage Market Tri Analysis for optimizing capital allocation for mortgage market loans.

  • Used the ACF and PACF graphs to determine the appropriate number of AR and MA lags. The SARIMAX model was pretty successful in forecasting for the 30% unseen data.

EY
EY
Technology Analyst
2015 - 2016 (1 year)
United States of America
  • Acted as Product Manager for the design and implementation of CA-Agile Central (Rally) framework (a SaaS platform for Agile development) for 7 individual work streams - increased the efficiency and quality of the software development process for our client and was greatly appreciated by them.

  • Established processes across the engagement that reduced build down-time by over 50%, allowing us to have two builds per week as opposed to one per week.

  • Analyzed, managed, and actively tracked completion metrics, defect status, and time & expenses - created forecasts and staffing models used to plan future releases, managed onboarding of new resources, and created client deliverables for executive leadership.

Portfolio

Data Engineer - E-commerce product content management
Data Engineer - E-commerce product content management

The project involved an e-commerce product content management for online retail. Managed to build a modern enterprise grade Delta Lake using Databricks, Azure, ADF, Spark, and Scala. I was leading a team of 5 to productionalize an end-to-end platform for ingesting data from 200+ scrapers, parse it, normalize it, and produce it for downstream consumers. The platform was eventually reliably processing moe than 2TB of data every day (both Batch & Streaming).

Data Engineer - Stealth mode AI startup (Series A $20MM)
Data Engineer - Stealth mode AI startup (Series A $20MM)

I managed the architecture and implementation of a distributed machine learning platform, productionizing 20+ machine learning models via Spark MLlib. Built products and tooling to reduce time to market (TTM) for machine learning projects. Reduced the startup's TTM from the design phase to Production by 50%. Productionalized 8 Scala Spark applications to transform the ETL layer to feed into the machine learning models downstream. Used Spark SQL for ETL and Spark Structured Streaming & Spark MLlib for analytics. Technologies used in the project: Scala, Spark SQL, Spark MLlib, Machine Learning, Spark Structured Streaming, Linux, Bash.

Data Engineer - Dow Chemical (Fortune 62)
Data Engineer - Dow Chemical (Fortune 62)

Productionalized 5 Scala Spark apps for ETL. Also wrote multiple bash scripts for the automation of these jobs. Architected and productionalized a Scala Spark app for validating the Oracle source tables with their ingested counterparts in HDFS. The user could dynamically choose to conduct either a high-level validation or a data level validation. The output of the app in case of a discrepancy was to output the exact columns and the exact rows that mismatched between source and destination. Hence, the app reduced the engineer's manual debug workload by over 99%, reducing it to just running the app and then reading the human-readable output file. Delivered the entire ETL and validation project ahead of schedule. Technologies use in the project: Scala, Spark SQL, Oracle Database, Linux, Bash.

Education

Cloudera Spark and Hadoop Developer
Cloudera Spark and Hadoop Developer
Certifications
Data Science program focusing on Python, Statistics, and Machine Learning
Data Science program focusing on Python, Statistics, and Machine Learning
Galvanize - San Francisco, SoMa
Bachelor of Science (B.S.), Computer Science & Engineering 2014
Bachelor of Science (B.S.), Computer Science & Engineering 2014
The Ohio State University