UpStack - Grow your team with the top 1% Remote Software Developers

Naman J.

India

Hire Naman J. Hire Naman J. Hire Naman J.

About Me

<br>Naman is a passionate data engineer with 7 years of experience in the analysis, design, development, implementation, maintenance and support, with experience in developing strategic methods for deploying big data technologies. Hands-on experience in the design, implementation, and productionalization of an enterprise-grade ML on big data solution for two Fortune 500 financial institutions. Highly-skilled in working with Databricks, Azure, ADF, Spark, and Scala. <br>

Work history

UpStack

Senior Data Engineer

2020 - Present (5 years)

Remote

Delivers data warehouse and ETL solutions as part of an agile team using advanced machine learning techniques to improve performance and processes.
Helps build and improve infrastructure, application and performance development and ensures tight security including data encryption, security groups, and environment scanning.
Ensures high-quality deliverables and implements DevOps and security best practices in fast-paced environments.

Data Engineering Data Migration Scala Spark Spark Structured Streaming Azure Data Factory Azure Delta Lake Development Azure Event Hub Azure Databricks ETL Looker Data Modeling EMR Jenkins Hadoop

GeoLocation & Satellite Data Management co.

Independent Data Solutions Architect

2021 - 2022 (1 year)

Remote

Architected and built a Terabyte scale Data-as-a-service platform using Snowflake, DBT, DBT Cloud, Gitlab, Periscope (Sisense), FiveTran & Stitch.
Led the Data Migration exercise to lift a large collection of their ANSI SQL logic running on AWS Redshift via Periscope to DBT SQL logic running on Snowflake via DBT Cloud.
Evangelized Data Engineering within the client by conducting learning sessions on DBT Cloud, thereby enabling the SQL analysts to raise PRs on core PROD logic.

Data Architecture Data Modeling Data Engineering Snowflake DBT Gitlab AWS AWS Redshift AWS Glue Jenkins EMR Hadoop

E-commerce product

Senior Data Engineer

2019 - 2021 (2 years)

United States of America

Led a team of five engineers to productionalize an end-to-end platform for ingesting data from 200+ scrapers, parse it, normalize it, and produce it for downstream consumers.
Built a modern enterprise grade Delta Lake for online retail data using Databricks, Azure, ADF, Spark, and Scala.
Built the platform using open source software (hence no vendor lock-in and no black boxes in the architecture) and with technologies meant to scale.

Data Engineering Data Modeling Azure Databricks ADF Spark Scala Hadoop Jenkins

Stealth Start-up

Senior Data Engineer

2019 - 2020 (1 year)

Singapore

Led the design, implementation, and productionalization of our enterprise grade ML on big data solution for 2 clients, both Fortune 500 financial institutions.
Significantly reduced the size of our deployments by combining over 40 drivers into a single Spark app. The shared dependencies and common functionalities increased code cohesiveness and ease of maintenance.
Advocated for and won consensus for replicating the target client cluster specifications and mocking client data internally, hence significantly increasing the stability of our deployments.

Data Engineering Spark Machine Learning Data Modeling Hadoop Jenkins

phData

Senior Data Engineer

2017 - 2019 (2 years)

United States of America

Designed and implemented a Spark app in Scala to build Apache Solr indices from Hive tables. The app was designed for roll back on any failure, and reduced the downtime for downstream consumers from ~3 hrs to ~10 seconds.
Productionalized the Spark app to ingest more than 100Gb of data as a daily batch job, partition and store as parquet in HDFS, with corresponding Hive partitions at the query layer. The app replaced a legacy Oracle solution and required ~10% of previous time.
Implemented Spark Structured Streaming app to ingest data from Kafka and upsert into Kudu tables in a kerborized cluster.

Data Engineering Spark Scala Solr Spark Structured Streaming Hive Kafka Data Modeling EMR Jenkins Hadoop

Galvanize

Data Scientist Fellow

2016 - 2017 (1 year)

United States of America

Was part of the Galvanize data science program on a Python-based curriculum that introduces best practices in machine learning, statistical analysis, natural language processing, and data visualization.
Created a two week capstone project which was a Mortgage Market Tri Analysis for optimizing capital allocation for mortgage market loans.
Used the ACF and PACF graphs to determine the appropriate number of AR and MA lags. The SARIMAX model was pretty successful in forecasting for the 30% unseen data.

Data Engineering Data Science Data Migration Spark Scala Python Machine Learning Data Modeling Jenkins

Technology Analyst

2015 - 2016 (1 year)

United States of America

Acted as Product Manager for the design and implementation of CA-Agile Central (Rally) framework (a SaaS platform for Agile development) for 7 individual work streams - increased the efficiency and quality of the software development process for our client and was greatly appreciated by them.
Established processes across the engagement that reduced build down-time by over 50%, allowing us to have two builds per week as opposed to one per week.
Analyzed, managed, and actively tracked completion metrics, defect status, and time & expenses - created forecasts and staffing models used to plan future releases, managed onboarding of new resources, and created client deliverables for executive leadership.

Data Engineering Data Migration SaaS Agile Methodologies Spark Scala Data Modeling Java

Showcase

Data Engineer - E-commerce product content management

Developed a modern enterprise-grade Delta Lake platform using Databricks, Azure, ADF, Spark, and Scala.
Managed a team of 5 to produce a production-ready platform for ingesting data from 200+ scrapers.
The platform reliably processes over 2TB of data daily, encompassing both Batch and Streaming data.

Data Engineer - Stealth mode AI startup (Series A $20MM)

Managed the architecture and implementation of a distributed machine learning platform.
Produced and deployed 20+ machine learning models via Spark MLlib.
Reduced the startup's TTM from design to production by 50%.

Data Engineer - Dow Chemical (Fortune 62)

Developed 5 Scala Spark apps for ETL processes.
Automated Oracle source table validation with HDFS using Scala Spark.
Implemented a dynamic validation interface allowing users to choose between high-level and data-level validation, reducing manual debugging by over 99%.

Education

Cloudera Spark and Hadoop Developer

Certifications

Data Science program focusing on Python, Statistics, and Machine Learning

Galvanize - San Francisco, SoMa

Bachelor of Science (B.S.), Computer Science & Engineering 2014

The Ohio State University

Naman J.

Naman J.

About Me

Backend

DevOps

Workflow

Other

Work history

Senior Data Engineer

Independent Data Solutions Architect

Senior Data Engineer

Senior Data Engineer

Senior Data Engineer

Data Scientist Fellow

Technology Analyst

Showcase

Education