Rohit G.

About Me

Experienced Data Engineer with expertise in building scalable data platforms, ML pipelines, and cloud solutions. R. G. specializes in Databricks, Azure, AWS, and MLOps with a proven track record of architecting enterprise data systems, optimizing complex workflows, and delivering significant business value. Skilled in PySpark, Python, and modern DevOps practices with multiple awards for innovation and technical excellence.

AI, ML & LLM

Azure OpenAI NLP Dataiku Azure AI Search Apache Airflow

Backend

Python Flask REST APIs

Database

DevOps

QA & Testing

Workflow

Other

Work history

British Petroleum (BP)
Data Engineer II (Data & AI)
2023 - 2026 (3 years)
Remote
  • Architected NZS platform centralizing carbon metrics across 4 global entities and built dual-path ingestion via Azure Logic Apps and Azure Data Factory with Unity Catalog Federation

  • Engineered PySpark pipelines to automate Delta-based YoY variance analysis and implemented Palantir Foundry dashboards with ontology merging and RBAC for secure data operations

  • Built RAG chatbot using Azure OpenAI, migrated 250+ tables to Unity Catalog, and optimized ingestion pipelines for cost reduction and failure recovery

Databricks Azure Data FactoryAzure DevOpsPalantir Foundry PysparkDelta Lake Unity Catalog Azure Logic Apps Azure OpenAI Bazel
ZS Associates
Senior Engineer (Data, MLOps & AI)
2020 - 2023 (3 years)
Remote
  • Optimized Spark jobs on Kubernetes clusters and automated migration of 30+ projects from EMR to EKS using Dataiku APIs with CI/CD pipeline automation

  • Integrated KubeFlow and built Flask APIs for ML Lifecycle monitoring, orchestrated ML workflows using tree algorithms, and deployed Lambda and Kubernetes-based pipelines

  • Built end-to-end ETL pipelines with Apache Airflow and Databricks, ingested millions of records into relational and graph databases, and maintained data infrastructure

DataikuKubernetesEKSSparkAWS EMRSonarQubeDockerAWS Lambda Step FunctionsAWS CodebuildKubeflowFlaskNLPFlair AWS ECS SeleniumStardog Apache Airflow Databricks GitLab CI/CD PythonLocust
ITC Infotech
Deep Learning Intern (Data, ML & AI)
2019 - 2019
Remote
  • Applied OpenCV DNN for face detection and calculated embeddings using FaceNet and TensorFlow

  • Clustered similar images using Chinese Whispers algorithm for real-time facial clustering

  • Implemented Siamese Networks for computer vision tasks with Euclidean distance calculations

Education

Education
B.Tech. in Computer Science Honors
Jaypee University of Information Technology
2016 - 2020 (4 years)
Education
Machine Learning
Coursera
Education
Knowledge Engineering Specialist
PoolParty
Education
Core Designer, Advanced Designer, Developer
Dataiku
Education
Cloud Practitioner Essentials
AWS
Education
Fundamentals of Big Data and Delta Lake
Databricks