UpStack - Grow your team with the top 1% Remote Software Developers

Rajagopal B.

Hyderabad, India

Hire Rajagopal B. Hire Rajagopal B. Hire Rajagopal B.

About Me

Rajagopal is a highly skilled Data Engineer with advanced expertise in Python programming, relational databases, and designing scalable data pipelines. His expertise includes data modeling and transforming large, complex datasets (both structured and unstructured) using distributed systems like Apache Spark and Flink. Rajagopal is adept at query optimization and building efficient data solutions in cloud environments like AWS. He is a collaborative problem-solver focused on contributing to feature development and ensuring data integrity within fast-paced, AI-driven teams.

Work history

Staff Data Engineer

2024 - Present (1 year)

Bangalore, India

Spearheaded the setup of core processes and best practices for the Trust Data Science team, including data governance, reliability improvements, and foundational dataset development.
Developed a real-time metrics dashboard by building a Flink-based streaming service to process live event data and generate platform health metrics, significantly improving incident response.
Led an org-wide initiative to overhaul data logging practices, enabling self-serve access to key metrics and reducing dependency on the DS team.
Developed a centralized Supertable for the Trust Review Operations team as part of a data democratization initiative, saving 200+ man-hours annually.

Apache FlinkPinot Kafka Spark PysparkTrino Airflow Java Python ScalaPower BI Tableau GCPBigQuery Databricks Data EngineeringData Governance Datasets Dashboard Development

Uber

Data Engineer

2019 - 2024 (5 years)

Bangalore, India

Built and maintained a central warehouse that powers foundational datasets, ensuring high data quality and freshness within defined SLAs for 50+ Tier1/Tier2 datasets.
Designed a scalable analytics query gateway service for the Customer Obsession domain at Uber, enabling flexible metric querying across multiple data sources and reducing onboarding time.
Led a data governance project to enhance data management practices by establishing clear ownership for datasets and implementing Time to Live (TTL) policies.
Built anomaly detection systems that proactively detected business issues, saving ~$500K.
Built a data validation utility to compare datasets, which saved ~50 man-hours monthly.
Created access controls for all datasets based on a least privilege access control principle, helping in preventing accidental data loss, and recovery would take ~1 month.
Worked closely with ML teams to build features needed for ML models like R&A issuance, which has a budget of $1 billion annually.
Designed ETL pipelines that are GDPR compliant and adhering to security practices.
Owned 100+ Tier 1/Tier 2 datasets, responsible for their SLAs and performance optimization.

Apache FlinkApache Kafka Pinot Apache SparkGolang Pyspark Java Python Scala Looker GCPBigQuery Presto Airflow Kubernetes Big Data AWS SQL HiveSpark Streaming Apache Hudi Data EngineeringETL Pipelines Data Governance Data ManagementDatasets Data Validation Performance Optimization

Walmart

Data Engineer | Big Data Intern

2016 - 2019 (3 years)

Remote

Developed a real-time audience size estimation that reduced processing time from 2 hours to 5 minutes with a small error margin.
Co-developed an ETL pipeline in Apache Spark for rule-based campaigns to improve efficiency for larger scales.
Co-developed an engineering pipeline to use Machine Learning models for campaigns, impacting revenue by $10M.
Developed an application that analyzes log messages and alerts when an application goes down.
Improved the existing algorithm by 3 times and disk space by 20 times.
Worked in the Walmart Ad Exchange (WMX) team, building data pipelines for audience targeting and campaign measurement.
Co-implemented a propensity segment pipeline as an ML-based audience targeter worth millions of dollars for Walmart and got an approximate estimate of audience size in real time using data sketches.
Migrated data between clusters and changed logic and code when upstream data source/infrastructure changed.

Python Scala Java Spark Hive SQL HadoopOrc GCP Apache SparkSpark Streaming ETL Pipelines Machine Learning Data pipelines Data Migration

USC Information Sciences Institute

Graduate Data Scientist

2015 - 2016 (1 year)

Remote

Implemented clustering and graph algorithm techniques to find human traffickers and reported to DARPA.
Worked on DARPA-funded projects to detect online fraudulent activities in various domains like human trafficking, selling illegal weapons, and counterfeit electronics.
Gathered information from different websites and the entity linking them and finally visualized the results.

Python Java ScalaElasticsearch Data Science Apache Spark GraphX Data Visualization Clustering

Fiorano

Software Developer

2013 - 2014 (1 year)

Hyderabad, India

Developed the front end of a web application to manage APIs.
Built the client side of a web application using SmartGWT, Java, and API management, where API service providers can create proxy projects with different requirements like customized quota, data output, added security functionality, and modify user permissions.

Java SmartGWTObject-oriented Programming (OOP) Web Applications APIsAPI Management

Flipkart

Intern

2013 - 2013

Bangalore, India

Developed a sales dashboard showing all the analytics of what products are selling fast with different filters like category, location, and time.
Developed a VBScript module that sends a daily email update to topline management about the sales.
Developed an inventory planning dashboard that gives an estimate of required inventory based on previous sales and present pace of selling.

VbscriptDashboard Development

Showcase

Clustering, Graph Analytics on Human Trafficking

This implementation was part of a human trafficking project. Grouped the ads that look syntactically similar together so we know that these are posted by the same guy most likely. Modelled the data as a graph using Apache Spark GraphX and found people who operate in groups and big guys using PageRank and connected components. Got very good results and work got featured in Forbes. Tech stack: Spark, GraphX, Scala, Clustering. Link (pdf): http://iswc2015.semanticweb.org/sites/iswc2015.semanticweb.org/files/93670175.pdf

ClickThrough Prediction Rate

This was a Kaggle project. Predicted whether an ad will be clicked or not using a feature set. Compared the results using different classifiers such as Decision Tree, Random Forest, and SVM. Used PCA algorithm for feature selection. Plotted the results of different classifiers on ROC Curve and found that Random Forest was the best classifier. Tech stack: Python, Machine Learning.

Recommendation System

Dataset used Audioscrobbler-data, recommended artists to user using Latent Factor Model and got mean AUC for user as 0.96 (1 is best). Implemented user-user and item-item collaborative recommendation system on an IMDB movie dataset. Tech stack: Python, Spark MLlib, Scala.

Education

MSc Data Informatics

University of Southern California

2015 - 2016 (1 year)

BE Electrical and Electronics Engineering

BITS Pilani - India

2009 - 2013 (4 years)

Rajagopal B.

Rajagopal B.

About Me

AI, ML & LLM

Backend

Database

DevOps

Other

Work history

Staff Data Engineer

Data Engineer

Data Engineer | Big Data Intern

Graduate Data Scientist

Software Developer

Intern

Showcase

Education