Rajagopal B.

About Me

A highly skilled Data Engineer with advanced expertise in Python programming, relational databases, and the design of scalable data pipelines. Proven experience in data modeling and transforming large, complex datasets (both structured and unstructured) using distributed systems like Apache Spark and Flink. Adept at query optimization and building efficient data solutions in cloud environments like AWS. A collaborative problem-solver focused on contributing to feature development and ensuring data integrity within fast-paced, AI-driven teams.<br><br>

AI, ML & LLM

Backend

Database

DevOps

Other

Apache Flink Pinot Kafka Spark Pyspark Scala Power BI Tableau BigQuery Databricks Golang Looker Presto Hive Hadoop GraphX Elasticsearch SmartGWT MapReduce Dataproc Pandas Numpy

Work history

Linkedin, Bangalore
Trust Data Science, Staff Software Engineer
2024 - 2025 (1 year)
Remote
  • Spearheaded the setup of core processes and best practices for the Trust Data Science team, including data governance, reliability improvements, and foundational dataset development.

  • Developed a real-time Metrics Dashboard by building a Flink-based streaming service to process live event data and generate platform health metrics, significantly improving incident response.

  • Led a org-wide initiative to overhaul data logging practices, enabling self-serve access to key metrics and reducing dependency on the DS team.

Apache FlinkPinot KafkaSparkPysparkTrino AirflowJavaPythonScalaPower BI TableauGCPBigQuery Databricks
Uber, Bangalore
Customer Obsession team, Software Engineer
2019 - 2024 (5 years)
Remote
  • Built and maintained a central warehouse that powers foundational datasets, ensuring high data quality and freshness within defined SLAs for 50+ Tier1/Tier2 datasets.

  • Designed a scalable analytics query gateway service for the Customer Obsession domain at Uber, enabling flexible metric querying across multiple data sources and reduced onboarding time.

  • Led a data governance project to enhance data management practices by establishing clear ownership for datasets and implementing Time to Live (TTL) policies.

Walmart Labs, Sunnyvale
Advertising team, Software Engineer
2017 - 2019 (2 years)
Remote
  • Developed real-time Audience Size Estimation which reduced processing time from 2 hours to 5 minutes with a small error margin.

  • Co-developed an ETL pipeline in Apache Spark for Rule-based Campaigns to improve efficiency for larger scales.

  • Co-developed the engineering pipeline to use machine learning model for campaigns which impacted revenue by 10 Million dollars.

WalmartLabs, Sunnyvale
Platform team, Big Data Intern
2016 - 2016
Remote
  • Developed an application which analyzes log messages and alerts when an application goes down.

  • Improved the existing algorithm by 3 times and disk space by 20 times.

Information Sciences Institute, Marina Del Rey
Data Scientist
2015 - 2016 (1 year)
Remote
  • Implemented clustering and graph algorithm techniques to find Human Traffickers and reported to DARPA.

  • Signature based approach and PageRank approaches worked better.

Fiorano Software Ltd, Hyderabad
Software Developer
2013 - 2014 (1 year)
Remote
  • Developed front-end of web application to manage APIs.

  • API Service Providers can track metrics like response time, API traffic and add restrictions like API calls/user using smartGWT.

Flipkart Online Services, Bangalore
Software Intern
2013 - 2013
Remote
  • Developed sales dashboard which shows all the analytics of what products are selling fast with different filters like category, location and time-wise.

Education

Education
MS. Data Informatics
University of Southern California
2015 - 2016 (1 year)
Education
B.E. (Hons.)
BITS-Pilani, Pilani campus
2009 - 2013 (4 years)