UpStack - Grow your team with the top 1% Remote Software Developers

Shreya S.

Denton, TX, United States of America

Hire Shreya S. Hire Shreya S. Hire Shreya S.

About Me

Shreya is an IT professional specializing in database administration, data analytics, and Big Data technologies. She has extensive hands-on expertise with the Hadoop ecosystem, including Apache Spark, MapReduce, Spark Streaming, PySpark, Hive, HDFS, Kafka, Sqoop, and Oozie. Shreya designs and implements end-to-end ETL workflows using Azure Data Factory and Databricks, leveraging PySpark and Spark SQL for scalable data processing. She is also skilled in developing CI/CD scripts and managing automated deployment pipelines within Azure environments.

AI, ML & LLM

Apache Airflow Machine Learning Artificial Neural Networks (ANN) Naive Bayes

Work history

Charter Communications

Senior Database Engineer/Administrator

2025 - Present

Remote

Using Azure Data Factory (ADF) to orchestrate and automate data ingestion pipelines from diverse source systems into Snowflake.
Developing robust ADF pipelines and using Databricks with PySpark for scalable data transformation, cleansing, and aggregation.
Building and managing Databricks clusters and integrating Kafka for streaming ingestion.
Integrated LLM-based automation within data quality and monitoring workflows using OpenAI APIs to auto-summarize pipeline alerts and anomaly reports in Databricks.
Designed and deployed AI-driven metadata generation scripts that automatically tagged datasets and lineage in Snowflake, improving data discoverability and governance.
Collaborated with ML engineers to build feature-ready datasets for AI/ML pipelines, ensuring scalable ingestion from streaming and batch data sources.
Partnered with analytics teams to fine-tune LLM prompts for telecom-specific use cases.
Collaborating with data scientists and business stakeholders to design analytical data models in Snowflake that support self-service BI, machine learning, and real-time dashboards.
Led the development of a Kafka-Spark-Snowflake prototype to simulate real-time data ingestion and analytics for Big Data consulting use cases.
Migrated legacy Cosmos DB event sourcing components into a modern Snowflake-based architecture using Snowpipe for real-time ingestion and DBT for data modeling.
Implemented Azure Active Directory (AAD) integration for secure access control across services including Databricks, ADF, and Snowflake.
Supported large-scale SQL environments involving complex queries, stored procedures, triggers, and performance tuning across multiple servers and databases.
Supporting microservices deployment and orchestration in containerized environments using Docker and Kubernetes.
Migrated legacy ETL workflows to modern ETL pipelines using ADF, DBT, and Snowflake, significantly improving pipeline maintainability, scalability, and auditability.
Designed and implemented incremental data loading strategies and integrated Azure Key Vault for secure credential management in ADF and Databricks.
Using Snowflake Streams and Tasks for real-time change data capture (CDC) and developing robust data quality checks and validations using DBT tests.
Created parameterized and dynamic ADF pipelines and built reusable PySpark modules for complex joins, aggregations, and data enrichment operations.

Database EngineeringDatabase Administration (DBA) Azure Data Factory Data pipelines Snowflake Pyspark Azure DatabricksData Transformation Data Cleansing Data Aggregation KafkaKafka Streams OpenAILarge Language Models (LLMs) AI/ML Big Data SparkSnowpipe Data Build Tool (dbt) Data ModelingAzure Cosmos DB Azure Active DirectoryMySQL Performance Tuning SQL Stored Procedures SQL Triggers Docker Kubernetes Microservices ETLETL Pipelines Data Loading Azure Key VaultCDC

Walmart

Senior Database Engineer

2024 - 2024

Bentonville, United States of America

Developed scalable Spark applications using Scala on Google Cloud Dataproc to process batch and streaming data from multiple RDBMS and messaging sources.
Designed and implemented real-time data pipelines using Kafka (hosted on GKE) and Spark Structured Streaming to process event-driven datasets.
Integrated Google Pub/Sub with Apache Spark for ingestion of real-time messages from various streaming sources, enabling seamless data movement in GCP.
Installed and configured Kafka Manager for consumer lag monitoring, topic management, and partition analysis within GCP Compute Engine clusters.
Created end-to-end AI data pipelines for training and inference workflows using Spark MLlib, integrating with Vertex AI for model deployment.
Built and deployed Scala-based microservices to consume real-time data streams and perform intelligent transformations before persisting into BigQuery.
Developed custom Machine Learning models using MLlib to classify streaming data for anomaly detection and user behavior prediction.
Created user-defined functions (UDFs) in Scala for custom business logic used within Spark and SQL transformations in BigQuery.
Integrated Cloud Functions to trigger downstream AI processes based on file drops or Pub/Sub events, enhancing automation and responsiveness.
Leveraged Google Cloud IAM for fine-grained access control and KMS for encrypting sensitive data within pipelines and ML models.
Built Spark MLlib-based recommendation engine prototype and trained it on customer interaction data stored in BigQuery, then exposed via REST APIs.
Implemented BigQuery ML for in-database machine learning to provide scalable insights without data movement, integrated with dashboards.
Developed advanced AI models on GCP Vertex AI, integrating with Dataproc for large-scale model training and Cloud Storage for dataset versioning.
Designed ELT automation scripts in Scala and Python to move and transform data from Cloud SQL, GCS, and external APIs.
Delivered detailed technical documentation and design artifacts for AI-driven data pipelines, including data flow diagrams, transformation logic, and operational runbooks.

Database EngineeringDatabase Administration (DBA) Google Cloud Dataproc Spark Scala RDBMSGoogle Kubernetes Engine (GKE) Kafka Data pipelines GCP Apache SparkGoogle Pub/Sub Vertex AI AI Model Intergration MLlib GCP BigQuery Microservices Machine LearningApache Airflow Directed Acrylic Graphs (DAG) User-defined Functions (UDF) Google Cloud FunctionsAWS Key Management Service (KMS) IAMIdentity & Access Management (IAM) REST APIs Recommender Engine Dataproc AI Model Training AI Modeling PythonELT

CGI

Database Engineer

2021 - 2023 (2 years)

Hyderabad, India

Designed, developed, and deployed batch and streaming pipelines using AWS services.
Developed data pipelines using cloud and container services like Docker and Kubernetes, AWS Glue, and PySpark jobs in EMR cluster.
Designed and developed monitoring solutions using AWS CloudWatch, AWS IAM, AWS Glue, and AWS QuickSight.
Used Lambda, Glue, EMR, EC2, and EKS for data processing and developed data marts, data lakes, and data warehouses using AWS services.
Maintained the Hadoop cluster on AWS EMR and migrated an existing on-premises application to AWS.
Created, debugged, scheduled, and monitored jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.
Built ETL pipeline for data ingestion, data transformation, and data validation on cloud service AWS, working along with data steward under data compliance.
Designed and developed end-to-end ETL pipelines using Informatica and Python and implemented data validation and cleansing frameworks.
Optimized data transformation logic and SQL scripts, improving ETL performance and reducing load times by over 25%.
Automated recurring data ingestion workflows using Azure Data Factory (ADF) and Airflow, integrating structured and unstructured datasets across on-prem and cloud systems.
Developed Spark applications using Scala and Java and implemented Apache Spark data processing to handle data from various RDBMS and streaming sources.

Database Engineering AWSAWS Batch Data pipelines AWS Cloud Containers Docker Kubernetes AWS Glue Pyspark AWS EMR Amazon QuickSightAWS IAM AWS CloudWatchAWS Lambda AWS EC2 AWS EKS Data ProcessingData Marts Data Lakes Data Warehouse AWS S3 AWS RDS AWS RedshiftDataFrames Spark SQL AWS Serverless HDFS (Hadoop Distributed File System)Snowflake Hadoop Airflow ETLETL Pipelines Data Transformation Data Validation Data Stewardship Informatica PythonData Cleansing Azure Data Factory RDBMS Spark Scala JavaDatabase Administration (DBA)

Cyient

Python Developer

2020 - 2021 (1 year)

Hyderabad, India

Built a web application using Django, Flask, Jinja, Python, WSGI, Redis, PostgreSQL, and DynamoDB.
Wrote Python scripts to parse XML documents and load data in the database.
Developed web-based applications using Python, CSS, and HTML.
Developed applications with XML, JSON, XSL (PHP, Django, Python, Rails).
Wrote subqueries, stored procedures, triggers, cursors, and functions on MySQL and PostgreSQL databases.
Developed web-based applications using Python, Django, PHP, C++, XML, CSS, HTML, DHTML, JavaScript, and jQuery.
Worked in WAMP (Windows, Apache, MySQL, and Python/PHP) and LAMP (Linux, Apache, My SQL, and Python/PHP) architecture.
Developed views and templates with Python and Django view controllers and templating language to create a user-friendly website interface.
Worked with various Python IDEs using PyCharm, PyScripter, Spyder, PyStudio, and PyDev.

Python Django Flask JinjaWeb Server Gateway Interface (WSGI) Redis PostgreSQL DynamoDBWeb App Development HTML CSS Python Scripting XMLDocument Parsing Data Loading JSON XSL PHP RailsSQL Stored Procedures SQL Triggers SQL Functions MySQL jQuery JavaScript C++DHTML LAMPWamp PycharmSpyder

Education

MSc Computer Science

University of North Texas

2023 - 2024 (1 year)

B.Tech Computer Science

Gokaraju Rangaraju Institute of Engineering and Technology (GRIET) - India

2016 - 2020 (4 years)

Shreya S.

Shreya S.

About Me

AI, ML & LLM

Backend

Database

DevOps

Other

Work history

Senior Database Engineer/Administrator

Senior Database Engineer

Database Engineer

Python Developer

Education