Lalita A.

Lalita A.

Seattle, United States of America
Hire Lalita A. Hire Lalita A. Hire Lalita A.

About Me

Lalita is a seasoned data engineering & ML/Al professional with ~17 years of experience delivering real-time analytics, big data, & ML solutions across multiple industries. Expertise in the end-to-end project lifecycle, including data infrastructure design, cleansing, transformation, modeling, optimization, QA, and deployment of ETL/data pipelines to SQL/NoSQL warehouses, Delta Lake, and cloud data lakes. Skilled in designing and managing batch and real-time pipelines ingesting data from APIs, legacy systems, IoT sensors, databases, and files, integrating with streaming frameworks. Experienced in workflow orchestration and job scheduling.

AI, ML & LLM

Backend

Database

DevOps

QA & Testing

Hypothesis Testing

Workflow

Other

Work history

Insight Global – Client Walt Disney
Lead Data Architect and AI/ML Engineer
2024 - 2026 (2 years)
Remote
  • Designed and implemented real-time, batch, and ML/LLM-driven data pipelines using AWS, Databricks, Airflow, Kafka, Kubernetes, Docker, and Spark, supporting large-scale content analytics, personalization, and cybersecurity workflows.

  • Built ETL/ELT frameworks integrating telemetry, content metadata, user activity logs, SAP financial feeds, marketing APIs, and cybersecurity datasets, storing high-velocity data in MongoDB, Cassandra for low-latency feature retrieval and streaming analytics.

  • Developed synchronous REST & asynchronous event-driven APIs to expose curated datasets, threat intelligence signals, and ML inference services for consumption by internal security, fraud, and product teams.

AWSS3LambdaStep FunctionsEventBridge KinesisDatabricks AirflowKafkaKubernetesDockerSparkMongoDBCassandraGraphqlWebhooks Delta Lake SnowflakeUnity Catalog AWS SagemakerHugging Face Transformers PyTorch Lightning LangChain FastAPIRedisFAISS GitlabPower BI
Client - Vancouver Airport Authority, Amalgamated Bank
Sr Data/ML ops Engineer
2023 - 2025 (2 years)
Remote
  • Designed and delivered end-to-end data and analytics platforms on Azure, leveraging Databricks (with Unity Catalog), Data Factory, ADLS Gen2, Cosmos DB (NoSQL), Azure SQL Warehouse, and Redis Cache and Denodo for large-scale ingestion.

  • Built an automated airport data warehouse using WhereScape, generating staging, dimensional, and fact layers with full metadata tracking, version control, and impact analysis to accelerate development, governance, and centralized API-based data access services for downstream analytics.

  • Engineered event-driven & batch ETL pipelines using EventHub, Service Bus, RabbitMQ, Airflow, Databricks(PySpark); deployed Kafka streams to process real-time passenger flow, crowd density, & baggage telemetry into Delta Lake for operational insights.

Azure Databricks Unity Catalog Data Factory Cosmos DB noSQLRedis CacheDenodo WhereScape Service Bus RabbitMQAirflowPysparkKafka
Own corporation - Client PwC via Orion Inc, Carenet Healthcare
Senior Data/ML Engineer
2023 - 2023
Remote
  • Collaborated with product stakeholders, software engineers to build scalable and data-driven platforms and tools, created system-design, Proof of concept and then implemented the platform solution on azure cloud for corporate global tax reporting, compliance, and analytics.

  • Developed end-to-end pipelines, ingesting patient, clinical, & insurance claims data from on-prem sources, MySQL transactional systems and APIs into AWS S3, Redshift, orchestrated via AWS Glue and Airflow, with monitoring through Datadog and PagerDuty alerts.

  • Built resilient batch and streaming pipelines using Databricks, Python, Spark, YAML/Unix Bash, Azure Event Hubs, ADF, ADLS Gen2, SQL Warehouse, and Kubernetes for automated deployment and scaling.

AWSS3RedshiftAWS GlueAirflowDataDogPagerDuty Databricks PythonSparkYAMLAzure Event Hubs ADFKubernetesAzure FunctionsAPI Management Azure Data Lake Cosmos DB SnowflakeKafkaAlteryxKubeflow
Client - 5Aces - Auto Collision & Shipping Industry
Lead Data ML Engineering Consultant
2022 - 2023 (1 year)
Remote
  • Built end-to-end cloud data solutions for the automobile and shipping industries, managing collision, repair, maintenance, invoicing, and logistics pipelines, transforming raw XML, API, and IoT data into structured formats for analytics and operational insights.

  • Ingested 3rd party API & event data into Azure Blob Storage and Google Cloud Storage using Event Hub listeners, Pub/Sub, & PySpark on Databricks and Dataproc, building robust ETL pipelines integrating collision repair software, insurance records, IoT sensors, & shipping logistics data.

  • Architected serverless, real-time microservices using Azure Functions, Event Grid, Logic Apps, and Google Cloud Functions to orchestrate collision repair operations and vehicle logistics workflows, enabling event-driven processing, automated notifications, and seamless data exchange with partners and customers.

Own corporation -Client – Suncor Inc
Senior Big Data engineer - Lead role
2021 - 2022 (1 year)
Remote
  • Developed end-to-end data engineering lifecycle for Suncor, designing and implementing complex data platforms across cloud systems, handling streaming, structured, and unstructured data for exploration, drilling, production, and asset management.

  • Extracted real-time oil pricing data from third-party APIs including WTI, Brent, and Crude Monitor, and integrated with operational, sensor, and financial datasets; ingested and processed data using Azure Data Factory, Event Hubs, Service Bus, Databricks (PySpark/Spark SQL), U-SQL, Synapse, Cosmos DB, and ADLS Gen2.

  • Built and deployed containerized ML and statistical predictive models using Docker, Azure Kubernetes Service (AKS), Databricks, PyTorch, TensorFlow, and accessed via REST APIs for predictive analytics on exploration optimization, customer lifetime value, and operational efficiency.

Azure Data Factory Event Hubs Service Bus Databricks PysparkSpark SQLU-SQL Synapse Cosmos DB DockerAzure Kubernetes Service (AKS) PytorchTensorflowREST APIs TerraformGitHub Actions Azure DevOpsAzure MonitorSplunkPagerDuty Microsoft Purview
Own corporation -Client – Uplight via Tek Systems
Senior Data ML engineer - Lead role
2021 - 2021
Remote
  • Worked with Engineering Pillar to build optimized cloud platforms for renewable energy using AWS S3, Kinesis, Lambda, Glue, EventBridge, EC2, SNS/SQS, Redshift, RDS, Aurora, PostgreSQL, and GCP components including BigQuery, Dataproc, Firestore, migrating systems from AWS to GCP with Terraform, CloudFormation, and CircleCI.

  • Developed batch & real-time data pipelines using Databricks, Python, Scala, SQL, Kafka, RabbitMQ & Airflow, ingesting sensor, energy consumption, geospatial & market data. Leveraged Lambda and Glue for event-driven ETL processing, automated transformations & orchestration, while EventBridge managed cross-service event routing for real-time energy monitoring.

  • Indexed high-volume telemetry, pipeline logs, and operational events into Elasticsearch / OpenSearch to enable fast search, anomaly investigation, operational dashboards, and root-cause analysis across renewable energy platforms.

AWSS3KinesisLambdaGlue EventBridge EC2 RedshiftRDSAuroraPostgreSQLGCPBigQuery Dataproc Firestore TerraformCloudFormation CircleCI Databricks PythonScalaSQLKafkaRabbitMQAirflowElasticsearch Opensearch
Microsoft via HCL employer
Lead Data engineer & Consultant
2020 - 2021 (1 year)
Remote
  • Led the design and implementation of end-to-end Azure data pipelines to ingest telemetry data from Microsoft Graph, LinkedIn, and other APIs, transforming, aggregating, and storing it in Cosmos DB, Data Lake Gen2, and Synapse SQL to generate actionable insights for improving Microsoft market share.

  • Designed architecture and created proof-of-concept solutions, implementing CI/CD pipelines with Azure DevOps, BICEP, and Azure Git for automated deployment, testing, and code review.

  • Developed REST APIs on SQL Server using Azure Functions, containerized with Docker, and orchestrated via Kubernetes for scalable and reliable data exchange.

Coast Capital Savings
Sr Data ML engineer Analyst
2020 - 2020
Remote
  • Assisted in building enterprise warehouse by creating pipeline process flow from APIs, Salesforce, SAP ERP, Oracle DB to AWS S3 bucket, BigData processing using databricks pyspark, ETL job orchestration using airflow, data loading into Redshift, deployment of pipelines using Devops.

  • Packaged the microservices, data processing pipelines and its dependencies into docker container and setup Kubernetes to manage the containers for scalability, reliability, and compliance.

  • Built and deployed a custom ML project on AWS SageMaker that incorporated end-to-end machine learning workflows, from feature engineering to hyperparameter tuning, training, and deploying models for banking-related problems.

Metrie
Data/ML Engineer
2019 - 2020 (1 year)
Remote
  • Built and maintained enterprise pricing, supply chain, and finance datasets on SAP HANA, streaming data through Kinesis into AWS S3, processing with Databricks (PySpark), and modeling analytical schemas and OLAP layers in Amazon Redshift to support large-scale manufacturing and distribution analytics.

  • Integrated internal ERP, production, logistics, distributor, and customer data with external market research, competitor pricing, regional demand, and B2B sales signals to deliver unified Customer 360 and product-level insights across manufacturing and sales operations.

  • Developed real-time and batch pricing pipelines using Databricks, Kafka, and AWS compute, operationalizing models via containerized services on Kubernetes and exposing predictions through REST APIs for pricing tools, sales applications, and internal decision-support systems.

Freelance
Data Scientist
2017 - 2019 (2 years)
Remote

Adhoc projects: Housing price predictive models, Patient data analyses using electronic health records, Medical Imaging, A/B testing, independent School database creation using PostgreSQL & Python, Sensor data (IOT) anomaly detection.

Eli Lilly and Company
Data science Engineer - Lead role
2016 - 2017 (1 year)
Remote
  • Was responsible for delivery of bigdata engineering solutions, real-time analytics & ML solutions against healthcare datasets across operational, clinical, financial, marketing, pharmacy benefit management (PBM), & other business functions using SAS, SQL, Python & Spark.

  • Developed event-driven applications using AWS Lambda, SNS/SQS to handle real-time data from IoT-enabled medical devices, ensuring immediate processing and alerting.

  • Designed and deployed interactive Power BI dashboards for internal stakeholders, researchers, and business teams to visualize patient trends, PBM metrics, supply chain forecasts, and operational KPIs, enabling real-time monitoring, insights-driven decisions, and efficient reporting across healthcare and pharmaceutical operations.

SASSQLPythonSparkAWS Lambda Amazon Kinesis RDSDynamoDBHIPAAPhi MongoDBIQVIA Glue Power BI
Blue Ocean
Data Science Engineer
2015 - 2015
Remote
  • Ingested data from different Sources Systems to client server using tools like SSMS, SQL, Spark & Jupyter notebooks as a pre-requisite for developing ML models.

  • Conducted data exploration and pre-processing and analysis for resource allotment, product placing in store, brand performance, promotion impact, consumer behavior, and ROI analysis, Inventory levels forecasting and analysis.

  • Developed Bayesian and market-mix models, analyzed the impact of celebrity endorsements, and presented actionable insights to stakeholders through storyboarded presentations for strategic business decisions.

ANZ Bank
Data scientist
2015 - 2015
Remote
  • Carried out ETL data process from Sources Systems (Salesforce CRM) to Azure Data Storage services like SSIS, T-SQL, Spark & Jupyter notebooks for ML models.

  • Developed live predictive systems using Deep learning Al to combat online payment frauds thereby saving approx. ~$200k in terms of cost and helped mitigating risks using PySpark.

  • Analyzed data of about 100,000 applicants' information using Artificial Neural Network method to create credit risk model that predicts borrower's estimate of the probability of default.

TE Connectivity
Systems BI & ML engineer - Lead role
2013 - 2014 (1 year)
Remote
  • Designed and implemented ETL solutions using data warehouse design best practices for Next Generation Analytics platform, deployed ETL job workflow with reliable exception handling and rollback.

  • Formulated next generation analytics & Machine learning frameworks, providing centralized platform for all data-centric activities which allows full view of key metrics, product usage to back-office transactions using SAS and Python.

  • Developed financial dashboards processing over 1 billion transactions across 25 tables in Tableau and Power BI, optimizing performance through data cleanup, aggregation improvements, and dimensional remodeling.

IQVIA
Data science engineer
2011 - 2013 (2 years)
Remote
  • Spearheaded the delivery of big data engineering solutions and real-time analytics specifically targeting pharmacy benefit management and other healthcare datasets across operational, clinical, financial, marketing, and other business functions, leveraging SAS, SQL, and Python.

  • Collected and processed social media data using APIs and web scraping tools using python, applied NLP techniques to extract sentiment and thematic insights from social media conversations, developed visualization reports to share insights with marketing & product teams.

  • Led HR analytics initiatives for pharmaceutical clients by designing and evolving HR data architecture to support workforce analytics, integrating data across HRIS/HCM, payroll, talent management, and compliance systems.

Symphony Marketing Solutions-Genpact
Data engineer & Analyst
2008 - 2011 (3 years)
Remote
  • Designed and maintained end-to-end data pipelines for POS, inventory, and healthcare datasets, securely transferring data from Unix servers to mainframe systems via encrypted FTP/SFTP into DB2 and COBOL workflows.

  • Built inventory and operations data warehouses, integrating POS, inventory, and patient-level healthcare data across regions, automating hourly updates, and enabling unified views of sales, inventory, customer behavior, prescription claims, patient records, and clinical trials while ensuring HIPAA/PHI compliance.

  • Collaborated with retail partners & data operations teams to address data discrepancies, ensuring alignment of POS & inventory data feeding into statistical models that improved ROI, customer targeting, and operational efficiency.

Education

Education
Bachelor of Science: Biosciences (Botany Major)
Sathya Sai University
2004
Education
Master of Science: Bioinformatics
Orissa University of Agriculture & Technology
2007