Shrey S.

About Me

A highly skilled AI Engineer and Machine Learning Specialist with 5+ years of experience in developing production-grade ML solutions, LLM applications, and generative AI systems. Shrey is proficient in Python, Pandas, NumPy, and SQL for building scalable data pipelines and microservices. He has hands-on experience with LangChain, LangGraph, and OpenAI-compatible APIs for deploying intelligent agents and RAG systems. Possesses a strong background in AWS Lambda, S3, API Gateway, and SageMaker for cloud-native and serverless deployments. Shrey has a proven track record in computer vision, conversational AI, voice-based systems, and MLOps with structured logging and observability for production AI workloads.

AI, ML & LLM

Pytorch FAISS OpenAI API MLOps Llama 3 Gemini CrewAI LlamaIndex Langgraph LangChain

Backend

Database

DevOps

CloudWatch Api Gateway AWS Lambda EC2 Docker

Workflow

Git GitHub Actions

Other

SageMaker S3 Transformers Diffusion Models Computer Vision Scikit Learn Tensorflow Vector Search Pinecone RESTFul APIs CUDA OpenCV WebSocket Microservices Metrics Structured Logging Numpy Pandas Packaging OOP Prompt Engineering Retrieval-augmented Generation (RAG) Hugging Face Transformers Mistral pgvector

Work history

RAPID DATA LABS
Senior Al Engineer
2025 - 2026 (1 year)
Remote
  • Led development of AI-driven fashion platform leveraging generative AI, LLM pipelines, and computer vision, achieving 40% user engagement increase.

  • Architected scalable Python-based ML pipelines using diffusion models for high-fidelity fashion visual generation, deployed on AWS infrastructure.

  • Designed and maintained RESTful microservices using FastAPI integrated with AWS Lambda and S3 for seamless data flow and integration across systems.

Al LLM PythonDiffusion Models FastAPIAWS Lambda S3
Neuramonks
Al Engineer
2021 - 2024 (3 years)
Remote
  • Engineered PyTorch-based voice cloning system with 15 voice agents; fine-tuned Hugging Face models on 80 min/voice achieving 92% similarity, processing 1,000+ daily requests with <2s latency using multi-threaded audio generation pipelines.

  • Built real-time CCTV surveillance system with YOLOv8 and multi-camera tracking using Python multi-processing pipelines, reducing incident response time by 60%.

  • Developed YOLO-based malaria detection and blood cell classification system achieving 95% accuracy, processing 500+ daily images with automated Pandas-based data transformation pipelines.

PytorchHugging Face PythonPandasLLMs FastAPIWebSocketRasa NER AWS EC2SageMaker