Churn Prediction for a Book Publisher
The problem statement involved predicting which classes were about to change from using the publisher's books to online material. After implementing feature engineering using a genetic algorithm and clustering, the best prediction results were achieved using a random forest model.
Stock Suggestions | Distributed System with PySpark
This is an attempt to understand the relationship between the financials of a company and its performance in the stock market. There is also an attempt to identify cheap buying opportunities based on various risk profiles. The dataset was huge, so it was stored in a distributed file system, and we used PySpark for the transformations.
Word Recommendation System for Movie and Series Reviews
This is a natural language processing project where we used various methods like parts of speech tagging, name-entity recognition, readability, sentiment score, topic modeling, and more to train a regression model for good and bad reviews scraped from websites concerning different topics. The recommendations were made based on how various features impacted the score and what measures could be taken to improve it.
Machine Learning Model to Suggest Better Pictures for Social Media
I have created a database by scraping the web for pictures and trained a machine learning model with several characteristics of images available in social media and the number of likes to suggest which picture works better. I also deployed the model using the Flask API.
Generating Insights in Stock Market Data
I created data pipelines for merging data from various sources like several data APIs and the PostgreSQL database. I also implemented an exploratory data analysis and modeling of the new data to derive new insights along with running Jupyter Lab on an AWS EC2 instance.
Live Tweet Sentiment Tracking
The project involved ingesting live tweet data using the API into Kafka topics. Then we used Spark streaming as a subscriber and did sentiment analysis and feature engineering. This data was then aggregated and passed onto a shiny dashboard. The data then was stored in a MongoDB database.
Cancer Prediction Using VOC Data
This project is based on research where volatile organic compounds (VOCs) released by humans have predictive power with cancer. Here we are using a VOC database with labeled cancer data. The results are deployed using a Flask API which predicts the kind of cancer based on the VOC content.
Time Series Forecasting
Built an ensemble model that combined outputs from deep learning time series models like N-HiTS and N-BEATS with a traditional linear regression that outperformed all the existing forecasts. Also built a Twitter scraper to get tweet data for the products and their associated sentiments.
End-to-end NLP Model Deployment
Trained BERT-based solutions to fit the given use case. Built APIs to allow its interaction with external modules. Dockerized the whole application.Connected it with AWS and GCP solutions like Lambda, container registry, etc., to achieve autoscaling of the API.