IBM
IBM US leases several facilities across the US to run its operation. The objective of this project was to improve facility utilization and reduce facility operations and lease costs, along with many business constraints. I developed the Python integer programming algorithm to solve this problem. Considering the business constraints made this problem interesting and unique. I parameterized the optimization period (the period to look into the future) in the algorithm to provide multiple solutions. The client especially appreciated this feature.Technologies: Python, Plotly, Linear Programming, Package Pulp
Newristics
Newristics is a US-based global leader in applying decision-heuristic science to marketing. Using heuristic psychology (500+ different heuristics), it rewrites each marketing message.
I automated the message scorer process where a team compares the new message against the old one and analyzes it to rate how closely it depicts the heuristic.
Text data is then preprocessed with text cleaning, text normalization, and generated unigram bigram of normalized data. I built two main models to solve this problem: XGBoost and deep neural network seq-to-seq learning.
For XGBoost, I created around 900 features (divided into three sections).
• NLP basic features: count/ratio of words/character of the message, TF-IDF of unigram/bigram, gensim TF-IDF similarity, and so on
• Word embedding—similarity of self/pre-trained Word2vec/GloVe-weighted average embedding vectors (TF-IDF as weight), etc.
• Graph—degree of nodes, the intersection of neighbors, k-core/k-clique, degree of separation, etc.
I used the deep learning seq-to-seq model to enhance the sequence inference neural network architecture.
Technologies: Python, LSTM, gensim, GloVe, SpaCy, NLTK, Scikit-learn, TensorFlow, Keras, Jupyter Notebook, Git, Google Cloud Platform
AbbVie, Inc.
AbbVie, Inc. is a leading pharmaceutical company and introduced a drug whose market share slipped from 65% to 49%. They conducted a physician survey on three themes to help in strategic planning. We interviewed 119 physicians about HCV regiment attributes which impact the market driver, 55 physicians concerning patient treatment, and 60 physicians about sales rep interaction and their impression about the message and interaction. I worked closely with the C-level executive and product management team to analyze the survey and produced data/reports. This helped the product team and executive team to make more informed decisions—increasing market share through the identification of new opportunity, target segments, and devising ingenious new ways of resolving constraints.Technologies: Python, R, Plotly, Matplotlib, Regression, Cluster, Association Rule
Classify H&E Stained Histological Breast Cancer Images
I participated in a hackathon to classify H&E stained histological breast cancer images. We got a minimal set of training data (a few hundred images). To increase the robustness of the classifier, I used a strong data augmentation and deep convolutional feature extractor at different scales with pre-trained CNNs on ImageNet. On this feature set, I applied a highly accurate gradient boosting algorithm. I also avoided training neural networks on this amount of data to prevent suboptimal generalization.
Technologies: Python 3, Keras, NumPy, Pandas, SciPy, Scikit-learn
Demand Forecast at an SKU-level for a Brewery Company
Problem: They have a large portfolio of products distributed to retailers through wholesalers (agencies). There are thousands of unique wholesaler-SKU/product combinations. In order to plan its production and distribution as well as help wholesalers with their planning, it is important for them to have an accurate estimate of demand at SKU level (34) for each wholesaler (60). Data: Four years of data of 60 agencies and 34 SKUs are used for prediction.• Price sales promotion (dollar/hectoliter): The price, sales, and promotion in dollar value per hectoliter at an agency-SKU-month level• Historical volume (hectoliters): Sales data at an agency-SKU-month level• Weather (degree celsius): The average maximum temperature at an agency-month level• Industry soda sales (hectoliters): Industry-level soda sales• Event calendar: Event details (sports, carnivals, and so on)• Industry volume (hectoliters): Industry actual beer volume• Demographics: Demographic details (yearly income in dollars); used deep neural networks sequence to sequence learning for demand prediction
Satellite Imagery Feature Detection Using Deep Learning
I developed a model for satellite imagery feature detection using deep learning. 1KM x 1KM satellite images are in both 3-band and 16-band formats. This multi-band imagery is taken from the multispectral (400-1040NM) and short-wave infrared (SWIR) (1195-2365NM) range.