SmartRecruit: Autonomous Resume Parsing and Grading
I developed an end-to-end automated hiring platform, encompassing candidate profiling to onboarding, resulting in a significant reduction of recruitment lead time. The back end was skillfully constructed using Flask and Docker, with deployment on AWS ECR and Fargate. To enhance availability, a load balancer was seamlessly integrated, while MongoDB served as the database back end for NoSQL and unstructured data needs. Additionally, I built a comprehensive system pipeline framework, covering document parsing, information extraction, data normalization, candidate grading, and production deployment. Moreover, I designed a highly reliable PDF parser based on MuPDF, proficient in extracting crucial characteristics from PDF documents including text, fonts, indentations, objects, images, and lines. I also developed a heuristics-based prediction and probability model, utilizing extracted features for precise heading segmentation and subsequent data population within each heading. Furthermore, I trained the spaCy NER model (BERT) on a labeled corpus of education and experience, enabling the extraction of key entities such as organization, dates, roles, accreditations, and grades. Lastly, I designed efficient textual libraries utilizing the FlashText algorithm, which swiftly extract over 50,000 skills and categories, normalize 3,000+ institutions, and provide accurate location data.
ID Card Identification and Data Extraction
I successfully designed and developed an information extraction system utilized for extracting personal data from insurance cards, including name, date of birth, address, insurance plan, codes, copay, and phone numbers. This system aimed to automate patient registration workflows in hospitals and practices, thereby significantly reducing the time and errors associated with manual data entry. Deployed as an Azure Function in the cloud, the entire solution involved an Android application created by the Avalon Team, which called the back-end function to perform inferences. As the sole orchestrator, designer, and implementer of the system pipeline, I took charge of identifying challenges, finding solutions, and ultimately turning it into a production-grade product. For identification of rectangular card regions from raw images, I effectively employed OpenCV, along with augmentation and vision routines to correct skew, reduce color variances, and improve text prominence. Additionally, I trained the end-to-end optical character recognition (OCR) pipeline using PaddleOCR. This meticulous approach resulted in an exceptional 93% accuracy in text recognition and detection, surpassing the 55% accuracy achieved by Tesseract.
Calls Transcription and Hold Detection Bot
I successfully developed an AI-assisted dialer bot that efficiently handles operator queries, transcription tasks, and recognizes hold and pick-up actions. This innovative solution significantly reduced call initiation and waiting times, enhancing overall efficiency. I meticulously analyzed various open-source automatic speech recognition (ASR) frameworks, including Vosk, SPINE, OpenVINO, and NeMo, to evaluate their Word Error Rate (WER) performance on internal insurance voice calls. Subsequently, I devised and executed a resilient messaging-based queuing protocol system using RabbitMQ, empowering the system to initiate calls, conduct transcriptions, identify hold durations, cancel calls, and adeptly manage interactive voice responses (IVR) for up to 40 concurrent calls.
Blackwise Search Engine
Created a robust platform that facilitates black-owned businesses in registering and validating ownership, granting them the opportunity to seamlessly integrate their enterprises with an expanding search engine. This endeavor entailed developing administration, business, and user portals, orchestrating role management and permission protocols, and constructing the entire application stack employing cutting-edge technologies like React, Python FastAPI, and storage backends such as MariaDB, MongoDB, and Elasticsearch. Furthermore, a comprehensive scraping pipeline was meticulously established to gather data from diverse sources, including Instagram, harmoniously utilizing a vast array of JavaScript and Python libraries for an optimized user experience.