Dipanshu is a versatile Python Developer with 6+ years of experience developing core web back ends along with Machine Learning and NLP. He has led the creation of a full-blown PDF processing library using NLP, Machine Learning, cloud computing, parallel processing, and process automation, among others. Dipanshu enjoys creating technology solutions that increase productivity and overall business value.
Working on multiple software solutions focused on solving the problem of document-based information overload for the financial services industry.
Leading new projects using NLP, Machine Learning, data mining (unstructured data from PDF documents) + scraping (intelligent goal-oriented, config-based), asynchronous APIs, microservices, GNU/Linux (RHEL, Ubuntu), and on-premise deployments.
Enhancing the functionalities of current software systems and creating predictive models for ML-based features.
Bank Statements Hub extracts transaction information from any arbitrary bank’s statement PDF without having to maintain any prior format/template information. It analyzes the extracted information to generate a financial health model for use by credit rating agencies. Understood user requirements and built some internal Python libraries to augment the open-source packages available for PDF-related data mining. Developed a full-blown PDF processing library that is more robust and flexible to extend than the open-source options like Tabula. Participated in the integration with a client's existing systems through an on-premise deployment.
KEngine is a self-improving application that extracts any pair of key & value from any PDF document (scanned or digital) based on initial training provided by a user for that document type. Worked on developing large parts of the back end and the self-learning mechanisms along with the user-led training mechanism. Technologies used: Python, PDF processing libraries, HTML+CSS+JS, OCR tools, SpaCy, NLTK.
News Analytics sources financial news from the web, adds tags to specific companies mentioned in the article, classifies it into a news category, and then predicts whether or not a given user would like to see that news item. Worked with Python, NLTK, SpaCy, TensorFlow, Selenium, BS4, Flask.