Dipanshu is a versatile Python Developer with 4+ years of experience in developing core web backends along with Machine Learning and Natural language processing. He led the creation of a full-blown PDF processing library using technologies such as Natural Language Processing, Machine Learning, Cloud Computing, Parallel Processing, Process Automation, among others. Dipanshu enjoys creating technology solutions that increase productivity and overall business value.
Bank Statements Hub extracts transaction information from any arbitrary bank’s statement PDF without having to maintain any prior format/template information. It analyzes the extracted information to generate a financial health model for use by credit rating agencies.
My role involved understanding user requirements and building some internal Python libraries to augment the open-source packages available for PDF-related data mining. Managed to develop a full-blown PDF processing library that is more robust and flexible to extend, as compared to the open-source options like Tabula. I also participated in the integration with a client's existing systems through an on-premise deployment.
News Analytics sources financial news from the web, adds tags to specific companies mentioned in the article, classifies it into a news category, and then predicts whether or not a given user would like to see that news item. Technologies used as the lead developer: Python, NLTK, SpaCy, Tensorflow, Selenium, BS4, Flask.
KEngine is a self-improving application that extracts any pair of key & value from any PDF document (scanned or digital) based on initial training provided by a user for that document type. Worked on developing large parts of the backend and the self-learning mechanisms along with the user-led training mechanism.
Technologies used: Python, PDF processing libraries, HTML+CSS+JS, OCR tools, SpaCy, NLTK.
2021 - Present (2 years)
Create and develop innovative software solutions for different clients across a broad range of industries.
Participate in scrums consisting of cross-functional teams, both software and hardware.
Ensure that features are being delivered efficiently and on-time.
Worked on multiple software solutions focused on solving the problem of document based information overload for the financial services industry.
Led new projects using technologies such as as Natural Language Processing, Machine Learning, Data Mining (unstructured data fromPDF documents) +Scraping (intelligent goal-oriented, config based), Asynchronous APIs, Microservices, GNU/Linux (RHEL, Ubuntu), and on-premise deployments.
Enhanced the functionalities of current software systems and created predictive models for ML-based features.