I am a Senior Data Scientist with 7+ years of expertise and a top competitor in leading machine learning competitions - deploying cutting-edge Machine Learning and Deep Learning models and solutions on complex projects through innovative data-driven initiatives. I combine proprietary data assets with statistical modelling and software development skills for validating ML approaches, optimizing data systems and delivering solutions from the ground up through the entire lifecycle of solutions.
Participated in The 3rd YouTube-8M Video Understanding Challenge hosted by Google Research with Team Ceshine; which required participants to localize video-level labels to the precise time in the video where they appear - working at an unprecedented scale. Our solution included a mixture of context-aware and context-agnostic segment classifiers; placing 7th on the leaderboard with a low budget ($150 GCP credit and one additional local GTX1070 card during the whole process).
Worked on an AI-assisted news-aggregation website with a variety of international sources; it uses a semantic similarity engine that helps editors to review content quicker, and readers find relevant news easily. Veritable uses the ability of machines to process a large amount of information to assist with human subjective judgments. From daily news articles from various credible media, relevant 20+ news items are manually selected and organized into simple web pages to help readers quickly grasp the daily news and reduce the algorithm's control over itself.
Worked on an A.I.-powered article reader that can extract the text contents from web pages, partition the article into sentences, identify named entities in each sentence, and highlight the sentences that convey the core ideas of the article (in yellow). The algorithm uses TextRank which employs extractive summarization in processing summaries. The solution allows readers to browse through articles for ideas and entities mentioned; highlighting and reading the paragraphs that interest them for more contexts and details.
Work history
UpStack
Senior Data Scientist
2020 - Present (3 years)
Remote
Creating and developing innovative software solutions for clients across a broad range of industries.
Participate in scrums consisting of cross-functional teams, both software and hardware.
Ensure that features are being delivered efficiently and on-time.
Actively contribute to open-source research projects, independent data products, and public technical notes and tutorials that help democratize AI.
Promote data science and AI/ML wherever applicable to solutions on projects for clients; conducting data cleansing, ensuring data quality, and data enrichment.
Participated in the third Youtube Video Understanding Challenge and published a paper in its ICCV 2019 workshop.
Developed and implemented data pipeline solutions to merge data from different sources within the company to a secure data warehouse solution.
Worked on an automatic NLP merchandise classification for Baiwang; setting up the annotation procedure, data quality control measures and experiment processes on the project.
Handled the iteration of a real-time analytics system to monitor national product roll-out; providing decision support and scaling the solution accordingly.
Built a data science solution for a customer churn prediction system for a mobile phone company.
Developed a sales and inventory forecasting and monitoring system for a smart vending machine company; working on data quality solutions to ensure stability and scalability.
Built and implemented anomaly detection models and algorithms on Yongdata's SaaS product and maintained high-quality code and documentation.