Shih-hsuan L.

About Me

Shih-Hsuan is an entrepreneur, data scientist, and top competitor in machine learning competitions. He specializes in analyzing data pipelines and modeling business problems to deliver data projects with business impact. He built a real-time analytics system that monitored national product roll-out and provided decision support. Shih-Hsuan excels at sales forecasting, niche image classification, short text classification, and conditional text generation, along with AI, ML, and statistics.

AI, ML & LLM

Database

Other

Data Analysis Data Analytics Statistical Data Analysis Python Pandas Data Visualization Natural Language Processing (NLP) Big Data Recommendation Systems R Data pipelines

Work history

Veritable Technology, Co.
Data Scientist and Founder
2018 - Present (7 years)
Remote
  • Won seventh place in the third YouTube Video Understanding Challenge and published a paper in its ICCV 2019 workshop.

  • Assisted clients that required expertise in data science, machine learning, and artificial intelligence.

  • Created open source research projects, indie data products, and public technical notes and tutorials to help democratize AI.

Baiwang
Chief Data Scientist
2017 - 2018 (1 year)
Remote
  • Built data pipelines to merge data from different sources in the company to a data warehouse.

  • Developed an automatic NLP merchandise classification system, including setting up an annotation procedure, data quality control, and experiment processes.

  • Built a real-time analytics system that monitored national product roll-out and provided decision support.

Yongdata
Senior Data Scientist
2015 - 2016 (1 year)
Remote
  • Developed a customer churn prediction system for a mobile phone company.

  • Developed a monitoring and forecast system of sales and inventory for a smart vending machine company.

  • Implemented anomaly detection algorithms in the company's analytics SaaS product.

Soshio
Software Engineer
2013 - 2015 (2 years)
Remote
  • Maintained the back end of the company's NLP public opinion analysis product.

  • Developed data visualization in the dashboard facing customers.

  • Maintained the scrapping system and merged it with the firehoses from commercial data providers.

Showcase

Seventh Place Solution to The Third YouTube-8M Video Understanding Challenge
  • Kagglers will localize video-level labels to the precise time in the video where the label appears.

  • Video-level models were pre-trained on the YouTube-8M frame-level features dataset to create meaningful video representations.

  • The solution utilizes two segment classifiers: context-aware and context-agnostic.

Paraphrasing English Sentences
  • This project focuses on building models for automatic paraphrasing of English sentences.

  • It utilizes a pretrained T5 transformer model and fine-tunes it using public paraphrase datasets.

  • The resulting models can generate both semantically and dramatically correct paraphrases.

Self-Supervised Domain Adaptation
  • The text proposes using self-supervised learning to improve label efficiency in downstream tasks.

  • The approach leverages a pre-trained Imagenet model and applies self-supervised learning to an unlabeled dataset from a different domain.

  • Preliminary experiments demonstrate improvements with longer pre-training and larger unlabeled datasets.

Forecasting Challenges
  • Corporación Favorita Grocery Sales Forecasting: predicted 20th out of 1,671 teams.

  • Recruit Restaurant Visitor Forecasting: predicted 21st out of 1,248 teams.

  • Web Traffic Time Series Forecasting: predicted 43rd out of 1,095 teams.

Education

Education
Master's Degree in Applied Statistics
National Australian University
2014 - 2015 (1 year)
Education
Bachelor of Science Degree in Computer Science and Information Engineering
National Taiwan University
2004 - 2009 (5 years)