Jaime is a versatile data scientist and developer with 4+ years of experience working in all stages of the data science pipeline for different organizations. He is highly-skilled in R and Python, also having direct experience working with Machine Learning, Deep Learning, NLP, and AWS. Achievements include developing several interactive RShiny dashboards and training speech synthesis models by using automatic speech transcription, being a regular contributor to data science projects on GitHub.
Create and implement data analysis pipelines, including data access, ingestion, munging / manipulation / cleansing, analysis / modelling, testing, deployment / integration into business applications and services.
Enhance operational aspects of businesses by increasing control of the company's data.
Working in cross-functional teams to provide data-driven solutions for increased efficiency and productivity.
Worked with structured data on the development of a data science pipeline that included data cleaning, feature engineering, and model building/deployment.
Created RShiny dashboards and managed multiple Amazon Web Services (AWS) S3, EC2, Lambda, API Gateway, Cognito, and RDS.
Developed an application that collected status updates and comments from Twitter and Facebook in order to perform sentiment and text analysis.
Trained a model that imitates the voice of David Attenborough. To train the speech synthesis models you need a dataset consisting of thousands of pairs of audio clips and their transcriptions. Extracting audio clips from recordings is easy. The difficult part is matching each audio clip to its transcript. I trained the Tacotron2 and WaveGlow models with audio clips extracted from the audiobook “Life on Earth” narrated by David Attenborough and with transcripts generated using Amazon Transcribe. Technologies used in the project: Speech Synthesis, Tacotron 2, Waveglow, Python, Amazon Transcribe, S3, Audacity.
The project involved using a Markov Chain Model to create a text autocomplete application. Word prediction is an assistive technology tool for writing that suggests words as you type. Technologies used in the project: R, R Shiny, Markov Chain Model.
Developed a movie recommendation application in Django. It uses a Content-Based Recommender System trained on movie descriptions to suggest movies that are similar to each other. The dataset for the analysis comes from The Movies Dataset and it contains 45,000 movies released on or before July 2017.
Education
Master's degree in Petroleum Engineering
Texas A & M University
2015 - 2016 (1 year)
Big Data Modeling and Management Systems; Introduction to Big Data; Deep Learning Specialization; Data Science