Kevin is a driven Data Engineer focusing on building real-time Data ETL pipelines using Python, data streaming with Kafka, and further processing or even modeling with Spark for creating real-time interactive dashboards for visual insight. He creates expert models through Machine Learning or Deep Learning with Python for Time Series problems, Classification tasks as well as NLP. He also has good knowledge of AWS, GCP, and Azure for serverless applications that involve data manipulation, storing, processing or streaming.
Worked on designing and building Data Pipeline for BI solutions. Integrated REST API endpoints from applications like Shopify and Rutter
Migrated Cloud based warehouse Data from Snowflake and AWS Redshfit to Google Big Query using Airflow. Used Machine Learning to predict the likelihood of an organic visit to one of the client stores.
Automated data upload/aggregation in Postgres using Python for easy extraction from backend. Setup migration scripts with rollbacks in knex for frontend DB using JavaScript. Set up an ETL pipeline using AWS transfer, s3, lambda (Python) and Postgres RDS.
Worked on the creation of Digital Twins for Supply Chain procedures, presenting the data architecture proposal to management.
Streamed real-time data from MySQL source to GBQ and then replicated it to Azure and AWS using a Multi-Node Kafka Cluster.
Applied real-time processing to data using Spark for streaming from Database to the Cloud. Automating data extraction from Cognos Framework Manager XML model using Python.
Performed web scrapping, mining and validating data relevant to client requests, going from commodity futures, general financial market data and Geo-location.
Worked with Data ETL for customer insight, as well as feature engineering, data modeling using supervised and unsupervised learning for forecasting and classification tasks of multiple phenomena and events.
Created a Python script to upload batches of data directly into Google BigQuery for testing a serverless approach to data migration. Performed string matching and database merging by implementing NLP techniques such as Edit-based measures and Token-based measures combined with machine learning .
The project involved assisting the team in building the ETL pipeline that connected to different endpoints using Typescript. The transformation/validation of the data was done using EMR, however, the orchestration of the pipeline was done with Kubernetes clusters. I also built most of the views for the dashboard that the underwriting team would use, by leveraging the DBT cloud.
The project involved building a product that would flag potential fraudulent invoices. In this project, I leveraged my skills in Machine Learning and Feature Engineering to build my solutions. Another of my tasks was to build a Pipeline that would convert Geolocation data into actual readable addresses for our clients.
The project involved creating an Airflow ETL Pipeline. Worked on connecting it to different endpoint be it Datalakes, API endpoints or SFTP servers extracting data and pushing to AWS s3, from which point I created a Glue job that transformed, validated the data and pushed it to the warehouse in Redshift.
Education
Master's degree, Data Engineering
Jacobs University Bremen
2017 - 2019 (2 years)
Masters, International Business Management
Universitat Autònoma de Barcelona
2009 - 2010 (1 year)
Bachelor's degree, Economics and Business Administration