Research and Development of High-performance Shellcode Detector
A model capable of detecting shellcodes, a type of cyber threat, in raw network data.This project consisted of two large parts—the research of a model for very high-speed inference and its implementation. Due to NDA, I can't go into any details about the research side. Still, the main issue I can comment on is that the model should have a high detection rate with a shallow false positive rate to ensure false positives wouldn't overwhelm the analysts.One of the main project constraints was the speed since it acts as a firewall and therefore must take decisions to let packets go or not go through in real time. It was estimated that the solution would need to go at 1Gbps speed, at least, on an average laptop. Furthermore, the solution's security was also critical, which led me to use the Rust language that combines these two features. Data had to be processed and kept at the lowest levels of cache and use vectorization at the CPU level to reach the expected speed.The final solution was able to reach 5Gbps of an average laptop with a detection rate of over 95% and a false positive rate under 0.000000001%, which means one false positive per Terabyte of data.
Created MLJ.jl, Julia's Largest Machine Learning Framework
As part of my master's degree, I realized there was a significant shortcoming in the ecosystem of Julia programming language—the absence of a unifying machine learning framework, alike scikit in Python.I, therefore, designed, architected, and implemented the first version of that framework. The difficulty was designing a well-balanced interface, something generic enough to include all models but strict enough that it wouldn't be lengthy and too abstract to use.By the end of my master's, a dozen of the most fundamental machine learning libraries had been unified, and the project had attracted the Alan Turing Institute's attention. I've been invited to present it at the Julia Convention and ended up taking it over and continuing to develop it ever since.
Creator and Owner of Websek.co
Websek.co is a simple SaaS tool to alert users about potential security issues and misconfigurations on their website. I developed the whole project with the following stack and structure:• A REST API used by the front end with AWS Lambda • A scheduler to launch EC2 analysis instances when required using AWS Lambda and AWS EC2• A monitoring agent to verify the instances are healthy with AWS Lambda• An analysis tool based on OWASP ZAP• A PostgreSQL database• A simple Bootstrap and jQuery front end
Co-author of a Peer-reviewed Scientific Paper
I took responsibility for implementing and studying the random interchange loop model used to make a numerical analysis of quantum dynamics for ferromagnets.The project consisted of a numerical model analysis to determine how various attributes change depending on environmental parameters. This environment exists on a 4D lattice—three space and one time dimensions— and gets more and more accurate as the lattice increases. From earlier research, the margin of error was acceptable, starting from around 100-lattice.Before this paper, the only implementation of the model would take 10 minutes to simulate a 10-lattice. This is a minimal lattice that can't be accurately used for numerical analysis and would be too slow.I redesigned and implemented the model in C, creating a software that could simulate a 160-lattice in a few seconds. With such a performance, we could do a grid search on multiple parameters, giving us a global view of how each parameter affects the results. The paper is a collection of the most important results and explains its further implications.
Invester: Stock Price Prediction Bot
A stock price prediction bot written in Julia. Backstory: I was finishing my degree in mathematics, and having been interested for years in stock trading, I decided to try and write my trading agent. The idea was not to do real-time algo-trading but mostly mid-long-term suggestions.Tech: I created a framework to allow for different trading strategies so that I could independently compare different ideas. I also implemented backtesting mechanisms and various strategies, some heavily reliant on machine learning while others followed simpler signals. The code was deployed to the cloud and would run daily, giving a list of the top 10 suggestions to buy.Interesting observation: The more complex the algorithm was, the less well it handled the early COVID-19 period.
Twitter Sentiment Analysis
A sentiment analysis classifier based on Tweets.Designed, researched, and developed several classifiers to process and study tweets and predict the sentiment as either positive, negative, or neutral.
Train Ticket Cost Optimizer
A software to scrape train ticket providers to get notified when train tickets were at an advantageous price. The software would run daily and scrape data from multiple train ticket providers to find the best deal for a specific route, with specified constraints, such as time periods and number of station changes.