I am a Data Engineer with 4+ years proven SQL experience, passionate about building innovative predictive analytics and data science solutions. Coming from a technical background with 5+ years expertise in Python programming, allows me to research unknown domains as well as learn cutting-edge technologies effectively. Most importantly, I am a team player with a bias for action and a strong drive to keep learning. Right now I’m also fascinated by Reinforcement learning, Blockchain and their significance for the future.
Developed an Airflow environment and CD pipeline and migrated our pipelines to it, greatly improving ETL monitoring and maintenance. Automated ETL pipeline creation in Airflow, making time spent on new pipelines almost negligible.
Automated the creation and delivery of I&A fee reports to all our clients, with a 98% success delivery rate.
Created automated, real-time dashboard for advertisers to see ad and publisher performance and significantly increasing data transparency.
Created new Airflow environment for our ETL pipelines with continuous, automated deployment using puppet, and a testing environment using docker. Automated the ETL pipeline creation for our most commonly used patterns. The user drops his query/ddl/ingSpec in the folder and a pipeline is generated. Monitoring and maintenance became much more sustainable allowing our team to spend more time actually adding value instead of fire fighting.
Launched and now is the preferred home for our ETL processes. Currently used by our whole data engineering team.
Technologies Used: Python, MySQL, Hive, Druid, Puppet, Docker, Linux
Built data infrastructure in the cloud to allow real time data ingestion into the warehouse, data models and visualization dashboards. Created monitoring dashboards and alerts to keep track of usage and resources. Being able to do analysis real time has been very helpful to better understand users at these early stage. Also, it provides an easy and accurate way to monitor the usage of third party API’s.
Launched (Beta) currently only open to a small
audience but should have no difficulties scaling.
Technologies Used: Python, Apache Beam, Graph DB, BigQuery, Pub/Sub,DataFlow, Cloud functions, Linux
Automated communication via cloud functions with our DB and the Plaid API to keep bank account/transaction information synched for all users. Pretty much eliminated the need to make client side calls to fetch new data, decreasing latency and improving user experience.
Launched (Beta) currently only open to a small audience but should have no difficulties scaling.
Technologies Used: Python, Graph DB (Firestore), REST, DataFlow, Cloud functions, Linux, Flask
Built complex business login into ETL process to provide near real time insight into the revenue of the company as a whole and by offices. Provided an easy and straight forward way to detect anomalies and discrepancies in the data that could impact the health of the company.
The dashboard is currently used across the company and by the CEO and COO.
Technologies Used: Python, MySQL, Tableau, SuperX, Airflow, Linux