AWS Cloud Engineer - AWS Cloud Migration of marketing platform
The project was the migration of a marketing platform to the cloud. AWS platform was chosen. The entire pipeline was migrated from SSIS packages to PySpark ETL processes, with S3 for storage, AWS Lambda for monitoring S3 buckets/triggering the pipeline workflow/some small microservices, EC2 for computation (Second stage of the project started to move to AWS EMR), RDS/Redshift for database, AWS Step Functions/AWS DynamoDB for workflow, CloudFormation for IaaS, Code Commit for source control, for Snowflake/Tableau for visualization/reports. I created a cloud environment for the on-premise old solution. The whole data pipeline was built on AWS from the S3 repository to the RDS/Redshift storage. Managed to reduce the data processing time by 25%, with the possibility to improve it according to scalability. The first stage of migration was moved to production successfully. Technologies used in the project: AWS EC2, AWS S3, AWS Redshift, AWS RDS, AWS Step Function, AWS CloudFormation, AWS CodeCommit.
Data Engineer / Data Scientist - DataLake + Recommendation System for Real Estate company
The first stage of the project was moving all changes in the Oracle database logs in Kafka queues to the S3 data lake, using NiFi as an ETL server, distributing the workflow of queues to the multiple destinations and transformations. Also, PySpark was used to manage different data transformations, data quality, data merging, and moving to S3 and to Redshift. The second stage was building the Recommendation System using ScikitLearn, Apache Spark MLLib, and Numba. This last one created an optimization code that reduces the generation of the database recommendations from days to hours since each county has hundreds/thousands of properties, and the similarity calculated among them was taking many days. I created a data lake and data scientist tasks using AWS S3. Built a recommendation system that reduces processing time from days to hours. Technologies used in the project: AWS EMR, AWS S3, AWS Redshift, AWS CodeCommit, Zeppelin, NiFi, Kafka, Python (Scikit Learn, Numpy).
Datawarehouse Engineer - Re-Engineering of IDB (International Development Bank) Data warehouse
The project was a re-engineering of the existing financial data warehouse, moving from PL/SQL & SQL Server ETL packages, to IBM Datastage, taking into account auditing, slowly/rapidly changing dimensions, bridge tables, snapshot fact tables.
It included Multidimensional models, Facts, Slowly changing dimensions, etc. and support for reporting: tuning, business rules, integration with the data warehouse. Performed the re-engineering of the existing financial data warehouse, introducing a new data warehouse that supports existing and new key areas of the IDB. Manage to reduce the time of ETL processes by using the IBM Datastage server with better visualization of the data flow and scheduling. The Financial department stared using this new data warehouse in Production. Technologies used in the project: IBM Datastage, IBM DashDB, IBM Db2, PL/SQL, Oracle Oracle SQL tuning.