Well-rounded Data Analyst with 2+ years of experience working with agile projects, Big Data, AWS cloud technology, and developing algorithms on Python or Spark for ETL. I am passionate about descriptive analytics, data visualization, and application of data analysis solutions towards Big Data. Proven experience in interpreting and analyzing data to drive growth for companies such as Santander, BNP Paribas, and Vivo.
Create solutions towards Big Data using Shell Script, Hadoop framework (HDFS, HBase, Hive, Impala, Hue, Spark) for clients such as Santander, BNP Paribas, Vivo.
Developed Python scripts to automate tasks and perform data analytical procedures.
Designed solutions independently and maintained the production systems.
Developed, analysed, and interpreted structured and unstructured data using softwares and programming languages in order to extract insights for clients and support internal sub service lines.
Created predictive modeling and machine learning algorithms to enhance audit processes.
Built effective dashboards with Tableau that compared a variety of data simultaneously.
A project that included integrating all data sources on the cloud and perform ETL for daily and historical data. My role in the project was to develop algorithms for ETL using Python / Spark, perform data analytical procedures using Jupyter Notebook, and identify features and insights relevant to the business. Managed to develop pipelines to perform ETL on AWS cloud environment. Used Spark instead of Python for increased performance using huge amount of data. Technologies used: AWS Cloud Technology (EC2, S3, Glue, Redshift), Jupyter Notebook, Visual Studio.
The project required understanding data and providing support for the development of machine learning models that will be incorporated in marketing strategies. I performed ETL using Oracle SQL and Python, as well as created detailed logs, and performed data analytical procedures using Jupyter Notebook. Defined variables relevant to business goals and to the development of machine learning models. I also integrated external and internal data, in addition to electing relevant variables for the model. The variables identified on the data analysis were used on the development of machine learning models, and presented an accuracy between 80 % and 90 %.