Carlos G.

Carlos G.

Machine Learning Developer

Tuusula, Finland
Hire Carlos G. Hire Carlos G. Hire Carlos G.

About Me

Carlos is an exceptional data generalist who brings vast experience in the design, implementation, and validation of data-intensive systems to all of his projects, along with deep expertise in machine learning and real-time stream processing. He has worked in the eCommerce and media industries, working for large corporations and startups. Carlos is a versatile engineer and looks forward to his next challenge.

Work history

MarkaVIP
Director of Data Science
2015 - 2016 (1 year)
Remote
  • Implemented real-time analytics on operations, modeled interventions on customer experience to address returns and cancellations, built a policy optimizer through retrospective simulation with historical data, and enabled it as a microservice.

  • Expanded the policy optimizer to improve order profitability by optimizing basket constraints and incentives.

  • Implemented various improvements to the product recommender, including the use of fine-grained recorded impressions as a negative signal and more flexibility in handling catalog metadata.

  • Built and deployed a foundational analytical backbone for the company in AWS with Kinesis, Redshift, and Spark.

  • Integrated continuous data ingestion from key systems into the analytical backbone, whenever practical, through low latency interfaces such as database replication.

  • Migrated some interaction tracking systems to the backbone and the recommender.

  • Conducted retrospective sourcing performance and pricing analysis by replaying row mutations continuously captured from database replication logs and stored in Redshift (Python/C++).

OracleMySQLRedshiftAmazon Kinesis C++JavaRPythonOptimization d3.jsMachine LearningStatisticsBayesian Statistics Recommendation Systems Apache SparkGitData ScienceData EngineeringETLLinuxMicroservicesRESTful Microservices REST APIs Amazon S3 (AWS S3) PyTest
Codento's clients
Software and Data Architect
2011 - 2015 (4 years)
Remote
  • Built an image upload/pre-processing pipeline for a media startup using Node.js and MongoDB on AWS. Included single sign-on with a Ruby on Rails app in the back end.

  • Created custom, interactive data displays for a bespoke structured messaging application using D3.js. Implemented real-time updates.

  • Implemented a structured messaging application. Contributed to the Python/Django back-end and the CoffeeScript front end.

  • Built a custom C# distributed data analysis pipeline to perform MATLAB jobs on AWS.

  • Designed and implemented a custom interactive data analysis and visualization for economic data along with a Python back-end and D3.js visualization.

  • Assembled a custom nurse schedule and route optimization system for a healthcare software startup. Worked on pre-processing and mixed integer model formulation for Gurobi with Python/Pandas/NumPy/PuLP, D3.js visualization of solutions, and Flask API.

  • Modernized the system design and implementation of a Java/Spring back end for real-time transport logistics. Improved scalability and performance.

  • Designed and implemented a reference application for a high-security network architecture for a banking customer with Scala/Play, Slick, and two-factor authentication.

  • Contributed to a large-scale online storage system implementation using Python and PostgreSQL. Contributed to embedded security appliances in C.

  • Developed a custom MATLAB system to tune a legacy application from data during black-box optimization (derivative-free).

Ruby on Rails (RoR) JavaCoffeeScriptNode.jsJavaScriptPythonScalaSlick MongoDBPostgreSQLd3.jsAWS CLICGurobi Optimization Mixed-integer Linear Programming Front-end CSSHTMLFlaskBottle.py CVXOPT GitMatlabData ScienceData EngineeringTornadoLinuxC#.NETData ArchitectureArchitecture SaaSMicroservicesRESTful Microservices REST APIs SpringAmazon S3 (AWS S3) PyTest
Perceptive Constructs
Data Engineer
2010 - Present (14 years)
Remote
  • At Staq.io, a Fintech startup (GCP, RunPod):  AI prototype to assist compliance officers. End-to-end NLP pipeline from PDF data extraction and transformation, through model fine-tuning, to RaG and prototype UI.

  • At Intel471, a Cybersecurity company (AWS):  Multilingual NLP pipeline for knowledge extraction from discussion forums. Data curation, extraction, transformation and annotation. Model fine tuning, custom decoding, evaluation and deployment. Data visualisation.

  • At Bank Al Etihad, a Retail Bank (AWS, On-Premise):  Analytics pipeline and data warehouse. Anonymisation. Streaming data fusion.  Bespoke offering recommender from transaction data.  Deposit rate optimisation, causal estimation. Churn, loan, and card risk models.

  • At Verto Analytics, a Behavioural Analytics startup (AWS):  Processing pipeline optimisation  Workflow orchestration for deliveries.

  • At MarkaVIP, an E-commerce startup emphasizing Flash Sales (AWS):  Bespoke low latency recommender

Freelance Clients
Freelance Data Scientist and Engineer
2010 - Present (14 years)
Remote
  • Designed and built various machine learning models for risk assessment, customer churn, and rate optimization for a retail bank using Python, Pandas, NumPy, SciPy, PyMC, TensorFlow, and scikit-learn. Deployed with custom FastAPI back end.

  • Designed and built a custom system for products offering recommendations from transactions and demographic data (retail bank) using Python, Pandas, NumPy, and C++. Deployed with custom FastAPI back end.

  • Designed and built a continuous analytics backbone and a data warehouse in a hybrid onsite/AWS environment with Kafka, Scala, Python, and Redshift.

  • Designed and built a real-time data fusion pipeline to create a complete picture of customer transactions from different systems.

  • Assembled a low-latency custom recommender system for eCommerce flash sales using Python and C++ for an eCommerce startup. Deployed with custom C++/Boost.Asio back end.

  • Developed a custom marketing message timing optimizer for eCommerce using Python and R for an eCommerce startup.

  • Built a bespoke transaction risk analysis system for eCommerce using Python and R.

  • Refactored, optimized, fixed, streamlined, documented, and further developed a complex and largely undocumented Airflow delivery workflow.

  • Ported to Scala, optimized, tested, and documented Spark UDF/UDAF/UDTs written in Java and included functions dealing with text and URL matching, information extraction from text and URLs, and supporting data structures. Wrote Python bindings.

  • Built a prototype knowledge management system for a manufacturing/engineering company. Performed technical due diligence for the development with a partner of CAD software to be distributed as part of this system.

RedisC++Node.jsJavaScriptRPythonMachine LearningAmazon Kinesis Amazon Elastic MapReduce (EMR) AWS IAM AWS ELB AWS CLIRedshiftRedshift Spectrum Amazon AthenaSpark SQLSpark MLApache SparkFastAPIFlaskApache Airflow TensorflowKerasPytorchPandasSciPyNumpyPyMC GithubGitHub API DockerMLFlow PrometheusGrafanaApache Kafka ConfluenceScalaDeep LearningElasticsearch SQLDelta Lake PysparkBayesian Statistics StatisticsPostgreSQLCd3.jsOptimization Mixed-integer Linear Programming RocksDB Recommendation Systems Distributed Computing Natural Language Processing (NLP) Jupyter NotebookHadoopGitStatsModels Data ScienceData EngineeringTheanoSeabornMatplotlibETLLinuxScripting Data Extraction Beautiful Soup Command-line Interface (CLI) DevOpsKubernetesData ArchitectureDatabase ArchitectureArchitecture Back-end SaaSGeopandas Shapely AlgorithmsMicroservicesRESTful Microservices REST APIs PyTestAmazon S3 (AWS S3) Boost.Asio
Nokia
Chief Software Architect
2009 - 2010 (1 year)
Remote
  • Prototyped a voice- and gesture-based user interface for in-car mobile phone usage at various levels of fidelity ranging from Wizard of Oz to software proof-of-concept (Python, Java, Sphinx).

  • Defined software architecture for a family of in-car products, with input to hardware platform selection.

  • Planned costs, schedule, and execution of multiple new product development scenarios.

  • Organized and moderated usability studies for prototype validation and iteration.

  • Conducted rigorous feasibility studies and software architecture reviews at Gear.

JavaPythonSoftware ArchitectureBluetoothPlanning UsabilityUsability TestingSpeech Recognition Architecture Technical Leadership Leadership
Nokia
Senior R&D Manager
2003 - 2009 (6 years)
Remote
  • Recruited and ramped up the Maemo application framework team from scratch.

  • Defined the application framework architecture and development strategy.

  • Led the implementation of three major software generations along with updates.

  • Impacted Nokia's entry into open-source development.

  • Developed a considerable subcontracting and partnering network for Linux development.

  • Contributed to the initial product concept definition.

IT Project Management Agile Project ManagementSoftware ArchitectureOpen Source Due Diligence Recruitment Leadership
Nokia
Senior Software Engineer
2001 - 2003 (2 years)
Remote
  • Prototyped a small-footprint relational database for small Linux devices in C++ for the Nokia Research Center.

  • Prototyped a personal information manager for handheld devices based on semantic web technology in Python.

  • Studied and evaluated architectural options for an application framework aimed at Linux-based handheld devices adopted by the nascent Maemo project.

PythonCC++DatabasesEmbedded Linux Semantic Web RDF Software ArchitectureGraphical User Interface (GUI) GNOME Qt
Freelancer clients
GIS/Computer Graphics Freelancer
1998 - 2001 (3 years)
Remote
  • Built a geographic information system (GIS) to edit the land cadaster for the Portuguese Ministry of Agriculture using C++, Windows, and Oracle technologies.

  • Constructed a custom C++ framework for real-time manipulation of topologically integrated geographic vector data.

  • Assembled a geographical decision support system for semi-automated execution and optimization of land-consolidation projects for specialized consultancy using C++.

  • Developed, licensed, and finally sold a ray-tracing rendering module for use with interior design software written in C++.

  • Shaped GIS to edit an olive tree cadaster for the Portuguese Ministry of Agriculture, with integrated olive tree recognition from aerial photography, built with C++ in Windows.

  • Designed and implemented flooring tiling algorithms for 3D interior design software.

OraclePythonC++Computational Geometry Computer GraphicsGISOptimization

Portfolio

RawHash

An experimental, binary-friendly alternative to using a hash as a key value cache, in C++, for Node.js.Keys are binary buffer objects rather than strings. Values are arbitrary objects.RawHash is built on Google SparseHash and MurmurHash3.

Rdb-parser

An asynchronous streaming parser for Redis RDB database dumps, written in 100% JavaScript, intended for use in Node.js, and released as open-source. It's useful for diagnostics, data conversion, or even as part of a data processing pipeline.

Incremental Random Forest

An implementation in C++, with Node.js and Python bindings, of a variant of Leo Breiman's random forests.The forest is maintained incrementally as samples are added or removed - rather than fully rebuilt from scratch every time to save resources.It is not a streaming implementation, as all the samples are stored and will be re-seen when required to recursively rebuild invalidated subtrees. The effort to update each tree can vary substantially, but the overall effort to update the forest is averaged across the trees and tends not to vary significantly.

Data-Graft.js

An animation-friendly, differential document object model (DOM) template engine that is self-contained and framework-agnostic. Built to experiment with dynamic data/DOM binding, focusing on flexibility for animating data-change transitions.

Education

Education
Master's Degree in Computer Science
Universidade Nova de Lisboa
1991 - 1996 (5 years)