Steven B.

Steven B.

Data Engineer | Data Scientist

Oechsen, Germany
Hire Steven B. Hire Steven B. Hire Steven B.

About Me

Steven is a qualified Data Scientist and Data Engineer who develops and deploys full analysis pipelines for Machine Learning and data mining projects on large datasets. He has hands-on experience translating models into business insights and applying data knowledge to real-world business problems. Steven uses various technology stacks to deliver solutions to complex projects using a practical focus and collaborating in cross-functional teams.

Work history

UpStack
UpStack
Data Engineer | Data Scientist
2020 - Present (4 years)
Remote
  • Creating and implementing data analysis pipelines including data access, ingestion, munging/manipulation/cleansing, analysis/modeling, testing, and deployment/integration into business applications and services.

  • Enhancing operational aspects of businesses by increasing control of company data.

  • Working in cross-functional teams to provide data-driven solutions for increased efficiency and productivity.

Fraunhofer Institute for Integrated Circuits
Fraunhofer Institute for Integrated Circuits
Data Scientist
2017 - 2018 (1 year)
Dresden, Germany
  • Organized and analyzed large amounts of data by developing analysis pipelines and performed visualization of results via dashboards for non-technical audiences.

  • Assisted the business decision-making process by implementing adaptive Machine Learning systems on large time series datasets.

  • Implemented software in a Scrum team and collaborated with engineering and product development teams to meet customer requirements.

AUDI AG
AUDI AG
Data Scientist
2016 - 2016
Ingolstadt, Germany
  • Managed project activities in a goal-oriented manner, evaluating the knowledge transfer between domains of artificial and real images to minimize costs for labeled real data.

  • Enhanced Deep Learning approaches using Python to extrapolate patterns from large sets and predict new data by analyzing available datasets.

  • Created Machine Learning tools that analyzed front car camera images to highlight objects with bounding boxes.

Otto-von-Guericke-University Magdeburg
Otto-von-Guericke-University Magdeburg
Researcher | Data Scientist
2015 - 2015
Magdeburg , Germany
  • Worked on the development and implementation of intelligent flight behavior for aerial drones in a swarm intelligence research project.

  • Analyzed and interpreted patterns, recorded findings, and anticipating issues.

  • Conducted detailed research according to project requirements.

Inverso GmbH
Inverso GmbH
Software & Database Engineer Intern
2013 - 2013
Ilmenau, Germany
  • Developed and implemented an ETL tool with Java and embedded SQL to extract data from the source systems.

  • Developed an automated data integration process to streamline and simplify data analysis.

  • Offered business insights to non-technical audiences by aggregating and analyzing relevant data.

Portfolio

Data Scientist - Deep Transfer Learning Approaches for Object Recognition (AUDI)
Data Scientist - Deep Transfer Learning Approaches for Object Recognition (AUDI)

Deep Learning methods can achieve state-of-the-art results on challenging Computer Vision problems such as image classification, object detection, and face recognition. Deep neural networks are known for their massive need of labeled data for training huge network architectures. The main questions answered in this project were: a) to what extent artificial data is usable for training deep neural networks in object detection and classification, and b) how effectively is the learned model transferable to real-world data. The basic outcome of this project work was the development of a two-stage fine-tuning process where the models were initialized with pre-trained weights learned on ImageNet data, fine-tuned with mixed training data of both domains, and finally fine-tuned with target domain data subsets to reduce the amount of necessary expensive data. The proposed approach saved about half of the target domain data by yielding a comparable performance to the real-world baseline. Technologies used: Computer Vision, Image Processing, Recording and Preprocessing of Imaging Data, Deep Learning, Python, Sci-Kit Learn, Caffe.

Data Scientist - ACME 4.0
Data Scientist - ACME 4.0

The project is based on the combination of new kinds of highly integrated sensor systems with innovative signal processing algorithms. The planned acoustic condition monitoring electronic (ACME) platform can also be quickly configured for use with other components via software. This platform makes it possible to quickly and conveniently design individual applications that incorporate intelligent, distributed, communicative, and self-adapting processes. Performed data analysis using Machine Learning algorithms and developed a preprocessing pipeline and classification models. Technologies used: Feature Engineering and Selection, Preprocessing, Classification Algorithms, Python, Sci-Kit Learn, PySpark.

Data Scientist - Employees Dashboard GCP
Data Scientist - Employees Dashboard GCP

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics, and machine learning. GCP provides infrastructure as a service, platform as a service, and serverless computing environments. Developed an example dashboard based on GitHub employees dataset and GCP services. GCP offers a wide range of amazing services for scalable data analysis and analytics pipelines, from ETL to visualization.

Data Scientist
Data Scientist

The project involved recording thousands of input and sensory values from production machines. It is a multi-year project involving several partners. Analyzed data using Machine Learning algorithms and developed preprocessing and data analysis pipelines. Enhanced detection of machine breakdowns and optimized overall production performance. Analyzed and filtered potential features and sanitized and preprocessed data. Technologies used: Feature Engineering and Selection, Preprocessing, Classification Algorithms, Python, Sci-Kit Learn, PySpark, Hadoop.

Data Scientist - Census International Dashboard
Data Scientist - Census International Dashboard

The International Data Base (IDB) was developed by the U.S. Census Bureau to provide access to accurate and timely demographic measures for populations around the world. The database includes a comprehensive set of indicators, as produced by the U.S. Census Bureau since the 1960s. This is an example dashboard, built based on the GCP Census Bureau International dataset, and Google Cloud Platform services. GCP offers a wide range of amazing services to build fast and scaling data analysis and analytics pipelines, from ETL to visualization.

Data Scientist - New York Citi Bike Dashboard
Data Scientist - New York Citi Bike Dashboard

Citi Bike is New York City’s bike-share system and the largest in the nation. Citi Bike launched in May 2013 and has become an essential part of the transportation network. Citi Bike is available for use 24 hours/day, 7 days/week, 365 days/year, and riders have access to thousands of bikes at hundreds of stations across Manhattan, Brooklyn, Queens, and Jersey City. Developed an example dashboard based on the New York Citi Bike dataset and GCP services.

Education

MSc Data and Knowledge Engineering
MSc Data and Knowledge Engineering
Otto-von-Guericke University Magdeburg - Germany
2014 - 2016 (2 years)
BSc Computer Science
BSc Computer Science
University of Applied Sciences Schmalkalden - Germany
2011 - 2014 (3 years)