Data Scientist - Deep Transfer Learning Approaches for Object Recognition (AUDI)
Deep Learning methods can achieve state-of-the-art results on challenging Computer Vision problems such as image classification, object detection, and face recognition. Deep neural networks are known for their massive need of labeled data for training huge network architectures. The main questions answered in this project were: a) to what extent artificial data is usable for training deep neural networks in object detection and classification, and b) how effectively is the learned model transferable to real-world data. The basic outcome of this project work was the development of a two-stage fine-tuning process where the models were initialized with pre-trained weights learned on ImageNet data, fine-tuned with mixed training data of both domains, and finally fine-tuned with target domain data subsets to reduce the amount of necessary expensive data. The proposed approach saved about half of the target domain data by yielding a comparable performance to the real-world baseline. Technologies used: Computer Vision, Image Processing, Recording and Preprocessing of Imaging Data, Deep Learning, Python, Sci-Kit Learn, Caffe.
Data Scientist - ACME 4.0
The project is based on the combination of new kinds of highly integrated sensor systems with innovative signal processing algorithms. The planned acoustic condition monitoring electronic (ACME) platform can also be quickly configured for use with other components via software. This platform makes it possible to quickly and conveniently design individual applications that incorporate intelligent, distributed, communicative, and self-adapting processes. Performed data analysis using Machine Learning algorithms and developed a preprocessing pipeline and classification models. Technologies used: Feature Engineering and Selection, Preprocessing, Classification Algorithms, Python, Sci-Kit Learn, PySpark.
Data Scientist - Employees Dashboard GCP
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics, and machine learning. GCP provides infrastructure as a service, platform as a service, and serverless computing environments. Developed an example dashboard based on GitHub employees dataset and GCP services. GCP offers a wide range of amazing services for scalable data analysis and analytics pipelines, from ETL to visualization.
Data Scientist
The project involved recording thousands of input and sensory values from production machines. It is a multi-year project involving several partners. Analyzed data using Machine Learning algorithms and developed preprocessing and data analysis pipelines. Enhanced detection of machine breakdowns and optimized overall production performance. Analyzed and filtered potential features and sanitized and preprocessed data. Technologies used: Feature Engineering and Selection, Preprocessing, Classification Algorithms, Python, Sci-Kit Learn, PySpark, Hadoop.
Data Scientist - Census International Dashboard
The International Data Base (IDB) was developed by the U.S. Census Bureau to provide access to accurate and timely demographic measures for populations around the world. The database includes a comprehensive set of indicators, as produced by the U.S. Census Bureau since the 1960s. This is an example dashboard, built based on the GCP Census Bureau International dataset, and Google Cloud Platform services. GCP offers a wide range of amazing services to build fast and scaling data analysis and analytics pipelines, from ETL to visualization.
Data Scientist - New York Citi Bike Dashboard
Citi Bike is New York City’s bike-share system and the largest in the nation. Citi Bike launched in May 2013 and has become an essential part of the transportation network. Citi Bike is available for use 24 hours/day, 7 days/week, 365 days/year, and riders have access to thousands of bikes at hundreds of stations across Manhattan, Brooklyn, Queens, and Jersey City. Developed an example dashboard based on the New York Citi Bike dataset and GCP services.