Denis V.

Denis V.

Data Scientist/Data Engineer

Russia
Hire Denis V. Hire Denis V. Hire Denis V.

About Me

Denis is a senior full-stack AI engineer and data scientist, highly skilled in modern generative tech (GPT-4, Midjourney, etc.), machine learning, ETL pipelines, data analysis, mathematical modeling, big data, and MLOps. He has a Ph.D. in mathematics, and his data science expertise includes probabilistic risk modeling, revenue forecasting, geospatial data analysis, handwriting recognition, anomaly detection in time series, data engineering, and team leading.

Work history

OkGPT
Founder, CTO
2023 - Present (1 year)
Remote
  • Created OkGPT, an AI personal assistant bot. It accepts a text, voice, or video message, transcribes it, and sends the result to GPT-4.

  • Integrated multiple APIs, including Telegram, OpenAI, Google, Redis, Amplitude, and more.

  • Optimized the code to parallelize user queries' processing to increase the bot performance by magnitude.

  • Supervised three team members and a few external collaborators.

  • Implemented support for most of the languages in text and voice messages.

  • Set up continuous integration/deployment procedures.

Python 3 Telegram Bot API Telegram Bots Telegram Messenger API AsyncIOPython AsyncIOAsync/Await OpenAI GPT-4 API OpenAI GPT-3 API GPT Generative Pre-trained Transformer 3 (GPT-3) Generative Pre-trained Transformers (GPT) Speech to Text Google Speech API Text to Speech (TTS)Speech Recognition Speech Synthesis Natural Language Processing (NLP) Deep LearningGoogle APIGoogle APIs AIOps Machine Learning Operations (MLOps) PythonDataDogAmplitude SeleniumSelenium API RedisPostgreSQLSQLAlchemyDockerDocker ComposeRailway PoetryMypy CI/CD Pipelines GitGithubAPI Hooking Web Search CloudContinuous Deployment Continuous Integration (CI)Continuous Delivery (CD)Dashboards Data Analytics AnalyticsCustomer Retention User Retention PDF Scraping DatabasesCircleCI Slack APIUnit TestingPyTestpylint Asynchronous I/O Coroutines
Turn LLC
Senior Python Engineer
2022 - 2023 (1 year)
Remote
  • Developed an app that translated text in English to a special pseudo-phonetic alphabet.

  • Acted as a consultant to help define the deliverables and then the overall architecture of the translation tool.

  • Helped to define text annotation requirements and oversaw the annotation process.

Programming PythonGenerative Pre-trained Transformers (GPT) GPT Natural Language Processing (NLP) CSV
Israel-based HR Tech Startup
Freelance Senior Data Scientist and Data Engineer
2021 - 2022 (1 year)
Remote
  • Created a BI dashboard to Query and summarize a vast amount of semi-structured data.

  • Set up and tuned an Elasticsearch cluster and Kibana on AWS cloud.

  • Developed an ETL pipeline to ingest a terabyte of raw data into Elasticsearch.

  • Architected and directed the creation of a core Similarity engine to score candidates.

  • Created a Big Data pipeline in Databricks and Spark to enrich the input data and prepare the features for ML.

  • Used pre-trained NLP deep neural networks to create semantic text embeddings, which significantly increased the Similarity engine output results.

  • Developed a score using Spark GraphX to measure the company's attractiveness in the job market.

  • Prepared custom deep learning models to build richer embeddings, including various data sources and metadata.

  • Led communications with external data providers and created infrastructures to interface with their APIs.

  • Drove implementation of best DevOps and MLOps practices to improve reliability and reproducibility of ETL, feature generation, models' training, and inference subsystems.

Elasticsearch KibanaAmazon S3 (AWS S3) Amazon EC2 AWS CLIVisual Studio Code (VS Code) Jupyter NotebookFlaskDatabricks SparkDelta Lake ParquetJSONStar Schema FAISS Apache SparkUser-defined Functions (UDF) Spark SQLDeep Neural Networks Generative Pre-trained Transformers (GPT) GPT Natural Language Processing (NLP) Machine Learning Operations (MLOps) PythonPython 3 Spark MLJIRAMonday.com Big DataData ScienceData EngineeringMachine LearningData Wrangling ETLGraphXCloudData Analytics Data IntegrationMongoDBDeep LearningObjectives & Key Results (OKRs) KerasAlgorithmsXGBoostPredictive Modeling Programming Recommendation Systems Generative Pre-trained Transformer 3 (GPT-3) Hugging Face PostgreSQLAzure DatabricksData AnalysisLinear Regression Random Forests Random Forest Regression Data ModelingPattern Matching Language Models Data Matching LSTMForecastingAmazon Web Services (AWS) Data Build Tool (dbt) DevOpsMathematical Analysis Full-stack Architecture PytorchCSV Amazon Machine Learning Amazon DynamoDB API IntegrationDatabases
US-based Ops/Tech Startup
Freelance Senior Data Scientist and Data Engineer
2020 - 2021 (1 year)
Remote
  • Built a foundational end-to-end machine learning solution that predicts fair prices of real-estate properties, thus eliminating a need for manual assessment and enabling the company to run its business by providing quick responses to its customers.

  • Designed and implemented an automatically refreshing ETL pipeline that injects, cleans, joins, and enriches new data from AWS S3 storage daily.

  • Developed an interpretable machine learning model with Scikit-learn, CatBoost, Lifelines, FBProphet, FAISS, and SHAP that consists of several submodels and satisfies business monotonicity constraints.

  • Set up a continuous machine learning procedure for daily model retraining and redeploying based on the newly collected data.

  • Designed and implemented an automatic model promotion mechanism to ensure that models produced via the daily retraining process get deployed to production only if they have sufficiently good performance metrics and satisfy business constraints.

  • Created a historical data simulation system to generate synthetic data before the company’s launch and enable backtesting capabilities.

  • Architected and built the required infrastructure in AWS cloud: EC2 instances and VPCs, Docker environments for development, testing, and production, an Airflow pipeline for ETL and ML, and MLFlow model storage.

  • Created various dashboards for data exploration and data quality management, model performance monitoring, and visualized predictions.

  • Supervised other data science team members and coordinated with the engineering team.

Amazon S3 (AWS S3) Amazon RDSAmazon EC2 AWS Elastic BeanstalkAWS Lambda Amazon SageMaker GrafanaMLFlow Apache Airflow MetabaseAmazon CloudWatch AWS CloudFormationAirtableAmazon Elastic Container Registry (Amazon ECR) CircleCI GitDockerDocker ComposePythonDagster JupyterJupyter NotebookTime Series Predictive Modeling Analysis Dask GISMachine LearningData ScienceData EngineeringJIRAAWS CLIFlaskSQLPostgreSQLSQLAlchemyPandasNumpyCatBoost Data pipelinesData ArchitectureETLData VisualizationData Validation Data Wrangling ModelingAmazon EBS Solution Architecture Software ArchitectureCloudData Analytics Data IntegrationData Inference Business Intelligence (BI) ETL Tools APIsPython 3 Machine Learning Operations (MLOps) ARIMA AlgorithmsXGBoostProgramming Geopandas Data AnalysisLinear Regression Random Forests Random Forest Regression Data ModelingData Matching ForecastingAmazon Web Services (AWS) DevOpsMathematical Analysis Technical Leadership Full-stack Architecture CSV Amazon Machine Learning API IntegrationDatabases
Netology
Lecturer
2019 - 2020 (1 year)
Remote
  • Prepared and lectured a course on calculus for data scientists.

  • Developed and presented a course on linear algebra for data scientists.

  • Designed, prepared, and lectured a course on probability theory for data scientists.

Mathematical Modeling Machine LearningData ScienceMathematics
KPMG
Senior Data Scientist
2017 - 2019 (2 years)
Remote
  • Created a machine learning model that predicted revenues for a retail store chain based on store location, local demographic data, GIS features, seasonality, and other factors.

  • Developed and deployed an interpretable machine learning model that scored B2B customers for payment default risks and provided explanations for the scores. The model massively reduced workload for weekly risks assessment.

  • Built a probabilistic Bayesian machine learning model to predict which apartment buildings still under construction would fail to be commissioned in time. The model helped reduce the funds needed to hedge risks by two times.

  • Developed and deployed NLP models to automatically label a vast body of housing contracts by contract type and extract contractor party names, address entities, and other attributes.

  • Constructed and deployed a model to predict the problematic clogging of the evaporator in a chemical factory. This allowed for the timely preemptive service of the unit before it broke down, saving millions of dollars in production time.

  • Led and mentored a team of junior and middle data scientists in the projects mentioned above.

  • Communicated with clients, ensuring business goals were correctly translated into data science and machine learning tasks—explained insights and models to clients.

  • Architected ETL pipelines, including data acquisition, data ingestion, merging internal and external datasets, data cleaning and validation, data transformation, and feature engineering on several distinct projects.

  • Designed model performance metrics and their measurement protocols on several distinct projects.

  • Developed an ML system for a retail bank to recommend bank products to clients based on their past transactions' patterns. This included building an ETL pipeline and an ML recommendation system.

Data ScienceMathematical Modeling Machine LearningBig DataArtificial Intelligence (AI) Apache HiveHadoopSparkDash PlotlyPandasGitJupyter NotebookPythonPytorchTensorflowPysparkSpark SQLData Wrangling DockerDocker ComposeSQLData AnalysisCatBoost XGBoostTime Series Anomaly Detection Analysis of Variance (ANOVA) KerasCircleCI Bayesian Statistics Data Analytics Business Intelligence (BI) Solution Architecture Software ArchitectureData IntegrationData Inference ETLETL Tools APIsDeep LearningPython 3 Machine Learning Operations (MLOps) Natural Language Processing (NLP) GPT Generative Pre-trained Transformers (GPT) Deep Neural Networks Apache SparkOCRARIMA AlgorithmsPredictive Modeling Programming Image ProcessingRecommendation Systems PostgreSQLAzure Geopandas Linear Regression Random Forests Random Forest Regression Data ModelingComputer VisionHandwriting Recognition Language Models ARIMA Models SARIMA LSTMForecastingRPrefectDevOpsStatistical Analysis Google Cloud Platform (GCP) Mathematical Analysis Technical Leadership Full-stack Architecture Manufacturing CSV
National Research University — Higher School of Economics
Centre for Advanced Studies (CAS) - Postdoctoral Researcher
2015 - 2017 (2 years)
Remote
  • Invented a novel mathematical method for cross-frequency synchronization analysis in the human brain.

  • Implemented the method as a MATLAB toolbox and ran tests confirming that the results agreed with previously known scientific data.

  • Prepared and published the method and findings in a top-level journal.

  • Supervised the master's degree projects of several students.

  • Lectured a master's-level course on computational neuroscience.

PythonMatlabMathematical Modeling ETLData Preparation Signal Processing Medical ImagingEEG EEG Libraries for Python Software ArchitectureSolution Architecture Research Science Data ScienceLife Science APIsMathematical Analysis
University of Rome (Tor Vergata)
ERC Advanced Grant Postdoctoral Researcher
2013 - 2015 (2 years)
Remote
  • Discovered a new geometric phenomenon accountable for the rigidity of certain mathematical models related to heat conduction in crystals.

  • Discovered a new stability property of attractors of multidimensional piecewise isometry maps related to Markov field models.

  • Discovered that almost every interval translation map of three intervals is finite type.

  • Prepared papers describing the findings and published them in high-ranking journals.

Mathematical Modeling Mathematics Research Science Mathematical Analysis
Institute for Basic Research in Developmental Disabilities
Visiting Scientist (Consultant in Mathematics and 3D Scanning)
2013 - 2015 (2 years)
Remote
  • Created and verified mathematical models for the growth of blood vessels in the human placenta during the gestation period.

  • Developed the protocol for 3D data collection, including 3D surface scanning and micro-CT scans of the specimen.

  • Created an ETL pipeline to clean up and preprocess the collected samples.

  • Analyzed the collected data, fitted the mathematical models, and interpreted the findings.

Data ScienceData Processing3D Reconstruction3D ScanningMathematical Modeling Machine Vision PythonMatlabETLResearch Science Life Science Medical ImagingAPIsAlgorithmsProgramming Data AnalysisData ModelingComputer VisionHealthcare Statistical Analysis Mathematical Analysis
KTH Royal Institute of Technology
Göran Gustafsson Postdoctoral Researcher
2012 - 2013 (1 year)
Remote
  • Established that the rotation numbers of circle maps' semigroups define their generators.

  • Discovered a fractal structure of attractors of piecewise isometry maps related to Markov field models.

  • Prepared the papers describing the findings and published them in leading journals.

  • Lectured a PhD-level course on the structural stability of dynamical systems.

MatlabMathematical Modeling Mathematics Research Science Mathematical Analysis
SISSA
Postdoctoral Researcher
2010 - 2012 (2 years)
Remote
  • Discovered a new class of dynamical systems that have persistent massive attractors.

  • Established a deep relationship between skew product dynamical systems over Markov chains and nonlinear random walks.

  • Prepared the papers describing the findings and published them in major journals.

Mathematical Modeling Mathematics Science Research Mathematical Analysis
A4Vision
Researcher and Software Engineer
2004 - 2007 (3 years)
Remote
  • Designed and implemented algorithms for face detection on a 2D image, facial features detection, and alignment on a 3D surface.

  • Implemented and tuned 3D surface reconstruction algorithms.

  • Built and implemented statistical test procedures for new machine learning algorithms for face recognition.

  • Constructed and implemented a system for automatic test report generation based on log parsing.

  • Implemented and managed the automated build system dedicated to the building server.

  • Calibrated optical cameras and lasers for 3D scanners.

  • Migrated the algorithmic core to an embedded platform.

GitMathematical Modeling Artificial Intelligence (AI) Embedded Development Neural NetworksMachine LearningSubversion (SVN) MatlabC++Machine Vision 3D ReconstructionResearch SoftwareSoftware ArchitectureSolution Architecture AlgorithmsProgramming Image ProcessingComputer VisionMathematical Analysis Software DevelopmentComputer Science
Parascript
Intern
2000 - 2004 (4 years)
Remote
  • Developed a mathematical background for novel machine learning methods for handwritten text recognition.

  • Implemented novel methods for handwritten text recognition as a C++ library.

  • Presented my research at scientific conferences and seminars.

Mathematical Modeling Artificial Intelligence (AI) Machine LearningOpenCVC++OCRResearch Science SoftwareAlgorithmsImage ProcessingLinear Regression Computer VisionHandwriting Recognition Language Models Mathematical Analysis Computer Science
Generative Tech Startup (ChatGPT)
AI Engineer
Present (2024 years)
Remote
  • Developed an MVP of an app to query enterprise data in natural language. Given access to a database and a question in natural language about the data, the app would output the answer as a plot or a small table.

  • Engineered and fine-tuned the prompts to improve the quality and correctness of SQL code generation.

  • Created an automatic annotator for the database columns and the final table.

PandasBig DataGenerative Pre-trained Transformers (GPT) GPT Natural Language Processing (NLP) Text Generation Code Generators SQLLanguage Models Fine-tuning CSV ChatGPT OpenAIAPI IntegrationOpenAI GPT-3 API Chatbots Databases
Blockchain Security Company
Senior Machine Learning Engineer
Present (2024 years)
Remote
  • Created a machine learning model to automatically detect malicious smart contracts before they can cause harm.

  • Built a visualization tool for model output to audit its decisions.

  • Deployed the model to AWS cloud platform as a Lambda serverless function.

BlockchainEthereum Smart Contracts Smart ContractsAWS Lambda Generative Pre-trained Transformers (GPT) GPT Natural Language Processing (NLP) Linear Regression Decision Tree Regression CSV Databases
Skillbox
Data Science Evangelist
Present (2024 years)
Remote
  • Reviewed and improved core courses in mathematics, data science, and machine learning.

  • Supervised the creation of new courses, including video lectures and exercises, on Data Science, Analytics, SQL, Power BI, and Tableau.

  • Recruited, interviewed, and screened lecturers and tutors for new courses.

PandasMathematical Modeling Machine LearningData ScienceMathematics Python 3 AlgorithmsXGBoostLinear Regression Random Forest Regression Data Analysis
Artec Group
Researcher and Software Engineer
Present (2024 years)
Remote
  • Designed and implemented biometric machine learning face recognition algorithms.

  • Created and implemented statistical test procedures for new recognition algorithms.

  • Developed calibration procedures from 3D laser and flash scanners.

  • Refactored Windows32 code to make it cross-platform.

  • Implemented and tuned 3D surface reconstruction algorithms.

  • Applied software for barcode encoding and scanning.

  • Managed the build server and was responsible for CI/CD process for our team.

Mathematical Modeling Artificial Intelligence (AI) Neural NetworksMachine LearningGitMatlabwxWidgets C++3D ReconstructionResearch SoftwareSoftware ArchitectureSolution Architecture AlgorithmsProgramming Image ProcessingComputer VisionMathematical Analysis Software DevelopmentComputer Science

Portfolio

House Rental Price Prediction

An end-to-end cloud ML solution for highly accurate house rental price prediction and risk estimation. The solution was foundational to a Bay Area-based startup with the goal of uberization of the house rental market. It included raw data ingestion, a complex ETL pipeline, a suite of predictive models, MLOps processes including CI/CD, model, data versioning, and production model monitoring. I supervised a few other engineers who joined the project later to further improve the system.

Revenue Prediction for Retail Store Chain

Built a machine learning model that predicted revenues for a retail store chain based on store location, local demographic data, GIS features, seasonality, and other factors. I was the tech lead in a group of data scientists who ran the whole cycle from data extraction, web scrapping, ETL, exploratory analysis, data preprocessing, feature engineering, machine learning, packaging the model as a standalone service, and implementing a dashboard.

Payment Default Risk Scoring

Built and deployed an interpretable machine learning model that scored B2B customers for payment default risks and provided explanations for the scores. The model massively reduced workload for weekly risks assessment. I was the tech lead in a group of data scientists who ran the whole cycle from data extraction, merging several different data sources, ETL, exploratory analysis, data preprocessing, feature engineering, machine learning, packaging, and deploying the model to the client' premises.

Probabilistic Model for Building Commission Times

Built a probabilistic Bayesian machine learning model to predict which apartment buildings still under construction would fail to be commissioned in time. The model helped reduce the funds needed to hedge risks by two times. In addition to typical data science project activities, which included data exploration, ETL, and ML, this project also involved setting up machinery for the explicit Bayesian inference of structured models using GPUs.

Education

Education
Doctoral Degree in Mathematics
Lomonosov Moscow State University
2007 - 2010 (3 years)
Education
Master's Degree in Mathematics
Lomonosov Moscow State University
1999 - 2004 (5 years)