Syed M.

Syed M.

Data Engineer

Karachi, Sindh
Hire Syed M. Hire Syed M. Hire Syed M.

About Me

Muneeb has been in the data management industry for over five years, focusing primarily on data engineering, warehousing, and analytics. He has worked with various companies handling both relational and big data needs. An expert SQL query writer and data developer, Muneeb is passionate about developing databases, ETL pipelines, and visualization dashboards. He also has hands-on experience with multiple cloud vendors like Alibaba Cloud, Azure, and BigQuery.

Python 7 years SQL 7 years ETL 7 years Data Engineering 7 years Data Warehousing 7 years Databases 7 years Query Optimization 7 years Business Intelligence (BI) 7 years SQL Stored Procedures 7 years Query Plan 7 years Data Integration 7 years Data pipelines 7 years MySQL 7 years Database Design 7 years Data Migration 4 years Data Analytics 7 years Data Queries 7 years Data Warehouse Design 4 years Azure 3 years Microsoft SQL Server 7 years Alibaba Cloud 3 years Realtime 3 years Big Data 5 years Apache Hive 4 years MinIO 3 years HDFS (Hadoop Distributed File System) 4 years Azure Stream Analytics 3 years Azure Logic Apps 2 years ADF 3 years Talend ETL 4 years Microsoft Data Transformation Services (now SSIS) 3 years Data Modeling 7 years Cloud Computing 5 years Flask-SQL Alchemy 3 years DuckDB 3 years PostgreSQL 4 years Github 5 years

Work history

Zoetis
Zoetis
Consultant Data Engineer
2024 - Present
Remote
  • Working on a data warehousing project to integrate various source streams using both batch and streaming processes on Databricks with Apache Spark, storing data in ADLS as Parquet for faster retrieval. This project aims to eliminate multiple database and ETL hops, reducing costs by replacing Azure Streaming Analytics with Spark and Databricks for improved cost and time efficiency.

  • Designed and optimized ETL pipelines using ADF, ADLS Gen2, Databricks, and Azure SQL/Cosmos DB, while creating Databricks pipelines and Spark notebooks for batch and streaming data from IoT Hub.

  • Developed real-time data streams with Azure Streaming Analytics, implemented CI/CD pipelines for ADF using GitHub, and established robust alert and monitoring systems in ADF and Databricks to ensure reliable data workflows and streamlined deployments.

Data EngineeringData Warehousing ETL Implementation & DesignData Integration (ELT/ETL)Azure Azure Cloud Services Azure Blob StorageAzure Data FactoryAzure Data Studio Azure Data Lake Azure DatabricksAzure Cosmos DB PythonAzure SQL Azure SQL Data Warehouse (SQL DW) Azure Synapse MSSQL ServerPostgreSQLSQL Server Integration Services (SSIS) SQL Server Management Studio (SSMS) Flask-REST-JSONAPI.NET APIBig Data Architecture Big Data Architect Azure Stream AnalyticsAzure Service BusAzure Functions
Dataquartz
Dataquartz
Lead Data Engineer
2022 - 2024 (2 years)
Remote
  • Led the development of an in-house data ingestion product with Python, Flask, DuckDB, Postgres, Grafana for dynamic visualization, and Prefect for ETL workflow management. Moreover, successfully containerized the entire application using Docker for enhanced scalability and manageability in the data engineering workflow.

  • Orchestrated end-to-end ETL pipelines, incorporating audit logging and data integrity checks along with failure/data discrepancy alerts.

  • Pioneered bug tracking, resolution, and new feature development in the data model.

Data EngineeringData Warehouse DesignFlaskPython 3 SQLAlchemyDuckDB PostgreSQLGrafanaPrometheusDatabase DesignDatabase Development JIRAAgile Sprints ETL Development PrefectApache Airflow Big DataMinIO Cloud Deployment CI/CD Github
Seeloz
Seeloz
Data Engineering Manager
2022 - 2022
Remote
  • Developed ETL projects with SQL, PySpark, Scala, ADF, ADLS, and Azure Logic Apps, seamlessly extracting data from ERP systems and loading into the data models.

  • Implemented streaming data pipelines utilizing Azure Streaming Analytics to process real-time data and maintain data integrity. Implemented a robust monitoring framework using PySpark, Postgres, and Grafana to ensure data correctness. Executed ETL projects with SQL, PySpark

  • Crafted impactful data visualization reports, providing insights into key business metrics.

Azure Logic Apps Apache HiveGoogle BigQuery PysparkSpark SQLSQLDatabase DesignScalaData Warehousing Azure Blobs Data AnalysisData EngineeringDatabricks Query Optimization Big Data Architecture Data pipelinesData Quality Analysis Intellij IDEAShellData IntegrationData Queries Analysis ETLBusiness Intelligence (BI) Python 3 PycharmBig DataData Warehouse DesignAzure Cloud InfrastructureETL Tools DatabasesPythonCI/CD Pipelines GithubAzure SQL Data Analytics Database Analytics RDBMSData ProcessingBusiness Intelligence (BI) Platforms Azure SQL Databases Azure SQL Data Warehouse (SQL DW) API IntegrationSQL DML SQL Performance Performance Tuning T-SQL (Transact-SQL) Reports BI Reports Apache SparkRelational Databases Data ModelingDatabase Modeling MariaDBBusiness Logic APIsData ArchitectureDatabase ArchitectureLogical Database Design Database Schema Design Relational Database Design REST APIs Azure Service BusJSONQuality Management MySQLDimensional Modeling ELT PandasSparkMicrosoft AzureSchemas Jupyter NotebookRelational Data Mapping BigQuery ReportingBI Reporting Windows PowerShell XMLAnyDesk Apache Airflow noSQLPostgreSQLDatabase Optimization DuckDB Apache FlinkData Extraction CSV Export CSV Scripting MongoDB.NET
Daraz | Alibaba Group
Daraz | Alibaba Group
Big Data Engineering and Governance Lead
2019 - 2022 (3 years)
Pakistan
  • Built and managed a DWH architecture, and wrote automated ETL scripts using HiveQL, HDFS, HBase, Python, and Shell on a cloud platform for data ingestions.

  • Developed BI dashboards on Power BI, vShow, and FBI to gauge important metrics related to domains like customer funnel, marketing, and logistics.

  • Developed and maintained an enterprise data warehouse and monitored data ingestion pipelines on a daily basis using SQL, Python, Flink, ODPS, and ETL flows.

Apache HiveSQLAlibaba Cloud Data Data EngineeringData Warehousing Data Governance Big DataPython 3 ShellData VisualizationBusiness Intelligence (BI) Query Optimization Data IntegrationPostgreSQLMySQLData AnalysisBig Data Architecture Data pipelinesData Quality Analysis Data Queries Analysis ETLDatabase DesignData Warehouse DesignAzure Cloud InfrastructureETL Tools DatabasesPythonCI/CD Pipelines GithubData Analytics Database Analytics RDBMSData ProcessingBusiness Intelligence (BI) Platforms Azure SQL Databases Azure SQL Data Warehouse (SQL DW) Microsoft SQL Server API IntegrationSQL DML SQL Performance Performance Tuning T-SQL (Transact-SQL) Reports BI Reports PysparkApache SparkRelational Databases Data ModelingDatabase Modeling Stored Procedure TableauDashboards Dashboard Development MariaDBBusiness Logic Microsoft Power BI APIsData ArchitectureDatabase ArchitectureLogical Database Design Database Schema Design Relational Database Design REST APIs JSONMySQL WorkbenchQuality Management Intellij IDEADimensional Modeling ELT PandasSparkSchemas Jupyter NotebookRelational Data Mapping BigQuery ReportingBI Reporting Windows PowerShell XMLAnyDesk Apache Airflow noSQLDatabase Optimization HadoopHDFS DockerGoogle BigQuery Apache FlinkData Extraction CSV Export CSV Scripting MongoDB
Qordata
Qordata
Technical Consultant
2019 - 2019
Pakistan
  • Designed and developed end-to-end data ingestion pipelines to ensure data flow daily.

  • Implemented and managed data flow jobs for data modeling solutions relevant to the health and life science industry, using tools like SQL Server Integration Services (SSIS) and Microsoft SQL Server.

  • Developed SQL queries, stored procedures, and dynamic SQL and optimized existing complex SQL queries to speed up day-to-day processes.

SQLSQL Server Integration Services (SSIS) SQL Server Management Studio Data AnalysisData Quality Analysis Data Queries Query Plan Query Optimization SQL Stored Procedures Data EngineeringData pipelinesShellData IntegrationAnalysis ETLData Warehousing Business Intelligence (BI) Database DesignData Warehouse DesignETL Tools DatabasesData Analytics Database Analytics RDBMSData ProcessingBusiness Intelligence (BI) Platforms Microsoft SQL Server SQL DML SQL Performance Performance Tuning T-SQL (Transact-SQL) Relational Databases Data ModelingDatabase Modeling Business Logic Data ArchitectureDatabase ArchitectureLogical Database Design Database Schema Design Relational Database Design Visual Studio Quality Management MySQLDimensional Modeling ELT Schemas Jupyter NotebookRelational Data Mapping ReportingBI Reporting Windows PowerShell noSQLPostgreSQLDatabase Optimization Data Extraction CSV Export CSV Scripting MongoDB.NET
Afiniti
Afiniti
Data Engineer
2017 - 2019 (2 years)
Pakistan
  • Designed and developed a database architecture and data model for a business flow using Talend Open Studio, SSIS, and MySQL Workbench.

  • Performed large-scale data conversions, migrations, and optimization to reduce resource and time costs while maintaining data integrity.

  • Wrote SQL stored procedures and Python scripts for data quality checks and ad-hoc analyses.

SQLMySQLSQL Server Integration Services (SSIS) SQL Server Management Studio TalendTalend ETL Data EngineeringData pipelinesData AnalysisAnalysis Data VisualizationBusiness Intelligence (BI) Query Optimization Data Quality Analysis ShellData IntegrationData Queries SQL Stored Procedures ETLData Warehousing Database DesignPython 3 Data Warehouse DesignETL Tools DatabasesPythonData Analytics Database Analytics RDBMSData ProcessingBusiness Intelligence (BI) Platforms Microsoft SQL Server SQL DML SQL Performance Performance Tuning T-SQL (Transact-SQL) Relational Databases Data ModelingDatabase Modeling Stored Procedure Business Logic MariaDBMicrosoft Power BI Data ArchitectureDatabase ArchitectureLogical Database Design Database Schema Design Relational Database Design Visual Studio MySQL WorkbenchQuality Management Dimensional Modeling ELT PandasSchemas Jupyter NotebookRelational Data Mapping ReportingBI Reporting Windows PowerShell AnyDesk noSQLDatabase Optimization Data Extraction CSV Export CSV Scripting .NET

Portfolio

Payment Risk Engine | COD Blocking

A system that identifies and blocks the cash-on-delivery option for faulty customers with bad buying histories. Previously, we had no way of tracking the customer performance, which led to many customers rejecting the delivered orders at their doorsteps, causing Daraz to bear the failed logistics cost. This system enabled us to block a cash-on-delivery (COD) feature for certain customers and make them pay in advance for their orders. It is based on a delicate trade-off as it increases gross-to-net revenue but can also decrease the customer base due to the COD feature blocking for parcel deliveries. I first conducted a thorough data analysis to find the impact on the business and moved on to creating data pipelines and a performance dashboard that would gauge the impact of the system on the overall business of Daraz.

Delayed Order Notification System

An automated alert system that notifies customers about delayed orders based on specific logistics metrics in order to enhance the customer experience. I worked on developing the system's end-to-end data pipelines, designed the business flow, and made a BI dashboard to gauge the performance.This project not only enhanced the customer experience but also helped in gauging Daraz's logistics performance and highlighted key metrics that needed to be fixed.

Dashboard Usage Analysis

Every data visualization dashboard consumes a certain amount of computing and memory resources. Knowing how many resources the dashboards consume from the assigned cloud quota is imperative when working in the eCommerce industry. Currently, there are more than 700 dashboards in Daraz. When these dashboards are refreshed daily, they consume many resources, slowing down other processes. Therefore, I needed to identify which dashboards were the most frequently used and which were not so they could be decommissioned to save resources. I created a meta dashboard that would rank the dashboards by tracking the daily, weekly, and monthly active users and their visits. Also, this meta dashboard tracked individual user history on multiple dashboards, i.e., the number of dashboards that a particular user regularly visits, which helped us filter out the executives' dashboards.

Enterprise Data Warehouse

At my previous company, Afiniti, multiple clients used the Afiniti engine to optimize their call center performance based on the data-driven decision-based customer and agent pairing. The legacy enterprise data portal that Afiniti used to gauge clients' performance had some limitations. For instance, there was no implementation of change data capture and historical analysis of the clients. Also, the optimizing metric, such as handle time, wait time, etc., that was used to calculate the performance of a client was not recorded historically. The enterprise data warehouse (EDW) structure caters to all limitations of an enterprise portal along with additional features, such as a standardized model that can fit into different business requirements without any change in architecture. It helped us track historical changes made to clients' performance and provided a holistic view of all clients in a single portal and at any time.I worked on creating the whole data warehouse from scratch, including developing all data pipelines and dimensional modeling.

Data Pull from Dynamics 365 Using Azure Logic Apps

A data integration pipeline that pulls data from certain data entities in Microsoft Dynamics 365 into our supply chain meta-model at Seeloz. I developed this data integration pipeline in Azure Logic Apps to fetch data from data entities and load them into Azure Blob Storage, which could later be used in ETL written at our end. All the communication was done using Azure Service Bus. The app was triggered using the HTTP POST request, and the required arguments were passed using the JSON payload. All the error handling and logging were also implemented adequately at each step.

Education

Education
Master's Degree in Computer Science
National University of Computer and Emerging Sciences
2018 - 2021 (3 years)
Education
Bachelor's Degree in Computer Science
National University of Computer and Emerging Sciences
2013 - 2017 (4 years)