Download PDF

Summary

  • Hands-on Cloud Data Technologist, Data blogger/author with 17+ years in data engineering/ architecture and 8+ years in data science/machine learning.
  • Designed & developed many data-intensive technology solutions using various tools in data architecture, data science, big data & cloud.
  • Translated complex business problems into technology & analytics solutions for data-driven decision-making.
  • Demonstrated knowledge of the business success drivers, industry trends, regulatory issues, and competitive marketplace.
  • Continuously in sync with the latest developments in the data analytics field methodologies and best practices.
  • Participating in Kaggle (Data Science) competitions since 2014, achieved Kaggle Expert level in 2017.
  • Experienced in leading technical teams and handling technical & business stakeholders.
  • Familiar with project management aspects (scope, time & budget) of deliverables.
  • Built analytics teams from scratch by hiring & developing analytics professionals. 
  • Experienced in full SDLC (Requirement Analysis, Estimation, Architecture/Design Specifications, Coding/Code Reviews, Testing and Deployment) with Agile/Scrum and Waterfall methodologies.
  • Completed 20+ project module implementations (end to end) with 9 clients/businesses.
  • B.Tech (Electronics) from Harcourt Butler Technological Institute (HBTI), Kanpur in 2005.
  • Possesses strong analytical, problem-solving and communication/presentation skills.

Skills Overview

  • Development: SQL, Python, PySpark, Scala
  • Big Data/Hadoop/Spark: HDFS, YARN/MapReduce, Pig, Hive, Cassandra
  • Cloud Computing: AWS, Azure, Google Cloud Platform (GCP)
  • DevOps/MLOps (CI/CD): Kubernetes, Jenkins
  • Data Science/AI: Classification/Regression, Clustering/Associative Mining/PCA, NLP, Image/Video Analytics
  • BI/Reporting: PowerBI/Tableau/OBIEE.
  • Data Governance: Data Quality, Master/Reference Data, Metadata management.
  • Architecture: Enterprise (Business/Application/Data/Technology) Architecture (TOGAF/ArchiMate2)
  • Design/Modeling: Database Design (Oracle/Teradata/MySQL), ER/Dimensional Modeling (Erwin/Visio), ETL Designs (Informatica), Normalization, De-normalization
  • Functional/Domain: Banking (AML, Mortgages), Health Insurance &  ATI

Work experience

Principal Engineer

Dec '20 Till Date
Diebold Nixdorf

Project: Data and AI Platform (Dec ‘20– till date) 

Project Environment: SQL, Python, Spark(SQL/PySpark/MLlib), Azure (ADLS2, ADF, HDInsight, Databricks, Purview), DevOps/MLOps (Azure).

Domain: FinTech 

Team-size: 15

DN Banking and Retail portfolios/products (Vynamic, AllConnect) require in-built intelligence to collect, process & analyze data so that our customers can take appropriate actions in a timely manner. The objective of the programme is to optimize operations & improvise decision-making using cutting-edge technologies like ML & AI. 

Achievements as Cloud Data Technologist:

  • Establishing data practice and designing/building a modern data platform.
  • Working on Security Recommendation Engine, Customer Segmentation, Cash Optimization, DN Smart Search Engine.
  • Building scalable data pipelines to deliver analytics for consumption in production.
  • Integrating ML/DL models to existing products & portfolios.
  • Mentoring data engineers & data scientists to build the platform and analytics models.

Lead Architect

Dec '17Dec '20
SITA.aero

Project: Data Science Platform (Dec ‘17– Dec '20) 

Project Environment: SQL, Python, AWS (EMR, S3, Kinesis, Glue, Athena), Spark(SQL/PySpark/MLlib), Cassandra, Sqoop/Kafka, Tableau, DevOps (Kubernetes/GoCD).

Domain: Air Travel Industry (ATI) 

Team-size: 11

SITA portfolios/products (FlightOps/AMS) requires in-built intelligence to collect, process & analyze data so that our customers can take appropriate actions in a timely manner. The objective of the programme to optimize operations & improvise decision-making using cutting-edge technologies like ML & DL. 

Achievements as Data Science Architect:

  • Built end-to-end Data Science Platform integrating various data sources to the data lake.
  • Worked on Flight Prediction, Stand Allocation & TAT Prediction ML/DL projects.
  • Built scalable data pipelines to deliver analytics for consumption in production.
  • Integrated ML/DL models to existing products & portfolios.
  • Mentored fellow data engineers & data scientists to build the platform and analytics models.

Principal Consultant

Jun '16Dec '17
Genpact HCM

Project:  Digital Analytics Platform (Jun ‘16– Dec ‘17) 

Project Environment: Python/R/Unix, Keras/TensorFlow, Azure, Spark(SQL/PySpark/MLlib), MongoDB, PowerBI.

Domain: Finance & Accounting (F&A), Insurance 

Team-size: 16

We started the Digital Analytics Platform to improve business decisions by executives using data & automate exceptions previously worked by operators using Analytics/ML models. Here we understood the current business processes, identified Analytics/ML opportunities and designed/developed prediction and recommendation models to provide tangible business benefits. 

Achievements as Data Scientist/Data Architect 

  • Built end to end Digital Analytics Platform integrating different data sources to the data lake.
  • Designed & deployed ML/DL models using Python & R.
  • Built models for Case Recommendation System, Dashboard Forecasting.
  • Guided analytics professionals to build the platform and analytics models.

Technical Lead

Nov '10Jun '16
RBS IDC

Project:  Advanced Analytics - Retail Banking  (Jul ’13 – Jun ‘16) 

Project Environment: SQL/R/ODM, Cloud (GCP), Unix/Java, Hadoop (HDFS/YARN/Hbase/Pig/Hive/

Sqoop/Flume), QlikView.

Domain: Retail Banking (AML/Mortgages) 

Team-size: 12

We started Analytics - Retail Banking programme to build Data Science capability within the bank and explore the possibilities to process/analyse existing data via Big Data/Cloud and compare the results with traditional data processing/analysis before actual migration projects.

Achievements as Data Specialist 

  • Evaluated different Hadoop/ML/Cloud components for data collection, processing, analysis & reporting.
  • Built analytics platform on the cloud using Hadoop by ingesting data from traditional data sources.
  • Translated complex business problems to analytical solutions.
  • Mentored team members on Big Data, Cloud & Analytics.
  • Built models for Customer Profiling, Market Basket Analysis & Anomaly Detection.

Project:  MTP (Business/Technology Transformation) (Nov ‘10 – Jul ‘13) 

Project Environment: Oracle 11g, UNIX server, Java/J2EE, Informatica, OBIEE, Teradata.

Domain: Mortgages (Retail Banking) 

Team-size: 15

MTP stands for Mortgage Transformation Program, where we are focusing on Business Architecture driven Technology transformation. Under the program, based on current and aspired Business Architecture, we are transforming Application/Data/Technology Architecture to consolidate all mortgage systems into strategic one and introducing Fee & Products configuration across Mortgages platform.

Achievements as Database Architect

  • Built data architecture for business requirements (functional & non-functional).
  • Derived design (Visio/ErWin) changes in current Data Architecture (OLTP/ETL/OLAP).
  • Performed Data Analysis/Mining using ODM, R to get in-depth insight).

Sr. Software Engineer

Oct '07Nov '10
Mastek Ltd

Project:  Apollo Munich (Oct ‘07 – Nov ‘10) 

Client: Apollo Munich Health Insurance, Gurgaon

Project Environment: Oracle 10g with Report server, UNIX Server, Java/J2EE.

Domain: Health Insurance 

Team-size: 8

ApolloMunich application was a policy life-cycle management solution designed by Mastek for Apollo Munich, which delivers a group-wide customer-centric system to handle membership, finance, reinsurance, claims and payment processing, with the ability to support multiple products/brands. We built an MIS dashboard for business to understand how their business was growing.

Achievements as ETL/BI Lead

  • Designed & built a data warehouse (DWH) for business.
  • Built ETL pipelines from DBs to DWH.
  • Designed & built MIS reports based on business requirements.

Associate

Jul '05Oct '07
Perot Systems

Project: LGRS Application (Jul ‘05 – Oct ‘07)

Client: Blue Cross Blue Shield Rhode of Island (BCBSRI)

Project Environment: Oracle 9i (SQL, PL/SQL), VB 6.0, UNIX Server.

Domain: Healthcare 

Team-size: 6

The project involved the Rating System of Large Group claims in Healthcare for BCBS. With the help of the application, BCBSRI evaluates its customer’s performance on a monthly basis. The application pulled data from data-mart using SQL loader and tables were manipulated by PL/SQL modules which were maintaining customer information and claims for BCBSRI.

Achievements as DB Developer

  • Built rating system using SQL queries & PL/SQL programming.
  • Created MIS reports for business using SQL queries.

Education

Continuous Self-learning

2012till now
Books, MOOCs, Blogs
  • Attending 'Machine Learning System Design' (CS-329S) course on Standford Portal in 2022.
  • Learnt from 'MS in Data Science' (CS-109A/B) courses on Harvard University Portal in 2018.
  • Attended Deep Learning course by Vincent Vanhoucke on Udacity in 2016.
  • Learnt Probability & Statistics for deeper understanding of Data Science in 2014.
  • Attended Machine Learning course by Andrew Ng on Coursera in 2012.

B. Tech

Aug '01Jun '05
Harcourt Butler Technological Institute (HBTI), Kanpur
  • Passed in Electronics Engg with 66% marks.
  • Attended Vocational Training in HCL Infosystems.
  • Presented a model ‘Intruder Alarm with Timer’ in Tech-Era (a national level seminar on recent trend in electronics technology) in 2003.
  • Executive Member of Literary Sub-Council in college.

Other Activities/Achievements

  • Presented a talk in 'PyData Global 2021' on 'Machine Learning Observability'.
  • Authored 'DS/AI Self-Starter Handbook' & 'Probability & Statistics for Data Science' books in 2019-20 (https://ankitrathi.com/books).
  • Presented a talk in 'PyData Delhi 2019' on 'Explainable Artificial Intelligence'.
  • Featured blogger for various AI topics on 'Towards Data Science' & 'HackerNoon' publications.
  • Achieved ‘Kaggle Expert’ level on Kaggle Data Science platform in 2017.
  • Attended a workshop on ‘ArchiMate2/TOGAF’ in 2016.
  • Participated in a seminar on ‘Data Governance & Architecture’ in 2014.
  • Attended a workshop on ‘Data Analytics in Banking’ in 2012.
  • Passed Oracle SQL and PL/SQL certification (OCA) with 93% score in 2009.
  • Passed the MCP (Microsoft Certified Professional) exam for ASP.NET with a 98.4% score in 2005.
  • Attended Vocational Training in HCL Infosystems on ‘Hardware components troubleshooting and maintenance’ in 2004.
  • Presented a model ‘Intruder Alarm with Timer’ in Tech-Era (a national level seminar on recent trend in electronics technology) in 2003 held at HBTI, Kanpur.

Created withVisualCV