// Senior Data Engineer & Data Scientist

Anjali
Gupta

ML · Gen AI · Agentic AI · Cloud Data Platforms

4+ years building large-scale data pipelines, ETL/ELT systems, and ML/AI solutions across healthcare, telecom & financial services. Expert in AWS, PySpark, Snowflake, Kafka & Python.

50M+

Records/day processed

$678K

Annual savings delivered

Years experience

Get In Touch View Experience

// 01. About

Who I Am

I'm a Senior Data Engineer & Data Scientist based in Bangalore, Karnataka, with a passion for solving complex data challenges at scale. My work sits at the intersection of data engineering, machine learning, and cloud infrastructure.

I've had the opportunity to build impactful platforms at companies like Baxter International, Airtel, and Yash Technologies — working across healthcare, telecom, and financial services verticals.

My current focus is on Gen AI & Agentic AI pipelines, where I'm integrating LLMs with cloud-native data infrastructure to automate complex workflows and deliver intelligent analytics at enterprise scale.

I hold a B.Tech in Computer Science from Medicaps University (CGPA: 8.0/10, 2017–2021).

☁️

Cloud-Native Architecture

Designing end-to-end AWS & Azure data platforms — from ingestion to analytics — with a focus on reliability and cost efficiency.

🤖

Gen AI & ML Engineering

Building LLM-powered pipelines, ML feature stores, and predictive models that drive real business outcomes.

⚡

Real-Time Data Streaming

Architecting Kafka + PySpark streaming systems handling 150M+ records from 30+ sources daily.

// 02. Experience

Work History

Senior Data Engineer

Baxter International · Bangalore

Sep 2025 – Present

Architected Gen AI & Agentic AI pipelines with LLMs + PySpark on AWS, cutting processing time by 35%.
Built event-driven workflows using AWS Step Functions, Lambda, S3 & EventBridge for automated data orchestration.
Engineered Snowflake DWH with PySpark & AWS Glue; designed ETL/ELT pipelines for healthcare analytics.
Built ML feature pipelines in Python & PostgreSQL; deployed predictive models for patient outcomes.
Migrated legacy batch jobs to Control-M, improving reliability across 50+ production pipelines.
Designed scalable data ingestion frameworks integrating 20+ structured & unstructured sources into cloud DWH.

Senior Data Engineer

Airtel · Pune

Dec 2023 – Aug 2025

Built PySpark pipelines on AWS processing 50M+ telecom records daily for customer analytics.
Modelled data in Snowflake with SQL & Python enabling real-time churn prediction & revenue analytics.
Designed AWS Glue + S3 + Redshift data lake architecture reducing reporting latency by 60%.
Implemented real-time data streaming using Kafka for batch-to-streaming migration of critical pipelines.
Leveraged Azure Databricks & ADF for cross-cloud data integration pipelines.
Delivered Power BI dashboards tracking KPIs across 10M+ subscribers; cut compute costs 30% on EMR.

Big Data Engineer

Yash Technologies · Bangalore

Sep 2021 – Nov 2023

Led SAP → MongoDB migration via Athena, Lambda & S3 — $678K annual saving, +14% performance.
Built Kafka + PySpark real-time pipeline ingesting 150M+ records from 30+ data sources daily.
Architected Azure ADF + Databricks pipeline scaling product from 0 → 125,000 active users.
Automated Airflow scheduling for 40+ pipelines; built Power BI dashboards saving 10 hrs/week.

Data Engineering Intern

Wiley mthree (Client: Morgan Stanley) · Bangalore

Feb 2021 – Jun 2021

Built SQL + Python reporting infrastructure for real-time financial insights; optimised Redshift ETL by 45%.
Automated ETL across billions of rows; designed Kafka clusters & built 20+ streaming pipelines.

// 03. Skills

Technical Expertise

☁️ AWS

S3LambdaGlue AthenaEC2Step Functions EventBridgeRedshiftCloudWatchEMR

🔷 Azure

ADFDatabricksSynapse Analytics

💻 Languages

PythonSQLPySparkBash

⚡ Big Data

SparkKafkaHiveHadoopSqoop

🗄️ Databases

SnowflakePostgreSQLMySQL MongoDBCassandraRedshift SQL ServerOracle

🤖 ML / AI

Gen AIAgentic AILLMs Scikit-learnXGBoostMLOpsFeature Engineering

📊 Data Science

PandasNumPyMatplotlib SeabornStatistical ModelingPredictive Analytics

🔧 Orchestration & DevOps

AirflowControl-MDocker GitCI/CDAgile

📈 Viz / BI

Power BIQuickSightMS Excel

// 04. Achievements

Key Impact

$678K

Annual savings via SAP → MongoDB migration on AWS

125K

Active users scaled from 0 with Azure Databricks pipeline

50M+

Records processed daily at Airtel on PySpark + AWS

60%

Latency reduction via Redshift data lake architecture

35%

Faster processing with Gen AI pipelines at Baxter

30%

Compute cost reduction optimising Spark jobs on AWS EMR

AnjaliGupta