// Senior Data Engineer & Data Scientist

Anjali
Gupta

ML · Gen AI · Agentic AI · Cloud Data Platforms

4+ years building large-scale data pipelines, ETL/ELT systems, and ML/AI solutions across healthcare, telecom & financial services. Expert in AWS, PySpark, Snowflake, Kafka & Python.

50M+
Records/day processed
$678K
Annual savings delivered
4+
Years experience

// 01. About

Who I Am

I'm a Senior Data Engineer & Data Scientist based in Bangalore, Karnataka, with a passion for solving complex data challenges at scale. My work sits at the intersection of data engineering, machine learning, and cloud infrastructure.

I've had the opportunity to build impactful platforms at companies like Baxter International, Airtel, and Yash Technologies — working across healthcare, telecom, and financial services verticals.

My current focus is on Gen AI & Agentic AI pipelines, where I'm integrating LLMs with cloud-native data infrastructure to automate complex workflows and deliver intelligent analytics at enterprise scale.

I hold a B.Tech in Computer Science from Medicaps University (CGPA: 8.0/10, 2017–2021).

☁️

Cloud-Native Architecture

Designing end-to-end AWS & Azure data platforms — from ingestion to analytics — with a focus on reliability and cost efficiency.

🤖

Gen AI & ML Engineering

Building LLM-powered pipelines, ML feature stores, and predictive models that drive real business outcomes.

Real-Time Data Streaming

Architecting Kafka + PySpark streaming systems handling 150M+ records from 30+ sources daily.

// 02. Experience

Work History

Senior Data Engineer
Baxter International · Bangalore
Sep 2025 – Present
  • Architected Gen AI & Agentic AI pipelines with LLMs + PySpark on AWS, cutting processing time by 35%.
  • Built event-driven workflows using AWS Step Functions, Lambda, S3 & EventBridge for automated data orchestration.
  • Engineered Snowflake DWH with PySpark & AWS Glue; designed ETL/ELT pipelines for healthcare analytics.
  • Built ML feature pipelines in Python & PostgreSQL; deployed predictive models for patient outcomes.
  • Migrated legacy batch jobs to Control-M, improving reliability across 50+ production pipelines.
  • Designed scalable data ingestion frameworks integrating 20+ structured & unstructured sources into cloud DWH.
Senior Data Engineer
Airtel · Pune
Dec 2023 – Aug 2025
  • Built PySpark pipelines on AWS processing 50M+ telecom records daily for customer analytics.
  • Modelled data in Snowflake with SQL & Python enabling real-time churn prediction & revenue analytics.
  • Designed AWS Glue + S3 + Redshift data lake architecture reducing reporting latency by 60%.
  • Implemented real-time data streaming using Kafka for batch-to-streaming migration of critical pipelines.
  • Leveraged Azure Databricks & ADF for cross-cloud data integration pipelines.
  • Delivered Power BI dashboards tracking KPIs across 10M+ subscribers; cut compute costs 30% on EMR.
Big Data Engineer
Yash Technologies · Bangalore
Sep 2021 – Nov 2023
  • Led SAP → MongoDB migration via Athena, Lambda & S3 — $678K annual saving, +14% performance.
  • Built Kafka + PySpark real-time pipeline ingesting 150M+ records from 30+ data sources daily.
  • Architected Azure ADF + Databricks pipeline scaling product from 0 → 125,000 active users.
  • Automated Airflow scheduling for 40+ pipelines; built Power BI dashboards saving 10 hrs/week.
Data Engineering Intern
Wiley mthree (Client: Morgan Stanley) · Bangalore
Feb 2021 – Jun 2021
  • Built SQL + Python reporting infrastructure for real-time financial insights; optimised Redshift ETL by 45%.
  • Automated ETL across billions of rows; designed Kafka clusters & built 20+ streaming pipelines.

// 03. Skills

Technical Expertise

☁️ AWS
S3LambdaGlue AthenaEC2Step Functions EventBridgeRedshiftCloudWatchEMR
🔷 Azure
ADFDatabricksSynapse Analytics
💻 Languages
PythonSQLPySparkBash
⚡ Big Data
SparkKafkaHiveHadoopSqoop
🗄️ Databases
SnowflakePostgreSQLMySQL MongoDBCassandraRedshift SQL ServerOracle
🤖 ML / AI
Gen AIAgentic AILLMs Scikit-learnXGBoostMLOpsFeature Engineering
📊 Data Science
PandasNumPyMatplotlib SeabornStatistical ModelingPredictive Analytics
🔧 Orchestration & DevOps
AirflowControl-MDocker GitCI/CDAgile
📈 Viz / BI
Power BIQuickSightMS Excel

// 04. Achievements

Key Impact

$678K

Annual savings via SAP → MongoDB migration on AWS

125K

Active users scaled from 0 with Azure Databricks pipeline

50M+

Records processed daily at Airtel on PySpark + AWS

60%

Latency reduction via Redshift data lake architecture

35%

Faster processing with Gen AI pipelines at Baxter

30%

Compute cost reduction optimising Spark jobs on AWS EMR

// 05. Contact

Let's Connect

I'm open to senior data engineering and ML engineering opportunities. Feel free to reach out — I'd love to chat!

Send Message