Demonstrated experience with Python.
Demonstrated experience with SQL.
Demonstrated experience with R.
Demonstrated experience with Bash.
CoreWeave, Inc.
Planned Python-SQL ETL pipelines orchestrated with Airflow to ingest 50K+ GPU telemetry logs/hour, improving data refresh rate by 40% and accelerating infrastructure analytics delivery for 10+ internal teams., Assisted in maintaining AWS Glue jobs and S3 data lakes, automating semi-structured data ingestion workflows and reducing manual data prep time by 35% across compute and billing datasets., Supported PySpark batch processing for large-scale logs, tuning Spark jobs to reduce memory overhead by 30% and enabling stable data transformation across multi-terabyte daily workloads., Analyzed basic data validation and schema checks using Python and Lambda functions, improving pipeline accuracy and reducing downstream data errors by 50% in production reports., Introduced Power BI dashboards on Redshift to visualize GPU usage metrics, billing trends, and job throughput KPIs, enabling non-technical teams to track operational data with minimal support., Participated in Agile sprints, documented workflows in Confluence, and wrote unit tests for ETL scripts, contributing to team velocity and improving deployment success rate by 20%.
JPMorgan Chase & Co.
Designed and automated end-to-end data pipelines using Python, SQL, and Airflow, reducing manual workflows by 45% and ensuring high-volume financial data was available with 99.9% uptime., Collaborated with BI teams to deliver Tableau dashboards monitoring risk and compliance KPIs, enabling faster audits and improving data-driven decision-making by 30% across regulatory functions., Led ETL frameworks for structured and semi-structured data integration, standardizing internal datasets and increasing data availability and accuracy by over 99.7% across enterprise reporting layers., Processed AWS-based data solutions using Glue and S3, optimizing storage and compute usage, and improving query response times by 60% on datasets exceeding 5TB+ in volume., Coordinated with DevOps and ML engineers to deploy real-time fraud detection pipelines, increasing transaction accuracy and reducing false positives by over 20% in production environments., Participated in Agile ceremonies, documented data workflows, and maintained sprint deliverables—enhancing delivery consistency and enabling team velocity improvement by 25%., Integrated data lineage tracking and produced alerting on pipeline failures, improving reliability and reducing resolution time of data issues from 6 hours to under 1 hour.
Master of Science
Bachelor of Technology
Descubre otros profesionales con experiencia similar
Demonstrated experience with ETL/ELT pipeline design.
Citigroup Inc.
Designed scalable ETL pipelines in Python and SQL to process over 2TB of transactional data weekly, reducing data latency and supporting real-time analytics across finance business units., Redesigned custom data transformation scripts to clean, normalize, and enrich multi-source datasets, increasing ML model training precision and enabling deeper customer segmentation efforts., Created self-service dashboards in Power BI that empowered business users with real-time insights, decreasing manual report generation workload by 50% for the data engineering team., Tuned complex SQL queries, added indexing, and optimized joins across Oracle and PostgreSQL databases, achieving report generation speed gains of up to 70% on core reports., Initiated PySpark batch pipelines processing 10M+ records daily and integrated Kafka streams reducing data latency by 70%, enabling near real-time ML model inputs for high-stakes risk modeling systems., Conducted thorough data quality checks and directed automated validations, helping meet internal audit compliance with a 98% success rate in data governance standards., Engaged in Agile teams, attended grooming/sprint planning sessions, and drove user story completion for regulatory reporting and data integration projects across India and US regions.