Vardhan Seepala | Lead Data Engineer

About

All Things Data

Passionate about building data systems that drive real business impact

I'm a Lead Data Engineer with 6+ years of experience designing and operating high-scale, cloud-native data platforms across GCP, AWS, and Azure. Currently at Hard Rock Digital, I lead engineering teams, define technical roadmaps, and establish org-wide data engineering standards that accelerate delivery and improve quality.

I'm equally effective as a hands-on technical architect and as a people-and-process leader driving cross-functional alignment. My expertise spans real-time stream processing, CDC pipelines, Data Vault 2.0, MLOps, and developer productivity tooling including LLM integration strategy.

👥

Engineering Leadership

Team enablement & technical roadmaps

🔄

CDC & Data Vault 2.0

Debezium, Pub/Sub, BigQuery pipelines

⚡

Stream Processing at Scale

100K+ msg/hr, exactly-once semantics

🤖

LLM Integration & MLOps

Copilot standards & AI deployment

Skills

Technologies & Tools

The stack I use to build scalable data solutions

☁️

Cloud Platforms

GCP BigQuery Dataflow Datastream Pub/Sub Cloud SQL Azure Synapse ADF ML Studio AWS EMR IAM

💻

Languages & Frameworks

Python Java SQL TypeScript JavaScript Apache Beam PySpark dbt Terraform Django PyTorch

🔄

Streaming & Orchestration

Apache Kafka Airflow Pub/Sub Debezium Protobuf Data Vault 2.0

🗄️

Databases & Warehousing

PostgreSQL MongoDB MS-SQL BigQuery NoSQL KQL

🔧

DevOps & Infrastructure

Kubernetes (GKE) Docker CI/CD Terraform CloudFormation Datadog Elasticsearch

📊

Analytics & Leadership

Looker Power BI LLM Integration Copilot Strategy Team Mentoring Agile/Scrum

Experience

Where I've Worked

A track record of delivering impactful data solutions

Lead Data Engineer

@ Hard Rock Digital

Mar 2025 – Present

Architected 5+ production Dataflow pipelines (Apache Beam) processing 100K+ messages/hour with exactly-once semantics and 100% data integrity
Designed real-time CDC ingestion framework using Debezium → Pub/Sub → BigQuery with Data Vault 2.0 Hub/Link/Satellite patterns
Achieved 60x throughput increase on PostgreSQL pipelines, reducing processing latency from 60 min to under 5 min for 320K+ records
Owned technical roadmap for migrating legacy Java streaming to Python/Apache Beam and upgrading 40+ Airflow DAGs to Airflow 3.0 on GKE
Established org-wide LLM integration standards (Claude Code/GitHub Copilot), accelerating code delivery velocity and unit-test coverage
Stress-tested endpoints to 10K msg/sec (3M+ sustained); implemented Protobuf validation and API-driven engine for 16M+ records
Partnered with BI and executive stakeholders to deliver optimized analytical data marts for campaign and user engagement tracking

Lead Data Engineer

@ Viral Nation

Jun 2023 – Feb 2025

Led end-to-end delivery of data platform initiatives across GCP and Azure for 10+ concurrent client campaigns
Designed CDC pipelines (Cloud SQL → BigQuery) via DataStream, reducing transformation overhead by 40%
Deployed Terraform templates on private VPC, cutting infrastructure setup time by 50%
Architected Apache Beam Dataflow pipelines streaming MongoDB/PostgreSQL to BigQuery for near-real-time campaign decisions
Drove dbt adoption as team-wide standard, reducing data errors by 80% across 10+ sources
Built CI/CD for AI model deployment on Azure ML Studio, cutting time-to-market by 35% and costs by 25%
Led Kafka-based brand detection and profanity filtering models moderating 1M+ daily social media interactions

Data Engineer Co-Op

@ BDO Canada

Jan 2023 – Apr 2023

Rebuilt ADF pipelines, cutting processing time by 30% (13 → 9 hours)
Built real-time Power BI dashboards integrating digital twin & MQTT metrics
Automated Azure Data Explorer queries using dynamic KQL generation

Senior Software Engineer

@ Oracle

Aug 2019 – Dec 2021

Designed AWS EMR pipelines serving 10K+ patients daily, reducing costs by 30% and runtime by 50%
Streamlined AWS IAM provisioning via CloudFormation, eliminating 90% of manual errors across 15+ clusters
Authored Java/Python test frameworks cutting integration testing time by 70% and post-release defects by 60%
Standardized healthcare data to HL7/FHIR compliance using Java, Python, and microservices
Facilitated Scrum ceremonies and sprint planning, improving velocity and cutting release cycles by 25%

DevOps Engineering Intern

@ Sigmoid

Apr 2019 – Jul 2019

Built Datadog/Elasticsearch dashboards reducing AWS cluster downtime by 50%
Created PoC demos for clients that cut cluster costs by 20%

Projects

Featured Work

Some of the impactful projects I've built

🔄

GitHub →

Data Vault 2.0 CDC Platform

Real-time ingestion framework using Hub/Link/Satellite patterns with Debezium → Pub/Sub → BigQuery. Eliminated downstream data staleness, serving as the platform foundation for all new event streams.

Debezium Pub/Sub BigQuery Data Vault 2.0 Protobuf

⚡

GitHub →

High-Scale Stream Processing

Architected 5+ production Dataflow pipelines processing 100K+ messages/hour with exactly-once semantics and stateful processing, guaranteeing 100% data integrity across real-time event streams.

Apache Beam Dataflow GCP Python

📈

GitHub →

PostgreSQL Performance Optimization

Achieved 60x throughput increase on mission-critical pipelines by redesigning bulk load operations, slashing processing latency from 60 minutes to under 5 minutes for 320K+ records.

PostgreSQL Bulk Load Optimization Python

🤖

GitHub →

MLOps Model Deployment Platform

End-to-end CI/CD pipeline for AI model deployment using Azure ML Studio and Kubernetes, cutting time-to-market by 35% and operational costs by 25%. Moderating 1M+ daily events.

Azure ML Kubernetes CI/CD Kafka

🏗️

GitHub →

Airflow 3.0 Migration on GKE

Technical roadmap for migrating 40+ Airflow DAGs to Airflow 3.0 on self-managed Kubernetes (GKE), delivering zero-downtime cutovers with full backward compatibility, unblocking 3+ downstream teams.

Airflow 3.0 GKE Kubernetes Python

🤝

GitHub →

LLM Integration Standards

Established org-wide engineering standards for LLM integration (Claude Code/GitHub Copilot), prompt engineering best practices, and security guardrails — accelerating code delivery velocity across the data engineering team.

Claude Code GitHub Copilot LLM Strategy Security

Hi, I'm Vardhan Seepala

All Things Data

Engineering Leadership

CDC & Data Vault 2.0

Stream Processing at Scale

LLM Integration & MLOps

Technologies & Tools

Cloud Platforms

Languages & Frameworks

Streaming & Orchestration

Databases & Warehousing

DevOps & Infrastructure

Analytics & Leadership

Where I've Worked

Lead Data Engineer

Lead Data Engineer

Data Engineer Co-Op

Senior Software Engineer

DevOps Engineering Intern

Featured Work

Data Vault 2.0 CDC Platform

High-Scale Stream Processing

PostgreSQL Performance Optimization

MLOps Model Deployment Platform

Airflow 3.0 Migration on GKE

LLM Integration Standards

Let's Work Together