Lead Data Engineer @ Hard Rock Digital

Hi, I'm Vardhan Seepala

Lead Data Engineer with 6+ years building high-scale, cloud-native data platforms across GCP, AWS, and Azure. I lead engineering teams, define technical roadmaps, and architect real-time streaming pipelines that process millions of events with zero data loss.

6+
Years Experience
60x
Throughput Increase
100K+
Messages/Hour
16M+
Records Validated

All Things Data

Passionate about building data systems that drive real business impact

I'm a Lead Data Engineer with 6+ years of experience designing and operating high-scale, cloud-native data platforms across GCP, AWS, and Azure. Currently at Hard Rock Digital, I lead engineering teams, define technical roadmaps, and establish org-wide data engineering standards that accelerate delivery and improve quality.

I'm equally effective as a hands-on technical architect and as a people-and-process leader driving cross-functional alignment. My expertise spans real-time stream processing, CDC pipelines, Data Vault 2.0, MLOps, and developer productivity tooling including LLM integration strategy.

👥

Engineering Leadership

Team enablement & technical roadmaps

🔄

CDC & Data Vault 2.0

Debezium, Pub/Sub, BigQuery pipelines

Stream Processing at Scale

100K+ msg/hr, exactly-once semantics

🤖

LLM Integration & MLOps

Copilot standards & AI deployment

Technologies & Tools

The stack I use to build scalable data solutions

☁️

Cloud Platforms

GCP BigQuery Dataflow Datastream Pub/Sub Cloud SQL Azure Synapse ADF ML Studio AWS EMR IAM
💻

Languages & Frameworks

Python Java SQL TypeScript JavaScript Apache Beam PySpark dbt Terraform Django PyTorch
🔄

Streaming & Orchestration

Apache Kafka Airflow Pub/Sub Debezium Protobuf Data Vault 2.0
🗄️

Databases & Warehousing

PostgreSQL MongoDB MS-SQL BigQuery NoSQL KQL
🔧

DevOps & Infrastructure

Kubernetes (GKE) Docker CI/CD Terraform CloudFormation Datadog Elasticsearch
📊

Analytics & Leadership

Looker Power BI LLM Integration Copilot Strategy Team Mentoring Agile/Scrum

Where I've Worked

A track record of delivering impactful data solutions

Lead Data Engineer

@ Hard Rock Digital

Mar 2025 – Present

  • Architected 5+ production Dataflow pipelines (Apache Beam) processing 100K+ messages/hour with exactly-once semantics and 100% data integrity
  • Designed real-time CDC ingestion framework using Debezium → Pub/Sub → BigQuery with Data Vault 2.0 Hub/Link/Satellite patterns
  • Achieved 60x throughput increase on PostgreSQL pipelines, reducing processing latency from 60 min to under 5 min for 320K+ records
  • Owned technical roadmap for migrating legacy Java streaming to Python/Apache Beam and upgrading 40+ Airflow DAGs to Airflow 3.0 on GKE
  • Established org-wide LLM integration standards (Claude Code/GitHub Copilot), accelerating code delivery velocity and unit-test coverage
  • Stress-tested endpoints to 10K msg/sec (3M+ sustained); implemented Protobuf validation and API-driven engine for 16M+ records
  • Partnered with BI and executive stakeholders to deliver optimized analytical data marts for campaign and user engagement tracking

Lead Data Engineer

@ Viral Nation

Jun 2023 – Feb 2025

  • Led end-to-end delivery of data platform initiatives across GCP and Azure for 10+ concurrent client campaigns
  • Designed CDC pipelines (Cloud SQL → BigQuery) via DataStream, reducing transformation overhead by 40%
  • Deployed Terraform templates on private VPC, cutting infrastructure setup time by 50%
  • Architected Apache Beam Dataflow pipelines streaming MongoDB/PostgreSQL to BigQuery for near-real-time campaign decisions
  • Drove dbt adoption as team-wide standard, reducing data errors by 80% across 10+ sources
  • Built CI/CD for AI model deployment on Azure ML Studio, cutting time-to-market by 35% and costs by 25%
  • Led Kafka-based brand detection and profanity filtering models moderating 1M+ daily social media interactions

Data Engineer Co-Op

@ BDO Canada

Jan 2023 – Apr 2023

  • Rebuilt ADF pipelines, cutting processing time by 30% (13 → 9 hours)
  • Built real-time Power BI dashboards integrating digital twin & MQTT metrics
  • Automated Azure Data Explorer queries using dynamic KQL generation

Senior Software Engineer

@ Oracle

Aug 2019 – Dec 2021

  • Designed AWS EMR pipelines serving 10K+ patients daily, reducing costs by 30% and runtime by 50%
  • Streamlined AWS IAM provisioning via CloudFormation, eliminating 90% of manual errors across 15+ clusters
  • Authored Java/Python test frameworks cutting integration testing time by 70% and post-release defects by 60%
  • Standardized healthcare data to HL7/FHIR compliance using Java, Python, and microservices
  • Facilitated Scrum ceremonies and sprint planning, improving velocity and cutting release cycles by 25%

DevOps Engineering Intern

@ Sigmoid

Apr 2019 – Jul 2019

  • Built Datadog/Elasticsearch dashboards reducing AWS cluster downtime by 50%
  • Created PoC demos for clients that cut cluster costs by 20%

Featured Work

Some of the impactful projects I've built

🔄

Data Vault 2.0 CDC Platform

Real-time ingestion framework using Hub/Link/Satellite patterns with Debezium → Pub/Sub → BigQuery. Eliminated downstream data staleness, serving as the platform foundation for all new event streams.

Debezium Pub/Sub BigQuery Data Vault 2.0 Protobuf

High-Scale Stream Processing

Architected 5+ production Dataflow pipelines processing 100K+ messages/hour with exactly-once semantics and stateful processing, guaranteeing 100% data integrity across real-time event streams.

Apache Beam Dataflow GCP Python
📈

PostgreSQL Performance Optimization

Achieved 60x throughput increase on mission-critical pipelines by redesigning bulk load operations, slashing processing latency from 60 minutes to under 5 minutes for 320K+ records.

PostgreSQL Bulk Load Optimization Python
🤖

MLOps Model Deployment Platform

End-to-end CI/CD pipeline for AI model deployment using Azure ML Studio and Kubernetes, cutting time-to-market by 35% and operational costs by 25%. Moderating 1M+ daily events.

Azure ML Kubernetes CI/CD Kafka
🏗️

Airflow 3.0 Migration on GKE

Technical roadmap for migrating 40+ Airflow DAGs to Airflow 3.0 on self-managed Kubernetes (GKE), delivering zero-downtime cutovers with full backward compatibility, unblocking 3+ downstream teams.

Airflow 3.0 GKE Kubernetes Python
🤝

LLM Integration Standards

Established org-wide engineering standards for LLM integration (Claude Code/GitHub Copilot), prompt engineering best practices, and security guardrails — accelerating code delivery velocity across the data engineering team.

Claude Code GitHub Copilot LLM Strategy Security

Let's Work Together

I'm always open to discussing new opportunities, interesting projects, or just chatting about data engineering.