// Data Engineer · GCP Certified

Salvador Guzman

Building cloud-native data platforms at scale.

Google Cloud Certified Professional Data Engineer with 3+ years building real-time and batch pipelines, modern data warehouse and lakehouse architectures, and automated data quality frameworks. Currently building MCP-based AI agents on top of BigQuery.

GET IN TOUCH VIEW WORK
3+
YEARS EXP.
70%
QUERY PERF. ↑
GCP
CERTIFIED

Skills & Technologies

// LANGUAGES
Python SQL (advanced) Go Java Bash Shell scripting
// DATA PROCESSING
PySpark Apache Beam Pandas Polars dbt Dataform
// GCP STACK
BigQuery Cloud Storage Pub/Sub Dataflow Cloud Composer Dataplex Data Catalog Datastream Cloud Functions Cloud Run
// CLOUD & WAREHOUSES
Google Cloud AWS Azure Snowflake Databricks Delta Lake Apache Iceberg Cloud Computing
// STREAMING & BATCH
Pub/Sub Datastream (CDC) Apache Kafka Apache Spark Dataflow
// ORCHESTRATION & APIs
Apache Airflow Cloud Composer Cloud Functions FastAPI APIs
// DATABASES
BigQuery Snowflake PostgreSQL MySQL SQL Server
// MODELING
Kimball Dimensional SCD Partitioning Clustering Query Optimization Data Modeling
// IaC, CI/CD & CONTAINERS
Cloud Build GitHub Actions Docker Git / GitHub Terraform Virtualización
// DATA GOVERNANCE & QUALITY
Dataplex Data Catalog Great Expectations IAM Data Contracts Data Lineage Sensitivity Tags
// AI & AGENTIC DEV
Claude Gemini GitHub Copilot MCP LLM Agents
// BI & METHODOLOGY
Looker Studio ELT / ETL Data Lakehouse Automated Validation

Experience

2023 — PRESENT
Applaudo Studios
Data Engineer · Data Quality focus
  • Designed and deployed real-time CDC pipelines (source → message broker → cloud data warehouse) enabling near real-time analytics from transactional healthcare sources.
  • Led the migration of legacy on-prem healthcare databases to a cloud data warehouse, improving average query performance by ~70% and shortening reporting cycles from hours to minutes.
  • Implemented automated data quality frameworks with custom Python validators, catching schema drift and anomalies before production consumption.
  • Built reusable ELT models with enforced testing and version control across multiple environments.
  • Built MCP-based AI agents on top of the data warehouse to accelerate data delivery — enabling natural-language access to governed datasets and automated metadata enrichment.
  • Designed and operated lakehouse architectures with open table formats (Apache Iceberg / Delta Lake), decoupling storage from compute and enabling multi-engine querying.
  • Implemented data contracts and shift-left validation between producers and consumers, formalizing schemas, SLAs and ownership across domains.
  • Deployed a data observability stack (freshness, volume, schema and lineage monitors) to detect silent failures and reduce time-to-detection of incidents.
  • Partnered with stakeholders to translate business questions into reliable, observable data products — not just pipelines.
CDC Cloud DW ELT Lakehouse Apache Iceberg Delta Lake Data Contracts Data Observability MCP Python SQL
2021 — 2022
Applaudo Studios
Data Quality Engineer
  • Reduced manual testing efforts by 60% by introducing reusable API tests with Postman and Python.
  • Led integration and end-to-end testing across Frontend and Backend, ensuring system reliability in CI/CD pipelines.
  • Developed and enforced QA standards for REST API validation: JSON schema checks, authentication, and performance tests.
  • Built robust test data generation strategies to support scalable and repeatable automated tests across multiple environments.
  • Implemented early data quality checks on incoming datasets — schema validation, null/duplicate detection and value-range assertions — before they reached downstream consumers.
  • Authored reusable Python validators and integrated them into CI/CD pipelines to block regressions in data contracts.
  • Collaborated with developers, product managers and DevOps teams to define acceptance criteria and ensure feature testability.
  • Contributed to Agile ceremonies — sprint planning and retrospectives — advocating for testability and QA early in the cycle.
  • Designed and delivered internal workshops to train QA engineers on new tools, frameworks and testing strategies.
  • Conducted root-cause analysis for production issues and worked cross-functionally to implement preventative measures.
  • Created detailed bug reports and collaborated closely with dev teams to drive resolution and continuous improvement.
Python Postman REST APIs JSON Schema CI/CD Agile Test Automation Data Quality

Education

Computer Science Engineer
Universidad Francisco Gavidia
Powered by Arizona State University
Database Specialization
Business Intelligence · Universidad Francisco Gavidia
Powered by Arizona State University
Google Cloud Professional
Data Engineer Certification
AWS Certified Cloud Practitioner
Amazon Web Services

Languages

Spanish
Native
English
B2 · Upper-Intermediate
German
A1 · Beginner

Get in Touch

Let's build
something big.

Open to data engineering opportunities, freelance projects and consulting. Available remote across LATAM and beyond.

contact.py
python contact.py
name = "Salvador Guzman"
role = "Data Engineer"
available = True
open_to = ["full-time", "freelance", "consulting"]
# ready to build at scale
// TWEAKS
ACCENT COLOR
BACKGROUND
FONT STYLE
HERO LAYOUT
CARD STYLE
SECTION DIVIDERS
SCANLINES