Semantic Layer Version Control: Managing Metric Definitions as Code
Learn how to apply software engineering version control practices to semantic layer definitions, enabling collaboration, audit trails, and safe deployment of metric changes.
Semantic layer version control is the practice of managing metric definitions, dimension hierarchies, and semantic model configurations using the same version control systems and practices that software engineering teams use for application code. By treating semantic layer definitions as code, organizations gain collaboration capabilities, audit trails, rollback ability, and disciplined change management for their critical business metrics.
When a CFO asks why revenue numbers changed between yesterday and today, version control provides the answer. When a metric definition needs to evolve, version control enables safe, reviewable changes. This discipline transforms semantic layers from fragile configurations into robust, engineering-grade systems.
Why Version Control for Semantic Layers
The Change Management Challenge
Semantic layer changes can have significant business impact:
- A revenue calculation change affects executive dashboards
- A dimension hierarchy update changes how users filter data
- A join modification alters metric accuracy
Without version control, these changes happen without visibility, review, or recourse.
Benefits of Version Control
Audit trail: Every change is recorded with who, when, and why.
Collaboration: Multiple team members can work on changes simultaneously.
Review process: Changes can be examined before deployment.
Rollback capability: If something breaks, revert to the previous version.
Environment promotion: Move changes through dev, staging, production safely.
Semantic Layer as Code
Definition File Structure
Organize semantic layer definitions in version-controllable files:
semantic-layer/
metrics/
finance/
revenue.yaml
costs.yaml
profitability.yaml
sales/
pipeline.yaml
performance.yaml
dimensions/
time.yaml
geography.yaml
customer.yaml
relationships/
customer_orders.yaml
product_hierarchy.yaml
access/
roles.yaml
policies.yaml
Definition Format
Use declarative, version-friendly formats:
# metrics/finance/revenue.yaml
metric:
name: revenue
version: 2.1.0
definition:
calculation: SUM(amount)
source: orders
filters:
- status = 'complete'
- type != 'refund'
dimensions:
- region
- product_category
- customer_segment
- time
metadata:
owner: finance-team
certification: certified
description: |
Total recognized revenue from completed orders,
excluding refunds and credits.
changelog:
- version: 2.1.0
date: 2024-02-15
change: Added customer_segment dimension
author: jsmith
- version: 2.0.0
date: 2024-01-10
change: Changed to exclude credits (breaking change)
author: mjones
Why YAML or Similar Formats
Text-based formats enable version control features:
- Diff visibility: See exactly what changed between versions
- Merge capability: Combine changes from multiple contributors
- Search: Find all metrics referencing a specific table
- Automation: Parse and validate programmatically
Git Workflows for Semantic Layers
Branch Strategy
Main branch: Always reflects production semantic layer state.
Feature branches: Short-lived branches for specific changes.
main
└── feature/add-customer-segment-dimension
└── feature/update-revenue-filter
└── fix/cost-calculation-bug
Environment branches (optional):
main (production)
└── staging
└── development
Pull Request Process
Every change goes through review:
1. Create feature branch
2. Make changes to definition files
3. Run automated validation
4. Open pull request
5. Business owner reviews semantic correctness
6. Technical owner reviews implementation
7. Automated tests pass
8. Merge to main
9. Deploy to production
Commit Messages
Write meaningful commit messages:
feat(revenue): add customer_segment dimension
- Added customer_segment as valid dimension for revenue metric
- Updated documentation with segmentation guidance
- Verified backward compatibility with existing queries
Requested-by: finance-team
Reviewed-by: data-platform
Automated Validation
Pre-Commit Checks
Validate before allowing commits:
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: yaml-lint
name: Lint YAML files
entry: yamllint
files: \.yaml$
- id: semantic-validate
name: Validate semantic definitions
entry: semantic-layer validate
files: ^semantic-layer/
CI/CD Pipeline Validation
Run comprehensive checks on pull requests:
# .github/workflows/semantic-layer-ci.yaml
name: Semantic Layer CI
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Validate syntax
run: semantic-layer validate --syntax
- name: Check metric tests
run: semantic-layer test --metrics
- name: Verify relationships
run: semantic-layer validate --relationships
- name: Check for breaking changes
run: semantic-layer diff --breaking-only
Test Types
Syntax validation: Ensure definition files are well-formed.
Metric calculation tests: Verify metrics return expected results for test data.
Relationship validation: Confirm joins and references are valid.
Performance tests: Check that metrics meet query time expectations.
Breaking change detection: Identify changes that affect existing consumers.
Deployment Strategies
Environment Promotion
Move changes through environments:
Development → Staging → Production
Development:
- Immediate deployment on merge to dev branch
- Sample data, frequent changes
Staging:
- Deployment on merge to staging branch
- Production-like data (anonymized)
- Integration testing with BI tools
Production:
- Deployment on merge to main branch
- Approval gates required
- Rollback procedures ready
Blue-Green Deployment
Run two versions simultaneously:
1. Current production (blue) serves all traffic
2. Deploy new version (green) alongside
3. Validate green with test queries
4. Switch traffic to green
5. Keep blue available for quick rollback
Canary Releases
Gradual rollout of changes:
1. Deploy new version
2. Route 5% of queries to new version
3. Monitor for errors or anomalies
4. Gradually increase percentage
5. Full rollout when confident
Handling Breaking Changes
What Constitutes Breaking
Breaking changes:
- Removing a metric
- Changing a metric calculation significantly
- Removing a valid dimension
- Changing filter behavior
Non-breaking changes:
- Adding a new metric
- Adding a dimension to a metric
- Improving documentation
- Performance optimization without result changes
Breaking Change Process
1. Announce deprecation with timeline
2. Document migration path
3. Add deprecation warnings in API responses
4. Support both old and new versions temporarily
5. Remove old version after transition period
Versioned Metrics
For significant changes, version the metric:
# Old version - deprecated
metric:
name: revenue_v1
deprecated: true
deprecation_date: 2024-06-01
migration_to: revenue_v2
# New version
metric:
name: revenue_v2
# New calculation...
Collaboration Practices
Ownership Model
Define clear ownership:
# CODEOWNERS
/semantic-layer/metrics/finance/ @finance-data-team
/semantic-layer/metrics/sales/ @sales-analytics
/semantic-layer/metrics/product/ @product-analytics
/semantic-layer/dimensions/ @data-platform
/semantic-layer/access/ @security-team
Review Requirements
Enforce appropriate reviews:
# Branch protection rules
main:
required_reviews: 2
required_reviewers:
- business-owner
- technical-owner
required_status_checks:
- semantic-layer-ci
- breaking-change-review
Communication
Keep stakeholders informed:
- Announce upcoming changes in team channels
- Publish changelog for each release
- Notify affected dashboard owners
- Document changes in metric descriptions
Best Practices
Keep Definitions DRY
Use inheritance and references:
# _templates/standard_metric.yaml
template:
metadata:
owner: data-team
certification: draft
# metrics/revenue.yaml
metric:
extends: _templates/standard_metric
name: revenue
# Override or extend as needed
Document Intent
Include context in definitions:
metric:
name: revenue
documentation:
purpose: Primary top-line financial metric
usage_guidance: Use for all external reporting
known_limitations: Excludes pending transactions
related_metrics: [bookings, arr, recognized_revenue]
Automate Everything
- Automated syntax validation
- Automated testing
- Automated deployment
- Automated documentation generation
- Automated change notifications
Version control transforms semantic layer management from a fragile, manual process into a robust, collaborative discipline. Teams can confidently evolve metric definitions knowing that every change is tracked, reviewed, and reversible.
Questions
It depends on your organization. Separate repos provide clearer ownership and different review processes. Combined repos ensure sync between application and metric changes. Many organizations use a dedicated analytics repo that includes semantic layer definitions alongside dbt models.