What's the difference between a fact table and a dimension table?

Fact tables store measurable events and metrics (orders, clicks, transactions) with numeric values that can be aggregated. Dimension tables store descriptive attributes (customer info, product details, dates) used to filter and group facts. Facts answer 'how much' while dimensions answer 'who, what, when, where.'

How do I choose the right grain for a fact table?

The grain should be the lowest level of detail that supports your analytical questions. Start by asking: What business event does each row represent? A transaction fact might be one row per order line item. A snapshot fact might be one row per customer per day. Document the grain explicitly - ambiguous grain causes aggregation errors.

Should I store calculated metrics in fact tables?

Store additive base measures in fact tables - values that can be summed across any dimension. Calculate derived metrics (ratios, averages, complex formulas) at query time through a semantic layer. This preserves flexibility and ensures calculations remain consistent across different aggregation levels.

How do factless fact tables work?

Factless facts record events that occurred without any associated measures - like student attendance, product promotions, or coverage eligibility. They contain only foreign keys to dimensions. They're useful for tracking what happened (or didn't happen) when there's nothing to measure.

Fact Table Design Patterns for Analytics and AI

Fact tables are the quantitative foundation of dimensional models. While dimension tables describe the context of business events - who, what, when, where - fact tables capture the measurable events themselves - the orders, clicks, transactions, and activities that drive business metrics.

Designing fact tables correctly is essential for analytics accuracy and performance. Poor fact table design leads to incorrect aggregations, slow queries, and AI systems that produce misleading results. This article covers the core fact table design patterns and how to implement them for AI-ready analytics.

Fundamental Concepts

Grain: The Most Important Decision

Grain defines what each row in a fact table represents. It's the single most important design decision and must be declared explicitly:

"One row per order line item"
"One row per customer per day"
"One row per support ticket per status change"

Every column in the fact table must be consistent with the declared grain. Mixing grains - like having some rows represent orders and others represent order items - causes aggregation errors that are difficult to detect.

AI implication: AI systems cannot infer grain from schema alone. Without explicit grain documentation in a semantic layer, AI may aggregate at wrong levels, producing results that look plausible but are mathematically incorrect.

Measures and Additivity

Fact tables contain measures - numeric values that can be aggregated. Measures fall into three additivity categories:

Fully additive: Can be summed across all dimensions

Revenue, quantity, cost
Most common and most useful

Semi-additive: Can be summed across some dimensions but not others

Account balances (can sum across accounts, not across time)
Inventory levels (can sum across locations, not across time)

Non-additive: Cannot be meaningfully summed

Ratios (profit margin, conversion rate)
Unit prices, percentages

Store fully additive base measures in fact tables. Calculate non-additive derived metrics through the semantic layer to ensure correct aggregation behavior.

Dimension Foreign Keys

Fact tables connect to dimensions through foreign keys. Each foreign key represents a dimensional context for the fact:

CREATE TABLE order_facts (
  order_key INT PRIMARY KEY,
  customer_key INT REFERENCES dim_customers,
  product_key INT REFERENCES dim_products,
  date_key INT REFERENCES dim_dates,
  store_key INT REFERENCES dim_stores,
  quantity INT,
  unit_price DECIMAL,
  total_amount DECIMAL,
  discount_amount DECIMAL
);

The combination of foreign keys plus grain defines the fact table's dimensional space - all the ways facts can be sliced and aggregated.

Core Fact Table Patterns

Transaction Fact Tables

Transaction facts capture discrete business events at the moment they occur. Each row represents one atomic transaction.

Examples:

Order line items
Payment transactions
Website clicks
Support ticket creation

Characteristics:

Rows inserted once, rarely updated
Grain is typically the individual event
Dates reflect when events occurred
Can grow very large over time

Schema example:

CREATE TABLE sales_transactions (
  transaction_id INT PRIMARY KEY,
  transaction_datetime TIMESTAMP,
  customer_key INT,
  product_key INT,
  store_key INT,
  employee_key INT,
  quantity INT,
  unit_price DECIMAL(10,2),
  discount_amount DECIMAL(10,2),
  sales_amount DECIMAL(10,2),
  cost_amount DECIMAL(10,2)
);

AI considerations: Transaction facts are straightforward for AI because each row represents a complete event. The main risk is grain confusion - ensure AI understands whether rows represent orders or line items.

Periodic Snapshot Fact Tables

Periodic snapshots capture the state of something at regular intervals. Instead of recording events, they record conditions at specific points in time.

Examples:

Daily account balances
Weekly inventory levels
Monthly sales pipeline
Daily active user counts

Characteristics:

One row per entity per period
Regular, predictable cadence (daily, weekly, monthly)
Measures are semi-additive across time
Supports trend analysis over time

Schema example:

CREATE TABLE daily_inventory_snapshot (
  snapshot_date DATE,
  product_key INT,
  warehouse_key INT,
  quantity_on_hand INT,
  quantity_reserved INT,
  quantity_available INT,
  days_of_supply DECIMAL(10,2),
  last_receipt_date DATE,
  PRIMARY KEY (snapshot_date, product_key, warehouse_key)
);

AI considerations: Semi-additivity is a common source of AI errors. The semantic layer must explicitly indicate that inventory quantities cannot be summed across dates - that would double-count. AI should understand that "total inventory" means latest snapshot, not sum of all snapshots.

Accumulating Snapshot Fact Tables

Accumulating snapshots track the lifecycle of a process with a defined beginning and end. Rows are inserted at process start and updated as the process progresses through milestones.

Examples:

Order fulfillment (ordered → picked → shipped → delivered)
Loan origination (applied → approved → funded → closed)
Support tickets (opened → assigned → resolved → closed)
Sales opportunities (created → qualified → proposed → won/lost)

Characteristics:

One row per process instance
Multiple date columns for milestones
Updated as milestones are reached
Includes lag measures between milestones

Schema example:

CREATE TABLE order_fulfillment (
  order_key INT PRIMARY KEY,
  order_date DATE,
  pick_date DATE,
  ship_date DATE,
  delivery_date DATE,
  days_to_pick INT,
  days_to_ship INT,
  days_to_deliver INT,
  total_fulfillment_days INT,
  order_amount DECIMAL(10,2),
  current_status VARCHAR(20)
);

AI considerations: Accumulating snapshots require understanding of process state. AI must know that NULL milestone dates indicate incomplete stages, and that current_status determines which milestones are valid for analysis.

Advanced Patterns

Factless Fact Tables

Factless facts record events without measures - capturing that something happened (or could happen) without any associated quantity or amount.

Coverage factless facts: Record what's possible

Which products are eligible for which promotions
Which employees can work which shifts
Which customers are assigned to which representatives

Event factless facts: Record what happened

Student attendance (present/absent)
Website page views (without revenue)
Employee clock-in events

Schema example:

CREATE TABLE promotion_coverage (
  date_key INT,
  product_key INT,
  promotion_key INT,
  store_key INT,
  PRIMARY KEY (date_key, product_key, promotion_key, store_key)
);

Counting rows provides the measure - how many products were on promotion, how many students attended.

Consolidated Fact Tables

When multiple fact tables share the same grain, they can be consolidated into a single table with measures from multiple sources. This simplifies queries that need to combine measures.

Example: Consolidating sales facts and returns facts

CREATE TABLE consolidated_sales (
  date_key INT,
  product_key INT,
  store_key INT,
  sales_quantity INT,
  sales_amount DECIMAL,
  returns_quantity INT,
  returns_amount DECIMAL,
  net_quantity INT,
  net_amount DECIMAL
);

Trade-off: Consolidation simplifies queries but couples the loading of related facts. If sales and returns are loaded at different times, consolidated tables require careful update logic.

Bridge Tables

Bridge tables handle many-to-many relationships between dimensions and facts - like orders with multiple payment methods or employees with multiple skills.

Schema example:

CREATE TABLE order_payment_bridge (
  order_key INT,
  payment_method_key INT,
  payment_amount DECIMAL,
  weighting_factor DECIMAL,
  PRIMARY KEY (order_key, payment_method_key)
);

The weighting factor allocates fact measures across multiple dimension members to prevent double-counting.

Design Best Practices

Declare Grain First and Explicitly

Before adding any columns, write down exactly what each row represents. This should be a complete sentence that leaves no ambiguity:

"Each row represents one line item on a customer order, capturing the quantity and amount for a single product purchased in a single transaction."

Include Degenerate Dimensions

Degenerate dimensions are dimensional attributes stored in the fact table rather than a separate dimension table. They're typically identifiers that don't warrant their own dimension:

Transaction ID, order number, invoice number
Typically used for drill-through to source systems
No additional attributes to model

Design for Common Query Patterns

Analyze how the fact table will be queried and optimize accordingly:

Add commonly used derived measures as stored columns
Include foreign keys to frequently joined dimensions
Consider pre-aggregated summary tables for high-volume queries

Document Everything for AI

For AI-ready fact tables, document:

Grain statement (required)
Additivity of each measure
Business meaning of each measure
Valid dimension combinations
Update patterns (insert-only vs. updateable)

This documentation should live in the semantic layer, making it accessible to AI systems generating queries.

Common Mistakes

Mixed grain: Rows representing different things in the same table. Always separate facts at different grains into different tables.

Storing calculated ratios: Ratios can't be aggregated correctly. Store components and calculate ratios at query time.

Missing dates: Every fact should have at least one date foreign key. Dateless facts can't support time-based analysis.

Unclear additivity: Failing to document which measures are semi-additive leads to incorrect sums across time.

Premature aggregation: Storing only summarized data loses detail that may be needed later. Aggregate in queries or materialized views, not in base fact tables.

Well-designed fact tables form the quantitative backbone of analytics. Combined with proper dimension design and semantic layer documentation, they enable both human analysts and AI systems to produce accurate, trustworthy insights.

Fact Table Design Patterns for Analytics and AI

Fundamental Concepts

Grain: The Most Important Decision

Measures and Additivity

Dimension Foreign Keys

Core Fact Table Patterns

Transaction Fact Tables

Periodic Snapshot Fact Tables

Accumulating Snapshot Fact Tables

Advanced Patterns

Factless Fact Tables

Consolidated Fact Tables

Bridge Tables

Design Best Practices

Declare Grain First and Explicitly

Include Degenerate Dimensions

Design for Common Query Patterns

Document Everything for AI

Common Mistakes

Questions

Related