Skip to content

Semantic Models

Semantic models map your physical Vulcan models to business-friendly representations. They define what consumers can do with each model: which columns are exposed as dimensions, what aggregations are available as measures, which reusable filters exist as segments, and how models relate to each other through joins.


Structure

A semantic model wraps a single Vulcan model. Use kind: semantic and put one model per file in models/semantics/:

kind: semantic
name: users                  # Business-friendly name used in queries
depends_on: b2b_saas.users   # Fully qualified Vulcan model this wraps
description: Core user dimension (semantic layer)

dimensions: [...]            # Columns consumers can group by and filter on
measures: [...]              # Aggregated calculations
segments: [...]              # Reusable filter conditions
joins: [...]                 # Relationships to other semantic models

Top-level fields:

Field Required Description
kind: semantic Yes Declares the file as a semantic model.
name Yes Business-friendly identifier consumers reference (e.g. users). Lowercase identifier (see Naming rules). This is the identifier you use in {name.column} references everywhere else.
depends_on Yes The fully qualified Vulcan model this wraps (e.g. b2b_saas.users). Must match a model defined in your models/ directory.
dimensions Yes List of dimensions. Must be non-empty. See Dimensions.
description No Human-readable explanation of the semantic model.
owner No Team or person responsible. Free-form string.
tags No List of tags. See Naming rules for the allowed pattern.
terms No List of business glossary references (e.g. revenue.subscription). See Naming rules.
ai_context No Hints for AI/LLM consumers. See AI context.
measures No List of named aggregations.
segments No List of reusable filter conditions.
joins No List of relationships to other semantic models.

Snowflake and other case-sensitive engines

Snowflake stores unquoted identifiers in uppercase by default. When targeting Snowflake, use uppercase column names in your dimension lists, expressions, and filters to match the warehouse schema. Lowercase examples in this guide assume a case-insensitive engine like DuckDB or Postgres.


Naming rules

Vulcan validates every identifier in a semantic model. Use this section as a quick reference if validation fails:

Identifier Pattern Notes
Semantic model name, measure name, segment name, join name, granularity name, metric name ^[a-z][a-z0-9_]{0,63}$ Lowercase, starts with a letter, underscores allowed, max 64 chars.
Dimension name (i.e. a column reference) ^[a-zA-Z_][a-zA-Z0-9_]{0,63}$ Mixed case allowed so warehouse identifiers like CUSTKEY survive.
tags[*] ^[a-zA-Z0-9.:_-]+$ Supports key:value patterns, e.g. classification:PII.
terms[*] ^[a-zA-Z0-9._-]+$ Typically dotted FQNs, e.g. revenue.subscription.

Unknown keys fail validation

The wire-level schemas (ai_context, rolling_window, granularities) use Pydantic's extra="forbid". Any unknown key inside these blocks will cause validation to fail. Stick to the documented fields.


Dimensions

dimensions: is a required, non-empty list. Each item can be either a bare column name (shorthand) or a full dictionary with metadata, granularities, and formatting.

Shorthand: a bare column name

If you only need to expose a column, write it as a string:

dimensions:
  - plan_type
  - status
  - email
  - industry
  - user_id

Each string is the name of a column in the underlying Vulcan model.

Full form: a dict with metadata

When you want to add documentation, tags, glossary terms, granularities, or a display format, write the dimension as a dict:

dimensions:
  - name: signup_date
    description: When the user signed up (used for cohort time axis)
    tags:
      - temporal
      - acquisition
      - cohort
    terms:
      - customer.signup_date
      - temporal.signup_timestamp

  - name: signup_channel
    description: How the user signed up (organic, paid, referral, etc.)
    tags:
      - acquisition
      - channel
    terms:
      - customer.signup_channel

You can freely mix shorthand and full-form entries in the same list.

Granularities

For time-like dimensions, attach a granularities: list inline on the dimension. Each granularity has a name and an interval:

dimensions:
  - name: session_start
    granularities:
      - name: hour
        interval: 1 hour
      - name: day
        interval: 1 day
      - name: week
        interval: 1 week
      - name: month
        interval: 1 month

Interval grammar: any positive quantity of minute, hour, day, week, month, or year (e.g. 15 minutes, 3 months, 1 year).

Granularity fields:

Field Required Description
name Yes Lowercase identifier (see Naming rules). Must be unique within the dimension.
interval Yes Duration string like 1 hour, 30 minutes, 1 month.
description No Human-readable explanation.
ai_context No Hints for AI/LLM consumers. See AI context.

Display format

For dimensions whose values benefit from a presentation hint, set format: inline:

dimensions:
  - name: pages_viewed
    format: percent
  - name: avg_order_value
    format: currency

format is a free-form string passed through to downstream consumers (BI tools, APIs). Vulcan does not validate the value against a fixed list. Common values include percent and currency; use whatever the consumer at the other end understands.

Dimension properties (full form)

Property Required Description
name Yes Column name in the underlying Vulcan model. Mixed case allowed (see Naming rules).
description No Human-readable explanation.
tags No List of categorization labels.
terms No List of business glossary references.
granularities No List of time buckets. Only meaningful on TIMESTAMP/DATETIME columns. Granularity names must be unique within the dimension.
format No Free-form display hint (e.g. percent, currency).
ai_context No Hints for AI/LLM consumers. See AI context.
public No Whether the dimension is visible to consumers (default: true).

Measures

measures: is a list. Each item is a dict that defines a named aggregation. Reference columns from the underlying model using {name.column} syntax, where name is the semantic model's name:.

measures:
  - name: total_users
    type: count
    expression: "{users.user_id}"
    description: Total registered users
    tags:
      - user
      - count
      - metric
    terms:
      - customer.total_users
      - metric.user_count

  - name: active_users
    type: count
    filters:
      - "{users.status} = 'active'"
    description: Currently active users
    tags:
      - user
      - active
    terms:
      - customer.active_users

  - name: avg_mrr_per_account
    type: avg
    expression: "{subscriptions.mrr}"
    filters:
      - "{subscriptions.status} = 'active'"
    description: Average MRR per active subscription
    tags:
      - revenue
      - financial
    terms:
      - revenue.avg_mrr

Measure types

Type Description Expression required?
count Row count No (see below)
count_distinct Distinct count Yes
count_distinct_approx Approximate distinct count Yes
sum Sum aggregation Yes
avg Average aggregation Yes
min Minimum value Yes
max Maximum value Yes
number Custom numeric expression Yes
string Custom string expression Yes
time Custom time expression Yes
boolean Custom boolean expression Yes

count measures and expression

count is the only type where expression: is optional. Three forms all work:

measures:
  - name: row_count
    type: count
    # No expression: counts every row (equivalent to COUNT(*))

  - name: total_users
    type: count
    expression: "*"
    # Same as above, written explicitly

  - name: users_with_email
    type: count
    expression: "{users.email}"
    # Counts non-NULL values of users.email

Pick the form that best describes intent: omit or "*" for "count rows", a {name.column} reference for "count non-null values in this column".

Measure properties

Property Required Description
name Yes Lowercase identifier (see Naming rules). Must be unique among measures and segments in this semantic model.
type Yes Aggregation type (see table above). Normalized to lowercase.
expression Conditionally Column reference ({name.column}) or SQL expression. Required for every type except count.
filters No List of SQL conditions that restrict which rows are aggregated. Only allowed on count, count_distinct, count_distinct_approx, sum, avg, min, max. Never allowed on number. Use {name.column} references.
description No Human-readable explanation.
tags No List of categorization labels.
terms No List of business glossary references.
rolling_window No Window configuration. See Rolling windows.
ai_context No Hints for AI/LLM consumers. See AI context.
public No Whether the measure is visible to consumers (default: true).

Reserved name

count is a reserved measure name: Vulcan adds an implicit count measure automatically. Use a different name like total_users, row_count, or subscription_count.

Rolling windows

Attach a rolling_window: to a measure to compute it over a sliding time window relative to the query period.

measures:
  - name: trailing_7d_revenue
    type: sum
    expression: "{subscriptions.mrr}"
    rolling_window:
      trailing: 7 days
      offset: end
Field Required Allowed values
trailing No unbounded, or a signed duration like 1 day, 30 days, -7 days. Same grammar as granularity intervals.
leading No Same grammar as trailing.
offset No start or end (default end). Controls whether the window is anchored to the start or end of the bucket.

Use trailing for look-backs (rolling averages, trailing totals), leading for look-aheads, and combine them for centered windows. unbounded removes the bound on that side.

Pydantic extra=\"forbid\"

Any key other than trailing, leading, offset inside rolling_window: will fail validation.


Segments

Segments are reusable filter conditions that define meaningful subsets of your data. Instead of writing WHERE status = 'active' in every query, define it once as a segment.

segments: is a list of dicts:

segments:
  - name: high_value_accounts
    expression: "{users.plan_type} IN ('pro', 'enterprise')"
    description: Paid plan users
    tags:
      - customer
      - segment
      - revenue
    terms:
      - customer.high_value
      - segment.premium

  - name: recent_signups
    expression: "{users.signup_date} >= CURRENT_DATE - INTERVAL '7 days'"
    description: Users signed up in last 7 days
    tags:
      - acquisition
      - temporal
      - growth

  - name: at_risk_users
    expression: "{users.status} = 'active' AND {users.plan_type} = 'free'"
    description: Free users who might churn
    tags:
      - churn
      - risk

Segment properties

Property Required Description
name Yes Lowercase identifier (see Naming rules).
expression Yes SQL boolean condition. Must reference columns of the current semantic model only (e.g. {usage_sessions.device_type} = 'mobile'). Cross-model filters belong on a metric or a measure.
description No Human-readable explanation.
tags No List of categorization labels.
terms No List of business glossary references.
ai_context No Hints for AI/LLM consumers. See AI context.
public No Visibility to consumers (default: true).

Uniqueness constraint

Measure and segment names must be unique within a single semantic model. You cannot have a measure and a segment with the same name.


Joins

Joins define relationships between semantic models so you can analyze across tables. joins: is a list of dicts. The name: of each join entry must match the name: of another semantic model in the project.

joins:
  - name: subscriptions
    type: one_to_many
    expression: "{users.user_id} = {subscriptions.user_id}"

  - name: usage_events
    type: one_to_many
    expression: "{users.user_id} = {usage_events.user_id}"

Join properties

Property Required Description
name Yes Lowercase identifier (see Naming rules). Must match the name: of an existing semantic model in the project. Must not equal the current model's own name.
type Yes One of one_to_one, one_to_many, many_to_one. Normalized to lowercase.
expression Yes SQL-like join predicate referencing both sides as {model_a.col} = {model_b.col}.
ai_context No Hints for AI/LLM consumers. See AI context.
fqn No Fully-qualified name of the join target. Engine-set; rarely authored by hand.

Joins do not accept metadata

Joins do not support description, tags, terms, or public. Only the fields above are allowed; extra keys fail validation.

Join types

Type Cardinality Example
one_to_one One row matches one row user to user_profile
one_to_many One row matches many rows user to subscriptions
many_to_one Many rows match one row subscriptions to subscription_plans

many_to_many is not supported. Model many-to-many relationships through an intermediate join model (a semantic model wrapping the bridge table) and chain two joins.

The join expression: uses {name.column} syntax on both sides. The cardinality helps Vulcan handle aggregations correctly and prevent double-counting.

Cross-model references

Once joins are defined, you can reference columns from joined models in measure filters:

measures:
  - name: enterprise_revenue
    type: sum
    expression: "{subscriptions.arr}"
    filters:
      - "{users.plan_type} = 'enterprise'"
    description: ARR from enterprise plan users

Here enterprise_revenue is defined on the subscriptions semantic model but filters by users.plan_type from the joined users model. Vulcan resolves the join path automatically.


Complete example

A B2B SaaS subscriptions semantic model with dimensions, measures, segments, and joins:

kind: semantic
name: subscriptions
depends_on: hello.subscriptions
description: Subscription lifecycle and revenue (semantic layer)

dimensions:
  - subscription_id
  - user_id
  - plan_id
  - start_date
  - end_date
  - name: plan_type
    description: Subscription plan tier (free, pro, enterprise, etc.)
    tags:
      - product
      - pricing
      - segment
    terms:
      - subscription.plan_type
      - product.plan_tier
  - status
  - billing_cycle
  - revenue_category
  - mrr
  - seats
  - arr

measures:
  - name: total_arr
    type: sum
    expression: "{subscriptions.arr}"
    filters:
      - "{subscriptions.status} = 'active'"
    description: Total Annual Recurring Revenue
    tags:
      - revenue
      - financial
      - arr
    terms:
      - revenue.total_arr
      - finance.annual_recurring_revenue

  - name: subscription_count
    type: count
    filters:
      - "{subscriptions.status} = 'active'"
    description: Total active subscriptions
    tags:
      - subscription
      - count
      - metric

  - name: churn_count
    type: count
    filters:
      - "{subscriptions.status} = 'cancelled'"
      - "{subscriptions.end_date} >= CURRENT_DATE - INTERVAL '30 days'"
    description: Subscriptions churned in last 30 days
    tags:
      - churn
      - retention

segments:
  - name: active_subscriptions
    expression: "{subscriptions.status} = 'active'"
    description: Currently active subscriptions

  - name: high_value_accounts
    expression: "{subscriptions.mrr} >= 1000"
    description: High-value accounts (>= $1000 MRR)
    tags:
      - revenue
      - high_value

  - name: enterprise_subscriptions
    expression: "{subscriptions.plan_type} = 'enterprise'"
    description: Enterprise plan subscriptions

joins:
  - name: subscription_plans
    type: many_to_one
    expression: "{subscriptions.plan_id} = {subscription_plans.plan_id}"

  - name: usage_sessions
    type: one_to_many
    expression: "{subscriptions.subscription_id} = {usage_sessions.subscription_id}"

AI context

Most spec objects (the semantic model itself, dimensions, granularities, measures, segments, joins) accept an optional ai_context: block to help AI/LLM consumers understand the object. All three fields are optional, and unknown keys fail validation.

kind: semantic
name: subscriptions
depends_on: hello.subscriptions

ai_context:
  instructions: |
    Subscription rows represent the contract lifecycle, not invoice events.
    Use `total_arr` for steady-state ARR and `churn_count` for retention questions.
  synonyms:
    - "contracts"
    - "subscriptions"
    - "accounts"
  examples:
    - "How many active enterprise subscriptions exist this month?"
    - "What is ARR by plan_type for the last quarter?"

dimensions:
  - name: plan_type
    ai_context:
      synonyms:
        - "plan"
        - "tier"
        - "subscription_level"
Field Type Description
instructions String Free-form guidance for how to think about this object.
synonyms List of strings Alternate names consumers/LLMs might use.
examples List of strings Example questions, queries, or values that illustrate intent.

Validation

Vulcan validates semantic model definitions automatically when you create a plan. It checks that:

  • depends_on references an existing Vulcan model
  • All identifier names match the patterns in Naming rules
  • dimensions is non-empty
  • Column references in measures and segments point to real columns
  • Segment expressions reference only columns of the current semantic model
  • Filters on measures are only used with allowed measure types
  • Join names reference existing semantic models and are not equal to the current model's name
  • Join type is one of one_to_one, one_to_many, many_to_one
  • Cross-model references have valid join paths
  • No duplicate names exist among measures, among segments, among joins, or among granularities within a single dimension
  • count is not used as an explicit measure name
  • No unknown keys appear inside ai_context, rolling_window, or granularities (Pydantic extra="forbid")

Validation runs before anything is materialized, so errors are caught early.


Next steps

  • Learn about Business Metrics that combine measures with time and dimensions
  • Explore working examples in your project's models/semantics/ directory
  • See the Semantics Overview for the complete picture