Semantic Models¶
Semantic models map your physical Vulcan models to business-friendly representations. They define what consumers can do with each model: which columns are exposed as dimensions, what aggregations are available as measures, which reusable filters exist as segments, and how models relate to each other through joins.
Structure¶
A semantic model wraps a single Vulcan model. Use kind: semantic and put one model per file in models/semantics/:
kind: semantic
name: users # Business-friendly name used in queries
depends_on: b2b_saas.users # Fully qualified Vulcan model this wraps
description: Core user dimension (semantic layer)
dimensions: [...] # Columns consumers can group by and filter on
measures: [...] # Aggregated calculations
segments: [...] # Reusable filter conditions
joins: [...] # Relationships to other semantic models
Top-level fields:
| Field | Required | Description |
|---|---|---|
kind: semantic |
Yes | Declares the file as a semantic model. |
name |
Yes | Business-friendly identifier consumers reference (e.g. users). Lowercase identifier (see Naming rules). This is the identifier you use in {name.column} references everywhere else. |
depends_on |
Yes | The fully qualified Vulcan model this wraps (e.g. b2b_saas.users). Must match a model defined in your models/ directory. |
dimensions |
Yes | List of dimensions. Must be non-empty. See Dimensions. |
description |
No | Human-readable explanation of the semantic model. |
owner |
No | Team or person responsible. Free-form string. |
tags |
No | List of tags. See Naming rules for the allowed pattern. |
terms |
No | List of business glossary references (e.g. revenue.subscription). See Naming rules. |
ai_context |
No | Hints for AI/LLM consumers. See AI context. |
measures |
No | List of named aggregations. |
segments |
No | List of reusable filter conditions. |
joins |
No | List of relationships to other semantic models. |
Snowflake and other case-sensitive engines
Snowflake stores unquoted identifiers in uppercase by default. When targeting Snowflake, use uppercase column names in your dimension lists, expressions, and filters to match the warehouse schema. Lowercase examples in this guide assume a case-insensitive engine like DuckDB or Postgres.
Naming rules¶
Vulcan validates every identifier in a semantic model. Use this section as a quick reference if validation fails:
| Identifier | Pattern | Notes |
|---|---|---|
Semantic model name, measure name, segment name, join name, granularity name, metric name |
^[a-z][a-z0-9_]{0,63}$ |
Lowercase, starts with a letter, underscores allowed, max 64 chars. |
Dimension name (i.e. a column reference) |
^[a-zA-Z_][a-zA-Z0-9_]{0,63}$ |
Mixed case allowed so warehouse identifiers like CUSTKEY survive. |
tags[*] |
^[a-zA-Z0-9.:_-]+$ |
Supports key:value patterns, e.g. classification:PII. |
terms[*] |
^[a-zA-Z0-9._-]+$ |
Typically dotted FQNs, e.g. revenue.subscription. |
Unknown keys fail validation
The wire-level schemas (ai_context, rolling_window, granularities) use Pydantic's extra="forbid". Any unknown key inside these blocks will cause validation to fail. Stick to the documented fields.
Dimensions¶
dimensions: is a required, non-empty list. Each item can be either a bare column name (shorthand) or a full dictionary with metadata, granularities, and formatting.
Shorthand: a bare column name¶
If you only need to expose a column, write it as a string:
Each string is the name of a column in the underlying Vulcan model.
Full form: a dict with metadata¶
When you want to add documentation, tags, glossary terms, granularities, or a display format, write the dimension as a dict:
dimensions:
- name: signup_date
description: When the user signed up (used for cohort time axis)
tags:
- temporal
- acquisition
- cohort
terms:
- customer.signup_date
- temporal.signup_timestamp
- name: signup_channel
description: How the user signed up (organic, paid, referral, etc.)
tags:
- acquisition
- channel
terms:
- customer.signup_channel
You can freely mix shorthand and full-form entries in the same list.
Granularities¶
For time-like dimensions, attach a granularities: list inline on the dimension. Each granularity has a name and an interval:
dimensions:
- name: session_start
granularities:
- name: hour
interval: 1 hour
- name: day
interval: 1 day
- name: week
interval: 1 week
- name: month
interval: 1 month
Interval grammar: any positive quantity of minute, hour, day, week, month, or year (e.g. 15 minutes, 3 months, 1 year).
Granularity fields:
| Field | Required | Description |
|---|---|---|
name |
Yes | Lowercase identifier (see Naming rules). Must be unique within the dimension. |
interval |
Yes | Duration string like 1 hour, 30 minutes, 1 month. |
description |
No | Human-readable explanation. |
ai_context |
No | Hints for AI/LLM consumers. See AI context. |
Display format¶
For dimensions whose values benefit from a presentation hint, set format: inline:
format is a free-form string passed through to downstream consumers (BI tools, APIs). Vulcan does not validate the value against a fixed list. Common values include percent and currency; use whatever the consumer at the other end understands.
Dimension properties (full form)¶
| Property | Required | Description |
|---|---|---|
name |
Yes | Column name in the underlying Vulcan model. Mixed case allowed (see Naming rules). |
description |
No | Human-readable explanation. |
tags |
No | List of categorization labels. |
terms |
No | List of business glossary references. |
granularities |
No | List of time buckets. Only meaningful on TIMESTAMP/DATETIME columns. Granularity names must be unique within the dimension. |
format |
No | Free-form display hint (e.g. percent, currency). |
ai_context |
No | Hints for AI/LLM consumers. See AI context. |
public |
No | Whether the dimension is visible to consumers (default: true). |
Measures¶
measures: is a list. Each item is a dict that defines a named aggregation. Reference columns from the underlying model using {name.column} syntax, where name is the semantic model's name:.
measures:
- name: total_users
type: count
expression: "{users.user_id}"
description: Total registered users
tags:
- user
- count
- metric
terms:
- customer.total_users
- metric.user_count
- name: active_users
type: count
filters:
- "{users.status} = 'active'"
description: Currently active users
tags:
- user
- active
terms:
- customer.active_users
- name: avg_mrr_per_account
type: avg
expression: "{subscriptions.mrr}"
filters:
- "{subscriptions.status} = 'active'"
description: Average MRR per active subscription
tags:
- revenue
- financial
terms:
- revenue.avg_mrr
Measure types¶
| Type | Description | Expression required? |
|---|---|---|
count |
Row count | No (see below) |
count_distinct |
Distinct count | Yes |
count_distinct_approx |
Approximate distinct count | Yes |
sum |
Sum aggregation | Yes |
avg |
Average aggregation | Yes |
min |
Minimum value | Yes |
max |
Maximum value | Yes |
number |
Custom numeric expression | Yes |
string |
Custom string expression | Yes |
time |
Custom time expression | Yes |
boolean |
Custom boolean expression | Yes |
count measures and expression¶
count is the only type where expression: is optional. Three forms all work:
measures:
- name: row_count
type: count
# No expression: counts every row (equivalent to COUNT(*))
- name: total_users
type: count
expression: "*"
# Same as above, written explicitly
- name: users_with_email
type: count
expression: "{users.email}"
# Counts non-NULL values of users.email
Pick the form that best describes intent: omit or "*" for "count rows", a {name.column} reference for "count non-null values in this column".
Measure properties¶
| Property | Required | Description |
|---|---|---|
name |
Yes | Lowercase identifier (see Naming rules). Must be unique among measures and segments in this semantic model. |
type |
Yes | Aggregation type (see table above). Normalized to lowercase. |
expression |
Conditionally | Column reference ({name.column}) or SQL expression. Required for every type except count. |
filters |
No | List of SQL conditions that restrict which rows are aggregated. Only allowed on count, count_distinct, count_distinct_approx, sum, avg, min, max. Never allowed on number. Use {name.column} references. |
description |
No | Human-readable explanation. |
tags |
No | List of categorization labels. |
terms |
No | List of business glossary references. |
rolling_window |
No | Window configuration. See Rolling windows. |
ai_context |
No | Hints for AI/LLM consumers. See AI context. |
public |
No | Whether the measure is visible to consumers (default: true). |
Reserved name
count is a reserved measure name: Vulcan adds an implicit count measure automatically. Use a different name like total_users, row_count, or subscription_count.
Rolling windows¶
Attach a rolling_window: to a measure to compute it over a sliding time window relative to the query period.
measures:
- name: trailing_7d_revenue
type: sum
expression: "{subscriptions.mrr}"
rolling_window:
trailing: 7 days
offset: end
| Field | Required | Allowed values |
|---|---|---|
trailing |
No | unbounded, or a signed duration like 1 day, 30 days, -7 days. Same grammar as granularity intervals. |
leading |
No | Same grammar as trailing. |
offset |
No | start or end (default end). Controls whether the window is anchored to the start or end of the bucket. |
Use trailing for look-backs (rolling averages, trailing totals), leading for look-aheads, and combine them for centered windows. unbounded removes the bound on that side.
Pydantic extra=\"forbid\"
Any key other than trailing, leading, offset inside rolling_window: will fail validation.
Segments¶
Segments are reusable filter conditions that define meaningful subsets of your data. Instead of writing WHERE status = 'active' in every query, define it once as a segment.
segments: is a list of dicts:
segments:
- name: high_value_accounts
expression: "{users.plan_type} IN ('pro', 'enterprise')"
description: Paid plan users
tags:
- customer
- segment
- revenue
terms:
- customer.high_value
- segment.premium
- name: recent_signups
expression: "{users.signup_date} >= CURRENT_DATE - INTERVAL '7 days'"
description: Users signed up in last 7 days
tags:
- acquisition
- temporal
- growth
- name: at_risk_users
expression: "{users.status} = 'active' AND {users.plan_type} = 'free'"
description: Free users who might churn
tags:
- churn
- risk
Segment properties¶
| Property | Required | Description |
|---|---|---|
name |
Yes | Lowercase identifier (see Naming rules). |
expression |
Yes | SQL boolean condition. Must reference columns of the current semantic model only (e.g. {usage_sessions.device_type} = 'mobile'). Cross-model filters belong on a metric or a measure. |
description |
No | Human-readable explanation. |
tags |
No | List of categorization labels. |
terms |
No | List of business glossary references. |
ai_context |
No | Hints for AI/LLM consumers. See AI context. |
public |
No | Visibility to consumers (default: true). |
Uniqueness constraint
Measure and segment names must be unique within a single semantic model. You cannot have a measure and a segment with the same name.
Joins¶
Joins define relationships between semantic models so you can analyze across tables. joins: is a list of dicts. The name: of each join entry must match the name: of another semantic model in the project.
joins:
- name: subscriptions
type: one_to_many
expression: "{users.user_id} = {subscriptions.user_id}"
- name: usage_events
type: one_to_many
expression: "{users.user_id} = {usage_events.user_id}"
Join properties¶
| Property | Required | Description |
|---|---|---|
name |
Yes | Lowercase identifier (see Naming rules). Must match the name: of an existing semantic model in the project. Must not equal the current model's own name. |
type |
Yes | One of one_to_one, one_to_many, many_to_one. Normalized to lowercase. |
expression |
Yes | SQL-like join predicate referencing both sides as {model_a.col} = {model_b.col}. |
ai_context |
No | Hints for AI/LLM consumers. See AI context. |
fqn |
No | Fully-qualified name of the join target. Engine-set; rarely authored by hand. |
Joins do not accept metadata
Joins do not support description, tags, terms, or public. Only the fields above are allowed; extra keys fail validation.
Join types¶
| Type | Cardinality | Example |
|---|---|---|
one_to_one |
One row matches one row | user to user_profile |
one_to_many |
One row matches many rows | user to subscriptions |
many_to_one |
Many rows match one row | subscriptions to subscription_plans |
many_to_many is not supported. Model many-to-many relationships through an intermediate join model (a semantic model wrapping the bridge table) and chain two joins.
The join expression: uses {name.column} syntax on both sides. The cardinality helps Vulcan handle aggregations correctly and prevent double-counting.
Cross-model references¶
Once joins are defined, you can reference columns from joined models in measure filters:
measures:
- name: enterprise_revenue
type: sum
expression: "{subscriptions.arr}"
filters:
- "{users.plan_type} = 'enterprise'"
description: ARR from enterprise plan users
Here enterprise_revenue is defined on the subscriptions semantic model but filters by users.plan_type from the joined users model. Vulcan resolves the join path automatically.
Complete example¶
A B2B SaaS subscriptions semantic model with dimensions, measures, segments, and joins:
kind: semantic
name: subscriptions
depends_on: hello.subscriptions
description: Subscription lifecycle and revenue (semantic layer)
dimensions:
- subscription_id
- user_id
- plan_id
- start_date
- end_date
- name: plan_type
description: Subscription plan tier (free, pro, enterprise, etc.)
tags:
- product
- pricing
- segment
terms:
- subscription.plan_type
- product.plan_tier
- status
- billing_cycle
- revenue_category
- mrr
- seats
- arr
measures:
- name: total_arr
type: sum
expression: "{subscriptions.arr}"
filters:
- "{subscriptions.status} = 'active'"
description: Total Annual Recurring Revenue
tags:
- revenue
- financial
- arr
terms:
- revenue.total_arr
- finance.annual_recurring_revenue
- name: subscription_count
type: count
filters:
- "{subscriptions.status} = 'active'"
description: Total active subscriptions
tags:
- subscription
- count
- metric
- name: churn_count
type: count
filters:
- "{subscriptions.status} = 'cancelled'"
- "{subscriptions.end_date} >= CURRENT_DATE - INTERVAL '30 days'"
description: Subscriptions churned in last 30 days
tags:
- churn
- retention
segments:
- name: active_subscriptions
expression: "{subscriptions.status} = 'active'"
description: Currently active subscriptions
- name: high_value_accounts
expression: "{subscriptions.mrr} >= 1000"
description: High-value accounts (>= $1000 MRR)
tags:
- revenue
- high_value
- name: enterprise_subscriptions
expression: "{subscriptions.plan_type} = 'enterprise'"
description: Enterprise plan subscriptions
joins:
- name: subscription_plans
type: many_to_one
expression: "{subscriptions.plan_id} = {subscription_plans.plan_id}"
- name: usage_sessions
type: one_to_many
expression: "{subscriptions.subscription_id} = {usage_sessions.subscription_id}"
AI context¶
Most spec objects (the semantic model itself, dimensions, granularities, measures, segments, joins) accept an optional ai_context: block to help AI/LLM consumers understand the object. All three fields are optional, and unknown keys fail validation.
kind: semantic
name: subscriptions
depends_on: hello.subscriptions
ai_context:
instructions: |
Subscription rows represent the contract lifecycle, not invoice events.
Use `total_arr` for steady-state ARR and `churn_count` for retention questions.
synonyms:
- "contracts"
- "subscriptions"
- "accounts"
examples:
- "How many active enterprise subscriptions exist this month?"
- "What is ARR by plan_type for the last quarter?"
dimensions:
- name: plan_type
ai_context:
synonyms:
- "plan"
- "tier"
- "subscription_level"
| Field | Type | Description |
|---|---|---|
instructions |
String | Free-form guidance for how to think about this object. |
synonyms |
List of strings | Alternate names consumers/LLMs might use. |
examples |
List of strings | Example questions, queries, or values that illustrate intent. |
Validation¶
Vulcan validates semantic model definitions automatically when you create a plan. It checks that:
depends_onreferences an existing Vulcan model- All identifier names match the patterns in Naming rules
dimensionsis non-empty- Column references in measures and segments point to real columns
- Segment expressions reference only columns of the current semantic model
- Filters on measures are only used with allowed measure types
- Join
names reference existing semantic models and are not equal to the current model'sname - Join
typeis one ofone_to_one,one_to_many,many_to_one - Cross-model references have valid join paths
- No duplicate names exist among measures, among segments, among joins, or among granularities within a single dimension
countis not used as an explicit measure name- No unknown keys appear inside
ai_context,rolling_window, orgranularities(Pydanticextra="forbid")
Validation runs before anything is materialized, so errors are caught early.
Next steps¶
- Learn about Business Metrics that combine measures with time and dimensions
- Explore working examples in your project's
models/semantics/directory - See the Semantics Overview for the complete picture