Overview¶
Models transform raw data into tables and views. Define what you want (the metadata) and how to make it (the SQL query), and Vulcan handles the rest.
Models live in .sql and .py files in the models/ directory of your project. Vulcan automatically figures out how your models relate to each other by parsing your SQL, so you don't have to manually configure dependencies. Write your SQL, and Vulcan handles the lineage.
Every model has two parts:
-
DDL (Data Definition Language) - The
MODELblock that tells Vulcan what this model is (name, schedule, how to materialize it, etc.) -
DML (Data Manipulation Language) - The
SELECTquery that does the actual transformation work
The DDL defines the model metadata. The DML contains the transformation logic.
Model Structure¶
You can write models in SQL or Python. Both work the same way conceptually; they just look different. Let's see both:
Breaking it down:
-
Lines 1-21: The DDL (
MODELblock) - tells Vulcan this is a daily sales model with metadata, tags, and column documentation -
Lines 23-31: The DML (
SELECTquery) - the actual transformation that aggregates orders by date
Breaking it down:
-
Lines 7-34: The DDL (
@modeldecorator) - same metadata as SQL with tags, terms, and column documentation -
Lines 35-54: The DML (function body) - runs the SQL and returns a DataFrame
Both formats do the same thing. Choose the one you prefer.
DDL: The MODEL Block¶
The MODEL block is where you tell Vulcan about your model. It's the first thing in your file (after any comments) and uses a simple, declarative syntax.
Basic Syntax¶
Here's what a MODEL block looks like:
This tells Vulcan:
-
name- What to call this model (schema.table format) -
kind- How to materialize it (FULL rebuilds everything, INCREMENTAL only updates changes, etc.) -
cron- When to run it (@dailymeans every day) -
grains- What makes each row unique (uses tuple syntax with parentheses) -
tags- Labels for categorization (uses tuple syntax) -
terms- Business glossary terms using dot notation -
description- Human-readable description of the model
Common Properties¶
Here are the properties you'll use most often:
| Property | What It Does | Example |
|---|---|---|
name |
Fully qualified model name (schema.table) | sales.daily_sales |
kind |
Materialization strategy | FULL, INCREMENTAL, VIEW |
cron |
When to run (scheduling) | '@daily', '0 0 * * *' |
grains |
Column(s) that make rows unique | (order_date) or (customer_id, order_date) |
owner |
Who owns this model (for governance) | analytics_team |
description |
Human-readable description | 'Daily sales aggregates' |
tags |
Labels for organizing models | ('gold', 'analytics', 'customer') |
terms |
Business glossary terms | ('customer.rfm_analysis') |
More DDL Properties
There are more properties available beyond these common ones, including column_descriptions, column_tags, and column_terms for column-level metadata. Check out the Model Properties reference for the complete list of all available model properties and their configurations.
DML: The SELECT Query¶
The SELECT query is where the magic happens. This is your transformation logic, the SQL that actually does the work.
This query:
-
Reads from
raw.raw_orders -
Groups by
order_date -
Counts orders, sums revenue, finds the latest order ID
-
Returns the results ordered by date
Pretty standard SQL! Vulcan will automatically figure out that this model depends on raw.raw_orders and build the dependency graph for you.
Conventions¶
Vulcan tries to be smart and infer as much as possible from your SQL. This means you don't have to write a bunch of YAML config files, just write SQL and Vulcan figures it out. But to do this, your SQL needs to follow some conventions.
SQL Model Conventions¶
Unique Column Names¶
Your final SELECT needs unique column names. No duplicates allowed!
If you have duplicate column names, Vulcan won't know which one you mean, and that causes problems.
Explicit Types¶
Cast your types explicitly. This prevents surprises and ensures your schema is consistent:
Vulcan uses PostgreSQL-style casting (x::int), but don't worry, it automatically converts this to whatever your execution engine needs. So you write ::INTEGER and Vulcan handles the rest.
Why this matters: Without explicit types, you might get FLOAT when you expected INTEGER, or VARCHAR when you wanted TIMESTAMP. Explicit casting prevents these surprises.
Inferrable Names¶
Your columns need names that Vulcan can figure out. If Vulcan can't infer a name, you need to add an alias:
If you forget an alias, Vulcan's formatter will add one automatically when it renders your SQL. But it's better to be explicit, you'll know what the column is called!
Column Metadata¶
Document and categorize your columns using column-level metadata properties. There are several options:
Column Descriptions (Recommended)
This keeps all your documentation in one place, in the MODEL block.
Column Tags and Terms
Beyond descriptions, you can also add tags and business glossary terms to columns:
column_tags- Categorize columns by role (dimension,measure), sensitivity (pii), or purposecolumn_terms- Link columns to business glossary terms for semantic understanding
See Model Properties for detailed documentation on all column-level metadata options.
Priority
If you use column_descriptions in the DDL, Vulcan will use those and ignore any inline comments in your query. DDL descriptions take priority, so if you define descriptions in both places, the DDL version wins.
Option 2: Inline Comments
If you don't specify column_descriptions in the DDL, Vulcan will automatically pick up comments from your query:
Vulcan registers these comments as column descriptions in your database.
Table comments: If you put a comment before the MODEL block, Vulcan will use it as the table description. But if you also specify description in the MODEL block, that takes priority.
Python Model Conventions¶
Python models work a bit differently because Python doesn't have the same type inference capabilities as SQL.
Explicit Column Definitions¶
You must define your columns explicitly in the @model decorator:
Vulcan can't infer types from Python code the way it can from SQL, so you need to tell it explicitly.
Explicit Dependencies¶
Unlike SQL models (where Vulcan figures out dependencies automatically), Python models need you to list them:
This is because Vulcan can't parse your Python code to find FROM clauses and joins. You need to tell it what this model depends on.
Column Metadata¶
Python models can't use inline comments for column descriptions. Instead, specify them in the decorator using column_descriptions, column_tags, and column_terms:
Column name validation
Vulcan will error if you put a column name in column_descriptions, column_tags, or column_terms that doesn't exist in columns. This prevents typos and keeps things consistent, if you describe a column, it better exist!
Return Type¶
Your execute function must return a pandas DataFrame, and the columns must match what you defined in columns:
The DataFrame columns need to match your columns definition exactly, same names, compatible types.
Learn more
See Python Models for detailed information, advanced patterns, and more examples.
Comment Registration¶
Vulcan registers comments (descriptions) in your database so they show up in your BI tools and data catalogs.
How Comments Get Registered¶
Model-level comments:
-
If you put a comment before the
MODELblock, Vulcan uses it as the table comment -
If you also specify
descriptionin the MODEL block, that takes priority
Column-level comments:
-
Use
column_descriptionsin the DDL (recommended) -
Or use inline comments in your SELECT query (if
column_descriptionsisn't specified)
What Gets Registered¶
Not everything gets comments registered:
-
Physical tables - Comments are registered (tables in the
vulcan__[project schema]schema) -
Production views - Comments are registered
-
Temporary tables - No comments (they're temporary)
-
Non-production views - No comments (keeps things clean)
Note: Some engines automatically pass comments from physical tables to views that select from them. So even if Vulcan didn't explicitly register a comment on a view, it might still show up if the engine does this automatically.
Engine Support¶
Different databases support comments differently. Some can register comments in the CREATE statement (one command), others need separate commands for each comment.
Here's what each engine supports:
| Engine | TABLE comments |
VIEW comments |
|---|---|---|
| Postgres | Y | Y |
| Snowflake | Y | Y |
| Spark | Y | Y |
If your engine doesn't support comments, Vulcan will skip registration (no errors, it just won't register them).
Macros¶
Macros are like variables for your SQL. They let you parameterize queries and avoid repetition. Vulcan provides several built-in macros (like @start_ds and @end_ds for incremental models), and you can define your own.
Macros use the @ prefix. For example, @this_model refers to the current model being processed, and @start_ds is the start date for incremental processing.
See the macros documentation for details.