Skip to content

Overview

Your Vulcan project needs a configuration file. It tells Vulcan how to connect to your data warehouse, where to store state, and what defaults to use for your models. Without it, Vulcan doesn't know where your data lives or how to run your transformations.

Configuration File

Create a configuration file in your project root. Choose one:

  • config.yaml: YAML format. Use this for most projects. Simple and readable.

  • config.py: Python format. Use this if you need dynamic configuration or want to generate settings programmatically.

Example Configuration

Here's what a typical configuration file looks like:

# Project identity
name: orders-analytics
display_name: Orders Analytics Platform
tenant: engineering
description: Orders Analytics is a centralized data product delivering clean, trusted insights across the full order lifecycle.

# Classification
tags:
  - e-commerce
  - retail
  - sales_analytics
  - customer_analytics
  - postgres

terms:
  - glossary.data_product
  - glossary.analytics_platform
  - glossary.sales_operations

# Metadata
metadata:
  domain: sales_operations
  use_cases:
    - Daily and weekly sales reporting
    - Customer segmentation and RFM analysis
    - Sales funnel conversion tracking
    - Product performance analytics
  limitations:
    - Demo dataset with synthetic data (100 customers, 1000 orders)
    - Historical data from November 2025 onwards

# Gateway Connection
gateways:
  default:
    connection:
      type: postgres
      host: warehouse
      port: 5432
      database: warehouse
      user: vulcan
      password: "{{ env_var('DB_PASSWORD') }}"
    state_connection:
      type: postgres
      host: statestore
      port: 5432
      database: statestore
      user: vulcan
      password: "{{ env_var('STATE_DB_PASSWORD') }}"

default_gateway: default

# Model Defaults (required)
model_defaults:
  dialect: postgres
  start: 2024-01-01
  cron: '@daily'

# Linting Rules
linter:
  enabled: true
  rules:
    - ambiguousorinvalidcolumn
    - invalidselectstarexpansion

Configuration Structure

graph TB
    Config[config.yaml]
    Config --> Project[Project Settings]
    Config --> Metadata[Metadata]
    Config --> Gateways[Gateways]
    Config --> ModelDefaults[Model Defaults]
    Config --> Options[Optional Features]
    Project --> Name[name, display_name, tenant]
    Project --> Desc[description]
    Project --> Tags[tags, terms]
    Metadata --> Domain[domain]
    Metadata --> UseCases[use_cases]
    Metadata --> Limitations[limitations]
    Gateways --> Connection[connection]
    Gateways --> StateConn[state_connection]
    Gateways --> TestConn[test_connection]
    Options --> Linter[linter]
    Options --> Notifications[notifications]
    Options --> Variables[variables]

Configuration Sections

Project Settings

Metadata fields that identify your project. They don't affect how Vulcan runs, but they're useful for organization and discovery.

Option Description Type Required
name Project identifier (used internally) string Yes
tenant Tenant or organization name string Yes
description Project description string Yes
description Project description string Yes
display_name Human-readable project name for UI/docs string No
tags Labels for categorization and filtering array of string No
terms Business glossary terms using dot notation (e.g., glossary.data_product) array of string No
metadata Project metadata object (domain, use_cases, limitations) object No
metadata Project metadata object (domain, use_cases, limitations) object No
# Project identity
name: orders-analytics
display_name: Orders Analytics Platform
tenant: engineering
description: Orders Analytics delivers insights across the full order lifecycle.

# Classification
tags:
  - e-commerce
  - retail
  - sales_analytics

terms:
  - glossary.data_product
  - glossary.analytics_platform
  - glossary.sales_operations

Metadata

Metadata fields provide additional context about your project's purpose and scope. Use these to document what your project does, where it applies, and any known constraints.

Option Description Type Required
domain Business domain or data area (e.g., sales_operations, marketing, finance) string No
use_cases List of primary use cases or business problems this project addresses array of string No
limitations List of known constraints, caveats, or edge cases to be aware of array of string No
# Metadata
metadata:
  domain: sales_operations
  use_cases:
    - Daily and weekly sales reporting
    - Customer segmentation and RFM analysis
    - Sales funnel conversion tracking
  limitations:
    - Demo dataset with synthetic data (100 customers, 1000 orders)
    - Historical data from November 2025 onwards

Gateways

Gateways define how Vulcan connects to your data warehouse and state backend. Define multiple gateways for different environments: dev, staging, prod. Each gateway has its own connection settings.

Component Description Type Required
connection Primary data warehouse connection object Yes
state_connection Where Vulcan stores internal state object No
test_connection Connection for running tests object No
scheduler Scheduler configuration object No
state_schema Schema name for state tables string No
Component Description Type Required
connection Primary data warehouse connection object Yes
state_connection Where Vulcan stores internal state object No
test_connection Connection for running tests object No
scheduler Scheduler configuration object No
state_schema Schema name for state tables string No
# Gateway Connection
gateways:
  default:
    connection:
      type: postgres
      host: warehouse
      port: 5432
      database: warehouse
      user: vulcan
      password: "{{ env_var('DB_PASSWORD') }}"
    state_connection:
      type: postgres
      host: statestore
      port: 5432
      database: statestore
      user: vulcan
      password: "{{ env_var('STATE_DB_PASSWORD') }}"

default_gateway: default

Model Defaults

The model_defaults section is required. At minimum, specify dialect to tell Vulcan what SQL dialect your models use. Other defaults are optional but apply to all models automatically, so you don't repeat the same settings in every model file.

model_defaults:
  dialect: postgres     # Required
  owner: data-team
  start: 2024-01-01
  cron: '@daily'

See Model Defaults for all available options.

Variables

Store sensitive information like passwords and API keys without hardcoding them. Use environment variables, .env files, or configuration overrides. Variables also let you override configuration values dynamically.

See Variables for details.

Execution Hooks

Run SQL statements automatically at the start and end of vulcan plan and vulcan run commands. Use before_all for setup tasks like creating temporary tables or granting permissions. Use after_all for cleanup or post-processing.

See Execution Hooks for detailed examples and use cases.

Linter

Automatic code quality checks that run when you create a plan or run the lint command. Catches common mistakes and enforces coding standards. Use built-in rules or create custom ones.

See Linter for rules and custom linter configuration.

Notifications

Set up alerts via Slack or email. Get notified when plans start or finish, when runs complete, or when audits fail.

See Notifications for Slack webhooks, API, and email setup.

Supported Engines

Vulcan works with these data warehouses and compute engines:

Engine Status
PostgreSQL Available
Snowflake Available
BigQuery Available
Databricks WIP
Redshift WIP
Spark WIP
Trino WIP
Microsoft Fabric WIP
SQL Server WIP
MySQL WIP
Lakehouse Coming Soon

Complete Configuration Reference

This table lists all available configuration keys in config.yaml. Click the links for detailed documentation.

Project Identity & Metadata

Configuration Key Description Type Required Default Documentation
name Project identifier (used for resource naming) string Yes - -
tenant Tenant or organization name (used for isolation) string Yes - -
description Project description and purpose string Yes - -
display_name Human-readable name for UI/docs string No null -
tags Labels for categorization and filtering array No [] -
terms Business glossary terms (e.g., glossary.data_product) array No [] -
metadata Project metadata (domain, use_cases, limitations) object No null See above
metadata.domain Business domain (sales, marketing, finance, etc.) string No null -
metadata.use_cases List of primary use cases this project addresses array No [] -
metadata.limitations Known constraints or caveats array No [] -

Gateway & Connection Configuration

Configuration Key Description Type Required Default Documentation
gateways Gateway configurations for different environments object Yes* {"": {}} See above
gateways.<name>.connection Primary data warehouse connection object Yes - Engines
gateways.<name>.state_connection Where Vulcan stores internal state object No Uses connection -
gateways.<name>.test_connection Connection for running unit tests object No DuckDB -
gateways.<name>.scheduler Scheduler configuration object No builtin -
gateways.<name>.state_schema Schema name for state tables string No vulcan** -
gateways.<name>.variables Gateway-specific variables object No {} Variables
default_gateway Name of the default gateway string No "" -
default_connection Root-level default connection object No null -
default_test_connection Root-level default test connection object No DuckDB -
default_scheduler Root-level default scheduler object No builtin -

* At least one gateway with a connection is required.
** With root-level state connection, defaults to {tenant}_{name} (normalized).

Model Configuration

Configuration Key Description Type Required Default Documentation
model_defaults Default values applied to all models object Yes* {} Model Defaults
model_defaults.dialect SQL dialect (postgres, snowflake, bigquery, etc.) string Yes - Model Defaults
model_defaults.owner Default owner for all models string No null -
model_defaults.start Default start date for backfilling string No Inferred -
model_defaults.cron Default cron schedule (e.g., @daily) string No null -
model_defaults.kind Default model kind (FULL, INCREMENTAL, etc.) string/object No VIEW -
model_defaults.interval_unit Temporal granularity of data intervals string No From cron -
model_defaults.batch_concurrency Max concurrent batches for incremental models integer No 1 -
model_defaults.table_format Table format (iceberg, delta, hudi) string No null -
model_defaults.storage_format Storage format (parquet, orc) string No null -
model_defaults.on_destructive_change Action on destructive schema changes string No error -
model_defaults.on_additive_change Action on additive schema changes string No apply -
model_defaults.physical_properties Properties for physical tables/views object No {} -
model_defaults.virtual_properties Properties for virtual layer views object No {} -
model_defaults.session_properties Engine-specific session properties object No {} -
model_defaults.audits Audit/assertion functions for all models array No [] -
model_defaults.optimize_query Whether to optimize SQL queries boolean No true -
model_defaults.allow_partials Whether models can process incomplete intervals boolean No false -
model_defaults.enabled Whether models are enabled by default boolean No true -
model_defaults.pre_statements SQL statements executed before model runs array No null -
model_defaults.post_statements SQL statements executed after model runs array No null -

* The model_defaults.dialect field is required.

Variables & Environment

Configuration Key Description Type Required Default Documentation
variables Root-level variables for models/macros object No {} Variables
env_vars Environment variable overrides object No {} Variables

Execution Hooks

Configuration Key Description Type Required Default Documentation
before_all SQL statements executed at start of plan/run array No null Execution Hooks
after_all SQL statements executed at end of plan/run array No null Execution Hooks

Code Quality & Linting

Configuration Key Description Type Required Default Documentation
linter Linting configuration object No {enabled: false} Linter
linter.enabled Enable or disable linting boolean No false Linter
linter.rules List of rules to enforce (error level) array No [] Linter
linter.warn_rules List of rules to warn about array No [] Linter

Notifications & Users

Configuration Key Description Type Required Default Documentation
notification_targets List of notification targets (Slack, email, console) array No [] Notifications
users List of users for approvals/notifications array No [] -
username Single user to receive notifications string No "" -

Environment & Schema Management

Configuration Key Description Type Required Default Documentation
default_target_environment Default environment for plan/run commands string No prod -
snapshot_ttl Time before unused snapshots are deleted string No in 1 week -
environment_ttl Time before dev environments are deleted string No in 1 week -
pinned_environments Environments not deleted by janitor array No [] -
physical_schema_mapping Map model patterns to physical schema names object No {} -
environment_suffix_target Where to append environment name string No schema -
environment_catalog_mapping Map environments to catalog names object No {} -
physical_table_naming_convention How to name tables at physical layer string No full -
virtual_environment_mode How to handle environments string No full -
gateway_managed_virtual_layer Whether gateways manage virtual layer boolean No false -

Project Management

Configuration Key Description Type Required Default Documentation
ignore_patterns Glob patterns for files to ignore array No Standard list -
time_column_format Default format for model time columns string No %Y-%m-%d -
infer_python_dependencies Auto-detect Python package requirements boolean No true -
log_limit Default number of logs to keep integer No 20 -
cache_dir Directory to store SQLMesh cache string No .cache -
loader Loader class for loading project files class No SqlMeshLoader -
loader_kwargs Arguments to pass to loader instance object No {} -

Command Configuration

Configuration Key Description Type Required Default Documentation
format SQL formatting options object No Default -
ui UI server configuration object No Default -
plan Plan command configuration object No Default -
migration Migration configuration object No Default -
run Run command configuration object No Default -
janitor Cleanup task configuration object No Default -
cicd_bot CI/CD bot configuration object No null -

Integrations & External Services

Configuration Key Description Type Required Default Documentation
dbt DBT-specific configuration object No null -
object_store Object storage for query results object No null -
transpiler External transpiler service object No Default -
graphql GraphQL API configuration object No Default -
state Root-level state connection (production) object No null -
pgq PostgreSQL Queue for async jobs object No Default -
analytics CloudEvents telemetry configuration object No {enabled: false} -
openlineage OpenLineage data lineage integration object No null -
heimdall Authentication service configuration object No {enabled: false} -

Minimal Valid Configuration

The absolute minimum configuration required to start:

name: my-project
tenant: my-org
description: My project description

gateways:
  default:
    connection:
      type: postgres
      host: localhost
      port: 5432
      database: mydb
      user: myuser
      password: mypass

model_defaults:
  dialect: postgres

Best Practices

Use environment variables for sensitive data like passwords and API keys. Keeps secrets out of your config files and makes it easier to manage different environments.

Set meaningful defaults in model_defaults to reduce boilerplate. If most of your models use the same dialect, start date, or cron schedule, set it once here instead of repeating it everywhere.

Enable linting to catch common errors early in development. Fix issues before they make it to production.

Separate state connection from your data warehouse for better isolation. Prevents state operations from interfering with your data processing.

Use multiple gateways for different environments: dev, staging, prod. Test changes safely before deploying to production. Use different database configurations for each environment.