Run and Scheduling¶
This guide covers Vulcan's run functionality and scheduling strategies. You'll learn how vulcan run processes new data intervals and how to automate it for production.
The run command is different from plan, it's for regular scheduled execution, not for applying changes. Once you understand the difference, you'll know when to use each one.
Run and Scheduler Architecture¶
The following diagram illustrates how Vulcan's run system works with cron-based scheduling:
graph TB
subgraph "Scheduler Triggers"
CRON[Cron Job / CI/CD<br/>Runs periodically]
MANUAL[Manual Execution<br/>vulcan run]
end
subgraph "Run Process"
START[vulcan run<br/>Command starts]
CHECK[Check for missing intervals<br/>Compare with state]
CRON_CHECK[Check cron schedules<br/>Which models are due?]
FILTER[Filter models<br/>Only process due intervals]
end
subgraph "Model Execution"
M1[sales.daily_sales<br/>cron: @daily<br/>Due: Yes]
M2[sales.weekly_sales<br/>cron: @weekly<br/>Due: No]
M3[sales.monthly_sales<br/>cron: @monthly<br/>Due: No]
end
subgraph "State Management"
STATE[State Database<br/>Tracks processed intervals]
UPDATE[Update State<br/>Mark intervals as processed]
end
subgraph "Execution Flow"
EXEC1[Execute daily_sales<br/>Process missing intervals]
EXEC2[Skip weekly_sales<br/>Not due yet]
EXEC3[Skip monthly_sales<br/>Not due yet]
end
subgraph "Results"
SUCCESS[Run Complete<br/>Intervals processed]
LOG[Log Results<br/>Execution summary]
end
CRON -->|"Scheduled"| START
MANUAL -->|"Triggered"| START
START -->|"to"| CHECK
CHECK -->|"to"| CRON_CHECK
CRON_CHECK -->|"to"| FILTER
FILTER -->|"Due"| M1
FILTER -->|"Not due"| M2
FILTER -->|"Not due"| M3
M1 -->|"to"| EXEC1
M2 -->|"to"| EXEC2
M3 -->|"to"| EXEC3
EXEC1 -->|"to"| STATE
EXEC2 -.->|"Skip"| STATE
EXEC3 -.->|"Skip"| STATE
STATE -->|"to"| UPDATE
UPDATE -->|"to"| SUCCESS
SUCCESS -->|"to"| LOG
style CRON fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style START fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000
style CHECK fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
style CRON_CHECK fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000
style M1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
style M2 fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000
style M3 fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000
style EXEC1 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style SUCCESS fill:#c8e6c9,stroke:#2e7d32,stroke-width:3px,color:#000
Key Concepts Illustrated¶
- Scheduler Triggers: Run can be triggered by cron jobs, CI/CD pipelines, or manually
- Interval Detection: Vulcan checks for missing intervals by comparing current state with model schedules
- Cron-Based Filtering: Only models whose cron schedules indicate they're due are executed
- State Tracking: Processed intervals are tracked in the state database
- Efficient Execution: Models not due are skipped, saving computational resources
Cron Schedule Flow¶
The following diagram shows how different cron schedules determine model execution:
gantt
title Model Execution Timeline (Example: Hourly, Daily, Weekly)
dateFormat YYYY-MM-DD HH:mm
axisFormat %H:%M
section Hourly Model
Run every hour :active, hourly1, 2025-01-20 00:00, 1h
Run every hour :active, hourly2, 2025-01-20 01:00, 1h
Run every hour :active, hourly3, 2025-01-20 02:00, 1h
Run every hour :active, hourly4, 2025-01-20 03:00, 1h
section Daily Model
Run once daily :active, daily1, 2025-01-20 00:00, 24h
section Weekly Model
Run once weekly :active, weekly1, 2025-01-20 00:00, 168h
Visual Explanation:
- Hourly models run every hour when vulcan run executes
-
Daily models run once per day (at the scheduled time)
-
Weekly models run once per week (at the scheduled time)
Understanding Run vs Plan¶
| Aspect | vulcan plan |
vulcan run |
|---|---|---|
| Purpose | Apply model changes to environment | Execute existing models on schedule |
| When to Use | When models are modified/added/removed | When no changes, just process new data |
| Change Detection | Compares local files vs environment | No file comparison needed |
| Backfill | Backfills based on changes | Processes missing intervals only |
| Cron Schedule | Not used (processes all affected dates) | Uses model's cron to determine what runs |
| User Interaction | Prompts for change categorization | Runs automatically |
| Output | Shows diffs and change summary | Shows execution progress |
Key Insight: Use plan when you've changed code. Use run for regular scheduled execution.
Think of it this way: plan is for deploying changes, run is for processing new data. They serve different purposes!
How Run Works¶
The vulcan run command processes missing data intervals for models that haven't changed:
flowchart TD
START[vulcan run<br/>Command starts] --> CHECK{Check model<br/>definitions}
CHECK -->|"Changed"| ERROR[Error: Use 'vulcan plan'<br/>to apply changes first]
CHECK -->|"No changes"| STATE[Query state database<br/>Get processed intervals]
STATE --> CRON[Check cron schedules<br/>Which models are due?]
CRON --> FILTER{Filter models<br/>by cron schedule}
FILTER -->|"Due"| EXEC1[Execute Model 1<br/>Process missing intervals]
FILTER -->|"Due"| EXEC2[Execute Model 2<br/>Process missing intervals]
FILTER -->|"Not due"| SKIP1[Skip Model 3<br/>Not due yet]
FILTER -->|"Not due"| SKIP2[Skip Model 4<br/>Not due yet]
EXEC1 --> UPDATE[Update state database<br/>Mark intervals as processed]
EXEC2 --> UPDATE
SKIP1 -.->|"Skip"| UPDATE
SKIP2 -.->|"Skip"| UPDATE
UPDATE --> SUCCESS[Run complete<br/>Summary output]
ERROR --> END[Exit]
SUCCESS --> END
style START fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style CHECK fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000
style ERROR fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000
style CRON fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
style FILTER fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000
style EXEC1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
style EXEC2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
style SKIP1 fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000
style SKIP2 fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000
style UPDATE fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
style SUCCESS fill:#c8e6c9,stroke:#2e7d32,stroke-width:3px,color:#000
Process Steps:
- No Model Changes: Assumes no model definitions have changed - if they have, you'll get an error telling you to use
planfirst - Cron-Based Execution: Each model's
cronparameter determines if it should run - daily models run daily, weekly models run weekly, etc. - Missing Intervals: Only processes intervals that haven't been processed yet - efficient!
- Automatic: No prompts or user interaction required. Works well for automation.
The run command works well for scheduled execution. It's fast, automatic, and only processes what's needed.
Interactive Diagrams
All diagrams in this guide are interactive! Double-click any diagram to zoom in and explore details. Use drag to pan, arrow keys to navigate, or the zoom controls.
Scenario 1: First Run - Processing New Data¶
After applying your first plan, use run to process new data as it arrives.
Expected Output:
======================================================================
Checking for missing intervals...
----------------------------------------------------------------------
Models to execute:
└── sales.daily_sales: 2025-01-16 (1 interval)
Executing model batches ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 1/1 • 0:00:02
[1/1] sales.daily_sales [insert 2025-01-16 - 2025-01-16] 2.1s
✔ All model batches executed successfully
What Happened?
- sales.daily_sales has cron: '@daily', so it runs daily - Vulcan checks if enough time has passed
-
Yesterday's plan processed up to 2025-01-15 - that's what's already done
-
Today (2025-01-16) is a new interval that needs processing - this is what's missing
-
runautomatically processes this missing interval - no prompts, just works
This is the beauty of run, it automatically figures out what needs processing and does it. Set it up once, and it keeps running!
Scenario 2: Cron-Based Execution¶
Different models can have different cron schedules. run respects each model's schedule.
Daily Model Execution¶
Expected Output (Day 2):
Weekly Model Execution¶
After 7 days, both daily and weekly models run:
Expected Output:
Models to execute:
├── sales.daily_sales: 2025-01-18 - 2025-01-24 (7 intervals)
└── sales.weekly_sales: 2025-01-20 - 2025-01-20 (1 interval)
Executing model batches ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 2/2 • 0:00:08
[1/2] sales.daily_sales [insert 2025-01-18 - 2025-01-24] 5.2s
[2/2] sales.weekly_sales [insert 2025-01-20 - 2025-01-20] 2.8s
✔ All model batches executed successfully
Understanding Cron Schedules:
-
Daily model (
@daily): Processes missing daily intervals - runs every day whenrunexecutes -
Weekly model (
@weekly): Only processes when 7 days have elapsed - skips if not enough time has passed -
Efficient: Each model only processes what's due based on its schedule - no wasted compute
This is why cron schedules are important. They tell Vulcan when each model should run, so you don't process things unnecessarily.
Scenario 3: Run with No Missing Intervals¶
When all intervals are up to date, run skips execution:
Expected Output:
======================================================================
Checking for missing intervals...
----------------------------------------------------------------------
No models to execute. All intervals are up to date.
✔ Run completed successfully
This is normal when running frequently. Nothing to process means everything is up to date. Your automation is working and keeping things current.
Scenario 4: Run After Model Changes (Error Case)¶
If models have changed, Vulcan detects this and requires a plan first:
Expected Output:
======================================================================
Error: Model definitions have changed. Use 'vulcan plan' to apply changes first.
Changed models:
└── sales.daily_sales
Please run 'vulcan plan' to apply these changes before using 'vulcan run'.
Workflow: Always plan first to apply changes, then run for scheduled execution.
This is the key workflow: use plan when you've changed code, then use run for regular data processing. Don't mix them up!
Scheduling for Production¶
The vulcan run command doesn't run continuously - it executes once and exits. For production, you need to schedule it to run periodically. This is where automation comes in, you'll set up cron jobs, CI/CD pipelines, or Kubernetes CronJobs to trigger run on a schedule.
Built-in Scheduler Architecture¶
graph TB
subgraph "Automation Layer - Triggers"
CRON[Cron Job<br/>Schedule: Every hour<br/>Example: 0 * * * *]
CI[CI/CD Pipeline<br/>GitHub Actions / GitLab CI<br/>Scheduled workflows]
K8S[Kubernetes CronJob<br/>Container orchestration<br/>K8s native scheduling]
MANUAL[Manual Trigger<br/>Developer runs manually<br/>vulcan run]
end
subgraph "Vulcan Run Command"
RUN[vulcan run<br/>Command starts]
VALIDATE[Validate Models<br/>Check for changes<br/>Error if modified]
QUERY[Query State Database<br/>Get execution history<br/>Read processed intervals]
end
subgraph "State Database"
STATE[State Storage<br/>PostgreSQL / SQL Engine<br/>Transaction-safe storage]
subgraph "State Tables"
INTERVALS[Processed Intervals<br/>model_name, start_ds, end_ds<br/>status: completed]
CRON_STATE[Cron Execution State<br/>model_name, last_run_time<br/>next_run_time]
MODEL_STATE[Model State<br/>model_name, fingerprint<br/>environment, version]
end
end
subgraph "Cron Evaluation Engine"
CRON_CHECK[Evaluate Cron Schedules<br/>Compare current time<br/>with last execution]
CALC[Calculate Missing Intervals<br/>Determine what's due<br/>Based on cron + state]
FILTER[Filter Models<br/>Only select due models<br/>Skip not-due models]
end
subgraph "Model Execution Queue"
QUEUE[Execution Queue<br/>Ordered by dependencies<br/>Upstream first]
EXEC1[Execute Hourly Model<br/>@hourly - Due<br/>Process missing intervals]
EXEC2[Execute Daily Model<br/>@daily - Due<br/>Process missing intervals]
SKIP[Skip Weekly Model<br/>@weekly - Not due<br/>Wait for next week]
end
subgraph "Update State"
UPDATE[Update State Database<br/>Mark intervals processed<br/>Update cron state]
COMMIT[Commit Transaction<br/>Ensure consistency<br/>Rollback on error]
end
subgraph "Results & Logging"
LOG[Log Execution<br/>Summary output<br/>Success/failure status]
NOTIFY[Notifications<br/>Optional: Slack/Email<br/>On success/failure]
end
CRON -->|"Scheduled trigger"| RUN
CI -->|"Pipeline trigger"| RUN
K8S -->|"K8s trigger"| RUN
MANUAL -->|"Manual trigger"| RUN
RUN -->|"1. Validate"| VALIDATE
VALIDATE -->|"2. Query state"| QUERY
QUERY -->|"Read"| STATE
STATE -->|"Intervals"| INTERVALS
STATE -->|"Cron state"| CRON_STATE
STATE -->|"Model state"| MODEL_STATE
INTERVALS -->|"Compare"| CRON_CHECK
CRON_STATE -->|"Check schedule"| CRON_CHECK
MODEL_STATE -->|"Get models"| CRON_CHECK
CRON_CHECK -->|"Evaluate"| CALC
CALC -->|"Calculate"| FILTER
FILTER -->|"Due models"| QUEUE
FILTER -.->|"Skip"| SKIP
QUEUE -->|"Execute"| EXEC1
QUEUE -->|"Execute"| EXEC2
EXEC1 -->|"Update"| UPDATE
EXEC2 -->|"Update"| UPDATE
SKIP -.->|"No update"| UPDATE
UPDATE -->|"Commit"| COMMIT
COMMIT -->|"Success"| LOG
LOG -->|"Optional"| NOTIFY
style CRON fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style CI fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style K8S fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style MANUAL fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#000
style RUN fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#000
style VALIDATE fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000
style STATE fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000
style INTERVALS fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
style CRON_STATE fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
style MODEL_STATE fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
style CRON_CHECK fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
style CALC fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
style FILTER fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000
style QUEUE fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
style EXEC1 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style EXEC2 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style SKIP fill:#ffebee,stroke:#d32f2f,stroke-width:2px,color:#000
style UPDATE fill:#e1f5fe,stroke:#0277bd,stroke-width:2px,color:#000
style COMMIT fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style LOG fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style NOTIFY fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
Built-in Scheduler Components¶
The built-in scheduler consists of several key components working together:
- Automation Layer: External triggers (cron, CI/CD, Kubernetes) that periodically execute
vulcan run - State Database: Stores execution history, processed intervals, and cron state
- Cron Evaluation Engine: Determines which models are due based on their schedules
- Execution Queue: Orders models by dependencies and executes them
- State Updates: Records what was processed for future runs
Key Features:
-
Stores state in your SQL engine (or separate state database)
-
Automatically detects missing intervals
-
Respects each model's
cronschedule -
Processes only what's due
-
Transaction-safe state updates
-
Dependency-aware execution order
Setting Up Automation¶
Run vulcan run periodically using one of these methods:
Option 1: Linux/Mac Cron Job¶
# Edit crontab
crontab -e
# Run every hour
0 * * * * cd /path/to/project && vulcan run >> /var/log/vulcan-run.log 2>&1
# Run every 15 minutes
*/15 * * * * cd /path/to/project && vulcan run >> /var/log/vulcan-run.log 2>&1
Option 2: CI/CD Pipeline¶
GitHub Actions Example:
name: Vulcan Run
on:
schedule:
- cron: '0 * * * *' # Every hour
workflow_dispatch:
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Vulcan
run: |
docker run --network=vulcan --rm \
-v $PWD:/workspace \
tmdcio/vulcan:latest vulcan run
GitLab CI Example:
vulcan_run:
schedule:
- cron: '0 * * * *' # Every hour
script:
- docker run --network=vulcan --rm \
-v $PWD:/workspace \
tmdcio/vulcan:latest vulcan run
Option 3: Kubernetes CronJob¶
apiVersion: batch/v1
kind: CronJob
metadata:
name: vulcan-run
spec:
schedule: "0 * * * *" # Every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: vulcan
image: tmdcio/vulcan:latest
command: ["vulcan", "run"]
restartPolicy: OnFailure
Determining Run Frequency¶
Set your automation frequency based on your most frequent model's cron:
graph TD
subgraph "Model Cron Schedules"
H[Hourly Model<br/>cron: @hourly]
D[Daily Model<br/>cron: @daily]
W[Weekly Model<br/>cron: @weekly]
end
subgraph "Automation Frequency"
AUTO_H[Run every hour<br/>vulcan run]
AUTO_D[Run daily<br/>vulcan run]
AUTO_W[Run weekly<br/>vulcan run]
end
subgraph "Execution Result"
RESULT1[Hourly: Runs every time<br/>Daily: Runs when due<br/>Weekly: Runs when due]
RESULT2[Hourly: Skipped<br/>Daily: Runs when due<br/>Weekly: Runs when due]
RESULT3[Hourly: Skipped<br/>Daily: Skipped<br/>Weekly: Runs when due]
end
H -->|"Requires"| AUTO_H
D -->|"Can use"| AUTO_H
W -->|"Can use"| AUTO_H
AUTO_H -->|"Hour 1"| RESULT1
AUTO_H -->|"Hour 2-23"| RESULT2
AUTO_H -->|"Week 1"| RESULT3
style H fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
style D fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
style W fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
style AUTO_H fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#000
style RESULT1 fill:#c8e6c9,stroke:#2e7d32,stroke-width:2px,color:#000
style RESULT2 fill:#fff9c4,stroke:#fbc02d,stroke-width:2px,color:#000
style RESULT3 fill:#ffe082,stroke:#f9a825,stroke-width:2px,color:#000
Rule: Schedule vulcan run based on your fastest model's cron.
-
Hourly models → Run automation every hour - if you have hourly models, you need to run at least hourly
-
Daily models → Run automation daily - if your fastest model is daily, you can run daily
-
Weekly models → Run automation weekly - if your fastest model is weekly, you can run weekly
Example: If your fastest model runs @hourly, schedule vulcan run to execute hourly. Models with slower schedules (daily, weekly) only process when their intervals are due. Vulcan won't process daily models every hour. It waits until they're actually due.
The key insight: you can run vulcan run more frequently than your slowest model's schedule. Vulcan will just skip models that aren't due yet.
Advanced Run Options¶
Run Specific Models¶
Processes only the specified model and its upstream dependencies.
Ignore Cron Schedules¶
Processes all missing intervals regardless of cron schedules. Use sparingly - typically for catching up after downtime.
This is useful if your automation was down for a while and you need to catch up on missed intervals. But normally, you want Vulcan to respect cron schedules, that's the whole point!
Custom Execution Time¶
Simulates running at a specific time. Useful for testing cron schedules.
Run in Different Environments¶
Runs models in the dev environment, maintaining separate execution state from production.
State Database Considerations¶
By default, Vulcan stores scheduler state in your SQL engine. For production:
Recommended: Use a separate PostgreSQL database for state storage when: - Your SQL engine is BigQuery (not optimized for frequent transactions)
-
You observe performance degradation
-
You need better isolation
See Configuration Guide for configuring a separate state database.
Best Practices¶
- Use
runfor scheduled execution - Don't useplanfor regular data processing - Set up automation - Schedule
vulcan runbased on your most frequent model's cron - Monitor execution - Check logs to ensure intervals are processing correctly
- Use
--ignore-cronsparingly - Only when catching up on missed intervals - Separate state database - Consider PostgreSQL for state storage in production
- Handle errors gracefully - Set up notifications for run failures
Here are some tips to help you use run effectively:
- Use
runfor scheduled execution - Don't useplanfor regular data processing. They serve different purposes! - Set up automation - Schedule
vulcan runbased on your most frequent model's cron. Set it and forget it. - Monitor execution - Check logs to ensure intervals are processing correctly. Make sure your automation is actually working.
- Use
--ignore-cronsparingly - Only when catching up on missed intervals. Normally, let Vulcan respect cron schedules. - Separate state database - Consider PostgreSQL for state storage in production. Some SQL engines aren't optimized for frequent transactions.
- Handle errors gracefully - Set up notifications for run failures.
Following these practices will help you build reliable, automated data pipelines.
Quick Reference¶
| Scenario | Command | When to Use |
|---|---|---|
| Regular Run | vulcan run |
Scheduled execution (cron jobs, CI/CD) |
| Dev Environment | vulcan run dev |
Running models in dev environment |
| Select Models | vulcan run --select-model "model" |
Running specific models only |
| Ignore Cron | vulcan run --ignore-cron |
Catch up on all missing intervals |
| Custom Time | vulcan run --execution-time "..." |
Testing/simulating runs |
Next Steps¶
- Learn about Plan Guide for applying model changes