Deployment Steps¶
This guide provides step-by-step instructions for deploying Vulcan data products in a DataOS environment.
Prerequisites¶
Before deploying a Vulcan data product, ensure you have the following resources configured in your DataOS environment:
1. DataOS 2.0 CLI¶
Ensure you have the DataOS CLI installed and configured:
2. Depot (Data Source Connection)¶
A depot must be configured to connect to your data warehouse (e.g., Snowflake, BigQuery, Databricks).
List available depots:
Note: Ensure the depot has read/write permissions for your data warehouse schema.
3. Engine Stack¶
An engine stack defines the execution environment for Vulcan operations (e.g., Snowflake, BigQuery, Spark).
List available stacks:
Supported engines:
- snowflake
- bigquery
- databricks
- postgres
- redshift
- trino
- mysql
- mssql
4. Compute Resource¶
A compute resource provides the execution environment for running Vulcan workflows.
List available compute resources:
Example compute resources:
- cyclone-compute (general purpose)
- minerva-compute (query engine)
- Custom compute clusters
5. Git-Sync Secret¶
A secret is required to access your private Git repository containing Vulcan models and configurations.
Create a git-sync secret:
Example secret configuration:
name: git-sync
version: v2alpha
type: secret
workspace: system
layer: user
description: "Secret for git-sync authentication (Bitbucket)"
secret:
type: key-value
data:
GITSYNC_USERNAME: "<your-git-username>"
GITSYNC_PASSWORD: "<your-git-token-or-password>"
Important: Replace
GITSYNC_USERNAMEandGITSYNC_PASSWORDwith your actual Git repository credentials or access tokens.
Configuration Files¶
Vulcan deployments require two key configuration files:
1. config.yaml - Vulcan Configuration¶
This file contains Vulcan-specific configurations including model defaults, gateways, notifications, and metadata.
Location: <project-root>/config.yaml
Key sections:
Basic Metadata¶
name: <data-product-name>
display_name: <Data Product Title>
tenant: <tenant-name>
description: <Description .... >
tags:
- <tag1>
- <tag2>
terms:
- glossary.<term1>
- glossary.<term2>
metadata:
domain: <business-domain>
use_cases:
- <use-case-1>
- <use-case-2>
limitations:
- <limitation-1>
- <limitation-2>
Model Defaults¶
model_defaults:
dialect: <engine-dialect> # Database dialect eg. snowflake, bigquery
start: '2025-01-01' # Start date for time-based models
cron: '<cron>' # Default scheduling cadence @daily
Gateway Configuration¶
Users and Ownership¶
users:
- username: <username1>
email: <username1@email.id>
type: OWNER
- username: <username2>
email: <username2@email.id>
type: CONTRIBUTOR
Complete config.yaml Example¶
📋 Click to see complete config.yaml example
name: user-engagement
display_name: User Engagement Analytics
tenant: engineering
description: User Engagement Analytics is a comprehensive data product delivering insights into user engagement patterns.
tags:
- snowflake
- user_engagement
- device_analytics
terms:
- glossary.data_product
- glossary.analytics_platform
- glossary.user_engagement
metadata:
domain: product_analytics
use_cases:
- User engagement tracking and analysis
- Device usage analytics
- Session and activity monitoring
limitations:
- Data available from 2025 onwards
- Refreshes daily at midnight UTC
model_defaults:
dialect: snowflake
start: '2025-01-01'
cron: '@daily'
gateways:
default:
connection:
type: depot
address: dataos://snowflakevulcan2
notification_targets:
- type: console
notify_on:
- apply_failure
- run_failure
- check_failure
users:
- username: <owner-username>
email: <owner-email@example.com>
type: OWNER
- username: <contributor-username>
email: <contributor-email@example.com>
type: CONTRIBUTOR
2. domain-resource.yaml - DataOS Resource Configuration¶
This file defines the DataOS-specific resource configuration for deploying Vulcan as a managed service.
Location: <project-root>/domain-resource.yaml
You can create this file manually, or generate a starter deploy manifest using the Vulcan CLI:
Key sections:
Resource Metadata¶
Execution Configuration¶
spec:
runAsUser: "<dataos-username>" # DataOS user identity
compute: <compute-name> # Compute cluster name eg. cyclone-compute
engine: <engine-name> # Execution engine eg. snowflake, bigquery
Repository Configuration¶
repo:
url: <git-repository-url> # eg. https://github.com/org/repo
syncFlags:
- '--ref=<branch-name>' # Git branch eg. main
- '--submodules=off'
baseDir: <path-to-project-in-repo> # Path to project folder
secret: <workspace>:<secret> # Git credentials secret eg. engineering:git-sync-name
Depot References¶
Workflow Configuration¶
workflow:
type: schedule # Run on a schedule
schedule:
crons:
- '<cron-expression>' # eg. '*/45 * * * *' (Every 45 minutes)
endOn: '<end-date>' # eg. '2027-01-01T00:00:00-00:00'
timezone: '<timezone>' # eg. 'US/Pacific'
concurrencyPolicy: Forbid
logLevel: INFO
resource: # Resource allocation
request:
cpu: "<cpu-request>" # eg. "200m"
memory: "<memory-request>" # eg. "512Mi"
limit:
cpu: "<cpu-limit>" # eg. "1000m"
memory: "<memory-limit>" # eg. "1Gi"
Vulcan Commands¶
migrate: # Schema migration
command: [vulcan]
arguments: [migrate]
plan: # Plan changes
command: [vulcan]
arguments:
- --log-to-stdout
- plan
- --auto-apply
run: # Execute models
command: [vulcan]
arguments:
- --log-to-stdout
- run
API Configuration¶
api:
replicas: <replica-count> # eg. 1
logLevel: INFO
resource:
request:
cpu: "<cpu-request>" # eg. "200m"
memory: "<memory-request>" # eg. "512Mi"
limit:
cpu: "<cpu-limit>" # eg. "5000m"
memory: "<memory-limit>" # eg. "4Gi"
Complete domain-resource.yaml Example¶
📋 Click to see complete domain-resource.yaml example
version: v1alpha
type: vulcan
name: user-engagement
tags:
- snowflake-analytics
- user_engagement
- device_analytics
spec:
runAsUser: "<dataos-username>"
compute: cyclone-compute
engine: snowflake
repo:
url: https://bitbucket.org/rubik_/vulcan-examples
syncFlags:
- '--ref=main'
- '--submodules=off'
baseDir: vulcan-examples/customer-usecase/usdk
secret: engineering:git-sync
depots:
- dataos://snowflakevulcan2?purpose=rw
workflow:
type: schedule
schedule:
crons:
- '*/45 * * * *'
endOn: '2027-01-01T00:00:00-00:00'
timezone: 'US/Pacific'
concurrencyPolicy: Forbid
logLevel: INFO
resource:
request:
cpu: "200m"
memory: "512Mi"
limit:
cpu: "1000m"
memory: "1Gi"
migrate:
command:
- vulcan
arguments:
- migrate
plan:
command:
- vulcan
arguments:
- --log-to-stdout
- plan
- --auto-apply
run:
command:
- vulcan
arguments:
- --log-to-stdout
- run
api:
replicas: 1
logLevel: INFO
resource:
request:
cpu: "200m"
memory: "512Mi"
limit:
cpu: "5000m"
memory: "4Gi"
Deployment Steps¶
Step 1: Prepare Your Repository¶
-
Create your Vulcan project structure:
your-project/ ├── config.yaml # Vulcan configuration ├── domain-resource.yaml # DataOS resource definition ├── models/ # SQL model files │ ├── staging/ │ └── marts/ ├── seeds/ # Static data files ├── checks/ # Data quality checks ├── audits/ # Audit queries └── semantics/ # Semantic layer definitions -
Configure
config.yamlwith your project settings - Generate
domain-resource.yamlwithvulcan create_deploy_yamlor configure it manually with your DataOS settings - Push your code to a Git repository
Step 2: Create Required Secrets¶
Step 3: Verify Prerequisites¶
# Verify depot exists
ds resource -t depot get -n <depot-name> -a
# Verify compute exists
ds resource -t compute get -n <compute-name> -a
# Verify stack exists
ds resource -t stack get -a
Step 4: Deploy Vulcan Resource¶
# Generate the deploy manifest if you haven't created it yet
vulcan create_deploy_yaml
# Apply the domain-resource configuration
ds resource apply -f domain-resource.yaml
Step 5: Monitor Deployment¶
# Get resource status
ds resource -t vulcan -n <data-product-name> get
# Check logs
ds resource -t vulcan -n <data-product-name> logs
Verification¶
Verify Models in Data Warehouse¶
Connect to your data warehouse and verify that tables/views have been created:
-- For Snowflake
SHOW TABLES IN SCHEMA <database>.<schema>;
-- Check specific table
SELECT * FROM <database>.<schema>.<table-name> LIMIT 10;