Get Started¶
This guide shows you how to set up a complete Vulcan project on your local machine.
The example project runs locally using a Postgres SQL engine. Vulcan automatically generates all necessary project files and configurations.
To get started, ensure your system meets the prerequisites below, then follow the step-by-step instructions for your operating system.
Prerequisites¶
Before you begin, make sure you have Docker installed and configured on your system. Follow the instructions below for your operating system.
1. Verify Docker Installation
First, check if Docker Desktop (Mac) or Docker Engine (Linux) is installed and running:
If both commands return version numbers, Docker is installed. Make sure Docker Desktop is running (you should see the Docker icon in your menu bar or system tray).
2. Install Docker (if needed)
-
Mac: Download and install Docker Desktop for Mac
-
Linux: Install Docker Engine and Docker Compose following the official Docker installation guide
3. Configure Resources
Ensure Docker Desktop has at least 4GB of RAM allocated. You can adjust this in Docker Desktop settings under Resources → Advanced.
1. Verify Docker Installation
Check if Docker Desktop for Windows is installed and running:
If both commands return version numbers, Docker is installed. Make sure Docker Desktop is running (you should see the Docker icon in your system tray).
2. Install Docker (if needed)
If Docker is not installed, download and install Docker Desktop for Windows
3. Configure Resources
Ensure Docker Desktop has at least 4GB of RAM allocated. You can adjust this in Docker Desktop settings under Settings → Resources → Advanced.
Vulcan Setup Locally¶
Follow these steps to set up Vulcan on your local machine. The setup process will create all necessary infrastructure services and prepare your environment for development.
The download includes: Docker Compose files, Makefile, and a comprehensive README
Step 1: Extract and Navigate
Extract the downloaded zip file and open the vulcan-project folder in VS Code or your preferred IDE:
Step 2: Run Setup
Important: Before running setup, ensure Docker Desktop is running on your machine and that you are logged into RubikLabs.
Execute the setup command:
This command starts the full Vulcan stack in one step:
-
statestore (PostgreSQL): Stores Vulcan's internal state, including model definitions, plan information, and execution history. This database persists your semantic model, plans, and tracks materialization state.
-
minio (Object Storage): Stores query results, artifacts, and other data objects that Vulcan generates. This service provides data retrieval and caching for your workflows.
-
vulcan-transpiler: Transpiler API for converting semantic queries to SQL (available at
http://localhost:8100) -
vulcan-api: REST API server for querying your semantic model (available at
http://localhost:8000) -
vulcan-graphql: GraphQL interface for querying your semantic layer (available at
http://localhost:3000) -
vulcan-mysql (optional): MySQL wire protocol access for BI tool connectivity (available at
localhost:3307) -
MySQL proxy: Proxy for BI tools to connect via MySQL protocol (available at
localhost:3306)
Note: The setup process typically takes 1-2 minutes to complete. All services are essential for Vulcan's operation.
State Connection Default
By default, you should use Postgres for your state connection. When configuring your config.yaml, set state_connection to use Postgres. This ensures reliable state management and is the recommended approach for most projects.
Verify Services Are Running
Before proceeding, verify that all services are up and running.
Check running containers
Use Docker directly to confirm that all containers are running:
You should see containers corresponding to the following services with a status of Up:
- statestore (PostgreSQL) – State storage service
- minio – Object storage service
- vulcan-api – REST API service
- vulcan-graphql – GraphQL service
- vulcan-transpiler – Transpiler service
If a container is missing from the list or not in an Up state, it may have stopped or failed to start.
Validate container logs
To inspect logs for any specific service, use:
For example:
Review the logs for errors, crash messages, or failed startup checks.
Once all containers are running properly and logs look healthy, proceed to the next step.
Step 3: Configure Vulcan CLI Access
Create an alias to access the Vulcan CLI. The alias uses an engine-specific Docker image. Postgres is shown by default (recommended for most users). If you're using a different engine, select it from the tabs below:
Automatic Updates
Docker image versions in this section are automatically synchronized with the engine configuration files. When engine image versions are updated, this section is automatically updated as well.
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-postgres:0.228.1.10 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-bigquery:0.228.1.10 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-databricks:0.228.1.10 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-fabric:0.228.1.6 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-mssql:0.228.1.6 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-mysql:0.228.1.6 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-redshift:0.228.1.6 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-snowflake:0.228.1.10 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-spark:0.228.1.6 vulcan"
alias vulcan="docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-trino:0.228.1.6 vulcan"
Note: This alias is temporary and will be lost when you close your shell session. To make it permanent, add this line to your shell configuration file (~/.bashrc for Bash or ~/.zshrc for Zsh), then restart your terminal or run source ~/.zshrc (or source ~/.bashrc).
Once all services are running, you're ready to create your first project!
The download includes: Docker Compose files, Windows batch scripts, and a comprehensive README
Step 1: Extract and Navigate
Extract the downloaded zip file and navigate to the vulcan-project directory:
Step 2: Run Setup
Important: Before running setup, ensure Docker Desktop for Windows is running and that you are logged into RubikLabs.
Execute the setup script:
This script creates and starts three essential services:
-
statestore (PostgreSQL): Stores Vulcan's internal state, including model definitions, plan information, and execution history. This database persists your semantic model, plans, and tracks materialization state.
-
minio (Object Storage): Stores query results, artifacts, and other data objects that Vulcan generates. This service provides data retrieval and caching for your workflows.
-
minio-init: Initializes MinIO buckets and policies with the correct configuration. This service runs once to set up the storage infrastructure.
Note: These services are essential for Vulcan's operation and must be running before you can use Vulcan. The setup process typically takes 1-2 minutes to complete.
State Connection Default
By default, you should use Postgres for your state connection. When configuring your config.yaml, set state_connection to use Postgres. This ensures reliable state management and is the recommended approach for most projects.
Verify Services Are Running
Before proceeding, verify that all required infrastructure services (engine and storage) are up and running.
Check running containers
Use Docker directly to confirm that all containers are running:
You should see containers corresponding to the following services with a status of Up:
- statestore (PostgreSQL) – State storage service
- minio – Object storage service
- warehouse (PostgreSQL) – Data warehouse engine
If a container is missing from the list or not in an Up state, it may have stopped or failed to start.
Validate container logs
To inspect logs for any specific service, use:
For example:
Review the logs for errors, crash messages, or failed startup checks.
Once all containers are running properly and logs look healthy, proceed to the next step.
Step 3: Configure Vulcan CLI Access
Create a function to access the Vulcan CLI. Open PowerShell and run the following command. Postgres is shown by default (recommended for most users). If you're using a different engine, select it from the tabs below:
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-postgres:0.228.1.10 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-bigquery:0.228.1.10 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-databricks:0.228.1.10 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-fabric:0.228.1.6 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-mssql:0.228.1.6 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-mysql:0.228.1.6 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-redshift:0.228.1.6 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-snowflake:0.228.1.10 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-spark:0.228.1.6 vulcan $args
}
function vulcan {
docker run -it --network=vulcan --rm -v .:/workspace tmdcio/vulcan-trino:0.228.1.6 vulcan $args
}
Note: This function is temporary and will be lost when you close your PowerShell session. To make it permanent, add it to your PowerShell profile. Run notepad $PROFILE to open your profile file, paste the function, and save.
Step 4: Start API Services
Configure Environment Variables
Before starting the services, open docker\docker-compose.vulcan.yml and replace the following placeholders with your actual values:
| Variable | Placeholder | Description |
|---|---|---|
DATAOS_RUN_AS_USER |
<your-dataos-username> |
Your DataOS user ID |
DATAOS_RUN_AS_APIKEY |
<your-dataos-api-key> |
Your DataOS API key |
HEIMDALL_URL |
<your-dataos-context> |
Your DataOS context URL (e.g., https://my-context.dataos.app/heimdall) |
Generate SSL Certificates for MySQL Wire Protocol (Optional)
If you plan to use the Vulcan MySQL wire protocol service (vulcan-mysql), SSL/TLS certificates are required. Generate them before starting the service:
mkdir docker\ssl
openssl req -x509 -nodes -days 365 -newkey rsa:2048 ^
-keyout docker\ssl\server.key -out docker\ssl\server.crt ^
-subj "/CN=vulcan-mysql"
Start the services:
This command starts the following services:
-
vulcan-api: A REST API server for querying your semantic model (available at
http://localhost:8000) -
vulcan-graphql: A GraphQL interface for querying your semantic layer (available at
http://localhost:3000) -
vulcan-mysql (optional): MySQL wire protocol access to Vulcan for BI tool connectivity (available at
localhost:3307)
Once these services are running, you're ready to create your first project!
Create Your First Project¶
Now that your environment is set up, let's create your first Vulcan project. This section walks you through initializing a project, verifying the setup, running your first plan, and querying your data.
Step 1: Initialize Your Project
Initialize a new Vulcan project: Learn more about init
When prompted:
-
Choose
DEFAULTas the project type -
Select
Postgresas your SQL engine
This command creates a complete project structure with 7 directories:
-
models/- Contains.sqland.pyfiles for your data models -
seeds/- CSV files for static datasets -
audits/- Write logic to assert data quality and block downstream models if checks fail -
tests/- Test files for validating your model logic -
macros/- Write custom macros for reusable SQL patterns -
checks/- Write data quality checks -
semantics/- Semantic layer definitions (measures, dimensions, etc.)
Configure Your Connection
After initialization, verify your config.yaml has the correct connection values. Replace the connection values (host, port, database, user, password) with values that match your actual database setup. For Docker setups, use the service names (warehouse, statestore) as hostnames. For local or remote databases, use the actual hostname or IP address.
Step 2: Verify Your Setup
Check your project configuration and connection status: Learn more about info
This command displays:
-
Connection status to your database
-
Number of models, macros, and other project components
-
Project configuration details
Important: Verify that the setup is correct before proceeding to run plans. If you see any errors, check the Troubleshooting section below.
Step 3: Create and Apply Your First Plan
Generate a plan for your models: Learn more about plan
This command performs three key actions:
- Validates your models and creates the necessary database objects (tables, views, etc.)
- Calculates which data intervals need to be backfilled based on your model's
startdate andcronschedule - Prompts you to apply the plan
When prompted, enter y to apply the plan and backfill your models with historical data.
Note: The backfill process may take a few minutes depending on the amount of historical data to process.
Step 4: Query Your Models
Execute SQL queries against your models: Learn more about fetchdf
This command executes a SQL query and returns the results as a pandas DataFrame.
Step 5: Query Using Semantic Layer
Use Vulcan's semantic layer to query your data: Learn more about transpile
This command transpiles your semantic query into SQL that can be executed against your data warehouse. The semantic layer provides a business-friendly interface for querying your data models.
Step 1: Initialize Your Project
Initialize a new Vulcan project: Learn more about init
When prompted:
-
Choose
DEFAULTas the project type -
Select
Postgresas your SQL engine
This command creates a complete project structure with 7 directories:
-
models/- Contains.sqland.pyfiles for your data models -
seeds/- CSV files for static datasets -
audits/- Write logic to assert data quality and block downstream models if checks fail -
tests/- Test files for validating your model logic -
macros/- Write custom macros for reusable SQL patterns -
checks/- Write data quality checks -
semantics/- Semantic layer definitions (measures, dimensions, etc.)
Configure Your Connection
After initialization, verify your config.yaml has the correct connection values. Replace the connection values (host, port, database, user, password) with values that match your actual database setup. For Docker setups, use the service names (warehouse, statestore) as hostnames. For local or remote databases, use the actual hostname or IP address.
Step 2: Verify Your Setup
Check your project configuration and connection status: Learn more about info
This command displays:
-
Connection status to your database
-
Number of models, macros, and other project components
-
Project configuration details
Important: Verify that the setup is correct before proceeding to run plans. If you see any errors, check the Troubleshooting section below.
Step 3: Create and Apply Your First Plan
Generate a plan for your models: Learn more about plan
This command performs three key actions:
- Validates your models and creates the necessary database objects (tables, views, etc.)
- Calculates which data intervals need to be backfilled based on your model's
startdate andcronschedule - Prompts you to apply the plan
When prompted, enter y to apply the plan and backfill your models with historical data.
Note: The backfill process may take a few minutes depending on the amount of historical data to process.
Step 4: Query Your Models
Execute SQL queries against your models: Learn more about fetchdf
This command executes a SQL query and returns the results as a pandas DataFrame.
Step 5: Query Using Semantic Layer
Use Vulcan's semantic layer to query your data: Learn more about transpile
This command transpiles your semantic query into SQL that can be executed against your data warehouse. The semantic layer provides a business-friendly interface for querying your data models.
Stopping Services¶
When you're done working with Vulcan, you can stop the services to free up system resources. Use the commands below based on your operating system.
Stop All Services
To stop all running services:
Stop and Clean Up (Warning: This deletes all data)
To stop all services and remove volumes (this will delete all data):
Stop Individual Service Groups
You can also stop specific service groups:
Stop All Services
To stop all running services:
Stop and Clean Up (Warning: This deletes all data)
To stop all services and remove volumes (this will delete all data):
Stop Individual Services
To stop only the Vulcan API services:
Troubleshooting¶
If you encounter any issues during setup or while using Vulcan, refer to the solutions below.
Common Issues and Solutions
Services Won't Start
If services fail to start, ensure Docker Desktop is running with at least 4GB RAM allocated. You can check and adjust this in Docker Desktop settings:
-
Mac: Docker Desktop → Settings → Resources → Advanced
-
Windows: Docker Desktop → Settings → Resources → Advanced
Invalid Connection Config Error
If you see an error like:
Error: Invalid 'postgres' connection config:
Field 'host': Input should be a valid string
Field 'user': Input should be a valid string
Field 'password': Input should be a valid string
Field 'port': Input should be a valid integer
Field 'database': Input should be a valid string
This means your config.yaml file is missing or incomplete. You need to create or update your config.yaml file with proper gateway configuration before running vulcan info or other Vulcan commands.
Solution:
-
If you haven't initialized your project yet, run
vulcan initfirst. This creates aconfig.yamlfile with the correct structure. -
If you already have a project, ensure your
config.yamlfile includes agatewayssection with all required connection fields. Here's a minimal example for Postgres:
gateways:
default:
connection:
type: postgres
host: warehouse
port: 5432
database: warehouse
user: vulcan
password: vulcan
state_connection:
type: postgres
host: statestore
port: 5432
database: statestore
user: vulcan
password: vulcan
default_gateway: default
model_defaults:
dialect: postgres
Connection Values
Important: Replace the connection values (host, port, database, user, password) with values that match your actual database setup. For Docker setups, use the service names (warehouse, statestore) as hostnames. For local or remote databases, use the actual hostname or IP address.
See the Configuration Overview for detailed information about gateway configuration.
Network Errors
If you encounter network-related errors, ensure the vulcan Docker network exists:
Port Conflicts
If you see errors about ports already being in use, one of the required ports (5431, 5433, 9000, 9001, or 8000) is likely occupied by another application. You have two options:
- Stop the conflicting application using that port
- Modify the port mappings in the Docker Compose files (
docker/docker-compose.infra.ymlanddocker/docker-compose.warehouse.yml)
Can't Connect to Services
If you're unable to connect to Vulcan services, verify that all required services are running:
docker compose -f docker/docker-compose.infra.yml ps
docker compose -f docker/docker-compose.warehouse.yml ps
All services should show as "Up" or "running". If any service shows as "Exited" or "Stopped", check the logs:
Access MinIO Console
You can access the MinIO console to manage your object storage:
-
URL:
http://localhost:9001 -
Username:
admin -
Password:
password
The MinIO console allows you to browse buckets, upload files, and manage storage policies.
Permission denied
Create a .logs folder manually and change the permission
Next Steps¶
You've set up Vulcan and created your first project. Here are recommended next steps:
-
Learn more about Vulcan CLI commands - Explore all available commands and their options
-
Explore Vulcan concepts - Deep dive into how models work and how to structure your data pipeline
-
Read the model kinds documentation - Understand different model types and when to use them