Custom materializations¶
Vulcan comes with a variety of model kinds that handle the most common ways to evaluate and materialize your data transformations. But what if you need something different?
Sometimes, your specific use case doesn't quite fit any of the built-in model kinds. Maybe you need custom logic for how data gets inserted, or you want to implement a materialization strategy that's unique to your workflow. That's where custom materializations come in, they let you write your own Python code to control exactly how your models get materialized.
Advanced Feature
Custom materializations replace Vulcan's built-in DDL/DML for a model kind. Reach for one only after you've ruled out the standard kinds: most workloads fit FULL, INCREMENTAL_BY_TIME_RANGE, INCREMENTAL_BY_UNIQUE_KEY, or INCREMENTAL_BY_PARTITION. If a built-in kind is almost what you need, file an issue first; an improvement there helps everyone.
What is a materialization?¶
A materialization is the "how" behind your model execution. When Vulcan runs a model, it needs to figure out how to get that data into your database. The materialization is the set of methods that handle executing your transformation logic and managing the resulting data.
Some materializations are straightforward. For example, a FULL model kind completely replaces the table each time it runs, so its materialization is essentially just CREATE OR REPLACE TABLE [name] AS [your query].
Other materializations are more complex. An INCREMENTAL_BY_TIME_RANGE model needs to figure out which time intervals to process, query only that data, and then merge it into the existing table. That requires more logic.
The materialization logic can also vary by SQL engine. PostgreSQL doesn't support CREATE OR REPLACE TABLE, so FULL models on Postgres use DROP then CREATE instead. Vulcan handles all these engine-specific details for built-in model kinds, but with custom materializations, you're in control.
How custom materializations work¶
Custom materializations are like creating your own model kind. You define them in Python, give them a name, and then reference that name in your model's MODEL block. They can accept configuration arguments that you pass in from your model definition.
Here's what every custom materialization needs:
-
Python code: Written as a Python class
-
Base class: Must inherit from Vulcan's
CustomMaterializationclass -
Insert method: At minimum, you need to implement the
insertmethod -
Auto-loading: Vulcan automatically discovers materializations in your
materializations/directory
You can also:
-
Override other methods from
MaterializableStrategyorEngineAdapterclasses -
Execute arbitrary SQL using the engine adapter
-
Perform Python processing with Pandas or other libraries (though for most cases, you'd want that logic in a Python model instead)
Vulcan will automatically load any Python files in your project's materializations/ directory. Or, if you prefer, you can package your materialization as a Python package and install it like any other dependency.
Creating a custom materialization¶
To create a custom materialization, just add a .py file to your project's materializations/ folder. Vulcan will automatically import all Python modules in this folder when your project loads, so your materializations will be ready to use.
Your materialization class needs to inherit from CustomMaterialization and implement at least the insert method. Let's look at some examples to see how this works.
Simple example¶
Here's a complete example that shows custom insert logic with some helpful logging:
Let's break down what's happening here:
| Component | What It Does |
|---|---|
NAME |
The identifier you'll use in your model definition (like simple_custom) |
table_name |
The target table where your data will be inserted |
query_or_df |
Either a SQL query string or a DataFrame (works with Pandas, PySpark, Snowpark) |
model |
The full model definition object, gives you access to all model properties |
is_first_insert |
True if this is the first time inserting data for this model version |
render_kwargs |
Dictionary of arguments used to render the model query |
self.adapter |
The engine adapter, your interface to execute SQL and interact with the database |
Minimal example¶
If you just want a simple full-refresh materialization, here's the minimal version:
That's it! This will completely replace the table contents each time the model runs, just like a FULL model kind.
Controlling table creation and deletion¶
You can also customize how tables and views are created and deleted by overriding the create and delete methods:
This gives you full control over the lifecycle of your data objects.
Using a custom materialization¶
To use the materialization, set the model's kind to CUSTOM and pass the class NAME as materialization:
Passing properties to the materialization¶
You can pass configuration to your materialization using materialization_properties. This is useful when you want to customize behavior per model:
Then access these properties in your materialization code via model.custom_materialization_properties:
This lets you create flexible materializations that can adapt to different use cases.
Extending CustomKind¶
Warning
This subclasses Vulcan internals, which means more surface area to maintain. If the standard Materialization subclass works for you, stay there. Only use this when you need to validate custom properties before any database connection happens.
The standard approach works for most cases. The reason to subclass CustomKind is when you need to validate or coerce custom properties before Vulcan opens any database connection, or you need a property to be present (and the right type) at parse time rather than at runtime.
In those cases, you can create a subclass of CustomKind that Vulcan will use instead of the default. When your project loads, Vulcan will detect your subclass and use it instead of the standard CustomKind.
Creating a custom kind¶
Here's how you'd create a custom kind that validates a primary_key property:
Using the custom kind in a model¶
Use it in your model like this:
Linking to your materialization¶
To connect your custom kind to your materialization, specify it as a generic type parameter:
When Vulcan loads your materialization, it inspects the type signature for generic parameters that are subclasses of CustomKind. If it finds one, it uses your subclass when building model.kind instead of the default.
Why would you want this? Two main benefits:
-
Early validation: Your
primary_keyvalidation happens at load time, not evaluation time. Issues get caught before you even create a plan. -
Type safety:
model.kindresolves to your custom kind object, so you get access to extra properties without additional validation.
Sharing custom materializations¶
Two ways to share a materialization across projects:
Copying files¶
The simplest approach is to copy the materialization code into each project's materializations/ directory. It works, but it's not the most maintainable approach, you'll need to manually update each copy when you make changes.
If you go this route, we strongly recommend keeping the materialization code in version control and setting up a reliable way to notify users when updates are available.
Python packaging¶
Packaging the materialization as a Python package solves the copy-paste problem and also covers the case where the scheduler (Airflow, etc.) runs on machines that don't have access to the project's materializations/ directory.
Package your materialization using setuptools entrypoints:
Once the package is installed, Vulcan automatically discovers and loads your materialization from the entrypoint list. No manual configuration needed!