Transform Data

Ultimate dbt Jinja Functions Cheat Sheet

Last Updated:

October 4, 2023

October 4, 2023

Mayra Peña
Linkedin

Data Engineering | Technical Customer Success.
Solving enterprise data challenges quickly with dbt & Airflow.

Jinja is the game changing feature of dbt Core that allows us to create dynamic SQL code. In addition to the standard Jinja library, dbt Core includes additional functions and variables to make working with dbt even more powerful out of the box.

See our original post, The Ultimate dbt Jinja Cheat Sheet, to get started with Jinja fundamentals like syntax, variable assignment, looping and more. Then dive into the information below which covers Jinja additions added by dbt Core.

This cheatsheet references the extra functions, macros, filters, variables and context methods that are specific to dbt Core.

Enjoy!

dbt Core: Functions

These pre-defined dbt Jinja functions are essential to the dbt workflow by allowing you to reference models and sources, write documentation, print, write to the log and more.

dbt Core: Macros

These macros are provided in dbt Core to improve the dbt workflow.

dbt Core: Filters

These dbt Jinja filters are used to modify data types.

dbt Core: Project context variables

These dbt core "variables" such as config, target, source, and others contain values that provide contextual information about your dbt project and configuration settings. They are typically used for accessing and customizing the behavior of your dbt models based on project-specific and environment-specific information.

Project Context Variables
adapters
dbt uses the adapter to communicate with the database. Setup correctly using the Database specific adapter i.e Snowflake, RedShift. The adapter has methods that will be translated into SQL statements specific to your database.
builtins The builtins variable is a dictionary containing keys for dbt context methods ref, source, and config.
{% macro ref(modelname) %}
    {% set db_name = builtins.ref(modelname).database | lower %}}
    {% if db_name.startswith('staging') or 
        db_name.endswith('staging')  %}
        {{ return(builtins.ref(modelname).include(database=false)) }}
    {% else %}
        {{ return(builtins.ref(modelname)) }}
    {% endif %}
{% endmacro %}
config
The config variable allows you to get configuration set for dbt models and set constraints that assure certain configurations must exist for a given model:

config.get('<config_key>', default='value'): Fetches a configuration named <config_key> for example "materialization". - If no default is provided or the configuration item is not defined, it this will return None
{%- set unique_key = config.get('unique_key', default='id') -%}
config.require("<config_key>"): Strictly requires a key named <config_key> is defined in the configuration. - Throws error if not set.
{%- set unique_key = config.require('unique_key') -%}
config.get and config.require are commonly seen in the context of custom materializations, however, they can be used in other macros, if those macros are used within a model, seed, or snapshot context and they have the relevant configurations set.
dbt_version
The dbt_version variable is helpful for debugging by returning the installed dbt version. This is the dbt version running not what you define in your project. e.g. If you make a project with dbt 1.3 and run it on another machine with dbt 1.6, this will say 1.6
execute
The execute variable is set to True when dbt runs SQL against the databse such as when executing dbt run. When running dbt commands such as dbt compile where dbt parses your project but no SQL is run against the database execute is set to False.

This variable is helpful when your Jinja is relying on a result from the database. Wrap the jinja in an if statement.
{% set payment_method_query %}
  SELECT DISTINCT
    payment_method
  FROM {{ ref('raw_payments') }}
  ORDER BY 1
{% endset %}

{% set payment_methods = run_query(payment_method_query) %}

{% if execute %}
  {# Extract the payment methods from the query results #}
  {% set payment_methods_list = payment_methods.columns[0].values() %}
{% else %}
  {% set payment_methods_list = [] %}
{% endif %}
flags This variable holds the value of the flags passed in the command line such as FULL_REFRESH, STORE_FAILURES, and WHICH(compile, run, build, run-operation, test, and show). It allows you to set logic based on run modes and run hooks based on current commands/type.
{% if flags.STORE_FAILURES %}
--your logic
{% else %}
--other logic
{% endif %}
graph
The graph variable is dictionary that hold the information about the nodes(Models, Sources, Tests, Snapshots) in your dbt project.
model
This is the graph object of the current model. This object allows you to access the contents of a model, model structure and JSON schema, config settings, and the path to the model.
modules This variable contains Python Modules for working with data including:datetime, pytz, re, and itertools.
modules.<desired_module>.<desired_function> 
{% set now = modules.datetime.datetime.now() %}
project_name
This variable returns the name for the root-level project that is being run.
target
This variable contains information about your dbt profile target, such as your warehouse connection information.

Use the dot notation to access more such as:
target.name or target.schema or target.database, ect

dbt Core: Run context variables

These special variables provide information about the current context in which your dbt code is running, such as the model, schema, or project name.

Run Context Variablels
database_schemas
Only available in the context for on-run-end. This variable allows you to reference the databases and schemas. Useful if using multiple different databases
invocation_id This function outputs a UUID every time you run or compile your dbt project. It is useful for auditing. You may access it in the query-comment, info dictionary in events and logs, and in the metadata dictionary in dbt artifacts.
results
Only available in the context for on-run-end. This variable contains a list of Results Objects. Allows access to the information populated in run results JSON artifact.
run_started_at
This variable outputs a timestamp for the start time of a dbt run and defaults to UTC. It Is a python datetime object. Use standard strftime formatting.
select
    '{{ run_started_at.strftime("%Y-%m-%d") }}' as run_start_utc
schemas
Only available in the context for on-run-end. This variable allows you to reference a list of schemas for models built during a dbt run. Useful for granting privileges.
selected_resources
This variable allows you to access a list of selected nodes from the current dbt command.
The items in the list depend on the parameters of —select, —exclude,—selector.
this
{{ this }} is the database representation of the current model. Use the dot notation to access more properties such as: {{ this.database }} and {{ this.schema }}.
thread_id The thread_id is a unique identifier assigned to the Python threads that are actively executing tasks, such as running code nodes in dbt. It typically takes the form of names like "Thread-1," "Thread-2," and so on, distinguishing different threads.

dbt Core: Context methods

These methods allow you to retrieve information about models, columns, or other elements of your project.

Context Methods
set(value, default) Allows you to use the python method set(), which converts an iterable into a unique set of values. This is NOT the same as the jinja expression set which is used to assign a value to a variable. This will return none if the value entered is not an iterable. In the example below both the python set() method and the jinja set expression are used to remove a duplicate element in the list.
{% set my_list = ['a', 'b', 'b', 'c'] %}
{% set my_set = set(my_list) %}
{% do print(my_set) %}


{# evaluates to {'a','b', 'c'} #}
set_strict(value) Same as the set method above however it will raise a TypeError if the entered value is not a valid iterable.
{% set my_set = set_strict(my_list) %}
exceptions Is used to raise errors and warnings in a dbt run:

raise_compiler_error will raise an error, print out the message. Model will FAIL.
exceptions.raise_compiler_error("Custom Error message")
warn will raise a compiler warning and print out the set method. Model will still PASS.
exceptions.warn("Custom Warning message") 
fromjson(string, default) Is used to deserialize a JSON string into a python dict or list.
fromyaml(string, default) Is used to deserialize a YAML string into a python dict or list.
tojson(value, default) Serializes a Python dict or list to a JSON string.
toyaml(value, default ) Serializes a Python dict or list to a YAML string.
local_md5
This variable locally creates an MD5 hash of the given string.
{%- set hashed_value = local_md5(“String to hash”) -%}
select 
     '{{ hashed_value }}' as my_hashed_value
zip(*args, default) This method is allows you to combine any number of iterables.
{% set my_list_a = [1, 2] %}
{% set my_list_b = ['Data', 'Coves'] %}
{% set my_zip = zip(my_list_a, my_list_b) | list %}
{% do print(my_zip) %}  

{# Result [(1, 'Data'), (2, 'Coves')] #}
zip_strict(value) Same as the zip method but will raise a TypeError if one of the given values are not a valid iterable.
{% set my_list_a = 123 %}
{% set my_list_b = 'Datacoves' %}
{% set my_zip = zip_strict(my_list_a, my_list_b) | list %}
{# This will fail #}

Please contact us with any errors or suggestions.

Author:

Mayra Peña

Looking for an enterprise data platform?

Datacoves offers managed dbt core and Airflow and can be deployed in your private cloud.

LEARN MORE

Table of Contents