dbt Core vs dbt Cloud – Key Differences as of 2023
If you've taken an interest in dbt (data build tool) and are on the fence about whether to opt for dbt Cloud or dbt Core, you're in the right place. Perhaps you're already using one of the dbt platforms and are considering a change. Regardless of your current position, understanding the differences of these options is crucial for making an informed decision. In this article, we'll delve deep into the key distinctions between dbt Cloud and dbt Core.
dbt, dbt Core, dbt Cloud: What do they mean?
For those new to the dbt community, navigating the terminology can be a tad confusing. "dbt," "dbt Core," and "dbt Cloud" may sound similar but each represents a different facet of the dbt ecosystem. Let's break it down.
dbt is the generic name for the open-source tool and when people say dbt the features are mainly those in dbt Core. dbt allows users to write, document, and execute SQL-based transformations, making it easier to produce reliable and up-to-date analytics. By facilitating practices like version control, testing, and documentation, dbt enhances the analytics engineering workflow, turning raw data into actionable insights.
Once you decide dbt is right for your organization, the next step is to determine how you'll access dbt. The two most prevalent methods are dbt Core and dbt Cloud. While dbt Cloud offers an enhanced experience with additional features, its abstraction can sometimes limit the desired flexibility and control over the workflow especially when it comes to using dbt with the complexities of an enterprise.
Throughout this article we'll observe that by using dbt Core and incorporating other tools, you can achieve many of the same functionalities as dbt Cloud while maintaining flexibility and control. While this approach offers enhanced flexibility, it consequently introduces increased complexity, maintenance, and an added workload. When adopting a dbt platform it is important to understand the tradeoffs to truly know what will work best for your data team.
What is dbt Core?
dbt Core is an open-source data transformation tool that enables data analysts and engineers to transform and model data to derive business insights. dbt Core is the foundational, open-source version of dbt that provides users with the utmost flexibility. The term "flexible" implies that users have complete autonomy over its implementation, integration, and configuration within their projects.
Even though dbt Core is free, to meet or exceed the functionality of dbt Cloud, it will need to be paired with additional tooling as we will discuss below.These open source solutions may be leverage at no cost, but this increases the platform maintenance overhead and may impact the total cost of ownership and the platform's time to market. Alternatively, managed dbt Core platforms exist, like Datacoves, which simplify this process.
Using and installing dbt Core is done manually. Depending on which data warehouse you are using, you select the appropriate dit adapter such as dit-snowflake, dbt-databricks, dt-redshift, etc. You can see all available dbt adapters on our dbt libraries. If you are using Snowflake you can check out our detailed Snowflake with dbt getting started guide.
Given that you have installed the pre-requisites, installing dbt is just a matter of installing dbt-snowflake.
What is dbt Cloud?
dbt Cloud is a hosted dbt platform to develop and deploy dbt projects. dbt Cloud leverages all the power of dbt Core with some extra features such as a proprietary Web-based UI, a dbt job scheduler, APIs, integration with Continuous Integrations platforms like Github Actions, and a proprietary Semantic layer. dbt Cloud's features are all intended to facilitate the dbt workflow.
dbt cloud pricing has three tiers: Enterprise, Team and Developer. Developer is a free tier meant for a single developer with a hard limit of 3000 model runs per month. The Team Plan pricing starts at $100 per developer for teams up to 8 with 15,000 successful models built per month; any additional models will cost $0.01.
dbt Core vs dbt Cloud: Development - IDE
When it comes to the Integrated Development Environment (IDE), both dbt Cloud and dbt Core present distinct advantages and challenges. Whether you prioritize flexibility, ease of setup, or a blend of both, your choice will influence how your team develops, tests, and schedules your data transformations. Let's explore how each option handles the IDE aspect and the impact on developers and analytic engineers.
In the instance of IDEs, using dbt Core requires setting up a dev environment on each member's device or a virtual space like AWS workspace. This involves installing a popular dbt IDE such as VS Code, dbt Core, connecting to a data warehouse, and handling dependencies like Python versions.
Enterprise dbt setups typically include additional dependencies to enhance productivity. Some notable VS Code extensions for this include dbt Power User, SQLFluff, and the official dbt Snowflake VS Code extension.
When companies are ramping up with dbt, one of the pain points is setting up and managing dbt IDE environments. Analytic Engineers coming to dbt may not be familiar with concepts like version control with git or using the command line. The dbt Cloud IDE simplifies developer onboarding by providing a web-based SQL IDE to team members so they can easily write, test, and refine data transformation code without having to install anything on their computers. Complexities like starting a git branch are tucked behind a big colorful button so users know that is the first step in their new feature journey.
However, Developers who are accustomed to more versatile local IDEs, such as VS Code, may find the dbt Cloud experience limiting as they cannot leverage extensions such as those from the VS Code Marketplace nor can they extend dbt Core using the vast array of Python libraries.
It is possible to get the best of both worlds - the flexibility of dbt Core in VS Code and the quick setup that dbt Cloud Offers - with a Managed dbt Core Platform like Datacoves. In a best-in-class developer setup, new users are onboarded in minutes with simple configuration screens that remove the need to edit text files such as profiles.ymland remove the complexity of creating and managing SSH keys. Version upgrades of dbt or any dependent library should be transparent to users. Spinning up a pristine environment should be a matter of clicks.
dbt Core vs dbt Cloud: Scheduling Jobs
Scheduling in a dbt project is crucial for ensuring timely and consistent data updates. It's the backbone of reliable and up-to-date analytics in a dbt-driven environment.
While an orchestrator does not come out of the box with dbt Core, when setting up a deployment environment companies can leverage any orchestration tool, such as Airflow, Dagster, or Prefect. They can connect steps prior to or after the dbt transformations and they can trigger any tool that exists within or outside the corporate network.
dbt Cloud makes deploying a dbt Core project simple. It allows you to define custom environment variables and the specific dbt commands (seed, run, test) that you want to run during production runs. The dbt Cloud scheduler can be configured to trigger at specific intervals using an intuitive UI.
dbt Cloud is primarily focused on running dbt projects. Therefore, if a data pipeline has more dependencies, an external orchestration tool may be required. Fortunately, if you do use an external orchestrator, dbt Cloud offers an API to trigger dbt Cloud jobs from your orchestrator.
dbt Core vs dbt Cloud: DataOps - Continuous Integration
DataOps emphasizes automating the integration of code changes, ensuring that data transformations are consistently robust and reliable. Both platforms approach CI/CD differently. How seamless is the integration? How does each platform handle tool compatibility?
When using dbt Core for your enterprise data platform, you will need to not only define and configure the automation scripts, but you will also need to ensure that all the components, such as a git server, CI server, CI runners, etc. are all working harmoniously together.
Since dbt Core can be run within the corporate firewall, it can be integrated with any CI tool and internal components such as Jira, Bitbucket, and Jenkins. To do this well, all the project dependencies must be packed into reusable Docker containers. Notifications will also need to be defined across the various components and all of this will take time and money.
dbt Cloud has built in CI capabilities which reduce the need for third party tools. dbt Cloud can also be paired with Continuous Integration (CI) tools like GitHub Actions to validate data transformations before they are added to a production environment. Aspects such as code reviews and approvals will occur in the CI/CD tool of choice such as GitHub and dbt Cloud can report job execution status back to GitHub Actions. This allows teams to know when it is safe to merge changes to their code. One item to note is that each successful model run in your CI run will count against the monthly model runs as outlined in the dbt Cloud pricing.
Companies that have tools like Bitbucket, Jira, and Jenkins within their corporate firewall may find it challenging to integrate with dbt Cloud.
dbt Core vs dbt Cloud: Semantic Layer
A semantic layer helps businesses define important metrics like sales, customer churn, and customer activations with the flexibility to aggregate at run time. These metrics can be referenced by downstream tools as if they had been previously computed. End-users benefit from the flexibility to aggregate metrics at diverse grains without the company incurring the cost of pre-computing every permutation. These on-the-fly pivots ensure consistent and accurate insights across the organization.
dbt Core does not come with a built-in semantic layer, but there are open source and proprietary alternatives that allow you to achieve the same functionality. These include cube.dev, and Lightdash.
dbt Cloud has been rolling out a proprietary semantic layer which is currently in public preview. This feature is only available to dbt pricing plans Team and Enterprise. When using the dbt Cloud semantic layer your BI tool connects to a dbt Cloud proxy server which sits between the BI tool and your Data Warehouse.
dbt’s semantic layer offers a system where metrics are standardized as dbt metadata, visualized in your DAG, and integrated seamlessly with features like the Metadata API and the dbt proxy server.
dbt Core vs dbt Cloud: API
APIs play a crucial role in streamlining dbt operations and enhancing extensibility.
With dbt Core, users often rely on external solutions to integrate specific API functionalities.
Administrative API Alternative: There is currently no feature-to-feature alternative for the dbt Cloud administrative API. However, the Airflow API can be leveraged to enqueue runs for jobs which is a primary feature of the dbt Cloud Administrative API.
Discovery API Alternative: This API was formerly known as the dbt Cloud Metadata API. Tools such as Datahub can provide similar functionality. Datahub can consume dbt Core artifacts such as the manifest.json and expose an API for dbt metadata consumption.
Semantic Layer API Alternative: When it comes to establishing and managing the semantic layer, Cube.dev provides a mature, robust, and comprehensive alternative to the dbt Cloud Semantic layer. Cube also has an API tailored for this purpose.
dbt Cloud offers three APIs. These APIs are available to Team and Enterprise customers.
Administrative API: The dbt Cloud Administrative API is designed primarily for tasks like initiating runs from a job, monitoring the progress of these runs, and retrieving artifacts once the jobs have been executed. dbt Cloud is working on additional functionality for this API, such as operational functions within dbt Cloud.
Discovery API: Whenever you run a project in dbt Cloud, it saves details about that project, such as information about your data models, sources, and how they connect. The Discovery API lets you access and understand this saved information. Use cases include: performance, quality, discovery, governance and development.
Semantic Layer API: The dbt Semantic Layer API provides a way for users to interact with their data using a JDBC driver. By using this API, you can easily query metrics values from your data and get insights.
Examining the differences between dbt Core and dbt Cloud reveals that both can lead organizations to similar results. Much of what dbt Cloud offers can be replicated with dbt Core when combined with appropriate additional tools. While this might introduce some complexities, the increased control and flexibility might justify the trade-offs for certain organizations. Thus, when deciding between the two, it's a matter of prioritizing simplicity versus adaptability for the team.
The Datacoves platform simplifies the dbt Core experience and retains its inherent flexibility. It effectively bridges the gap, capturing many benefits of dbt Cloud while mitigating the challenges tied to a pure dbt Core setup. More information on Datacoves is available on our product page.
Looking for an enterprise data platform?
Datacoves offers managed dbt core and Airflow and can be deployed in your private cloud.