dbt Cloud vs dbt Core for your enterprise data management platform
Since its inception, dbt has served the needs of small to mid-sized companies to transform data into actionable insights. Now as larger enterprises look to adopt dbt, they must consider the unique complexities of large enterprises. These include integrating with existing systems behind a corporate firewall, orchestrating end-to-end data pipelines, and implementing DataOps processes to transform how teams collaborate across the organization.
This article compares three popular dbt development and deployment options: leveraging dbt Cloud, using dbt Core alone, or using a dbt Core managed platform other than dbt Cloud. To learn more about specific dbt Cloud features and how to implement them using dbt Core using other open-source tools checkout dbt Core vs dbt Cloud - Key Differences.
dbt Cloud for your enterprise data management platform
dbt Cloud is a hosted environment to develop and deploy dbt Core projects. dbt Cloud leverages dbt Core, the open-source data transformation framework we all know as dbt. dbt Cloud pricing starts at $100 per developer for teams or if you are a single developer, they offer a free single seat with limited features.
dbt Cloud IDE for dbt-core development
When companies are ramping up with dbt, one of the hardest parts is setting up and managing dbt IDE environments. Analytic Engineers coming to dbt may not be familiar with concepts like version control with git or using the command line. The dbt Cloud IDE simplifies developer onboarding by providing a web-based SQL IDE to team members so they can easily write, test, and refine data transformation code without having to install anything on their computers. Complexities like starting a git branch are tucked behind a big colorful button so users know that is the first step in their new feature journey.
Developers who are accustomed to more versatile local IDEs, such as VS Code, may find the dbt Cloud experience limiting as they cannot leverage extensions such as those from the VS Code Marketplace nor can they extend dbt Core using the vast array of Python libraries. An example of a common library employed in the dbt community is SQLFluff. With SQLFluff, teams can ensure a minimum level of code quality is achieved across developers.
dbt Cloud for DataOps
DataOps is a set of practices that combines data management and software engineering to improve the quality, speed, and collaboration of data analytics teams. In an enterprise environment, implementing DataOps processes early on in a project's lifecycle is crucial.
dbt Cloud can be paired with Continuous Integration (CI) tools like GitHub Actions to validate data transformations before they are added to a production environment.
If CI integration checks include things like SQLFluff and dbt-checkpoint, there is currently no way for users to run these validations in dbt Cloud prior to executing the automated CI checks in GitHub Actions. This may lead to a lot of back and forth between the GitHub Actions run failures and the dbt Cloud IDE to fix and commit the changes. Aspects such as code reviews and approvals will occur in the CI/CD tool of choice such as GitHub and dbt Cloud can report job execution status back to GitHub Actions. This allows teams to know when it is safe to merge changes to their code.
Companies that have tools like Bitbucket, Jira, and Jenkins within their corporate firewall may find it challenging to integrate with dbt Cloud.
dbt Cloud for deployment
dbt Cloud makes deploying a dbt Core project simple. It allows you to define custom environment variables and the specific dbt commands (seed, run, test) that you want to run during production runs. The dbt Cloud scheduler can be configured to trigger at specific intervals using an intuitive UI.
dbt Cloud is primarily focused on running dbt projects. Therefore if a data pipeline has more dependencies, an external orchestration tool may be required. Cases where this is necessary include when you need to connect the Extract and Load step to the Transformation happening in dbt. Without connecting these steps, you have no assurance that the load is complete before the transformation starts. This can be mitigated with freshness checks, but without an explicit connection between the load step and the transformation, you may be delaying transformation unnecessarily.
Fortunately, if you do use an external orchestrator, dbt Cloud offers an API to trigger dbt Cloud jobs from your orchestrator. Another reason an external orchestrator may be needed in an enterprise is when there are additional steps to run after the dbt transformation. These may include sending data to downstream BI tools, marketing campaign tools, MDM systems, or refreshing machine learning models. dbt Cloud recently announced a webhooks API that would allow for triggering steps after transformation. However, if these endpoints reside within a corporate firewall, this may not be an option and an orchestrator like Airflow may be necessary.
dbt Cloud offers industry-standard SaaS security appropriate for any enterprise. At the enterprise level, it offers Single Sign-On and Role Based Access Controls. Enterprises can also choose to have dbt Cloud provisioned on a dbt Labs managed private cloud account. dbt Cloud is only offered as a managed SaaS solution on AWS and thus cannot be deployed to a company's existing private cloud account.
Who should consider using dbt Cloud
dbt Cloud is especially attractive for organizations with limited in-house technical resources or those seeking to minimize the time and effort spent on initial setup, maintenance, and upgrades. dbt Cloud runs in the browser and thus removes the need for manual installation and setup of dbt components, allowing your team to focus on data transformation tasks. dbt Cloud handles updates and upgrades for you, so it is simple to stay up-to-date and secure. dbt Cloud includes built-in scheduling capabilities, allowing you to automate dbt runs and testing without relying on external schedulers or orchestration tools.
dbt Cloud may introduce additional complexities in large enterprises with more complex orchestration needs or those that need to host the environment within their private cloud. Finally, the dbt Cloud IDE may feel constraining for users who prefer a VS Code environment.
All these features can in turn reduce the time to value in setting up an end-to-end enterprise data management platform. It is possible to start out with a simple setup, but inevitably not accounting for the breadth of the entire platform may create technical debt and cause rework in the future.
Pros and cons of using dbt Cloud
- Simplified setup and management of dbt environment
- Enhanced collaboration capabilities for data teams
- Integrated dbt job scheduling and monitoring
- Managed upgrades and updates
- Web-based IDE for streamlined development
- Metadata API
- Reliance on a third-party platform
- May be more expensive for larger teams or organizations with complex requirements
- Not able to deploy within the corporate's private cloud
- Unable to leverage VS Code extensions and other Python libraries
- May require additional components or additional SaaS solutions for complex orchestrations
dbt Core for your enterprise data management platform
dbt Core is the heart of dbt and since it is open source, anyone can use it freely. That being said, an enterprise data management platform must account for additional requirements such as the development experience, deployment and scheduling dbt runs, and creating a DataOps process. This will impact the total cost of ownership and the platform's time to market.
IDE for dbt Core development
Leveraging dbt Core allows for ultimate flexibility, but it involves setting up a development environment on each team member's local machine or on a virtual environment such as an AWS workspace. Installation includes installing a popular dbt IDE like VS Code, installing dbt Core, configuring a connection to the data warehouse, and managing additional dependencies such asa compatible version of Python.
While the overall installation process for dbt is similar for both Windows and Mac users, there are some differences and potential gotchas to be aware of. Additionally, most organizations provide users with pre-configured computers and these systems may be delivered with software that conflicts with dbt Core. This complexity can lead to frustration and delays and may require assistance from senior members of the team, pulling them away from value-added work.
Once everything is installed, configuring dbt to connect to a data warehouse can also introduce complexity before a new team member can begin familiarizing themselves with the organization's dbt project structure and coding practices.
An enterprise setup of dbt should also include components that improve developer code quality and productivity. Popular IDEs like VS Code include a vast array of extensions that also improve the development experience. These include dbt Power User, VS Code extension for SQLFluff, and the dbt Snowflake extension.
Configuring a development environment can take anywhere from a few days to as long as a week. Depending on the person’s technical aptitude and experience with dbt. To improve this process, companies spend weeks or months dialing in the installation process and documenting it in knowledge base tools like Confluence or Notion.
A development environment is not evergreen, it must be upgraded as new versions of dbt Core are released for added functionality or to fix bugs. Upgrading can be as error-prone as new installations and some companies opt not to keep their enterprise data platform current. This, however, is ill-advised because this type of technical debt only gets harder to remediate over time.
dbt Core for DataOps
When using dbt Core for your enterprise data platform, you will need to not only define and configure the automation scripts, but you will also need to ensure that all the components, such as a git server, CI server, CI runners, etc. are all working harmoniously together.
Since dbt Core runs within the corporate firewall, it can be integrated with any CI tool and internal components such as Jira, Bitbucket, and Jenkins. This flexibility comes at a price. To do this well, all the project dependencies must be packed into reusable Docker containers. Notifications will also need to be defined across the various components and all of this will take time and money.
dbt Core for deployment
When setting up a deployment environment, companies using dbt Core can leverage any orchestration tool, such as Airflow. They can connect steps prior to or after the dbt transformations and they can trigger any tool that exists within or outside the corporate network.
However, scaling tools like Airflow is not straightforward and may require knowledge of additional technologies like Kubernetes. By default, Airflow uses Python files and therefore, the team working with these orchestration scripts will need to also know Python on top of SQL.
A mature enterprise platform will also require multiple Airflow environments for development, testing, and production, adding complexity and increasing the total cost of ownership.
dbt Core extensibility
Using dbt Core is inevitably the most flexible option, which is why many organizations choose to go this route. Companies can run dbt Core and leverage leading code editors like VS Code. They can install any additional Python dependencies, including running any proprietary code that is only available within the corporate firewall. They can also take advantage of the ever-growing VS Code extensions such as those for Snowflake or Databricks.
Teams using dbt Core can send notifications to Slack, MSTeams, or any other platform as long as they develop and maintain these solutions.
However, this extensibility will only increase the platform's total cost of ownership and the time it takes to deploy the end-to-end platform.
dbt Core security
One reason some enterprises choose to use dbt Core is that they are not able to leverage SaaS solutions due to compliance or data privacy requirements such as GDPR. By leveraging dbt Core, they can achieve ultimate security as dbt Core would run in and comply with corporate network policies.
Running dbt Core within the corporate firewall also eliminates the need for lengthy risk assessments. However, ensuring that security patches are applied is left to the organization. Managing Single Sign-on with tools like Airflow and setting up Role Based Access Controls are also things that will have to be solved when using dbt Core as a dbt enterprise data management platform.
Who should consider using dbt Core
A lot of the power and flexibility with using dbt Core comes from being able to customize the entire dbt experience. However, all of this comes with the expense of having to build a data management platform. It is not uncommon for organizations to take six months or up to a year to set up and test their custom dbt enterprise data management platform. They will also likely need to hire external consultants who have worked with a myriad of technologies and understand how to integrate them.
Building a platform is not for the faint of heart and since it is something custom to the organization, it will also increase the total cost of ownership. Some organizations get the basics in place just to make their teams productive, but eventually, they find that without proper design and implementation, they cannot take full advantage of the value of dbt and the rest of the modern data stack.
Using dbt Core as a dbt enterprise data platform can be a cost-effective option for smaller teams with limited requirements and budgets. It can also be good for those trying out dbt. Large organizations with stringent security requirements of sensitive data and strong technical platform teams may also leverage this option as it allows them to deploy in a way that is compliant with corporate policies.
As platform complexity grows, so does the cost and time needed to implement the end-to-end platform. Larger organizations should consider whether developing something custom is better than using a managed solution or taking shortcuts that can reduce the value of dbt and the modern data stack.
Pros and cons of using dbt Core
- Full control over the dbt environment and configurations
- No reliance on third-party services
- No subscription costs
- Able to meet corporate security requirements by installing within the corporate firewall
- Requires technical expertise in defining the installation steps and documenting them
- May require hands-on involvement during setup and upgrades
- It can be time-consuming and challenging to maintain consistency across team members
- Upgrades and dependency management may be more complex compared to managed solutions
- Resolving installation issues can consume senior team members' time
- Total cost of ownership of the platform may be higher than using a managed solution
- Additional expertise will be needed to set up all the required technologies
- It will take longer to implement than a managed solution
Managed dbt Core for your enterprise data management platform
We have seen that dbt Cloud is a great dbt Core managed platform, but it has some limitations for large or complex deployments. On the other hand, using dbt Core alone gives organizations ultimate flexibility, but it comes at an added expense in terms of the time and cost that it takes to stand up and maintain a custom, one-of, enterprise data management platform.
What if you could combine the simplicity of dbt Cloud with the flexibility of dbt Core? A managed dbt Core platform, like Datacoves, can combine some of the best of both worlds. However, not all managed platforms are the same, below we discuss what to look for when considering a managed dbt Core enterprise data platform.
Managed VS Code as a dbt IDE for development
By far, the tool most of the dbt enterprise users will interact with is the IDE. When considering a managed dbt Core platform, consider the developer experience. Will they have VS Code in the browser? Will you get a curated, best in class dbt IDE like VS Code with integrated libraries like SQLFluff and extensions that simplify the developer workflow? When required, will you be able to customize the developer environment with company specific Python libraries and extensions?
In a best-in-class developer setup, new users are onboarded in minutes with simple configuration screens that remove the need to edit text files such as profiles.yml and remove the complexity of creating and managing SSH keys. Version upgrades of dbt or any dependent library should be transparent to users.
Some platforms will also have added flexibility and let you leverage VS Code for non-dbt development tasks, such as opening a Jupyter notebook environment or a Python development environment with libraries such as Python Black.
Spinning up a pristine environment should be a matter of clicks. To test new dbt versions, you should be able to run your project in an isolated environment before switching all developers to the new version of dbt.
Managed dbt Core for DataOps
The advantage of a managed dbt Core platform when it comes to DataOps is that you do not have to worry about creating Docker images with all the dependencies you need as you perform your Continuous integration tests. Working with a partner that has experience with enterprise CI/CD platforms like Jenkins will also accelerate your time to value. A knowledgeable dbt Core managed platform partner will know best practices and guide you in setting up a world-class workflow that will scale with your enterprise. Developers should also be able to execute CI validations locally before pushing their changes to GitHub, thus reducing the back and forth to fix issues during the CI checks.
Some managed dbt Core platforms can run within your corporate firewall, allowing you to leverage existing tools like Jira, Bitbucket, Jenkins, or Gitlab. All this simplifies and accelerates the rollout of a dbt data management platform.
Managed dbt Core for deployment
Enterprises are complex, and as such, they must consider a flexible orchestration tool that has wide adoption, documentation and can scale to the needs of the company.
A mature enterprise dbt Core platform will include a full-fledged orchestrator such as Airflow. With Airflow, organizations can orchestrate pipelines that include Extract, Load, Transform, and Activation steps. Other aspects to consider is whether Airflow jobs can scale by leveraging technologies such as Kubernetes and whether it will be simple to send notifications to enterprise tools such as MS Teams.
Some dbt Core platforms that leverage dbt have also adopted mechanisms that simplify Airflow DAG creation by letting users create them using yml files vs creating traditional Python-based DAGs.
A complete enterprise data management platform will allow you to easily create multiple Airflow environments for development, testing, and production. It will also eliminate the burden of having to create Docker images with your dbt dependencies.
Managed dbt Core extensibility
An enterprise dbt Core platform will streamline the process of ramping up with dbt and it will offer the organization the flexibility it requires. This may include triggering AWS Glue jobs as well as Fivetran jobs. It may include triggering internal and external APIs or it may require a pre-configured development environment with Python libraries like Snowpark or Streamsets.
Managed dbt Core security
When it comes to security, different types of industries have different requirements. Regulated industries in Life Sciences or Finance may not want to or and unable to use SaaS solutions, so being able to deploy a dbt enterprise data management platform within the corporate firewall is a must.
Integrating with SSO solutions like Okta or PingID is also a must, and managing user's authorization using existing mechanisms like Active Directory simplifies compliance with corporate policies.
Another area that may be overlooked when using open-source solutions is validating that libraries are up to date and there are no vulnerabilities introduced into the network. A robust dbt Core managed platform will be able to leverage internal registries like Artifactory, which are set up to scan for these issues.
Finally, at enterprise scale, the dbt Core managed platform must have robust Role Based Access Controls to assure users have the appropriate level of access.
Who should consider using a managed dbt Core platform
Just like dbt Cloud, a reliable enterprise dbt Core data management platform will reduce the time to value. Teams will not have to figure out how to host and configure the multitude of tools and libraries and they will have a starting point for implementing best practices at the start of the project.
By eliminating the need to stand up a platform, teams can focus on change management and new ways of working that will have the biggest impact on the organization. The total cost of ownership will be lower for the company since they do not have to build and constantly maintain their one-of platform. In contrast, they will benefit by working with a partner that works with other large organizations and looks for ways to improve and enhance the platform for everyone.
A managed dbt Core enterprise data platform is necessary for organizations that are not in the business of creating and managing complex technologies and those who need the flexibility that comes with complex enterprise tools and processes. When dbt Core must be deployed within the corporate firewall, an enterprise managed dbt data platform can also be a good option for companies with stringent security requirements to protect sensitive data.
Smaller organizations that want to leverage a managed VS Code environment or who want to implement DataOps from the outset can also realize the benefits of a SaaS managed dbt Core data platform.
A managed dbt Core enterprise data platform gives companies the simplicity of dbt Cloud with the peace of mind of dbt Core. Best of all, if the platform leverages open-source technologies, there is no vendor lock-in.
Pros and Cons of using a managed dbt Core platform
- Simplified developer onboarding
- Robust IDE such as VS Code
- Ability to configure a dbt environment
- Managed orchestrator such as Airflow
- Able to meet corporate security requirements by installing within the corporate firewall
- Requires a lot less technical expertise to set up and run
- Simplified dependency management across the development environment, CI/CD worker, and Airflow
- Lower total cost of ownership
- Unable to have full control over the platform
- Reliance on third-party
- Licensing costs
Selecting between dbt Cloud, dbt Core, or a managed dbt Core enterprise platform comes down to the internal experience within an organization and their desire to focus on data vs developing and maintaining a custom dbt platform. The beauty of the dbt ecosystem is that the components are freely available and with enough time and money, anyone can do what they need. However, enterprises that want to move fast and leverage the full potential of dbt, should consider a managed dbt Core enterprise platform such as Datacoves to reduce the total cost of ownership and accelerate time to market.
Looking for an enterprise data platform?
Datacoves offers managed dbt core and Airflow and can be deployed in your private cloud.