Datacoves blog

Learn more about dbt Core, ELT processes, DataOps,
modern data stacks, and team alignment by exploring our blog.
Build vs buy Data Platform
dbt alternatives
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Comparing cooking to data solutions you can trust
5 mins read

In 3 Core Pillars to a Data-Driven Culture, I discussed the reasons why decision makers don’t trust analytics. I then outlined the alignment and change management aspect to any solution. Once you know what you want, how do you deliver it? The cloud revolution has brought in a new set of challenges for organizations which have nothing to do with delivering solutions. The main problem is that people are faced with a Cheesecake Factory menu and most people would be better served with Omakase.

For those who may not be aware, The Cheesecake Factory menu has 23 pages and over 250 items to choose from. There are obviously people who want the variety and there is certainly nothing wrong with that, but my best meals have been where I have left the decision to the chef.

Omakase, in a Japanese restaurant is a meal consisting of dishes selected by the chef, it literally means “I'll leave it up to you”

Omakase leaves the decision to the chef

How does this relate to the analytics landscape? Well, there is a gold rush in the analytics space. There is a lot of investment and there are literally hundreds of tools to choose from. I have been following this development over the last five years and if anything, the introduction of tools has accelerated.

This eye chart represents the ever growing list of analytics tools

Most people are where I was back in 2016. While I have been doing work in this space for many years the cloud and big data space was all new to me. There was a lot I needed to learn and I was always questioning whether I was making the right decision. I know many people today who do POC after POC to see which tool will work the best, I know, I did the same thing.

Contrast this process with my experience learning a web development framework called Ruby on Rails. When I started learning Rails in 2009 I was focused on what I was trying to build, not the set of tools and libraries that are needed to create a modern web application. That’s because Rails is Omakase.

When you select Omakase in Rails you are trusting many people with years of experience and training to share that knowledge with you. Not only does this help you get going faster, but it also brings you into a community of like-minded people. So that when you run into problems, there are people ready to help. Below I present my opinionated view of a three-course meal data stack that can serve most people and the rationale behind it. This solution may not be perfect for everyone, but neither is Rails. 

Appetizer: Loading data

You are hungry to get going and start doing analysis, but we need to start off slowly. You want to get the data, but where do you start. Well, there are a few things to consider.

·     Where is the data coming from?

·     Is it structured into columns and rows or is it semi-structured(JSON)?

·     Is it coming in at high velocity?

·     How much data are you expecting?

What I find is that many people want to over engineer a solution or focus on optimizing for one dimension which is usually cost since that is simple to grasp. The problem is that if you focus only on cost, you are giving up something else, usually a better user experience. You don’t have a lot of time to evaluate solutions and build extract and load scripts, so let me make this simple. If you start with Snowflake as your database and Fivetran as your Extract and Load solution, you’ll be fine. Yes, there are reasons why not to choose those solutions, but you probably don’t need to worry about them, especially if you are starting out and you are not Apple.

Why Snowflake you ask? Well, I have used Redshift, MS SQLServer, Databricks, Hadoop, Teradata, and others, but when I started using Snowflake I felt like a weight was lifted. It “just worked.” Do you think you will need to mask some data at some point? They have dynamic data masking. Do you want to be able to scale compute or storage independently? They have separate compute and storage too. Do you like waiting for data vendors to extract data from their system and then having to import it on your side? Or do you need to collaborate with partners and send them data? Well,Snowflake has a way for companies to share data securely, gone are the days of moving data around, now you can securely grant access to groups within or outside your organization, simple, elegant. What about enriching your data with external data sources? Well, they have a data marketplace too and this is bound to grow. Security is well thought out too and you can tell they are focused on the user experience because they do things to improve analyst happiness like MATCH_RECOGNIZE. Oh, and they also handle structured and semi-structured data amazingly well and all without having to tweak endless knobs. With one solution I have been able to eliminate the need to answer the questions above because Snowflake can very likely handle your use case regardless of the answer. I can go on and on, but trust me, you’ll be satisfied with your Snowflake appetizer. If it’s good enough for Warren Buffett, it’s good enough for me.

But what about Fivetran you say? Well, because you have better things to do than to replicate data from Google Analytics, Salesforce, Square, Concur, Workday, Google Ads, etc. etc. Here’s the full list of current connectors Fivetran supports. Just set it and forget it. No one will give you a metal for mapping data from standard data sources to Snowflake. So just do the simple thing and let’s get to the main dish. 

Do the simple thing and let’s get to the main dish

Main dish: Transforming data

Now that we have all our data sources in Snowflake, what do we do? Well, I haven’t met anyone who doesn’t want to do some level of data quality, documentation, lineage for impact analysis, and do this in a collaborative way that builds trust in the process.

I’ve got you covered. Just use dbt. Yup, that’s it, simple, a single tool that can do documentation, lineage, data quality, and more. dbt is a key component in our DataOps process because it, like Snowflake, just works. It was developed by people who were analysts themselves and appreciated software development best practices like DRY. They knew that SQL is the great common denominator and all it needed was some tooling around it. It’s hard enough finding good analytics engineers let alone finding ones that know Python. Leave the Python to Data Science and first build a solid foundation for your transformation process. Don’t worry, I didn’t forget about your ambition to create great machine learning models, Snowflake has you covered there as well, check out Snowpark.

You will need a little more than dbt in order to schedule your runs and bring some order to what otherwise would become chaos, but dbt will get you a long way there and if you want to know how we solve this with our Datacoves, reach out, we’ll share our knowledge in our 1-hour free consultation.

Main dish: transforming data

Dessert: Reporting on data

This three-course meal is quickly coming to an end, but I couldn’t let you go home before you have dessert. You need dashboards, but you also want self-service, then you can’t go wrong with Looker. I am not the only chef saying this, have a look at this.

One big reason for choosing Looker in addition to the above is the fact that version control is part of the process. If you want things that are documented, reused, and follow software development best practices, then you need to have everything in version control. You can no longer depend on the secret recipe that one of your colleagues has on their laptops. People get promoted, move to other companies, forget… and you need to have a data stack that is not brittle. So choose your dessert wisely.

Finish a great meal with dessert

Conclusion 

There are a lot of decisions to be made when creating a great meal. You need to know your guests dietary needs, what you have available, and how to turn raw ingredients into a delicious plate. When it comes to data the options and permutations are endless and most people need to get to delivering solutions so decision makers can improve business results. While no solution is perfect, in my experience there are certain ingredients that when put together well enable users to get to building quickly. If you want to deliver analytics your decision makers can trust, just go Omakase.

Document and test data with dbt
5 mins read

In our previous article we wrote about the various dbt tests, we talked about the importance of testing data and how dbt, a tool developed by dbt Labs, helps data practitioners validate the integrity of their data. In that article we covered the various packages in the dbt ecosystem that can be used to run a variety of tests on data. Many people have legacy ETL processes and are unable to make the move to dbt quickly, but they can still leverage the power of dbt and by doing so slowly begin the transition to this tool. In this article, I’ll discuss how you can use dbt to test and document your data even if you are not using dbt for transformation.

Why dbt?

why dbt

Ideally, we can prevent erroneous data from ever reaching our decision makers and this is what dbt was created to do. dbt allows us to embed software engineering best practices into data transformation. It is the “T” in ELT (Extract, Load, and Transform) and it also helps capture documentation, testing, and lineage. Since dbt uses SQL as the transformation language, we can also add governance and collaboration via DataOps, but that’s a topic for another post.

I often talk to people who find dbt very appealing, but they have a lot of investment in existing tools like Talend, Informatica, SSIS, Python, etc. They often have gaps in their processes around documentation and data quality and while other tools exist, I believe dbt is a good alternative and by leveraging dbt to fill the gaps in your current data processes, you open the door to incrementally moving your transformations to dbt,   

Eventually dbt can be fully leveraged as part of the modern data workflow to produce value from data in an agile way. The automated and flexible nature of dbt allows data experts to focus more on exploring data to find insights.

Why ELT?

The term ELT can be confusing, some people hear ELT and ETL and think they are fundamentally the same thing. This is muddied by marketers who try to appeal to potential customers by suggesting their tool can do it all. The way I define ELT is by making sure that data is loaded from the source without any filters or transformation. This is EL (Extract and Load). We keep all rows and all columns. Data is replicated even if there is no current need. While this may seem wasteful at first, it allows Analytic and Data Engineers to quickly react to business needs. Have you ever faced the need to answer a question only to find that the field you need was never imported into the data warehouse? This is common especially in traditional thinking where it was costly to store data or when companies had limited resources due to data warehouses that coupled compute with storage. Today, warehouses like Snowflake have removed this constraint so we can load all the data and keep it synchronized with the sources. Another aspect of modern EL solutions is making the process to load and synchronize data simple. Tools like Fivetran and Airbyte allow users to easily load data by simply selecting pre-build connectors for a variety of sources and selecting the destination where the data should land.  Gone are the days of creating tables in target data warehouses and dealing with changes when sources add or remove columns. The new way of working is helping users set it and forget it.

Graphical user interface, text, application, email, websiteDescription automatically generated
This is an example of a modern data flow. Data Loaders are the tools that do the extracting and loading process to get the data to the RAW area of the data warehouse. These tools include Stitch, Fivetran and Airbyte. Now that the data is in the warehouse dbt can be leveraged for the transformation. As you can see above dbt delivers transformed data and also enables snapshotting, testing, documenting, and facilitates deploying.

Want more flexibility? Migrate your dbt Cloud project in under an hour.

Book a call

Plugging in dbt for testing

In an environment where other transformation tools are used, you can still leverage dbt to address gaps in testing. There are over 70 pre-built tests that can be leveraged, and custom tests can be created by just using SQL. dbt can test data anywhere in the transformation lifecycle. It can be used at the beginning of the workflow to test or verify assumptions about data sources and the best part is that these data sources or models do not need to be a part of any ongoing project within dbt. Imagine you have a raw customer table you are loading into Snowflake. We can connect this table to dbt by creating a source yml file where we tell dbt where to find the table by providing the name of the database, schema, and table. We can then add the columns to the table and while we are at it, we can add descriptions. 

The image below illustrates how test would be added for a CUSTOMER table in the SNOWFLAKE_SAMPLE_DATA database in the TPCH_SF100 schema.

Graphical user interface, text, application, emailDescription automatically generated

Logo, company nameDescription automatically generated
We can do tests at the table level. Here we check that the table has between 1 and 10 columns.

A picture containing graphical user interfaceDescription automatically generated
We can also do tests at the column level. In the image above we assure that C_CUSTKEY columns has no duplicates by leveraging dbt’s unique test and we check that the column is always populated with the not_null test.

Testing non-source tables

So far we have done what you would learn on a standard dbt tutorial, you start with some source, connect it to dbt, and add some tests. But the reality is, dbt doesn’t really care if the table that we are pointing to is a true "source" table or not.  To dbt, any table can be a source, even an aggregation, reporting table, or view.  The process is the same. You create a yml file, specify the “source” and add tests.

Let’s say we have a table that is an aggregate for the number of customers by market segment. We can add a source that points to this table and check for the existence of specific market segments and a range of customers by segment.

Graphical user interface, textDescription automatically generated

Using this approach, we can leverage the tests available in dbt anywhere in the data transformation pipeline. We can use dbt_utils.equal_rowcount to validate that two relations have the same number of rows to assure that a transformation step does not inadvertently drop some rows. 

When we are aggregating, we can also check that the resulting table has fewer rows than the table we are aggregating by using the dbt_utils.fewer_rows_than test.

TextDescription automatically generated with medium confidence

Notice that you can use the source macro when referring to another model outside of dbt. As long as you register both models as sources, you can refer to them. So when you see documentation that refers to the ref() macro, just substitute with the source macro as I did above.

Graphical user interface, text, application, chat or text messageDescription automatically generated

Also, note that even though documentation may say this is a model test, you can use this in your source: definition as I have done above.

Documenting tables

In dbt sources, we can also add documentation like so:

Text, applicationDescription automatically generated

These descriptions will then show up in the dbt docs.

Graphical user interface, application, TeamsDescription automatically generated
By only having sources in dbt docs, you will not have the lineage capability of dbt, but the above is more than many people have.

Conclusion

dbt is a great tool for transforming data, capturing documentation, and lineage, but if your company has a lot of transformation scripts using legacy tools, the migration to dbt may seem daunting and you may think you cannot leverage the benefits of dbt. 

By leveraging source definitions you can take advantage of dbt’s ecosystem of tests and ability to document even if transformations are done using other tools.

Gradually the organization will realize the power of dbt and you can gradually migrate to dbt.  For the data to be trusted, it needs to be documented and tested and dbt can help you in this journey.

dbt Core vs dbt Cloud
5 mins read

dbt Core and dbt Cloud both run the same transformation engine. The difference is in who manages the infrastructure around it.

dbt Core is open-source and free. It gives you full control over your environment but requires your team to build and maintain orchestration, CI/CD, developer environments, and secrets management.

dbt Cloud is a managed SaaS platform built on dbt Core. It simplifies setup with a built-in IDE, job scheduler, and CI/CD, but limits flexibility, restricts private cloud deployment, and can get expensive at scale.

Managed dbt Core platforms like Datacoves offer a third path: the operational simplicity of dbt Cloud with the flexibility and security of dbt Core, deployed in your own private cloud.

The right choice depends on your team's engineering capacity, security requirements, and how much infrastructure you want to own.

What Are dbt Core and dbt Cloud?

dbt Core and dbt Cloud both run the same transformation engine. The difference is in who manages the infrastructure around it.

dbt (data build tool) is an open-source transformation framework for building, testing, and deploying SQL-based data models. When people say "dbt," they're almost always talking about dbt Core, the engine that everything else is built on.

dbt Core is the open-source CLI tool maintained by dbt Labs. It's free, runs in any environment, and gives teams full control over their setup. Scheduling, CI/CD, and developer tooling are not included. Teams assemble those separately.

dbt Cloud is a managed SaaS platform built on dbt Core. It adds a web IDE, job scheduler, CI/CD integrations, a proprietary semantic layer, and metadata APIs. Setup is faster, but flexibility and private cloud deployment are limited.

Managed dbt platforms like Datacoves run dbt inside your own cloud with the surrounding infrastructure already in place: IDE, orchestration, CI/CD, secrets management, all managed for you.

All three run the same transformation engine. Everything else is a platform decision.

How dbt Core and dbt Cloud Compare at a Glance

The table below covers the key decision points. Sections that follow go deeper on each one.

Developer Environment: IDE and Setup

dbt Core

With dbt Core, every developer sets up their own environment. That means installing dbt, configuring a connection to the warehouse, managing Python versions, and handling dependencies like SQLFluff or dbt Power User. On paper, straightforward. In practice, setup can take anywhere from a few hours to several days depending on the developer's experience and the organization's IT constraints.

Pre-configured company laptops often ship with software that may conflict with dbt. Proxy settings, restricted package registries, and corporate firewall rules add friction before a developer writes a single line of SQL.

The upside is full control. Teams can use any IDE they prefer: VS Code, Cursor, PyCharm, or whatever fits their workflow. There are no constraints on tooling choices, and developers who already have strong local environment preferences can keep working the way they work best.

The maintenance challenge grows with team size. Every dbt version upgrade needs to happen in sync across all developers. On a small team that's manageable. On a team of 20 or more, someone is always on a different version, and those mismatches cause inconsistent behavior, failed CI runs, and debugging sessions that should never have happened. Organizations that skip upgrades to avoid the coordination cost accumulate technical debt that gets harder to unwind over time.

dbt Cloud

dbt Cloud's web IDE lets developers log in through a browser and start writing SQL without installing anything locally. No Python, no CLI, no profiles.yml. For analytics engineers who are new to dbt or unfamiliar with command-line tools, this is a genuine advantage.

The trade-off is flexibility. The web IDE does not support VS Code extensions or custom Python libraries. Teams that rely on SQLFluff configurations, internal Python packages, or warehouse-specific extensions like the Snowflake VS Code plugin will find it limiting.

dbt Cloud also offers a CLI option that lets developers work locally in VS Code while dbt Cloud handles compute. Many teams end up running both: newer analysts in the web IDE, senior engineers on the CLI. But the CLI path reintroduces the local environment problems the web IDE was supposed to solve. SQLFluff versions, Python dependencies, and VS Code extensions still need to be installed and kept in sync across every developer's machine. On larger teams, that version drift shows up quickly.

Managed dbt

Datacoves provides VS Code running in the browser, fully managed and pre-configured. Developers get the VS Code they already know, without any local installation. Warehouse connections, Git configuration, Python environments, and tooling like SQLFluff are set up out of the box.

Where Datacoves differs from dbt Cloud's web IDE: the environment is fully extensible. Teams can install any VS Code extension, add internal Python libraries, and configure the workspace to match their standards. Organizations with proprietary packages or warehouse-specific tooling can bring those into the environment without workarounds.

Onboarding a new developer is a matter of clicks, not days. When dbt or a dependent library needs an upgrade, Datacoves handles it. Developers work in a consistent, current environment without touching it.

Scheduling and Orchestration

dbt Core

dbt Core has no built-in scheduler. Teams choose their own orchestration tool, with Apache Airflow being the most common choice in enterprise environments. This gives full flexibility: you can connect ingestion, transformation, and downstream activation steps into a single pipeline, trigger internal tools behind the firewall, and orchestrate anything in your stack.

That flexibility comes with real cost. Airflow is not simple to operate. Running it reliably at scale requires Kubernetes knowledge, careful resource management, and dedicated engineering attention. A production-grade Airflow setup with separate local development, testing, and production environments is a multi-month investment for most teams. When you add advanced features like external secrets management, alerting, and DAG version control and the scope grows further.

Teams that underestimate this often end up with a fragile single-environment setup or become dependent on the key people who understand how everything works until it doesn't.

dbt Cloud

dbt Cloud includes a built-in job scheduler with a clean UI for configuring run frequency, retries, and alerts. For teams that only need to run dbt on a schedule, it works well and requires no additional tooling.

The limitation becomes clear when pipelines grow beyond dbt. If you need to connect an ingestion step before transformation, trigger a downstream tool after a model run, or orchestrate anything outside dbt's scope, the built-in scheduler is not enough. dbt Cloud offers an API to trigger jobs from an external orchestrator, but that adds integration overhead and means maintaining two systems.

Enterprise teams with existing Airflow infrastructure often end up running dbt Cloud jobs triggered by Airflow anyway, which raises the question of why they're paying for a scheduler they're not using.

Managed dbt

Datacoves includes managed Airflow as part of the platform. Two environments come pre-configured: a personal Airflow sandbox for each developer to test DAGs without affecting anyone else, and a shared Teams Airflow for production workflows. Both are pre-integrated with dbt and Airflow, so DAG creation for dbt runs is straightforward without custom operators or glue code.

Because Airflow runs inside your private cloud alongside dbt, it can reach internal systems, on-premise databases, and tools behind the corporate firewall. End-to-end pipelines that include ingestion, transformation, and activation steps all run in one orchestration layer without external API calls or cross-network dependencies.

Spinning up additional Airflow environments takes minutes, so enterprises can provision separate development, testing, and production environments without infrastructure work. Teams with complex testing requirements or multiple projects can have as many environments as they need.

Datacoves also supports simplified DAG creation using YAML, reducing the Python burden on teams that are primarily SQL-focused.

dbt Cloud covers transformation and scheduling, but it does not cover orchestration of the broader pipeline. Teams still need to run and maintain Airflow or another orchestrator alongside it.

CI/CD and DataOps

dbt Core

dbt Core gives teams complete control over their CI/CD pipeline. Any Git provider works: GitHub, GitLab, Bitbucket, Azure DevOps, or internal systems like Bitbucket Server. Any CI tool works too: GitHub Actions, GitLab CI, Jenkins, CircleCI, or whatever the organization already runs behind the firewall.

That flexibility is genuinely valuable for enterprises that have invested in internal tooling. A team on Jenkins with Bitbucket can build a world-class dbt CI pipeline without compromising on either tool.

The cost is setup time. Docker images need to be built and maintained with the right dbt version, SQLFluff configuration, and Python dependencies. CI runners need to be provisioned and kept current. Notification routing to Slack, MS Teams, or email needs to be configured separately. None of this is insurmountable, but it adds up fast and requires platform engineering skills that not every data team has.

Developers also have no way to run CI checks locally before pushing, which means failed CI runs often require multiple commits to fix, slowing down the feedback loop.

dbt Cloud

dbt Cloud has built-in CI that automatically triggers a run when a pull request is opened. It builds only the modified models and their downstream dependencies in a temporary schema, posts results back to the PR, and cleans up when the PR is merged or closed. For teams on GitHub or GitLab, this works well and requires minimal configuration.

The constraints appear quickly in enterprise contexts. Native automated CI only works with GitHub, GitLab, and Azure DevOps on Enterprise plans. Teams on Bitbucket, AWS CodeCommit, Jenkins, or any internal Git or CI system get no automated CI. They can use the dbt API to trigger jobs manually, but that requires custom integration work that undermines the simplicity dbt Cloud is supposed to provide.

Customization is also limited. The CI pipeline runs dbt checks. Adding custom steps, internal validation scripts, or governance checks outside of what dbt Cloud natively supports requires workarounds. Teams with mature DataOps practices often find the built-in CI too rigid to fit their standards.

Managed dbt

Datacoves provides pre-built CI/CD pipelines that work with any Git provider and any CI tool, including Jenkins and internal enterprise systems behind the firewall. The pipeline comes configured with dbt testing, SQLFluff linting, dbt-checkpoint governance checks, and deployment steps out of the box.

Developers can run the same CI checks locally before pushing changes, which catches issues before they reach the pipeline and dramatically reduces the back-and-forth of fixing failed CI runs. When the local check passes, the CI check passes.

Because the pipeline is fully customizable, teams can add any step they need: internal approval workflows, custom validation scripts, notifications to MS Teams, or integration with ticketing systems like Jira. There are no constraints on providers or tools.

Semantic Layer

dbt Core

dbt Core has no built-in semantic layer. Teams choose from several mature options depending on their warehouse and BI tool preferences.

Cube.dev is the most widely adopted open-source choice. It provides a headless semantic layer with its own API, caching, and broad BI tool support. Lightdash and Omni are strong alternatives that integrate tightly with dbt models and work well for teams that want metric definitions to live close to their transformation code.

For Snowflake users, the dbt_semantic_view package lets teams manage Snowflake Semantic Views directly from their dbt project. Metrics defined this way live in the warehouse itself and are accessible to any tool connected to Snowflake, without routing data through a third-party service.

The open-source path requires more setup and maintenance than a managed semantic layer, but it gives teams full control over where metrics are defined, how they are served, and which tools consume them.

dbt Cloud

dbt Cloud includes a hosted semantic layer powered by MetricFlow. MetricFlow was acquired from Transform in 2023 and open-sourced under Apache 2.0 at Coalesce 2025. The engine itself is now free to use. The hosted service in dbt Cloud is a paid feature available on Starter plans and above. Usage is metered by queried metrics per month and caching, which reduces repeated warehouse hits, is an Enterprise-only feature.

Supported BI integrations include Tableau, Power BI, Google Sheets, and Excel, among others. Most are generally available. The exception is Power BI, which is still in public preview and requires additional setup through an On-premises Data Gateway for Power BI Service.

Warehouse support is incomplete. Microsoft Fabric is not supported. When queries run through the dbt Cloud semantic layer, data passes through dbt Labs servers on the way back from the warehouse. For organizations in regulated industries with strict data residency requirements, that is a hard blocker.

The spec itself is also in flux. dbt Labs recently modernized the MetricFlow YAML spec with the Fusion engine, and the new spec is coming to dbt Core in version 1.12. dbt Labs has also joined the Open Semantic Interchange initiative alongside Snowflake, Salesforce, BlackRock, and others to work toward an open standard, though no engine is fully OSI compliant yet. Teams investing heavily in the dbt Cloud semantic layer today should be aware that the spec is still evolving.

Managed dbt

Datacoves does not lock teams into a single semantic layer approach. Depending on your warehouse and BI stack, you can use Snowflake Semantic Views via a dbt package, Cube.dev, Lightdash, or Omni. All options run inside your private environment, with no query data passing through third-party servers.

Because Datacoves runs dbt Core, teams can adopt MetricFlow natively when dbt Core 1.12 ships the new spec. No migration friction, no proprietary hosting layer to work around, and no metered query limits to plan around.

The OSI standard is still developing. Until compliance is widespread across tools, flexibility is the lower-risk position. Datacoves gives you that flexibility without requiring a bet on any single vendor's implementation.

Documentation and Lineage

dbt Core

dbt Core generates documentation automatically from your project: model descriptions, column definitions, tests, and a DAG showing upstream and downstream dependencies. You run dbt docs generate to build the static site and dbt docs serve to view it locally.

The limitation is hosting. dbt Core produces a static artifact. Your team is responsible for serving it somewhere accessible, keeping it updated after each run, and managing access controls. Many teams end up with stale docs because the pipeline to publish and refresh them is never properly automated. As projects grow across multiple teams and hundreds of models, the static site format also becomes a constraint. Navigation slows down, search is limited, and there is no real multi-project support.

dbt Cloud

dbt Cloud hosts your documentation automatically and updates it after each production run. On Starter plans, teams get dbt Catalog rather than the static dbt Docs experience. The features that matter most at enterprise scale, including column-level lineage, multi-project lineage, and project recommendations, are gated behind the Enterprise plan.

It is also worth noting that Snowflake now provides native lineage including column-level lineage directly in the platform, which covers a significant portion of what teams historically needed a separate docs tool to provide.

Managed dbt

Datacoves automates documentation generation and hosting as part of the CI/CD pipeline. Docs are updated on every merge without manual intervention, and the hosted site is available to your full team inside your private environment at no additional cost.

For teams that have outgrown the static dbt docs experience, Datacoves also offers TributaryDocs. Unlike the default dbt docs site, TributaryDocs is a client-server application, which means it scales to enterprise-sized projects without the performance and navigation limitations of a static site. It includes an MCP server, enabling AI tools to query your documentation directly and making your data catalog part of your AI-assisted development workflow.

Datacoves customers can also connect external catalogs like Alation or Atlan, or use the catalog built into their warehouse. Snowflake, for example, includes native column-level lineage directly in the platform.

APIs and Extensibility

dbt Core

dbt Core produces a set of artifacts after every run: manifest.json, catalog.json, and run_results.json. These files contain your full project metadata and are the foundation for any custom tooling, observability integrations, or downstream automation you want to build.

Because dbt Core is open source, you have complete access to these artifacts and full control over how you use them. The tradeoff is that everything is self-managed. Parsing artifacts, building pipelines around them, and integrating with other systems requires custom engineering work that your team owns and maintains.

dbt Cloud

dbt Cloud exposes a set of APIs including the Discovery API for metadata queries, the Administrative API for managing jobs and environments, and webhooks for event-driven automation. These are well-documented and cover most standard integration scenarios.

The limitations show up at the edges. CI/CD integrations are constrained to supported Git providers. Some API capabilities are plan-gated, with full access requiring Enterprise. Teams building complex internal tooling or integrating with systems outside dbt's supported ecosystem may find the platform less flexible than working directly with dbt Core artifacts.

Managed dbt

Datacoves runs dbt Core, so all native artifacts are available with no restrictions. Teams can build against manifest.json and run_results.json directly, integrate with any internal system, and use any CI tool or Git provider without platform constraints.

Datacoves also provides a dbt API that enables pushing and pulling artifacts programmatically. This is particularly useful for slim CI, where only changed models are tested, and for deferral, where development runs reference production state without rebuilding the entire project.

On the orchestration side, Datacoves exposes the Airflow API, giving teams full programmatic control over their pipelines. This enables event-driven architectures using Airflow datasets, where DAGs trigger based on data availability rather than fixed schedules. Datacoves also uses run_results.json within Airflow to enable retries from the point of failure, so when a model fails mid-run, the DAG resumes from that model rather than restarting the entire pipeline.

For teams that want API-driven metadata beyond what dbt Core artifacts provide, TributaryDocs exposes an MCP server that makes your documentation and lineage queryable by AI tools and external systems.

AI and LLM Integration

dbt Core

dbt Core has no built-in AI capabilities. Teams can integrate any AI tool they choose by connecting it to their local development environment. VS Code extensions like GitHub Copilot, Cursor, or any MCP-compatible client can work alongside dbt Core projects with full access to your codebase.

The flexibility is real, but so is the setup overhead. Each developer configures their own AI tooling independently, which means inconsistent experiences across the team and no centralized control over which models or providers are in use.

dbt Cloud

dbt Cloud includes dbt Copilot, an AI assistant built into the Cloud IDE. Copilot can generate documentation, tests, semantic models, and SQL based on the context of your dbt project. It is generally available on Enterprise plans and available in limited form on Starter.

The constraint is that Copilot is tied to OpenAI. Teams cannot bring their own LLM or route requests through their own Azure OpenAI instance unless they are on Enterprise and configure bring-your-own-key. Usage is also metered: 100 actions per month on Developer, 5,000 on Starter, and 10,000 on Enterprise. dbt Cloud also provides its own MCP server for integrating dbt context into AI workflows, but does not support connecting arbitrary third-party MCP servers within the platform. For organizations with strict data governance policies around which AI providers can touch their code and metadata, the lack of model choice is a hard limitation.

Managed dbt

Datacoves supports any LLM your organization has approved. Teams can connect Anthropic, OpenAI, Azure OpenAI, GitHub Copilot, or Snowflake Cortex CLI directly to the VS Code environment without platform restrictions. Snowflake Cortex CLI also supports skills, enabling teams to build custom AI-powered workflows grounded in their warehouse data. There are no metered AI actions and no dependency on a single provider.

Because Datacoves provides VS Code in the browser, teams can configure any MCP server alongside their dbt project, not just a single platform-provided one. This means connecting Snowflake's MCP server, TributaryDocs' MCP server, or any other MCP-compatible tool is a configuration choice, not a platform constraint.

For organizations in regulated industries where AI provider choice is a compliance requirement, the bring-your-own-LLM architecture is not a nice-to-have. It is a prerequisite.

Security and Compliance

dbt Core

dbt Core has no built-in security controls. All security decisions sit with your team: where the environment runs, how credentials are managed, who has access, and how secrets are stored. For teams with the engineering capacity to implement this properly, that is complete flexibility. For everyone else, it is undifferentiated heavy lifting.

The most common gaps are secrets management, environment isolation, and consistent access controls across developers. These are solvable problems, but solving them requires deliberate investment and ongoing maintenance.

dbt Cloud

dbt Cloud is a SaaS product. Your data stays in your warehouse, but your code, metadata, and credentials pass through dbt Labs infrastructure. For many teams that is an acceptable tradeoff. For organizations in regulated industries such as pharma, healthcare, finance, and government, it often is not.

dbt Cloud offers SSO, role-based access control, and SOC 2 Type II compliance. PrivateLink and IP restrictions are available, but only on Enterprise+ plans. Teams that need their entire development and orchestration environment to remain inside their own network perimeter will find that dbt Cloud cannot meet that requirement regardless of plan.

Managed dbt

Datacoves can be deployed in your private cloud account. Your code, your credentials, your metadata, and your pipeline execution all stay inside your own network. There is no VPC peering required and no data transiting a third-party SaaS environment.

Datacoves integrates with your existing identity provider via SSO and SAML, connects to your secrets management system such as AWS Secrets Manager, and supports your organization's logging and audit requirements. Security controls are not bolt-ons, they are part of the deployment architecture from day one.

For organizations in regulated industries, this is the architecture that passes security reviews without exceptions. You are not asking your security team to approve a SaaS vendor touching your pipeline. You are showing them that everything runs in your own account, under your own controls.

Total Cost of Ownership

dbt Core

dbt Core is free. The cost is everything around it. A team that builds its own platform on dbt Core needs to provision and maintain developer environments, stand up and operate Airflow, build CI/CD pipelines, manage secrets, handle upgrades, and onboard every new developer into a custom setup.

That work falls on your most senior engineers. It is not a one-time cost. Every version upgrade, every new team member, and every incident that traces back to environment inconsistency is time your team is not spending on data products. Open source looks free the way a free puppy looks free.

dbt Cloud

dbt Cloud starts at $100 per developer seat per month on the Starter plan, capped at five developers. Full enterprise capabilities require an Enterprise contract with custom pricing. Semantic Layer usage is metered separately. Copilot usage is metered separately. Teams that grow beyond five developers or need features like multi-project lineage, column-level lineage, or advanced CI/CD will find that the total bill looks very different from the entry price.

There is also an indirect cost. dbt Cloud covers transformation and scheduling, but it does not cover orchestration of the broader pipeline. Teams still need to run and maintain Airflow or another orchestrator alongside it, which means the dbt Cloud platform cost is only part of the picture.

Managed dbt

Datacoves provides the full environment: VS Code, dbt Core, Airflow, CI/CD, secrets management, documentation hosting, and governance guardrails. There is no separate orchestration bill, no environment infrastructure to maintain, and no platform engineering team required to keep it running.

Onboarding a new developer takes minutes, not days. Datacoves customers report reducing onboarding time by approximately 30 hours per developer. At scale, across a team of 20 or 30 engineers, that compounds quickly.

The right comparison is not Datacoves versus dbt Cloud's license fee. It is Datacoves versus the total cost of dbt Cloud plus Airflow infrastructure plus the engineering time to build and maintain the environment around them.

The Third Option: Managed dbt Core

Most comparisons of dbt Core and dbt Cloud treat the choice as binary. It is not.

dbt Core gives you full control and zero cost, but leaves your team responsible for building and maintaining everything around it. dbt Cloud removes that burden but constrains your tooling, your security posture, and your budget as you scale. Both options make tradeoffs that many enterprise teams cannot accept.

The third option is a managed dbt platform that runs in your own cloud, on your own terms.

A managed dbt platform provides the operational simplicity of dbt Cloud with the flexibility and security of dbt Core, deployed in your own private cloud.

Datacoves delivers the operational simplicity of dbt Cloud without the SaaS architecture, the vendor lock-in, or the platform constraints. Your team gets a fully configured environment from day one: VS Code in the browser, dbt Core, managed Airflow, CI/CD pipelines, secrets management, and governance guardrails, all running inside your private cloud account.

You keep full ownership of your code and your data. You choose your warehouse, your Git provider, your CI tool, your LLM, and your BI stack. When your requirements change, the platform adapts. There is no migration to a new vendor and no renegotiation of what the platform will and will not support.

For enterprise teams in regulated industries, for organizations that have outgrown dbt Cloud's constraints, and for data leaders who want the best-practice foundation of a managed platform without surrendering control, Datacoves is the path that does not require a compromise.

Datacoves doesn't replace your tools. It gives them a proper home.

How to Choose: dbt Core vs dbt Cloud vs Managed dbt

The right choice depends on your team's size, security requirements, and how much of the platform you want to own.

Choose dbt Core if:

  • You have a small, highly technical team that is comfortable building and maintaining infrastructure
  • You want complete control over every component of your stack
  • You have existing Airflow infrastructure and the engineering capacity to integrate it properly
  • Budget is the primary constraint and you can absorb the hidden costs of DIY

Choose dbt Cloud if:

  • Your security and compliance requirements allow for SaaS-based code and metadata hosting
  • You want a fully managed transformation environment without standing up your own infrastructure
  • Your orchestration needs are met by dbt's built-in scheduler and you do not need Airflow
  • You are comfortable with OpenAI-based AI tooling or can configure bring-your-own-key on Enterprise
  • You are just getting started with dbt and that is your only priority right now

Choose Datacoves if:

  • You are in a regulated industry where data and code must stay inside your own cloud
  • You have outgrown dbt Cloud's constraints around Git providers, CI tooling, or orchestration
  • You need managed Airflow alongside dbt without building and maintaining the integration yourself
  • You want AI flexibility, including bring-your-own-LLM, without metered usage caps
  • You are modernizing from legacy ETL and need a proven architecture with best practices built in
  • You want the operational simplicity of a managed platform without surrendering control of your environment

If you are evaluating dbt Core and dbt Cloud and neither feels quite right, that is usually a signal. Most enterprise teams do not lack good tools. They lack a proper platform to run them in.

Ready to see how Datacoves works in your environment?

Book a demo to walk through the platform with a Datacoves expert.

Get our free ebook dbt Cloud vs dbt Core

Get the PDF
Download pdf