Noel Gomez

Noel Gomez

Datacoves Co-founder | 15+ Years Data Platform Expert. Solving enterprise data challenges quickly with dbt & Airflow.

Why executives can’t ignore data orchestration
5 mins read

Data orchestration is the foundation that ensures every step in your data value chain runs in the correct order, with the right dependencies, and with full visibility. Without it, even the best tools such as dbt, Airflow, Snowflake, or your BI platform operate in silos. This disconnect creates delays, data fires, and unreliable insights.

For executives, data orchestration is not optional. It prevents fragmented workflows, reduces operational risk, and helps teams deliver trusted insights quickly and consistently. When orchestration is built into the data platform from the start, organizations eliminate hidden technical debt, scale more confidently, and avoid the costly rework that slows innovation.

In short, data orchestration is how modern data teams deliver reliable, end-to-end value without surprises.

In today’s fast-paced business environment, executives are under increased pressure to deliver quick wins and measurable results. However, one capability that is often overlooked is data orchestration.

This oversight can sabotage progress as the promise of data modernization efforts fails to deliver expected outcomes in terms of ROI and improved efficiencies.

In this article, we will explain what data orchestration is, the risks of not implementing proper data orchestration, and how executives benefit from end-to-end data orchestration.

What Is Data Orchestration? (Simple Definition for Executives)

Data orchestration ensures every step in your data value chain runs in the right order, with the right dependencies, and with full visibility.
An infographic of data orchestration practice

Data orchestration is the practice of coordinating all the steps in your organization’s data processes so they run smoothly, in the right order, and without surprises. Think of it as the conductor ensuring each instrument plays at the right time to create beautiful music.

Generating insights is a multi-tool process. What’s the problem with this setup? Each of these tools may include its own scheduler, and they will each run in a silo. Even if an upstream step fails or is delayed, the subsequent steps will run. This disconnect leads to surprises for executives expecting trusted insights. This in turn, leads to delays and data fires, which are disruptive and inefficient for the organization. 

Imagine you are baking a chocolate cake. You would need a recipe, all the ingredients, and a functioning oven. However, you wouldn’t turn on the oven before buying the ingredients and mixing the batter if your milk had spoiled. Not having someone orchestrating all the steps in the right sequence would lead to a disorganized process that is inefficient and wasteful. You also know not to continue if there is a critical issue, such as spoiled milk. 

Data orchestration solves the problem of having siloed tools by connecting all the steps in the data value chain. This way, if one step is delayed or fails, subsequent steps do not run. With a data orchestration tool, we can also notify someone to resolve the issue so they can act quickly, reducing fires and providing visibility to the entire process.

Key Components of Modern Data Orchestration

  • Coordinated Workflows: Makes sure all tools and teams work together without unnecessary manual steps.
  • Problem Detection: Identifies issues early so they don’t disrupt reporting or decision-making.
  • Clear Oversight: Gives executives and teams visibility into the data value chain across the organization.
  • Scalable Processes: Ensures your data operations can grow with your business without causing chaos.

Data Orchestration vs. ETL (Clear Distinction)

ETL (Extract, Transform, and Load) focuses on moving and transforming data, but data orchestration is about making sure everything happens in the right sequence across all tools and systems. It’s the difference between just having the pieces of a puzzle and putting them together into a clear picture.

Why Orchestration Matters: The Business Case

Faster, More Reliable Analytics Delivery

Without data orchestration, even the best tools operate in silos, creating delays, data fires, and unreliable insights.

Executives make many decisions but rarely have the time to dive into technical details. They delegate research and expect quick wins, which often leads to mixed messaging. Leaders want resilient, scalable, future-proof solutions, yet they also pressure teams to deliver “something now.” Vendors exploit this tension. They sell tools that solve one slice of the data value chain but rarely explain that their product won't fix the underlying fragmentation. Quick wins may ship, but the systemic problems remain.

Data orchestration removes this friction. When workflows are unified, adding steps to the data flow is straightforward, pipelines are predictable, and teams deliver high-quality data products faster and with far fewer surprises.

Reduced Firefighting and Operational Risk

A major Datacoves customer summarized the difference clearly:
“Before, we had many data fires disrupting the organization. Now issues still occur, but we catch them immediately and prevent bad data from reaching stakeholders.”

Without orchestration, each new tool adds another blind spot. Teams don’t see failures until they hit downstream systems or show up in dashboards. This reactive posture creates endless rework, late-night outages, and a reputation problem with stakeholders.

With orchestration, failures surface early. Dependencies, quality checks, and execution paths are clear. Teams prevent incidents instead of reacting to them.

Governance and Visibility Across the Data Lifecycle

Data orchestration isn’t just about automation; it’s about governance.
It ensures:

  • Clear ownership
  • Predictable workflows
  • Consistent development processes
  • End-to-end visibility across ingestion, transformation, analytics, and activation.

This visibility dramatically improves trust. Stakeholders no longer get “chocolate cake” made with spoiled milk. A new tool may bake faster, but if upstream data is broken, the final product is still compromised.

Orchestration ensures the entire value chain is healthy, not just one ingredient.

Supports Ingestion and dbt Scheduling with Airflow

Modern data teams rely heavily on tools like dbt and Airflow, but these tools do not magically align themselves. Without orchestration:

  • dbt jobs run inconsistently
  • Airflow DAGs are difficult to track across environments
  • Quality checks run out of order, or not at all

With orchestration in place, ingestion, dbt scheduling, and activation become reliable, governed, and transparent, ensuring every step runs at the right time, in the right order, with the right dependencies. Learn more in our guide on the difference between dbt Cloud vs dbt Core.

For more details on how dbt schedules and runs models, see the official dbt documentation.

To learn how Airflow manages task dependencies and scheduling, visit the official Apache Airflow documentation.

Data orchestration:

  • Bridges gaps across systems
  • Reduces hidden costs and technical debt
  • Provides end-to-end visibility
  • Prevents costly rework as you scale

The Cost of NOT Having Orchestration

It is tempting to postpone data orchestration until the weight of data problems makes it unavoidable. Even the best tools and talented teams can struggle without a clear orchestration strategy. When data processes aren’t coordinated, organizations face inefficiencies, errors, and lost opportunities.

Implementing data orchestration early reduces hidden technical debt, prevents rework, and helps teams deliver trusted insights faster.

Fragmented Tools Create Operational Inefficiencies

When data pipelines rely on multiple systems that don’t communicate well, teams spend extra time manually moving data, reconciling errors, and firefighting issues. This slows decision-making and increases operational costs.

Common symptoms of fragmented tools include:

  • Manual data movement
  • Frequent reconciliations
  • Increased firefighting
  • Slower decision-making

Quick Wins Without Orchestration Create Long-Term Pain

Many organizations focus on a “quick wins” approach only to discover that the cost of moving fast was long-term lack of agility and technical debt. This approach may deliver immediate results but leads to technical debt, wasted spend, and fragile data processes that are hard to scale. A great example is Data Drive’s journey, before adding data orchestration, when issues occurred, they had to spend time debugging each step of their disconnected process. Now it is clear where an issue has occurred, enabling them to resolve issues faster for their stakeholders. 

Costly Rework and Compounding Technical Debt

As organizations grow, the absence of orchestration forces teams to revisit and fix processes repeatedly. Embedding orchestration from the start avoids repeated firefighting, accelerates innovation, and makes scaling smoother. Improving one step alone cannot deliver the desired outcome, just like a single egg cannot make a cake. 

Limited Visibility Erodes Trust in Insights

Organizations without data orchestration are effectively flying blind. Disconnected processes run out of order and issues are discovered by frustrated stakeholders. Resource-constrained data teams spend their time firefighting instead of delivering new insights. The result is delays in decision-making, higher operating costs, and an erosion of trust in data. Embedding orchestration from the start avoids repeated firefighting, accelerates innovation, and makes scaling smoother.

Data orchestration diagram showing integrated workflows across tools

Common Roadblocks and How to Avoid Them

If data orchestration is so important, why do organizations go without it? We often hear some common objections: 

Roadblock: Lack of Awareness

Many organizations have not heard of data orchestration and tool vendors rarely highlight this need. It’s only after a painful experience that they realize this essential need. 

Roadblock: It Will Add Complexity

It’s true that data orchestration adds another layer, but without it, you have disconnected, siloed processes. The real cost comes from chaos, not from coordination. 

Roadblock: Another Tool Will Make Things Harder

Vendor sprawl can indeed introduce additional risks, that’s why all-in-one platforms like Datacoves reduce integration overhead by bundling enterprise-grade orchestration, like Airflow, without increasing vendor lock-in. Explore Datacoves’ Integrated Orchestration Platform

Roadblock: Orchestration Makes Processes More Complex

Data value chains are inherently complex, with multiple data sources, ingestion processes, transformations, and data consumers. Data orchestration does not introduce complexity; it provides visibility and control over this complexity. 

It may seem reasonable to postpone data orchestration in the short term. But every mature data organization, both large and small, eventually needs to scale. By building-in data orchestration into the data platform from the start, you set up your teams for success, reduce firefighting, and avoid costly and time-consuming rework. Most importantly, the business receives trustworthy insights faster. 

How to Implement Data Orchestration Successfully

Implementing data orchestration doesn’t have to be complicated. The key is to approach it strategically, ensuring that every process is aligned, visible, and scalable.

Step 1: Start with a Clear, Business-Aligned Plan

Begin by mapping your existing data processes and identifying where inefficiencies or risks exist. Knowing exactly how data flows across teams and tools allows you to prioritize the areas that will benefit most from orchestration.

Key outcomes:

  • Clear understanding of current workflows
  • Prioritized areas for improvement
  • Better alignment across teams

Step 2: Automate High-Value Workflows First

Focus first on automating repetitive and error-prone steps such as data collection, cleaning, and routing. Automation reduces manual effort, frees up your team for higher-value work, and ensures processes run consistently.

Key outcomes:

  • Reduced manual effort
  • More consistent execution
  • Teams freed for strategic work

Step 3: Build Cross-Pipeline Visibility and Monitoring

Implement dashboards or monitoring tools that provide executives and teams with real-time visibility into data flows. Early detection of errors prevents costly mistakes and increases confidence in the insights being delivered.

Key outcomes:

  • Faster error detection
  • Increased trust in insights
  • Smoother incident response

Step 4: Scale Gradually with Dependencies and Governance

Start small with high-impact processes and expand orchestration across more workflows over time. Scaling gradually ensures that teams adopt the changes effectively and that processes remain manageable as data volume grows.

Key outcomes:

  • More predictable scaling
  • Stable workflows as volume grows
  • Stronger process governance

Step 5: Choose Tools that Align with Ingestion & dbt Scheduling

Select tools that integrate well with your existing systems, and provide flexibility for future growth. Popular orchestration tools include dbt and Airflow, but the best choice depends on your organization’s specific workflows and needs. Explore how these capabilities come packaged in the Datacoves Platform Features overview.

Key outcomes:

  • Better tool compatibility
  • Lower integration overhead
  • Simpler long-term evolution of your stack

Top Benefits and ROI of Data Orchestration

Investing in data orchestration delivers tangible business value. Organizations that implement orchestration gain efficiency, reliability, and confidence in their decision-making.

Improved Efficiency Across Data & Analytics Teams

Data orchestration reduces manual work, prevents duplicated efforts, and streamlines processes. Teams can focus on higher-value initiatives instead of firefighting data issues.

More Reliable, Trustworthy Insights

With coordinated workflows and monitoring, executives and stakeholders can trust the data they rely on. Decisions are backed by accurate, timely, and actionable insights.

Reduced Operational Costs and Technical Debt

By embedding data orchestration early, organizations avoid expensive rework, reduce errors, and prevent the accumulation of technical debt from ad hoc solutions.

Faster Innovation and Scalable Growth

Data orchestration ensures that data pipelines scale smoothly as the organization grows. Teams can launch new analytics initiatives faster, confident that their underlying processes are robust and repeatable.

Enhanced Visibility Across the Data Lifecycle

Executives gain a clear view of the entire data lifecycle, enabling better oversight, risk management, and strategic planning.

Final Recommendation: Orchestration Is the Foundation, Not the Finish Line

Data orchestration should not be seen as a “nice to have” feature that can be postponed. Mature organizations understand that it is the foundation needed to deliver trusted insights faster. Without it, companies risk setting up siloed tools, increased data firefighting, and eroding trust in both the data and the data team. With it, organizations gain visibility, agility, and the confidence that insights fueling decisions are accurate. 

The real question for strategic leaders is whether to try to piece together disconnected solutions, focusing only on short-term wins, or invest in data orchestration early and unlock the full potential of a connected ecosystem.

For executives, prioritizing data orchestration will mean fewer data fires, accelerated innovation, and an environment where trusted insights flow as reliably as the business demands. 

To see how orchestration is built into the Datacoves platform, visit our Integrated Orchestration page.

Don’t wait until complexity forces your hand. Your team deserves to move faster and fight fewer fires.

Book a personalized demo to see how data orchestration with Datacoves helps leaders unlock value from day one.

New Features from the Databricks AI Summit 2025
5 mins read

The Databricks AI Summit 2025 revealed a major shift toward simpler, AI-ready, and governed data platforms. From no-code analytics to serverless OLTP and agentic workflows, the announcements show Databricks is building for a unified future.

In this post, we break down the six most impactful features announced at the summit and what they mean for the future of data teams.

1. Databricks One and Genie: Making Analytics Truly Accessible

Databricks One (currently in private preview) introduces a no-code analytics platform aimed at democratizing access to insights across the organization. Powered by Genie, users can now interact with business data through natural language Q&A, no SQL or dashboards required. By lowering the barrier to entry, tools like Genie can drive better, faster decision-making across all functions.

Datacoves Take: As with any AI we have used to date, having a solid foundation is key. AI can not solve ambiguous metrics and a lack of knowledge. As we have mentioned, there are some dangers in trusting AI, and these caveats still exist.

Making Analytics Truly Accessible
Image credit

2. Lakebase: A Serverless Postgres for the Lakehouse

In a bold move, Databricks launched Lakebase, a Postgres-compatible, serverless OLTP database natively integrated into the lakehouse. Built atop the foundations laid by the NeonDB acquisition, Lakebase reimagines transactional workloads within the unified lakehouse architecture. This is more than just a database release; it’s a structural shift that brings transactional (OLTP) and analytical (OLAP) workloads together, unlocking powerful agentic and AI use cases without architectural sprawl. 

Datacoves Take: We see both Databricks and Snowflake integrating Postgres into their offering. Ducklake is also demonstrating a simpler future for Iceberg catalogs. Postgres has a strong future ahead, and the unification of OLAP and OLTP seems certain.

A Serverless Postgres for the Lakehouse
Image credit

3. Agent Bricks: From Prototype to Enterprise-Ready AI Agents

With the introduction of Agent Bricks, Databricks is making it easier to build, evaluate, and operationalize agents for AI-driven workflows. What sets this apart is the use of built-in “judges” - LLMs that automatically assess agent quality and performance. This moves agents from hackathon demos into the enterprise spotlight, giving teams a foundation to develop production-grade AI assistants grounded in company data and governance frameworks.

Datacoves Take: This looks interesting, and the key here still lies in having a strong data foundation with good processes. Reproducibility is also key. Testing and proving that the right actions are performed will be important for any organization implementing this feature.

From Prototype to Enterprise-Ready AI Agents
Image credit

4. Databricks Apps: Interfaces That Inherit Governance by Design

Databricks introduced Databricks Apps, allowing developers to build custom user interfaces that automatically respect Unity Catalog permissions and metadata. A standout demo showed glossary terms appearing inline inside Chrome, giving business users governed definitions directly in the tools they use every day. This bridges the gap between data consumers and governed metadata, making governance feel less like overhead and more like embedded intelligence.

Datacoves Take: Metadata and catalogs are important for AI, so we see both Databricks and Snowflake investing in this area. As with any of these changes, technology is not the only change needed in the organization. Change management is also important. Without proper stewardship, ownership, and review processes, apps can’t provide the experience promised.

Interfaces That Inherit Governance by Design
Image credit

5. Unity Catalog Enhancements: Open Governance at Scale

Unity Catalog took a major step forward at the Databricks AI Summit 2025, now supporting managed Apache Iceberg tables, cross-engine interoperability, and introducing Unity Catalog Metrics to define and track business logic across the organization.

This kind of standardization is critical for teams navigating increasingly complex data landscapes. By supporting both Iceberg and Delta formats, enabling two-way sync, and contributing to the open-source ecosystem, Unity Catalog is positioning itself as the true backbone for open, interoperable governance.

Datacoves Take: The Iceberg data format has the momentum behind it; now it is up to the platforms to enable true interoperability. Organizations are expecting a future where a table can be written and read from any platform. DuckLake is also getting in the game, simplifying how metadata is managed, and multi-table transactions are enabled. It will be interesting to see if Unity and Polaris take some of the DuckLake learnings and integrate them in the next few years.

Open Governance at Scale
Image credit

6. Forever-Free Tier and $100M AI Training Fund

In a community-building move, Databricks introduced a forever-free edition of the platform and committed $100 million toward AI and data training. This massive investment creates a pipeline of talent ready to use and govern AI responsibly. For organizations thinking long-term, this is a wake-up call: governance, security, and education need to scale with AI adoption, not follow behind.

Datacoves Take: This feels like a good way to get more people to try Databricks without a big commitment. Hopefully, competitors take note and do the same. This will benefit the entire data community.

Read the full post from Databricks here:
https://www.databricks.com/blog/summary-dais-2025-announcements-through-lens-games

What Data Leaders Must Do Next After Databricks AI Summit 2025

Democratizing Data Access Is Critical

With tools like Databricks One and Genie enabling no-code, natural language analytics, data leaders must prioritize making insights accessible beyond technical teams to drive faster, data-informed decisions at every level.

Simplify and Unify Data Architecture

Lakebase’s integration of transactional and analytical workloads signals a move toward simpler, more efficient data stacks. Leaders should rethink their architectures to reduce complexity and support real-time, AI-driven applications.

Operationalize AI Agents for Business Impact

Agent Bricks and built-in AI judges highlight the shift from experimental AI agents to production-ready, measurable workflows. Data leaders need to invest in frameworks and governance to safely scale AI agents across use cases.

Governance Must Span Formats and Engines

Unity Catalog’s expanded support for Iceberg, Delta, and cross-engine interoperability emphasizes the need for unified governance frameworks that handle diverse data formats while maintaining business logic and compliance.

Invest in Talent and Training to Keep Pace

The launch of a free tier and $100M training fund underscores the growing demand for skilled data and AI practitioners. Data leaders should plan for talent development and operational readiness to fully leverage evolving platforms.

The Road Ahead: Operationalizing AI the Datacoves Way

The Databricks AI Summit 2025 signals a fundamental shift: from scattered tools and isolated workflows to unified, governed, and AI-native platforms. It’s not just about building smarter systems; it’s about making those systems accessible, efficient, and scalable for the entire organization.

While these innovations are promising, putting them into practice takes more than vision; it requires infrastructure that balances speed, control, and usability.

That’s where Datacoves comes in.

Our platform accelerates the adoption of modern tools like dbt, Airflow, and emerging AI workflows, without the overhead of managing complex environments. We help teams operationalize best practices from day one, reducing total cost of ownership while enabling faster delivery, tighter governance, and AI readiness at scale. Datacoves supports Databricks, Snowflake, BigQuery, and any data platform with a dbt adapter. We believe in an open and interoperable feature where tools are integrated without increasing vendor lock-in. Talk to us to find out more.

Want to learn more? Book a demo with Datacoves.

Snowflake summit 2025
5 mins read

It is clear that Snowflake is positioning itself as an all-in-one platform—from data ingestion, to transformation, to AI. The announcements covered a wide range of topics, with AI mentioned over 60 times during the 2-hour keynote. While time will tell how much value organizations get from these features, one thing remains clear: a solid foundation and strong governance are essential to deliver on the promise of AI.

Snowflake Intelligence (Public Preview)

Conversational AI via natural language at ai.snowflake.com, powered by Anthropic/OpenAI LLMs and Cortex Agents, unifying insights across structured and unstructured data. Access is available through your account representative.  

Datacoves Take: Companies with strong governance—including proper data modeling, clear documentation, and high data quality—will benefit most from this feature. AI cannot solve foundational issues, and organizations that skip governance will struggle to realize its full potential.

Data Science Agent (Private Preview)

An AI companion for automating ML workflows—covering data prep, feature engineering, model training, and more.

Datacoves Take: This could be a valuable assistant for data scientists, augmenting rather than replacing their skills. As always, we'll be better able to assess its value once it's generally available.

Cortex AISQL (Public Preview)

Enables multimodal AI processing (like images, documents) within SQL syntax, plus enhanced Document AI and Cortex Search.

Datacoves Take: The potential here is exciting, especially for teams working with unstructured data. But given historical challenges with Document AI, we’ll be watching closely to see how this performs in real-world use cases.

AI Observability in Cortex AI (GA forthcoming)

No-code monitoring tools for generative AI apps, supporting LLMs from OpenAI (via Azure), Anthropic, Meta, Mistral, and others.

Datacoves Take: Observability and security are critical for LLM-based apps. We’re concerned that the current rush to AI could lead to technical debt and security risks. Organizations must establish monitoring and mitigation strategies now, before issues arise 12–18 months down the line.

Snowflake Openflow (GA on AWS)

Managed, extensible multimodal data ingestion service built on Apache NiFi with hundreds of connectors, simplifying ETL and change-data capture.

Datacoves Take: While this simplifies ingestion, GUI tools often hinder CI/CD and code reviews. We prefer code-first tools like DLT that align with modern software development practices. Note: Openflow requires additional AWS setup beyond Snowflake configuration.

dbt Projects on Snowflake (Public Preview)

Native dbt development, execution, monitoring with Git integration and AI-assisted code in Snowsight Workspaces.

Datacoves Take: While this makes dbt more accessible for newcomers, it’s not a full replacement for the flexibility and power of VS Code. Our customers rely on VS Code not just for dbt, but also for Python ingestion development, managing security as code, orchestration pipelines, and more. Datacoves provides an integrated environment that supports all of this—and more. See this walkthrough for details: https://www.youtube.com/watch?v=w7C7OkmYPFs

Enhanced Apache Iceberg support (Public/Private Preview)

Read/write Iceberg tables via Open Catalog, dynamic pipelines, VARIANT support, and Merge-on-Read functionality.

Datacoves Take: Interoperability is key. Many of our customers use both Snowflake and Databricks, and Iceberg helps reduce vendor lock-in. Snowflake’s support for Iceberg with advanced features like VARIANT is a big step forward for the ecosystem.

Modern DevOps extensions

Custom Git URLs, Terraform provider now GA, and Python 3.9 support in Snowflake Notebooks.

Datacoves Take: Python 3.9 is a good start, but we’d like to see support for newer versions. With PyPi integration, teams must carefully vet packages to manage security risks. Datacoves offers guardrails to help organizations scale Python workflows safely.

Snowflake Semantic Views (Public Preview)

Define business metrics inside Snowflake for consistent, AI-friendly semantic modeling.

Datacoves Take: A semantic layer is only as good as the underlying data. Without solid governance, it becomes another failure point. Datacoves helps teams implement the foundations—testing, deployment, ownership—that make semantic layers effective.

Standard Warehouse Gen2 (GA)

Hardware and performance upgrades delivering ~2.1× faster analytics for updates, deletes, merges, and table scans.

Datacoves Take: Performance improvements are always welcome, especially when easy to adopt. Still, test carefully—these upgrades can increase costs, and in some cases existing warehouses may still be the better fit.

SnowConvert AI

Free, automated migration of legacy data warehouses, BI systems, and ETL pipelines with code conversion and validation.

Datacoves Take: These tools are intriguing, but migrating platforms is a chance to rethink your approach—not just lift and shift legacy baggage. Datacoves helps organizations modernize with intention.

Cortex Knowledge Extensions (GA soon)

Enrich native apps with real-time content from publishers like USA TODAY, AP, Stack Overflow, and CB Insights.

Datacoves Take: Powerful in theory, but only effective if your core data is clean. Before enrichment, organizations must resolve entities and ensure quality.

Sharing of Semantic Models (Private Preview)

Internal/external sharing of AI-ready datasets and models, with natural language access across providers.

Datacoves Take: Snowflake’s sharing capabilities are strong, but we see many organizations underutilizing them. Effective sharing starts with trust in the data—and that requires governance and clarity.

Agentic Native Apps Marketplace

Developers can build and monetize Snowflake-native, agent-driven apps using Cortex APIs.

Datacoves Take: Snowflake has long promoted its app marketplace, but adoption has been limited. We’ll be watching to see if the agentic model drives broader use.

Improvements to Native App Framework

Versioning, permissions, app observability, and compliance badging enhancements.

Datacoves Take: We’re glad to see Snowflake adopting more software engineering best practices—versioning, observability, and security are all essential for scale.

Snowflake Adaptive Compute (Private Preview)

Auto-scaling warehouses with intelligent routing for performance optimization without cost increases.

Datacoves Take: This feels like a move toward BigQuery’s simplicity model. We’ll wait to see how it performs at scale. As always, test before relying on this in production.

Horizon Catalog Interoperability & Copilot (Private Preview)

Enhanced governance across Iceberg tables, relational DBs, dashboards, with natural-language metadata assistance.

Datacoves Take: Governance is core to successful data strategy. While Horizon continues to improve, many teams already use mature catalogs. Datacoves focuses on integrating metadata, ownership, and lineage across tools—not locking you into one ecosystem.

Security enhancements

Trust Center updates, new MFA methods, password protections, and account-level security improvements.

Datacoves Take: The move to enforce MFA and support for Passkeys is a great step. Snowflake is making it easier to stay secure—now organizations must implement these features effectively.

Enhanced observability tools

Upgrades to Snowflake Trail, telemetry for Openflow, and debug/monitor tools for Snowpark containers and GenAI agents/apps.

Datacoves Take: Observability is critical. Many of our customers build their own monitoring to manage costs and data issues. With these improvements, Snowflake is catching up—and Datacoves complements this with pipeline-level observability, including Airflow and dbt.

Read the full post from Snowflake here:
https://www.snowflake.com/en/blog/announcements-snowflake-summit-2025/

Build vs buy Data Platform
5 mins read

The modern data stack promised to simplify everything. Pick best-in-class tools, connect them, and ship insights. The reality for most data teams looks different: months spent configuring Kubernetes, debugging Airflow dependencies, and managing Python environments before a single pipeline runs in production. Who manages the infrastructure around those tools matters more than which tools you pick.

This article breaks down the build vs. buy decision for the two tools at the core of every modern data platform: dbt Core for transformation and Apache Airflow for orchestration. Both are open source. Both are powerful. And both are significantly harder and more expensive to self-host than most teams anticipate.

What Does "Build vs. Buy" Actually Mean for Data Teams?

In the context of the modern data stack, this decision is not about building software from scratch. dbt Core and Apache Airflow already exist. They are battle-tested, open source, and free to use under permissive licenses.

The real question is: who manages the infrastructure that makes them run in production?

What "Build" Really Means

Building means your team owns the infrastructure. You provision and manage Kubernetes clusters, configure Git sync for DAGs, handle Python virtual environments, manage secrets, set up CI/CD pipelines, and keep everything running as tools release new versions. The tools are free. The operational burden is not.

What "Buy" Really Means

Buying means a managed platform handles that infrastructure for you. Vendors like dbt Cloud, MWAA, Astronomer, and Datacoves build on top of the open-source foundation and manage the environment so your team does not have to. For a detailed feature comparison, see dbt Core vs dbt Cloud. You trade some control for significantly less operational overhead. The key word is "some," the best managed platforms give up very little flexibility while eliminating most of the burden.

This begs the important question: Should you self-manage or pay for your open-source analytics tools?

Build vs. Buy: The Real Tradeoffs

Both options have legitimate strengths. The right call depends on your team's size, technical depth, compliance requirements, and how much platform maintenance you can absorb without slowing down delivery. Here is a look at each.

The Case for Building In-House

The primary argument for building is control. Your team owns every configuration decision: how secrets are stored, how DAGs are synced, how environments are structured, and how tools integrate with your existing systems. For organizations with specialized workflows that no managed platform supports, this matters.

The tradeoff is real and significant. A production-grade Airflow deployment on Kubernetes requires deep DevOps expertise. You will spend weeks on initial setup before writing a single DAG. Ongoing maintenance, dependency management, version upgrades, and security hardening become a permanent part of your team's workload. And when the engineer who built it leaves, that institutional knowledge walks out the door.

Building also means your team is running version 1 of your own platform. Edge cases, security gaps, and scaling issues will surface in production. That is not a risk with a managed solution that has been hardened across many enterprise deployments.

The Case for Buying a Managed Platform

Managed platforms eliminate the infrastructure burden so your team can focus on what actually drives business value: building data models, delivering pipelines, and getting insights to stakeholders faster.

The common concern is flexibility. Many managed platforms lock you into standardized workflows, limit your tool choices, or make migration difficult. That concern is valid for some vendors, not the category as a whole. The right question is not "build or buy" but "which managed platform gives us the control we need without the overhead we do not want.

A well-chosen managed platform gets your team writing and running code in days, not months. It handles upgrades, secrets management, CI/CD scaffolding, and environment consistency. And unlike version 1 of your homegrown solution, it has already solved the edge cases you have not encountered yet.

Open Source Is Not Free: The Hidden Costs of Self-Hosting

Open source looks free the way a free puppy looks free. The license costs nothing. Everything that comes after it does. For most data teams, self-hosting dbt Core and Airflow on Kubernetes carries high hidden costs in engineering time alone, before infrastructure spend.

For dbt and Airflow, the real costs fall into three categories: engineering time, security and compliance, and scaling complexity. Most teams underestimate all three.

Before diving into each category, here is what self-hosting dbt Core and Airflow actually costs your team:

  • Weeks of initial setup before a single pipeline runs in production
  • $5,000 to $26,000 per month in engineering salaries spent on platform management
  • Kubernetes expertise required for deployment and scaling
  • Security and compliance implementation from scratch
  • Ongoing dependency management and version upgrades
  • Institutional knowledge loss every time an engineer leaves
  • Extended downtime costs when things break at scale

Engineering Time and Expertise

Setting up a production-grade Airflow environment on Kubernetes is not a weekend project. Teams routinely spend weeks configuring DAG sync via Git or S3, managing Python virtual environments, wiring up secrets management, and debugging dependency conflicts before anything runs reliably.

Then there is the ongoing cost. Upgrades, incident response, onboarding new engineers, and keeping the environment consistent across developers all consume time that could be spent delivering data products. A senior data engineer earns between $126,000 and $173,000 per year (Glassdoor, ZipRecruiter). For a team of two to four engineers spending 25 to 50 percent of their time on platform management, that's $5,250 to $28,830 per month in engineering costs alone, before a dollar of infrastructure spend. And that's assuming no one leaves. For a deeper breakdown of what these tools actually cost to run, see what open source analytics tools really cost.

A managed platform can have your team writing and running code in days. Datacoves helped J&J set up their data stack in weeks, with full visibility and automation from day one.

Security and Compliance Overhead

With open-source tools, your team is responsible for implementing security best practices from the ground up. Secrets management, credential rotation, SSO integration, audit logging, and network isolation do not come preconfigured. Each one requires research, implementation, and ongoing maintenance.

For regulated industries like healthcare, finance, or government, compliance requirements add another layer. Meeting HIPAA, SOX, or internal governance standards through a self-managed stack is a process of iteration and refinement. Every hour spent here is an hour not spent on data products, and every gap is a potential audit finding.

Scaling Complexity

Scaling a self-hosted Airflow deployment means scaling your Kubernetes expertise alongside it. As DAG count grows, as team size increases, and as pipeline complexity compounds, the operational surface area expands. Memory issues, worker contention, and environment drift become recurring problems.

Extended downtime at scale is not just an engineering problem. Business users who depend on fresh data feel it directly. The hidden cost is not just the engineering hours spent fixing it. It is the trust lost with stakeholders when the data is late or wrong.

The Case for Buying a Managed Platform

The strongest argument for a managed platform is compounding speed, not convenience.

Every week your team spends managing infrastructure is a week not spent building data products. That gap compounds. A team that gets into production in days instead of months delivers more value, builds more trust with stakeholders, and develops faster than one still debugging Kubernetes configurations three months in.

Managed platforms handle the infrastructure layer your team should not be owning: upgrades, secrets management, environment consistency, CI/CD scaffolding, and scaling. What used to take months of setup is available on day one. And because you are running a platform that has been hardened across many enterprise deployments, the edge cases have already been solved.

The reliability argument matters too. Your homegrown solution is version 1. A mature managed platform is version 1,000. The difference shows up in production at the worst possible times.

The Vendor Lock-in Question

The most common objection to buying is vendor lock-in. It is a legitimate concern, and it applies to some platforms more than others.

The risk is real when a managed platform abstracts away the underlying tools with a proprietary layer, when you do not own your code and metadata, or when switching providers requires a full rebuild. Some vendors in this space do exactly that.

The risk is low when the platform is built on open-source tooling at the core, when you retain full ownership of your code, models, and DAGs, and when the architecture is designed to be warehouse and tool agnostic. Before signing with any vendor, ask three questions: Can I see the underlying dbt Core and Airflow configurations? Do I own everything I build? Can I swap components as my stack evolves?

If the answers are yes, lock-in is not the risk. Slow delivery is.

Where Managed Platforms Fall Short

Pipeline orchestration and transformation do not exist in isolation. For a deeper look at how dbt and Airflow work together as a unified pair, see dbt and Airflow: The Natural Pair for Data Analytics.

Not all managed platforms are built for enterprise complexity. Some are designed for fast starts, not long-term scale. The most common failure modes are rigid workflow standardization that does not match how your team actually works, SaaS-only deployment that cannot meet strict data sovereignty requirements, and limited support once the contract is signed.

MWAA, for example, manages Airflow infrastructure but still requires significant configuration to integrate with dbt and handle memory issues at scale. dbt Cloud covers the transformation layer well but uses per-seat pricing that scales steeply for larger teams and does not address orchestration. Neither covers the full data engineering lifecycle in a unified environment.

The right managed platform gives your tools a proper home.

Why Datacoves Is the Buy That Feels Like a Build

Datacoves was designed so you don't have to sacrifice.

Datacoves is an end-to-end data engineering platform that runs entirely inside your cloud, under your security controls, and adapts to the tools your team already uses. It manages the infrastructure layer so your team does not have to, without locking you into a rigid workflow or a proprietary toolchain.

What Datacoves Actually Manages

Every developer gets the same consistent workspace from day one: in-browser VS Code, dbt Core, Python virtual environments, Git integration, CI/CD pipelines, and secrets management, all preconfigured and aligned to best practices. There is no weeks-long setup. There is no "figure it out yourself" onboarding. Your team opens the environment and everything works.

Managed Airflow covers both development and production. My Airflow gives individual developers a personal sandbox for fast iteration. Teams Airflow handles shared production orchestration, with DAG syncing from Git, built-in dbt operators, and simplified retry logic. Troubleshooting across the full pipeline, from ingestion through transformation to deployment, happens in one place.

Flexibility Without the Overhead

Datacoves is warehouse agnostic. It works with Snowflake, Databricks, BigQuery, Redshift, DuckDB, and any database with a dbt adapter. It supports dbt Mesh for multi-project, multi-team setups. It integrates with your existing identity provider, logging systems, and ingestion tools. You bring what you have. Datacoves manages the rest.

Unlike dbt Cloud, which is locked to its own runtime and per-seat pricing, or MWAA, which still requires significant configuration work, Datacoves covers the full data engineering lifecycle in a single environment. And because it is built entirely on open-source tooling, there is no proprietary layer trapping your code or your team.

The Private Cloud Advantage

For security-conscious and regulated organizations, Datacoves is the only managed platform in this category that can be deployed entirely within your private cloud account. Your data never leaves your environment. No VPC peering required. No external access to internal resources. Full SSO and role-based access integration with your existing security controls.

This is the difference between a platform that asks you to trust their security and one that puts security entirely in your hands. For teams in healthcare, finance, pharma, or government, that distinction is not a nice-to-have. It is a requirement.

Best Practices Built In

Beyond infrastructure, Datacoves brings a proven architecture foundation. Branching standards, CI/CD enforcement, secrets management patterns, deployment guardrails, and onboarding templates are all pre-baked into the platform. Your team does not need to research and implement best practices from scratch. They inherit them on day one.

Dedicated onboarding, a Resident Solutions Architect on call, and white-glove support mean that best practices do not stay with the champion who led the evaluation. They spread across the whole team. Most tool purchases don't change how a team works. This one does.

Standardized environments and templates reduce onboarding time significantly. Guitar Center onboarded in days, not months, with their full data stack running on Datacoves from the start.

Build makes sense when:

  • Your team has dedicated DevOps and infrastructure engineers with Kubernetes expertise
  • Your workflows have highly specialized requirements no managed platform supports
  • You have the long-term capacity to maintain the platform without sacrificing delivery velocity

Buy makes sense when:

  • Your team's primary job is delivering data products, not managing infrastructure
  • You operate in a regulated industry with strict data sovereignty requirements
  • You need to onboard engineers quickly and consistently
  • You want best practices built in from day one without researching and implementing them yourself

Conclusion: Stop Building What You Should Be Buying

The build vs. buy question is really a resource allocation question. What should your team own, and what should be managed for you?

The answer for most data teams is clear. Own your data models, your business logic, your stakeholder relationships and your architecture decisions. Do not own Kubernetes clusters, Airflow upgrades, and CI/CD pipeline scaffolding. That work consumes engineering time without delivering business value, and it compounds the longer you wait to address it.

As Joe Reis and Matt Housley argue in Fundamentals of Data Engineering, data teams should prioritize extracting value from data rather than managing the tools that support them. The teams that move fastest are not the ones who built the most. They are the ones who made smart decisions about what not to build.

Open source isn't free, and self-hosting is harder than it looks. And the gap between a working proof of concept and a production-grade, secure, scalable data platform is wider than most teams expect until they are already in it.

Datacoves closes that gap. It gives your team the flexibility of a custom build, the reliability of a mature platform, and the security of a private cloud deployment, without the operational burden that makes building so expensive. Your team focuses on data products. Datacoves handles everything underneath them.

If your team is spending more time managing infrastructure than building pipelines, that’s the signal. See Datacoves in action and discover how teams simplify their data platform so they can focus on building, not maintaining.

dbt pricing build vs buy
5 mins read

Organizations often opt for open-source tools because "free" seems like an easy decision, especially compared to the higher price of managed versions of the same tooling. However, as with many things, there is no such thing as a free lunch. When choosing these open-source tools, it is easy to say that the Airflow and dbt pricing is $0 dollars meaning a cost-saving choice, but hidden expenses that are hard to ignore will quickly be revealed.

dbt Core and Apache Airflow are a natural pair in modern data analytics. dbt Core simplifies SQL-based data transformations, empowering data teams to create and maintain clean, well-documented, structured pipelines. Apache Airflow takes care of orchestrating these workflows, automating the movement and processing of data through the data engineering life cycle. Together, they can drive a powerful analytics stack that’s flexible and scalable—when used correctly. But this flexibility often comes at a price.  

In this article, we’ll examine the build vs. buy dilemma, highlighting the flexibility and true costs of open-source tools like dbt Core and Apache Airflow. We’ll also compare them to managed solutions such as dbt Cloud pricing and Datacoves pricing, providing the insights you need to evaluate the trade-offs and choose the best option for your organization.  

dbt and Airflow pricing

Open-source dbt pricing

The open-source tool dbt is free to download and use. However, the actual cost emerges when considering the technical resources required for effective implementation and management. Tasks such as setting up infrastructure, ensuring scalability, and maintaining the tool demand skilled engineers.

Assuming a team of 2–4 engineers is responsible for these tasks, with annual salaries ranging from $120,000 to $160,000 (approximately $10,000 to $13,000 per month), even dedicating 25–50% of their time to managing dbt Core results in a monthly cost of $5,000 to $26,000. As your use of dbt scales, you may need to hire a dedicated team to manage the open-source solution full-time, leading to costs equating to 100% of their combined salaries.

So we can begin to see the true open source dbt pricing, especially at scale. In addition to engineering labor are other costs such as time, and effort required to maintain and scale the platform. More on that later.

dbt Cloud Pricing

Just on engineering pricing alone, we begin to see the comparison between the open-source and managed solutions. dbt Labs offers a hosted solution, dbt Cloud, with added features and tiered pricing options.

  • Developer Plan: Best for individual users, this free tier includes a single developer seat, up to 3,000 models built per month, and support for one dbt project.
  • Team Plan: Aimed at small to mid-sized teams, this plan supports up to 8 developer seats, one dbt project, and 15,000 models built per month. Priced at $100/user/month, additional model builds cost $0.01 per model.
  • Enterprise Plan: (This is the plan that most medium and large organizations will need) Designed for larger organizations, this tier supports unlimited users and projects. The MSRP jumps to a whopping $4,800 per seat and a fixed number of model runs after which the $0.01 per model price also applies.

Opting for a managed solution will allow your organization to cut engineering costs down or allow your engineers to focus on other projects. However, while dbt Cloud reduced the infrastructure burden a bit, it only focuses on the T of ELT. Meaning, you still need engineers to manage the other pieces of the stack which can result in a disconnected data pipeline.

Open-source Airflow

It is worth noting that some companies decide to use dbt cloud for the scheduler feature which can quickly become limiting as workflows become more complex. The next step is always a full fledged orchestrator such as Airflow.

Just like dbt Core, Apache Airflow is also free to use, but the true cost comes from deploying and maintaining it securely and at scale, which requires significant expertise, particularly in areas like Kubernetes, dependency management, and high-availability configurations.

Assuming 2–4 engineers with annual salaries between $130,000 and $170,000 (around $11,000 to $14,000 per month) dedicate 25–50% of their time to Airflow, the monthly cost ranges from $5,500 to $28,000. The pattern we saw with dbt Core rings true here as well. As your workflows grow, hiring a dedicated team to manage Airflow becomes necessary, leading to costs equating to 100% of their salaries.

Managed Airflow from AWS, MWAA

For teams looking to sidestep the complexities of managing Airflow in-house, managed solutions provide an appealing alternative:

  • AWS Managed Workflows for Apache Airflow (MWAA): A managed Airflow service from Amazon, MWAA simplifies deployment and scaling but has variable pricing based on environment size and execution time, which can make costs unpredictable.
  • Other Providers: Options like Astronomer and Google Cloud Composer offer similar managed Airflow solutions, each with unique features, performance considerations, and pricing structures.

A managed Airflow solution typically costs between $5,000 and $15,000 per year, depending on workload, resource requirements, and the number of Airflow instances. By choosing a managed solution, organizations can see cost savings in the infrastructure maintenance, overall maintenance stress and more.

The hidden costs of open-source tools

Setting up and managing infrastructure for Airflow and dbt Core isn’t as straightforward—or as “free”—as it might seem. The day-to-day work from managing Python virtual environments, keeping dependencies in check, and tackling scaling challenges require ongoing expertise and attention. In addition to salaries and benefits, what starts as an open-source experiment can quickly morph into a significant operational overhead full of hidden costs. Let’s dive into how by looking at time and expertise, security and compliance, and scaling complexities which, if not considered, can lead to possible side effects such as extended downtime, security issues and more.

Time and expertise

The time it takes to configure, customize, and maintain a complex open-source solution is often underestimated. It’s not until your team is deep in the weeds—resolving issues, figuring out integrations, and troubleshooting configurations—that the actual costs start to surface. With each passing day your ROI is threatened. You want to start gathering insights from your data as soon as possible. Datacoves helped Johnson and Johnson set up their data stack in weeks

And then there’s the learning curve. Not all engineers on your team will be senior, and turnover is inevitable. New hires will need time to get up to speed before they can contribute effectively. This is the human side of technology: while the tools themselves might move fast, people don’t. That ramp-up period, filled with training and trial-and-error, represents yet another hidden cost.

Security and compliance

Security and compliance add another layer of complexity. With open-source tools, your team is responsible for implementing best practices—like securely managing sensitive credentials with a solution like AWS Secrets Manager. Unlike managed solutions, these features don’t come prepackaged and need to be built integrated with the system.

Compliance is no different. Ensuring your solution meets enterprise governance requirements takes time, research, and careful implementation. It’s a process of iteration and refinement, and every hour spent here is another hidden cost as well as risking security if not done correctly.

Scaling complexities

Scaling open-source tools is where things often get complicated. Beyond everything already mentioned, your team will need to ensure the solution can handle growth. For many organizations, this means deploying on Kubernetes. But with Kubernetes comes steep learning curves and operational challenges. Making sure you always have a knowledgeable engineer available to handle unexpected issues and downtimes can become a challenge. Extended downtime due to this is a hidden cost since business user are impacted as they become reliant on your insights.  

Comparing build vs. buy: Key tradeoffs

Throughout this article, we have uncovered the true costs of open-source tools, bringing us to the critical decision between building in-house or buying a managed solution. Even after we have uncovered the actual cost of open-source, the decision isn’t just about price—it’s also about flexibility a custom build offers.

Managed solutions often adopt a one-size-fits-all approach designed to attract the widest range of customers. While this can simplify implementation for many organizations, it may not always meet the specific needs of your team. To make an informed decision, let’s examine the key advantages and challenges of each approach.

Building In-House

Pros:

  • Customization: The biggest advantage of building in-house is the flexibility to customize the tool to fit your exact use case. You maintain full control, allowing you to align configurations with your organization’s unique needs. However, with great power comes great responsibility—your team must have a deep understanding of the tools, their options, and best practices.
  • Control: Owning the entire stack gives your team the ability to integrate deeply with existing systems and workflows, ensuring seamless operation within your ecosystem.
  • Cost Perception: Without licensing fees, building in-house may initially appear more cost-effective, particularly for smaller-scale deployments.

Cons:

  • High Upfront Investment: Setting up infrastructure requires a significant time commitment from developers. Tasks like configuring environments, integrating tools like Git or S3 for Airflow DAG syncing, and debugging can consume weeks of developer hours.
  • Operational Complexity: Ongoing maintenance—such as managing dependencies, handling upgrades, and ensuring reliability—can be overwhelming, especially as the system grows in complexity.
  • Skill Gaps: Many teams underestimate the level of expertise needed to manage Kubernetes clusters, Python virtual environments, and secure credential storage systems like AWS Secrets Manager.

Example:
A team building Airflow in-house may spend weeks configuring a Kubernetes-backed deployment, managing Python dependencies, and setting up DAG synchronizing files via S3 or Git. While the outcome can be tailored to their needs, the time and expertise required represent a significant investment.

Buying a managed solution

Pros:

  • Faster Time to Value: With a managed solution, your team can get up and running quickly without spending weeks—or months—on setup and configuration.
  • Reduced Operational Overhead: Managed providers handle infrastructure, maintenance, and upgrades, freeing your team to focus on business objectives rather than operational minutiae.
  • Predictable Costs: Managed solutions typically come with transparent pricing models, which can make budgeting simpler compared to the variable costs of in-house built tooling.

Cons:

  • Potentially Less Flexibility: Managed solutions may not allow for the same level of customization as building in-house, which could limit certain niche use cases.
  • Dependency on a Vendor: Relying on a vendor for your analytics stack introduces some level of risk, such as service disruptions or limited migration paths if you decide to switch providers.

Example:

Using a solution like MWAA, teams can leverage managed Airflow eliminating the need for infrastructure worries however it may not have the flexibility or interoperability with other aspects of their stack

Whereas using a solution like Datacoves, teams can leverage managed Airflow and pre-configured environments for dbt Core. This eliminates the need for infrastructure setup, simplifies day-to-day operations, and allows teams to focus on deriving value from their analytics, not maintaining the tools that support them.  

Verdict on build vs buy

There is no universal right answer to the build vs. buy dilemma—every use case is unique. However, it’s important to recognize that many problems have already been solved. Unless there is a compelling reason to reinvent the wheel, leveraging existing solutions can save time, money, and effort.

In Fundamentals of Data Engineering, Joe Reis and Matt Housley emphasize the importance of focusing on delivering insights rather than getting entangled in the complexities of building and maintaining data infrastructure. They advocate for using existing solutions wherever possible to streamline processes and allow teams to concentrate on extracting value from data. The key question to ask is: Will building this solution provide your organization with a competitive edge? If the answer is no, it’s worth seeking out an existing solution that fits your needs. Managed platforms can reduce the need for dedicated personnel as we saw above and provide predictable costs, making them an attractive option for many teams.

This philosophy underpins why we built Datacoves. We believe data teams shouldn’t be bogged down by the operational complexities of tools like dbt and Airflow. And we also believe that Data teams should have access to the flexibility a custom-built solution has to offer. Datacoves offers the flexibility these tools are known for while removing the infrastructure burden, enabling your team to focus on what really matters: generating actionable insights that drive your organization forward.  

Why teams choose Datacoves for dbt and Airflow  

Simplifying complex analytics stacks

Datacoves delivers the best of both worlds: the flexibility of a custom-built open-source solution combined with the rich features and zero-infrastructure maintenance of a managed platform—all with minimal vendor lock-in. How does Datacoves achieve this? By focusing on open-source tools and eliminating the burden of maintenance. Datacoves has already done the challenging work of identifying the best tools for the job, configuring them to work seamlessly together, and optimizing performance.

With Datacoves, your team can stop worrying about infrastructure and focus entirely on generating insights. The platform includes bundled in-browser VS Code, dbt Core, and Python extensions, alongside ready-to-use virtual environments tailored to analytics needs. Add to this a fully managed Airflow experience, and you have a solution where the code remains yours, but the operational headaches are gone.

Managed Airflow with unique features

Datacoves has enhanced Airflow with features designed to make DAG development more intuitive and enjoyable:

  • Flexible Airflow: Datacoves can be used with or without dbt, making it an ideal choice for teams seeking a robust, scalable Airflow solution without the hassle of managing it themselves. Anything you can do with open-source Airflow, you can do with Datacoves—without requiring a dedicated team.
  • Shared “Team Airflow” for Collaboration: This shared space is designed for developing and testing DAGs in a production-like environment, ensuring that buggy DAGs don’t disrupt your production workflows. Team members can collaborate in an isolated development environment, enabling faster iteration and testing.
  • Developer-Specific “My Airflow” for Isolated Workflows: Recognizing the limitations of a shared environment, Datacoves created “My Airflow,” a standalone instance for individual developers. Changes made to DAGs immediately reflect in this instance, allowing for rapid iteration and testing before moving to “Team Airflow” for more robust validation.
  • Streamlined DAG Deployment (S3/Git Sync): Whether syncing DAGs via S3 or Git, Datacoves simplifies the process, eliminating common deployment pain points associated with Airflow orchestration.

Enterprise-grade infrastructure

  • Kubernetes-Backed Scalability: Building a scalable platform often requires Kubernetes expertise, which can be costly and time-intensive. Datacoves eliminates this need with a skilled team that manages Kubernetes deployments, handling upgrades, maintenance, and downtime risks for you.
  • SaaS and Private Cloud Options: Datacoves offers deployment flexibility, accommodating enterprises with strict security requirements. Choose between a SaaS offering or a private deployment within your organization’s VPC, depending on your needs.

Cost predictability

One of the key benefits of Datacoves is the elimination of hidden costs through its all-in-one platform approach. Teams often realize too late that piecing together the modern data stack—combining open-source tools, hosting solutions, and server infrastructure—results in unpredictable costs. A single misstep in configuration can lead to high cloud bills.

Datacoves removes the guesswork. Its optimized infrastructure provides predictable billing for non-variable services, along with clear guidelines for variable costs. By implementing best practices and optimizations, Datacoves ensures that your costs remain as low as possible without sacrificing performance.

Datacoves makes it easier for teams to harness the power of open-source tools like dbt and Airflow, without the operational burden. From simplifying complex workflows to delivering enterprise-grade infrastructure and predictable costs, Datacoves empowers teams to focus on what matters most: driving insights and business value.

Conclusion

Open-source tools like Airflow are incredibly powerful, offering flexibility and extensibility that modern analytics teams need. However, as we have seen, the initial appeal of "free" tools is not true. Actual costs exist in the form of salaries and benefits and hidden costs like costs of implementation, scaling, and long-term maintenance are very real and expensive. Paid solutions are around for a reason and finding the best one that suits your needs is essential.  

The most flexible managed data platform on the market

If your team is looking to scale its analytics stack without the operational burden of managing open-source tools, Datacoves offers the perfect balance of flexibility, simplicity, and cost-efficiency.  Explore Datacoves to learn more about our all-in-one platform for dbt Core and Airflow or check out our case studies and testimonials to see how other teams have accelerated their analytics engineering journey with Datacoves.

Why don't decision makers trust your analytics
5 mins read

The reason companies fail at leveraging analytics stems from the fact that people tend to focus on the destination instead of the journey that will lead to the solutions that will have the most impact on the business. Time and time again, I see people focus on the so-called shiny objects, like new tools, new techniques, or even new people, that appear to be the silver bullet everyone needs. The truth is, if you go back to the first principles and start with true alignment, good data processes, and user-centric experiences, project success and satisfaction are achievable.

Lack of Alignment Reduces Faith in the Solution

Every project I have been a part of started with a sense of optimism and excitement. The honeymoon phase was great. Everyone was united; we had gotten the funding, selected vendor partners, and purchased whatever technology was part of the solution. We all spoke the same language, everyone got to work, management started getting progress updates, and everyone thought we were off to a great start.

It wasn't until real decisions needed to be made that we realized the honeymoon was over. In every single instance, an excessive amount of time was spent in meetings  arguing and reaching some level of consensus until the next decision. The reason this happened was because we didn't really spend the time to get on the same page. People assumed that we were aligned because at a high level, we all talking about the key points of the given initiative: digital transformation, self-service analytics, customer mastering, data lakes, etc.

But we were not really thinking the same things. Everyone had different backgrounds and had expertise on different parts of the solution: regulatory requirement, technology limitations, end-user needs, etc. There were also things no one knew at the start, and we didn't have a north star to guide these decisions. We all appeared to be saying the same things, but we were thinking very differently.

Lack of Alignment Reduces Faith in the Solution

I have seen the pressure to get started on a project and show progress lead to delays and ultimate dissatisfaction with the end result. On projects where we have spent a couple of weeks getting aligned using a structured approach to product discovery, we ended up with better estimates and better overall satisfaction.

In any analytics-related project, the same things apply: the team needs to understand the business objectives, the current state (so the new process isn't worse), the risks, and prioritize the high-level features. Most importantly, the team needs to align on what's NOT in the new solution and the prioritizing criteria such as quality, feature completeness, or usability that will be used when making decisions. Agile does not mean no planning.

Trust starts by listening to people and creating a shared vision that sets the right expectations from day one. You can create an achievable plan if everyone knows what you are trying to achieve.

People don't Trust the Data

Let's face it, your data processes get no love. This is usually because this is "too technical." Your users don't care about databases, schemas, tables, or columns, let alone the process of converting raw facts into business-ready insights. It's easy for management to see a fancy dashboard and get excited about the possibility of machine learning, but talk about data and people's eyes gloss over.

It kind of makes sense; most people don't understand how the power grid works. We all take it for granted. We flip a switch, the light turns on, and we move forward. No one cares about electricity until something goes wrong. In a lot of organizations, things go wrong with data more often than you would think. Sometimes people notice right away, but other times failures are silent. When something does go wrong, everyone goes into firefighting mode. Meetings are held, issues are discovered, and patches to "prevent" the failure are put in place. The time to think about the inevitable is not once things break; you need to anticipate failure and design for resilience.

The issue here is that we don't think of the process of going from raw data to insights as a single system. It is all interconnected and needs to be treated as such. When it comes to analytics, sometimes it feels like companies want to build a mansion on a foundation atop quicksand. Initially, all seems fine, and everyone is in the house decorating until someone notices that a corner of the house is sinking. Everyone goes outside, props up the corner, and they happily go back inside to decide what color to paint the next room.

You can't build a house on quicksand; you need to set up repeatable processes with quality built in from the start. If we want collaboration, we have to build it in. If you want to be able to do impact analysis, guess what? You can't retrofit that later if you didn't do it from the start. Having documented analytics is not magic; you need this to be part of the culture and part of the process. The good thing is that many smart people have faced the same issues, and there are examples we can see where people are doing things right.

If you want users to trust data analytics, they need to trust the data, and they need to believe in a solid process that is built on a solid foundation.  

Bad User Experiences Erode Confidence

When you try to please everyone, you please no one, and in many companies, technical teams try to do everything they are asked. They jump through hoops to deliver projects, but it is very common for people to be dissatisfied with the end results. I have also seen new tools used like old ones. Teams sometimes take the approach that the new process is just affecting some part of the current broken process, so they only incrementally change it. I have seen Tableau dashboards that are essentially Excel on the web with some automation.

Instead of asking users what they want, we need to understand what they need and why. What are they trying to accomplish? What's wrong with how they do things today? Is the new process / tool you are putting in place better than what they already have? Sometimes it makes more sense to leave a current process as-is until other parts of the system are improved.

Bad user experiences erode confidence

When you understand the real need for an omni-channel dashboard or a sales dashboard, you design the solution to help you achieve that goal. If your users need to quickly get in and out of the tool, you can find ways to reduce the number of clicks it takes them to get there. You simplify access, and you surface the most important information first. You build the solution around them, and more importantly, you are able to justify your decisions and why certain things need to be de-prioritized. When users see that you empathize with them, they trust you. They don't push back on every choice because they know you have their best interests at heart because you have demonstrated time and again that you do care.

Conclusion

Getting decision-makers to trust data analytics is no different than getting anyone to trust anything. You need to start with alignment and set the right expectations; you need to build end-to-end processes that are robust; and you need to deliver the tools that facilitate the job users do.

dbt wont fix your data maturity problem
5 mins read

Implementing dbt (data build tool) can revolutionize your organization's data maturity, however, if your organization is not ready to take advantage of the benefits of dbt it might not be the right time to start. Why? Because the success of data initiatives often hinges on aspects beyond the tooling itself.  

Many companies rush into implementing dbt without assessing their organization’s maturity and this leads to poor implementation. The consequences that come from a poorly implemented dbt initiative can leave the organization frustrated, overwhelmed with technical debt, and wasted resources. To avoid these pitfalls and ensure your organization is truly ready for dbt, you should complete an assessment of your organization's readiness by answering the questions presented later in this article.

What is Data Maturity?

Before diving into the maturity assessment questions, it’s important to understand what data maturity means. Data maturity is the extent to which an organization can effectively leverage its data to drive business value. It encompasses multiple areas, including:

Data-Driven Culture: Fostering an environment where data is integral to decision-making processes.

Data Quality: Ensuring data is accurate, consistent, and reliable.

Data Governance: Implementing policies and procedures to manage data assets.

Data Integration: Seamlessly combining data from various sources for a unified view.

A mature data organization not only ensures data accuracy and consistency but also embeds data-driven decision-making into its core operations.  

How dbt Improves Data Maturity

By leveraging dbt's features, organizations can significantly enhance their data maturity, leading to better decision-making, improved data quality, robust governance, and seamless integration. For example:

what is dbt

Data-Driven Culture: By using dbt, you can improve many aspects that contribute to creating a data-driven culture within an organization. One way is by encouraging business users to be involved in providing or reviewing accurate model and column descriptions which are embedded in dbt. You can also involve them in defining what data to test with dbt. Better Data Quality will improve trust in the data. More trust in the data will always lead to more frequent use and reliance on it.

Data Quality and Observability: dbt enables automated testing and validation of data transformations. This ensures data quality by catching issues like schema changes or data anomalies early in the pipeline. As your data quality and data observability needs grow you can assess where you are on the data maturity curve.  For example, in a sales data model, we can write tests to ensure there are no negative order quantities and that each order has a valid customer ID. With dbt you can also understand data lineage and this can improve impact and root cause analysis when insights don’t seem quite right.

Data Governance: dbt facilitates version control and documentation for all transformations, enhancing transparency and accountability. Organizations can track changes to data models ensuring compliance with data governance policies.

Data Integration: dbt supports the integration of data from multiple sources by providing a framework for consistent and reusable transformations. This allows for the creation of unified data models that provide a holistic view of business operations.

dbt Readiness Assessment

Now that we understand what data maturity is and how dbt can help improve it, you might be ready to jump on the dbt bandwagon. But first, we encourage you to assess your organization’s readiness for dbt. The journey to data maturity involves not only choosing the right tools but also ensuring that your organization is philosophically and operationally prepared to take full advantage of these tools. It is important to recognize that dbt’s approach requires a shift in mindset towards modern data practices, emphasizing transparency, collaboration, and automation.

To determine if your organization is mature enough for dbt or if dbt is the right fit, consider the following assessment questions:

Are you philosophically aligned?

dbt requires a philosophical alignment with its principles, such as ELT (Extract, Load, Transform) instead of the traditional ETL (Extract, Transform, Load) approach. dbt is also based on idempotency meaning that given the same input, you will always get the same output. This is different than traditional ETL that may use incompatible constructs like Auto-Incrementing Primary Keys. If your organization prefers processes that are incompatible with dbt’s methodology, you will face challenges fighting the dbt framework to make it do something it was not intended to do.

Are you just going to do a lift and shift?

Simply migrating existing processes and code to dbt without rethinking them won’t leverage dbt’s full potential. Assess whether you’re ready to redesign your workflows to take advantage of dbt’s capabilities such as incremental tables, snapshots, seeds, etc.

Are you going to take advantage of features like data quality and documentation?

dbt offers excellent features for data quality and documentation. Evaluate if your team is prepared to prioritize the utilization of these features to enhance transparency and trust in your data. Tests and model descriptions will not write themselves. When it comes to good descriptions, they shouldn't come from a data engineering team that does not know how the data is used or the best data quality rules to implement. Good descriptions must involve business user review at a minimum.

Are you going to open up dbt to teams outside of IT?

The goal of dbt is to empower various teams including IT and business users by using the same tooling. Consider if your organization is ready to foster this cross-functional collaboration. When you implement dbt correctly, you will empower anyone who knows SQL to contribute. You can have multiple teams contribute to the insight delivery process and still ensure proper governance and testing before updating production.

Are you going to automate processes like deployment and testing?

Automation is key to achieving efficiency with dbt. Implementing automated deployment, testing, and CI/CD pipelines can significantly improve your workflows. If you aren’t ready to automate, the benefits of dbt may not be fully realized. If you simply put in dbt without thinking about the end-to-end process and the failure points, you will miss opportunities for errors. The spaghetti code you have today didn't happen just because you were not using dbt.

Are you changing more than just tooling?

dbt is a framework, not a silver bullet. Merely changing tools without altering your underlying processes will not solve existing issues. This is a huge issue with organizations that have not done the work to create a data-driven culture. Assess if your team is ready to adopt better naming conventions and more structured processes to make data more understandable.  

Conclusion

Data immaturity might manifest as a reliance on manual processes, lack of data quality controls, or poor documentation practices. These factors can derail the effective implementation of dbt since dbt thrives in environments where data practices are robust and standardized. In other words, dbt alone will not solve these problems.

Ensuring your organization is ready for the changes that come with implementing dbt is not just best practice, it is essential for success. By thoroughly assessing your readiness, you can avoid technical debt, optimize your workflows, and fully harness the power of dbt. Remember, dbt is a powerful tool, but its effectiveness depends on the readiness of your organization to improve data practices and its alignment with dbt’s philosophy.

What is holding you back from true digital transformation
5 mins read

Digital transformation is often seen through the lens of technological advancement and process optimization. Most blog posts and guides out there revolve around implementing new software, automating tasks, and digitizing operations. Yet, there's a pivotal element that's frequently overlooked in these discussions, especially when it comes to an enterprise: the mindset and culture within an organization. This article aims to shed light on why this is crucial in achieving true digital transformation. But first, let's investigate what digital transformation is and why it is important.

Digital transformation defined

Digital transformation is the integration of digital technology into all areas of a business, fundamentally changing how it operates and delivers value to customers. It is more than just a technological upgrade; it is a cultural shift that requires organizations to continually challenge the status quo, experiment, and get comfortable with failure. This often means walking away from long-standing business processes that companies were built upon to embrace new ways of working. Most organizations find this part the most challenging.

Why is digital transformation important

  • Keeping Up with the Digital Economy: In a world where technology evolves rapidly, businesses must adapt to stay relevant. Digital transformation allows companies to remain competitive in an increasingly digital economy.
  • Enhanced Data Collection and Analysis: Digital transformation creates a system for gathering the right data and fully utilizing it for better business decisions, efficiencies, and customer insights.
  • Customer Expectations: Today's customers expect a seamless digital experience. Businesses need to engage with customers on their terms, using digital tools and platforms that are convenient and user-friendly.
  • Increased Agility and Innovation: Adopting digital solutions empowers organizations to be more agile and responsive to changes in the marketplace or industry. It fosters a culture of innovation, encouraging new ideas and approaches.
  • Operational Efficiency: Automation and streamlining of processes reduce operational costs and improve efficiency. This allows employees to focus on more strategic tasks that add value to the business.
  • Risk Management and Compliance: With the increasing importance of data security and privacy, digital transformation helps businesses keep up with changing regulations and protect sensitive information.
  • Sustainability: Digital processes can reduce waste and improve energy efficiency, contributing to more sustainable business practices.

Enterprise digital transformation

To achieve digital transformation in an enterprise 9 times out of 10 there must be a change in company culture. However, changing a company's culture is a formidable task. It is rare to hear statements like, “We need to fundamentally change our problem-solving approach.” This realization became clear to me through my past experiences as I noticed that managers often lacked the influence to drive change at the highest organizational levels. Additionally, the pressure to deliver quick results within budget cycles frequently hindered genuine cultural transformation.

During my tenure at various companies, under numerous managers, the consistent message was the need for improvement. However, I have come to understand that organizations, much like fireflies, develop their own rhythms. It is this unique rhythm that sets apart innovative and transformative companies from those that merely follow without achieving similar success. What do I mean by this? Let’s turn to nature for an explanation.

Firefly phenomenon - Does it mean conformity or innovation in your organization?

Nature is fascinating, especially when observing how hundreds or thousands of fireflies can synchronize their flashes.

In organizations, a similar phenomenon occurs. People will sync up and follow the status quo, even if it is not what is best for the organization. This dramatically hinders digital transformation because the loudest are not always right and yet they cause others to sync up with them. This will cause innovation to be stopped in its tracks.

In addition to this firefly phenomenon, often action differs from ambition. I recall a staff meeting with a former CIO discussing a future less dependent on Microsoft and more open to non-Windows devices. It was clear that iPhones were going to change the corporate landscape. Despite this, every new tool implemented was still optimized for Internet Explorer. This discrepancy between ambition and action often drives analytical people like me to frustration. To effect change, persistence is key. I have had ideas initially dismissed as “not my job,” only to see one later turn into a patented invention.

This manifests itself in other ways as well; have you ever seen a company advocate for fewer meetings while simultaneously criticizing those who do not include “everyone” in decision-making? I have been in such situations and can attest that decision-making by committee is not inherently superior. In fact, the more people involved in an initiative, the less effective it tends to be. This, I believe, is due to the Dunning-Kruger effect.

The more people you involve in a transformation initiative, the more likely the discussions will deteriorate to bike shedding discussions. When there is a disconnect between what is said and what is done, people take notice, and it breeds discontent.

Bike Shed Effect

One firefly can only affect their neighbors

Even in my most successful transformation initiatives, the radius of transformation has been limited to my sphere of influence. Sure, some of my tools and processes got global and cross-functional acceptance, but the underlying principles never took hold because they were too radical for the organization at the time. I was not part of the IT organization so the things I did were typically seen as shadow IT. Instead of focusing on what I should not be doing, it would have been more progressive for them to see how I was practicing Agile principles. They could have inquired about how my project was doing DevOps before that was in style, or how it was that this non-sanctioned product was extremely well received and people sought me out to help them improve their processes.

This means if you want the organization to be more innovative, you need to find the obstacles that hold people back from being innovative. Often politics and bureaucracy impact an initiative more than the solution itself. If you force everyone to comply with existing tools and processes, then you are imposing a constraint on the team that will limit innovation.

A typical way this manifests itself is leadership pushing the idea that one platform or process can solve every need. This can come in the form of imposing that a particular group do data transformation, or a visualization tool be the way that everyone can do analytics. I have never seen one tool that is good at everything, and you end up balancing the single solution with an unmanageable array of tools and processes. A healthy organization is a learning organization that is always open to improvement.  When management encourages pushing boundaries and not taking anything as fact then the company can innovate.  

A great example of driving innovation is seen in the approach of Steve Jobs, co-founder of Apple Inc. Jobs was known for his ability to challenge conventional wisdom and existing standards in the technology industry. He emphasized the importance of understanding the fundamental principles underlying a problem to innovate and create groundbreaking solutions. One notable instance was the development of the iPhone, which revolutionized the smartphone industry. Jobs and his team did not just improve on existing phones; they rethought what a phone could be, focusing on user experience and simplicity. This approach led to a product that dramatically altered how people interact with technology.  

As a leader, you need to look for the fireflies who are using first principles like Steve Jobs to deliver innovative solutions and nurture, or create, a corporate culture that truly challenges what has been done without artificial constraints.

Reasoning by first principles removes the impurity of assumptions and conventions. What remains is the essentials. It’s one of the best mental models you can use to improve your thinking because the essentials allow you to see where reasoning by analogy might lead you astray.

Most fireflies eventually comply, or fly away – Loss of innovators

The transformative and innovative thinkers will either comply or leave, both of which are undesirable. In my case I tended to leave. In every organization where I have worked, I have managed to make a significant impact, often through sheer determination. During my time at one such company, our goal was to introduce a data catalog. By analyzing the problem I was able to discern what was essential for our organization vs an elaborate and idealistic vision which was capable of doing everything. While the IT organization felt it would be better to create a home-grown catalog I understood that our biggest obstacle was getting people to use a catalog in the first place, so time to market was critical. I found that Alation met the needs we had and IT kept to their vision to build an all encompassing catalog, In 3 months I had deployed Alation and 1.5 years later, the home grown solution was a tenth as good. This approach of breaking down the problem to its basic elements and building up from there was critical. It is often underestimated how challenging it is to develop and maintain custom software.  This experience highlights the effectiveness of first principles thinking in deploying practical and efficient solutions.

The reality is that not everyone possesses the tenacity to advocate for change, especially in the face of substantial resistance. Not only that, but I have also witnessed people being ostracized for thinking differently, while others were promoted for fitting in. It is crucial to seek out divergent thinkers and consider the validity of their perspectives, instead of forcing them to conform. This is why true digital transformation necessitates a shift in culture.  

When an individual, much like a firefly that does not flash in unison with the rest, finds themselves out of sync with the collective rhythm, they face a decision: conform and synchronize with the group or venture out to find a new collective that resonates with their unique spark.

How do we change the flash for all? Aligning mindsets for transformation

True transformational change must come from the top. Achieving enterprise digital transformation requires a deep and bold questioning of the status quo. We must critically assess our processes: Is a particular task truly necessary for a certain group? Can we identify and eliminate inefficiencies? Will adding another layer of approval or inspection genuinely enhance outcomes? It is essential to remember that human behavior often has a more profound impact than any technology or process we implement. When decision-making is centralized within one group, solutions are inevitably skewed to reflect their viewpoint. Too often, I have witnessed decisions justified by cost considerations that, upon closer inspection, proved detrimental in the broader context. An effective strategy involves analyzing the entire system, recognizing that optimizing the whole may require accepting lower efficiency in some areas.

The key is to align with the needs of users and the organization and engage leadership in this journey. With a united front, tackling the 'corporate dragons' becomes a more manageable endeavor. One practical approach is employing methodologies like the 'Job to be Done' framework.

Conclusion

Company culture and change management are frequently overlooked in the pursuit of process improvement. Employees operate within their limitations, while management ponders the lack of innovation and agility compared to other companies. The simpler path might seem to be increasing staff or updating technology, but the heart of transformation lies in the mindset of the organization. Leaders aiming for a lasting impact must embrace first principles thinking, ready to scrutinize and challenge established norms. Transformational change rarely stems from incremental improvements; truly innovative companies are those that dare to think and act differently. The organization thus faces a pivotal choice: will it adapt to a new rhythm, or compel its 'new fireflies' to fall in line with the existing order?

Get our free ebook dbt Cloud vs dbt Core

Get the PDF
Download pdf