A lean analytics stack built with dlt, DuckDB, DuckLake, and dbt delivers fast insights without the cost or complexity of a traditional cloud data warehouse. For teams prioritizing speed, simplicity, and control, this architecture provides a practical path from raw data to production-ready analytics.
In practice, teams run this stack using Datacoves to standardize environments, manage workflows, and apply production guardrails without adding operational overhead.
A lean analytics stack built with dlt, DuckDB, DuckLake, and dbt delivers fast, production-ready insights without the cost or complexity of a traditional cloud data warehouse.
The Lean Data Stack: Tools and Roles
A lean analytics stack works when each tool has a clear responsibility. In this architecture, ingestion, storage, and transformation are intentionally separated so the system stays fast, simple, and flexible.
- dlt handles ingestion. It reliably loads raw data from APIs, files, and databases into DuckDB with minimal configuration and strong defaults.
- DuckDB provides the analytical engine. It is fast, lightweight, and ideal for running analytical queries directly on local or cloud-backed data.
- DuckLake defines the storage layer. It stores tables as Parquet files with centralized metadata, enabling a true lakehouse pattern without a heavyweight platform.
- dbt manages transformations. It brings version control, testing, documentation, and repeatable builds to analytics workflows.
Together, these tools form a modern lakehouse-style stack without the operational cost of a traditional cloud data warehouse.
Setting Up DuckDB and DuckLake with MotherDuck
Running DuckDB locally is easy. Running it consistently across machines, environments, and teams is not. This is where MotherDuck matters.
MotherDuck provides a managed control plane for DuckDB and DuckLake, handling authentication, metadata coordination, and cloud-backed storage without changing how DuckDB works. You still query DuckDB. You just stop worrying about where it runs.
To get started:
- Create a MotherDuck account.
- In Settings → Integrations, generate an API token (
MOTHERDUCK_TOKEN). - Configure access to your object storage, such as S3, for DuckLake-managed tables.
- Export the token as an environment variable on your local machine.
This single token is used by dlt, DuckDB, and dbt to authenticate securely with MotherDuck. No additional credentials or service accounts are required.
At this point, you have:
- A DuckDB-compatible analytics engine
- An open table format via DuckLake
- Centralized metadata and storage
- A setup that works the same locally and in production
That consistency is what makes the rest of the stack reliable.
Ingesting Data with dlt into DuckDB
In a lean data stack, ingestion should be reliable, repeatable, and boring. That is exactly what dlt is designed to do.
dlt loads raw data into DuckDB with strong defaults for schema handling, incremental loads, and metadata tracking. It removes the need for custom ingestion frameworks while remaining flexible enough for real-world data sources.
In this example, dlt ingests a CSV file and loads it into a DuckDB database hosted in MotherDuck. The same pattern works for APIs, databases, and file-based sources.
To keep dependencies lightweight and avoid manual environment setup, we use uv to run the ingestion script with inline dependencies.
pip install uv
touch us_populations.py
chmod +x us_populations.pyThe script below uses dlt’s MotherDuck destination. Authentication is handled through the MOTHERDUCK_TOKEN environment variable, and data is written to a raw schema in DuckDB.
#!/usr/bin/env -S uv run
# /// script
# dependencies = [
# "dlt[motherduck]==1.16.0",
# "psutil",
# "pandas",
# "duckdb==1.3.0"
# ]
# ///
"""Loads a CSV file to MotherDuck"""
import dlt
import pandas as pd
from utils.datacoves_utils import pipelines_dir
@dlt.resource(write_disposition="replace")
def us_population():
url = "https://raw.githubusercontent.com/dataprofessor/dashboard-v3/master/data/us-population-2010-2019.csv"
df = pd.read_csv(url)
yield df
@dlt.source
def us_population_source():
return [us_population()]
if __name__ == "__main__":
# Configure MotherDuck destination with explicit credentials
motherduck_destination = dlt.destinations.motherduck(
destination_name="motherduck",
credentials={
"database": "raw",
"motherduck_token": dlt.secrets.get("MOTHERDUCK_TOKEN")
}
)
pipeline = dlt.pipeline(
progress = "log",
pipeline_name = "us_population_data",
destination = motherduck_destination,
pipelines_dir = pipelines_dir,
# dataset_name is the target schema name in the "raw" database
dataset_name="us_population"
)
load_info = pipeline.run([
us_population_source()
])
print(load_info)Running the script loads the data into DuckDB:
./us_populations.pyAt this point, raw data is available in DuckDB and ready for transformation. Ingestion is fully automated, reproducible, and versionable, without introducing a separate ingestion platform.
Transforming Data with dbt and DuckLake
Once raw data is loaded into DuckDB, transformations should follow the same disciplined workflow teams already use elsewhere. This is where dbt fits naturally.
dbt provides version-controlled models, testing, documentation, and repeatable builds. The difference in this stack is not how dbt works, but where tables are materialized.
By enabling DuckLake, dbt materializes tables as Parquet files with centralized metadata instead of opaque DuckDB-only files. This turns DuckDB into a true lakehouse engine while keeping the developer experience unchanged.
To get started, install dbt and the DuckDB adapter:
pip install dbt-core==1.10.17
pip install dbt-duckdb==1.10.0
dbt initNext, configure your dbt profile to target DuckLake through MotherDuck:
default:
outputs:
dev:
type: duckdb
# This requires the environment var MOTHERDUCK_TOKEN to be set
path: 'md:datacoves_ducklake'
threads: 4
schema: dev # this will be the prefix used in the duckdb schema
is_ducklake: true
target: devThis configuration does a few important things:
- Authenticates using the
MOTHERDUCK_TOKENenvironment variable - Writes tables using DuckLake’s open format
- Separates transformed data from raw ingestion
- Keeps development and production workflows consistent
With this in place, dbt models behave exactly as expected. Models materialized as tables are stored in DuckLake, while views and ephemeral models remain lightweight and fast.
From here, teams can:
- Add dbt tests for data quality
- Generate documentation and lineage
- Run transformations locally or in shared environments
- Promote models to production without changing tooling
This is the key advantage of the stack: modern analytics engineering practices, without the overhead of a traditional warehouse.
When This Stack Makes Sense
This lean stack is not trying to replace every enterprise data warehouse. It is designed for teams that value speed, simplicity, and cost control over heavyweight infrastructure.
This approach works especially well when:
- You want fast analytics without committing to a full cloud warehouse.
- Your team prefers open, file-based storage over proprietary formats.
- You are building prototypes, internal analytics, or domain-specific data products.
- Cost predictability matters more than elastic, multi-tenant scale.
- You want modern analytics engineering practices without platform sprawl.
The trade-offs are real and intentional. DuckDB and DuckLake excel at analytical workloads and developer productivity, but they are not designed for high-concurrency BI at massive scale. Teams with hundreds of dashboards and thousands of daily users may still need a traditional warehouse.
Where this stack shines is time to value. You can move from raw data to trusted analytics quickly, with minimal infrastructure, and without locking yourself into a platform that is expensive to unwind later.
In practice, many teams use this architecture as:
- A lightweight production analytics stack
- A proving ground before scaling to a larger warehouse
- A cost-efficient alternative for departmental or embedded analytics
When paired with Datacoves, teams get the operational guardrails this stack needs to run reliably. Datacoves standardizes environments, integrates orchestration and CI/CD, and applies best practices so the simplicity of the stack does not turn into fragility over time.
Teams often run this stack with Datacoves to standardize environments, apply production guardrails, and avoid the operational drag of DIY platform management.
See it in action
If you want to see this stack running end to end, watch the Datacoves + MotherDuck webinar. It walks through ingestion with dlt, transformations with dbt and DuckLake, and how teams operationalize the workflow with orchestration and governance.
The session also covers:
- When DuckDB and DuckLake work well in production
- How to add orchestration with Airflow
- How teams visualize results with Streamlit


-Photoroom.jpg)

