Love desserts? Explain Big Data Analytics with a Delicious Analogy; Baking Cakes
It’s important to make information Tasty and accessible
No one bakes for the sake of baking alone; cakes are meant to be shared. If no one bought, ate, or gifted someone with their delicious chocolate cake, then there would be no bakers. The same is true for data analytics. Our goal as data practitioners is to feed our organization with the information needed to make decisions.
If our cake doesn’t taste good or isn’t available when people want dessert, then it doesn’t matter that we made it from scratch. When it comes to big data, your goal should be to have the equivalent of a delicious cake - usable data - available when someone needs it.
Exploring the ever-changing taste of data analytics
Life would be simple if everyone were happy with a single flavor of the cake. Metrics play a crucial role in our organization, and two of the most fundamental ones are ARR (Annual Recurring Revenue) and NRR (Net Recurring Revenue). These metrics are like chocolate and vanilla - they remain popular and relevant. Yet, these flavors alone are not enough. Just as with ice cream flavors, we eventually need to try something new, the same goes for insights. It's important to experiment and explore different perspectives.
When something is novel, we love it. When we first start baking, we are not consistent. But, over time, quality improves. We go from ok, to good, and to great. Even with our newfound expertise, the chocolate cake will become boring. We all want something new.
With data, you often start with a simple metric. Having something is better than nothing, but that only lasts so long. We discover something new about our business and we translate that information into action. This is good for a while, but we will eventually see diminishing returns. While a LTV (Life Time Value) analysis may have a significant impact today, its usefulness is likely to diminish within a short period of time. Your stakeholders will crave something new, something more innovative and updated.
Someone is going to ask for a deep dive into how CAC (Customer Acquisition Cost) impacts LTV, or they might ask for a new kind of icing on that cake you just made. Either way, the point remains – as those around you start making use (or eating) of what you’re providing, they will inevitably ask for something more.
How much expertise does your business need to Bake successfully?
It depends. Some organizations can do more than enough with the reports and dashboards built into their CRM, web analytics, or e-commerce system. This works for many companies. There is no need to spend extra time and money on more complex data systems when something simple will suffice. There is a reason that chocolate and vanilla are popular flavors; many people like their taste and know what they are getting with their order. The same can be said for your data infrastructure.
However, flexibility is the challenge. You are limited by your reporting options and the data analytics you can access. You can have a cake you can eat, but it’s going to look a certain way, having only one type of icing, and certainly will not have any premium fillings.
If you want those things, you need to look for something a bit more nuanced.
Start Baking with an Easy Bake Oven
How do you start to accommodate your organization’s new demand for analysis or your family’s newly refined palate? By using new tools and techniques.
The Easy Bake Oven is the simplest first step – you get a pouch of ingredients, mix them, and within a few minutes, you have a cake. You might think it’s a children’s toy, but there are very creative recipes out there for the adventurous types. Unfortunately, you can only go so far; you’re limited by the size of the oven, and the speed you can bake each item, and it is targeted at kids, of course.
The corollary in the data world is Excel – a fabulous tool, but something that has its limitations. While it is possible to obtain data from other tools and manipulate it in a spreadsheet to make it more manageable, you are still limited by the pre-designed reports offered by the vendor tools. Excel is flexible enough for many, but it is not a perfect end-state reporting solution for everyone.
Eventually, you’ll need to address inconsistencies between systems and automate the process of data prep before it gets into your spreadsheet.
Elevating Your Baking with More Advanced Techniques
We have a feel for the basics, but now we want to improve our process. There are certain things we do repeatedly, regardless of the recipes we’re making. We need to create an assembly line. Whether we’re calculating metrics or mixing the wet ingredients and the dry ingredients, there are steps we need to complete in a specific order and to a certain level of quality.
We’ve moved on from the Easy Bake Oven and are now baking using a full-size oven. We need to be more careful about our measurements, ensure the oven is at the correct temperature, and write down notes about the various steps in our more complicated recipes. We need to make sure that different cakes, fillings, and decorations are ready at the same time, even if they have wildly different prep times. We need consistent results every time we make the cake, and we need others to be able to make the same recipe and achieve the same results by following our instructions.
Documenting and transferring this knowledge to others is difficult. Sure, you can write things down, but it is easy to skip a step that you take for granted. Perhaps your handwriting is hard to read, or someone is having trouble with the oven. You may be aware that using a hairdryer while baking can cause the circuit breaker to trip, but unless you document this information, others will not know this quirk. If your recipe has changed, such as using scratch ingredients instead of a pre-made cake mix, it is important to clearly specify these adjustments, otherwise your friends and family may struggle to replicate your delicious cake.
There are plenty of data analytics tools out there that assist at this stage – Alteryx,Tableau Prep, and Datameer are just a few of many. In large enterprise organizations, you might find Informatica Power Center, Talend, or Matillion. These types of tools have graphical user interfaces (GUI); they give you the flexibility to extract data, load data, inspect data and transform it. Many enable you to define and calculate metrics. But they require you to work within each tool’s set of rules and constraints. This works well if you are starting and need something less complex.
The process that was once simple now is not; there are hidden assumptions, configurations, and requirements. You’re not using a pen-and-paper recipe anymore; now you’re working within a new system.
GUI-based tools are great for companies whose workflows fit the way the systems work. But between the way some tools are licensed, and the skill needed to use them, they are only available to the IT organization. This leaves users to find shortcuts, develop workarounds, and become dependent on the business’ shadow IT. Inevitably, you’re going to run into maintainability issues.
Recipes often have plenty of steps and ingredients; “data recipes” are no different. There are dependencies between operations, different run times for different transformations, different release cadences, and data availability SLAs. Your team might be able to manage your entire workflow quite well with one of these tools, but once your team starts introducing custom SQL logic or additional overlapping tools into the ecosystem, you introduce another layer of complexity. The result is an increase in the total cost of ownership.
Often, this complexity is opaque, too. It is not obvious what that SQL transformation is doing, but they are strung together to build something usable. Over time, the complexity continues to grow; more custom SQL logic is introduced, and more steps in the chain. Eventually, abstractions begin to form. The data engineers decide that these processing pipelines look quite similar and can be customized based on some basic configurations. Less overhead, more output.
You’re now on the path of building custom ELT (Extract, Transform, and Load) pipelines, stitched together within the constraints of a GUI-based system. For some companies, this is OK, and it works. But there is a hidden cost – it is harder to maintain high quality inputs and outputs. The layers are tightly coupled and a mistake in one step is not caught until the whole pipeline is complete.
IT may not be aware of downstream issues because they occur in other tools outside of their domain; one change here breaks something else there. This is like buying a ready-made cake mix and “enhancing it” with your custom ingredients. It works until it does not. One day, Duncan Hines changes their ingredients, and without you realizing, there is a bad reaction between your “enhancements” and the new mix. Your once great recipe is not so great anymore, but there was no way for Duncan Hines to know. They expected you to follow their instructions; everything has been going according to plan until now.
Even if your tool has strong version control built in, it’s often difficult to reverse the changes before it is too late. If your recipe calls for 1 cup of sugar but you accidentally add in 11, we don’t want to wait until the cake is baked to discover the error. We want to catch that mistake as soon as it happens.
Streamlining the Baking Process for Consistency
Everything to this point serves a specific profile of a business, but what happens when the business matures beyond what these tools offer? What happens when you know how to bake a cake, but struggle to consistently produce hundreds or thousands of the same quality cakes?
When we aim to expand our baking operations, it's crucial to maintain consistency among bakers, minimize accidental mistakes, and have the ability to swiftly recover from any errors that occur. We need enough mixers and ovens to support the demand for our cakes, and we need an organized pantry with the correct measuring cups and spoons. We need to know which ingredients are running low, which are delayed, and which have common allergens.
In data analytics, we have our own supply chain, often called ELT. You will find tools like Airbyte and Fivetran are common choices for bringing in our data “ingredient delivery”. They manage data extraction and ingestion so you can skip the manual CSV downloads that once served you so well.
We want to ensure quality, have traceability, document our process, and successfully produce and deliver our cakes. To do this, we need a repeatable process, with clear sequential steps. In the world of baking, we use recipes to achieve this.
All baking recipes have a series of steps, some of which are common across different recipes. For example, creaming eggs and sugar, combining the wet and dry ingredients, and whipping the icing are repeatable steps that, when performed in the correct order, result in a delicious dessert. Recipes also follow a standard format: preheat the oven to the specified temperature, list the ingredients in the order in which they are used, and provide the preparation steps last. The sequence is intentional and provides a clear understanding of what to expect during the baking process.
We can apply this model to our data infrastructure by using a tool called dbt (data build tool). Instead of repeating miscellaneous transformation steps in various places, we can centralize our transformation logic into reusable components. Then we can reference those components throughout our project. We can also identify which data is stale, review the chain of dependencies between transformations, and capture the documentation alongside that logic.
We no longer need a GUI-driven tool to review our data; instead, we use the process as defined in the code to inform our logic and documentation. Our new teammate can now confidently create her cake to the same standard as everyone else; she can be confident that she is avoiding common allergens, too.
Better yet, we have a history of changes to our process and “data recipe”. Version control and code reviews are an expectation, so we know when modifications to our ingredient list will cause a complete change in the final product. Our recipes are no longer scattered, but part of a structured system of reusable, composable steps.
Why doesn’t everyone do it this way?
It comes down to our readiness. When we started with our Easy Bake Oven, little could go wrong, but little could be tweaked. As we build a more robust system, we can take advantage of its increased flexibility, but we also need to maintain more pieces and ensure quality throughout a more complex process.
We need to know the difference between baking soda and baking powder. Which utensils are best, and which oven cooks most evenly? How do we best organize our new suite of recipes? How do you set up the kitchen and install the appliances? There are many decisions to make, and not every organization is ready to make them. All this can be daunting for even large organizations.
But you don’t have to do things all at once. Many organizations choose to make gradual improvements, transforming their big data process from disorganized to consistent.
You can subscribe to services like Fivetran or Airbyte for data loading and use dbtCloud for development. If your work grows increasingly complex, you can make use of another set of tools (such as Astronomer or Dagster) to orchestrate your end-to-end process. What you gain in flexibility you have lost in simplicity.
Can you become a master baker quickly?
This is what we focus on at Datacoves. We aim to help organizations create mature processes, even when they have neither the time nor resources to figure everything out.
We give you “a fully stocked kitchen” - all the appliances, recipes, and best practices to make them work cohesively. You can take the guess work out of your data infrastructure, and instead, use a suite of tools designed to help your team perform timely and efficient analytics.
Whether your company is early in its data analytics journey or ready to take your processes to the next level, we are here to help. If your organization has strong info-sec or compliance requirements, we can also deploy within your private cloud. Datacoves is designed to get you “baking delicious cake” as soon as possible.
Set aside some time to speak with our co-founder and learn how Datacoves has helped both small and large companies deploy mature analytics processes from the start.
Looking for an enterprise data platform?
Datacoves offers managed dbt core and Airflow and can be deployed in your private cloud.
Considering dbt and whether you should use open source dbt Core or pay for a managed solution? Our eBook offers an in-depth look at dbt Core, dbt Cloud, and Managed dbt Core options. Gain valuable insights into each alternative and how each solution fits best with your organization. From small companies to large enterprise environments, this guide is your key to understanding the dbt landscape, pricing, and total cost of ownership.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.