dbt Coalesce 2021 - Day 1 Takeaways

dbt Coalesce 2021 - Day 1 Takeaways

Blake Burch
Blake Burch

Did you miss out on the first day of the dbt Coalesce conference? Want a quick recap? We've got you covered!  There's still 4 more days of content (2 in the US timezones) that you register to attend for free.

Here are our recaps for Day 2 and Day 3.

After attending most of sessions yesterday, here's what stood out to us the most.

The Data Wave is Bigger than the Software Wave

In the opening Keynote, Tristan Handy, CEO of dbt Labs, and Martin Casado, Partner at Andreessen Horowitz chatted through their perspectives on how the modern data stack was shaping the industry and just how far they thought the current trend of data advancements would play out.

The general consensus was that the data wave is very similar to the recent software wave.... and for data, there's an even bigger upside.

Martin made a great point that over the past 10-15 years, everyone has gradually become a software consumer when that used to be something reserved for tech teams. You use apps on your phone. You use SaaS for project management at work. You subscribe to media services like Netflix or Spotify. In much the same way, he expects that we'll get to a state where everyone is a data consumer... but we're still stuck in the phase where only the tech teams can effectively access it.

Once everyone is an active consumer of data, there's an unlimited number of possibilities we can tap into. Tristan mentioned that even if you knew about the car in the 1900s, you would have no way to predict the creation of fast food restaurants like McDonalds or the rise of suburbs. Those ideas wouldn't even be on your radar. In much the same way, unlocking data could unlock a world of new opportunities, which Martin mentioned that science fiction is probably a better indicator of what's possible than any analyst report.

The good news is that almost every problem we can image in the future right now is a data problem. Once one problem is solved, there will be some new problem that data can be applied to. We're 100% on board for helping create the future tooling that lets everyone be a data consumer, using data to quickly build solutions and solve problems.

Self-Service Analytics is a mix of Empathy, Patience, and Documentation

It's easier to scale knowledge than it is to scale people. That's why Erica "Ric" Louie, Head of Data at dbt Labs, is so adamant about building out the right processes to facilitate a data literate organization.

According to Erica, if you want your organization to be self-serve with data, you have to make a concerted effort to figure out how to provide internal users with the right training to feel confident using the data. In the first month of employment, you need to have a way to train any business user about how the data works and make yourselves available as a resource. You need to have clear and concise written documentation as well as video tutorials that can scale beyond your team size. People won't use the resources you provide them if they get confused or frustrated.

"If your colleagues are too scared to ask questions, it doesn't matter how many resources you end up building."

If someone requests data from your team, rather than using the resources provided, figure out why and iterate! On the flip side, if someone uses your resources and pulls a report on their own - celebrate it! This becomes encouragement for them to continue self serving while also providing examples to the rest of the company of what's possible when you self serve.

All-in-all, we thought this session had some great advice to scale up self-service analytics and were impressed with the level of detail and rigor the team had put in place!

dbt Labs internal data team documentation

Taking a dbt Project from Infancy to Adulthood

Dave Connors, Analytics Engineer at dbt Labs, held a session about what it looks like to have a dbt Project that's fully mature. You've probably heard the phrase "crawl, walk, run" and well.... let's just say Dave took that metaphor of aging to heart.

While it's easier than ever to deploy dbt to the cloud, it's not as easy to know what features you should and shouldn't be using. That's why dbt Labs is working towards creating guides that help you figure out how you can naturally improve your dbt projects as they evolve over time.

At a high level the stages are:

  • Stage 1 - Get started by running your existing ETL processes and dbt side-by-side. You can use really big SQL statements - it's ok.
  • Stage 2 - Break apart those SQL statements and create models that reference each other. Build tests and docs ONLY for the final data that everyone should be using.
  • Stage 3 - Create formal naming conventions and organize your models more clearly. Start testing and documenting every model, not just the final ones. Build macros for repeatable bits of logic.
  • Stage 4 - At this point, things will start to get a bit custom, based on your business needs. You'll want to augment your data with metadata and start building customized macros. You'll also want to increase the frequency of deployment for fresher data.
  • Level 5 - Start separating your dbt into different environments for development and production. Build out exposures so you know what downstream systems get impacted by your dbt data (or connect these processes with a orchestration tool like Shipyard).

Below is the summary of every stage of the dbt lifecycle and what features should be implemented at every stage. If you're interested in digging in further, check out Dave's blog post or the associated public repository!

The stages of dbt project maturity

Reverse ETL Can Make You a Data Hero

If you haven't heard of Reverse ETL, you're not alone! In the past year, this new segment of the modern data stack has surged in popularity. While it seems like it's not anyone's favorite term, it's the best description we have (and it seems like it's going to stick).

While ETL (Extract, Transform, Load) is all about getting data from your SaaS tools into your warehouse, Reverse ETL is all about getting data from your warehouse back into your SaaS tools. This article from our friends at Hightouch provides a great overview.

We thought Rachel Bradley-Haas, co-founder of Big Time Data, made some really good points throughout the session. With the modern tooling, "data people finally get to be the hero" since they can quickly move data wherever business users need it. Rachel talked about how every programmer hates repeated code - and Reverse ETL is just a way to eliminate that.  Instead of repeating 80% of your code and writing some brand new API calls (which would cause some engineers to quit) you can now say "run this SQL again, but now for this destination".

The end-goal with tools in the modern data stack is to centralize all of your data in your warehouse so you can maintain control and ownership over the logic. Tejas Manohar, Co-founder of Hightouch, mentioned that Reverse ETL tools just help you take your data and quickly push it out to tools you already use. It's all about driving a quicker time to value with your data.

This philosophy is pervasive even outside of Reverse ETL tools. Even as a data orchestration platform, Shipyard still helps data teams focus on moving data between systems with ease. Our 60+ low-code templates make it easy to avoid the repetitive code so you can get a solution up and running quickly, becoming the hero of your organization. We're excited to see how things continue to shift in the future.

That's a wrap! We hope this gives you a taste of what you missed out on from day 1 of the conference. Stay tuned for our upcoming recaps of the 2nd and 3rd day of the Coalesce 2021 conference. And be sure to check out some of freebies you can get from attending this free conference!