Data Council 2023 Highlights

Data Council 2023 Highlights

Blake Burch
Angel Catalan
Blake Burch, Angel Catalan

Did you miss out on the Data Council conference in Austin this week? The week was filled with great talks, great people, and lots of thoughts on where the data landscape is headed in the future.

While we’re retiring our captain’s hats for the week, we wanted to recap some of our favorite talks that helped spark new ideas and new ways of thinking about the data landscape. Hopefully these tidbits spark your curiosity!

We’ll be sure to update this article with links to the video presentations once those are released!

Generative AI for Product Builders

Tristan Zajonc (CEO & Co-founder, Continual)

Do you know what complexities are involved with working with generative AI tools? Want to incorporate new AI features with them inside of your product? At a high level, this talk gave me a lot of great ways to assess different build avenues as we try to incorporate more AI in our own platform.

Tristan mentioned that the real goal with generative AI is to get things to a point where you can have “zero-shot” learning, where the AI model is not provided with any previous examples to refer to; only a highly specific prompt that the model can parse through.

One of the bigger concerns with incorporating AI into your product is being cautious about not passing sensitive customer information through the model. This can be solved with smarter prompting where reference values are used to look up the sensitive information after the fact, rather than ever passing the values to the model.

Most notably, I learned a few different ways to reinforce AI generated output beyond the standard thumbs-up or thumbs-down system.

  • One alternative is to use implicit evaluation, where a user accepting the result or not editing it means it was a good response, but generating a new response or making any edits is a bad response. In this way, you use someone’s behavior as a way to determine the value of the response rather than requiring direct feedback from them.
  • Another alternative is to train a “Constitutional AI” that abides by very strict principles. According to the principles you set, you can have it evaluate two responses and choose which one better aligns with those principles.

This talk was highly in-depth and I highly recommend checking it out for a full overview of what it takes to build out AI within your own product.

LLM’s & Semantic Layer: Self-serve has Entered the Chat

Paul Blankley (Co-founder & CTO, Zenlytic)

Paul managed to do the impossible with this talk - help boil down why the semantic layer is so important for the future of analytics, especially as we start to see a rise in generative AI.

At its core, LLMs like GPT-4 can generate SQL, but that’s not good enough for analytics. When a user asks a question that needs to be turned into SQL, the LLM needs to know more about columns of data that are available and what context that column should be used for. For example, when you ask for profit per customer, the model won’t know how profit is calculated. Or if you asked for an average basket size per trip, it doesn’t know if basket size is calculated based on unique products or total quantity of products.

The semantic layer acts as an interpretation of your database’s schema and how the columns should be combined, grouped, and filtered according to business logic. It encodes the tribal knowledge your business has and provides information of how the columns are supposed to be used.

Paul believes that we shouldn’t be relying on AI to generate SQL. Instead, the models should generate the semantic definition of the question being asked, and based on that information, the SQL should be generated.

Building the control plane for data

Shirshanka Das (Co-founder & CEO Acryl Data)

As the data stack has progressed, we’ve seen things like Data Storage, Data Movement, Data Querying, Data Visualizaiton, Data Transformation, and real-time data get significantly easier as companies have openly adopted tooling to help automate and manage these tasks. While some things have gotten easier as time has gone on, things like Data Discovery, Data Quality, and Data management have gotten more complicated.

With data discovery, the over-indexing of data warehouses has made metadata counterintuitive. Beyond data discovery, the explosion of tools on the market has made providing quality data difficult. With many moving parts in the data stack, it’s gotten increasingly difficult to share outputs and perform quality checks.

Having a data control plane allows organizations to integrate quality signals across multiple tools. With intelligent computation, you’re able to trigger events based on arrival, ensuring that quality is being managed along every step of the process.

Feed the alligators with the lights on: How Data Engineers can see who really uses data

Mark Grover (CEO & Co-founder, Stemma)

Mark talked about data observability tooling and how these tools break down into three categories: Know your Data, Access Control, and Data Quality. His perspective is that many teams aren’t able to appropriately plan how a data change may break the existing table, break other people’s ability to make decisions, or break existing automated processes.

Right now the tools that exist show you the mess, but they don’t actually help clean up the mess. I thought Mark had a very interesting approach for building tooling that helps you see everywhere your data is being used, allowing you to quickly message all folks that would be impacted in a single click before making a change. In this way, a data team can be more proactive about the updates they make, all while making changes with a higher level of confidence that there won’t be unintended consequences.

The end of history? Convergence of batch and realtime data technologies

Matt Housley (Co-founder, Ternary Data)

With the progress of the data industry, most companies consider themselves to be “data-driven”. As organizations have established foundational data capabilities, the focus has now shifted to doing better things with all of this newfound data. The CAP theorem, presented in The Fundamentals of Data Engineering, states that databases can only provide two of the three following things. Consistency, Availability, and Partition tolerance.

Looking at the future of data, there will be ways to beat the CAP theorem. Focusing on real-time data will allow businesses to provide better insights to organizations, and with proper tooling in place, you can attain all three components mentioned in the CAP theorem.

Creating Self-Service, High-Velocity Data Cultures

DeVaris Brown (CEO, Meroxa)

In his work at Twitter, DeVaris quickly learned that having a consolidated data platform was crucial for organizations looking to collaborate. Having a plethora of data silos, no source of truth, and no way to measure the value data creates an environment where having a trusted data culture is not possible.

When looking at establishing a data culture, you need:

  1. Purpose: Having a collective understanding amongst the team of the ideal end state. This includes perspectives from both the data and business teams to ensure there’s a common understanding of the company’s priorities.
  2. Process: Having a set of coordinated required actions repeatedly and predictably achieve the desire results. It’s crucial to have the process come before the platform.
  3. Platform: Once the process has been established, teams should look to implement the systems needed to automate the process as much as possible.

Data needs to be available in the right place at the right time for people to consume it, while also ensuring they have an understanding of what that data means. Some best practices to accomplish this are:

  1. Appoint a DRI (directly responsible individual)
  2. This individual should be obsessed with customer/stakeholder success
  3. Perform routine audit and gap analysis of process and platform
  4. Build around the “blessed cowpaths”
  5. Measure time to value on everything

Getting buy in from business teams can be tough, and it’s always best to start with a small project that could produce quick wins. Getting this win and visibility will allow your team to see the value data can bring to an organization, and help create champions for larger data initiatives.

Growing a Developer Community

Wesley Faulkner (Senior Community Manager, AWS)

When looking to establish relationships with developers, it’s important to understand that developers tend not to respond to general marketing. When vaguely describing things, “fast” and “easy” will always invite more questions. “Fast compared to what? Easy compared to who” is typically where an engineer's mind goes and they’ll move on to their next task.

When looking at establishing a technical community, the founder of the community should have the right technical skillset to develop marketing language that speaks to the developer mindset. Establishing communities is hard, but understand what is not a community is the first step. Things like Forums, Social Media, Youtube Viewers, and Paying Customers are not community. While they are a great way to get the word out about the community, communities require that you have:

  1. Peer-to-Peer interaction: This fosters a sense of belonging and mutual support among community members
  2. Clear core interest: Defines the purpose and identity of the community and attracts like-minded people
  3. Individual expression: Allows community members to showcase their personality and creativity, enriching the diversity of the community
  4. Moderation: Ensures that the community is respectful, safe, and inclusive for everyone

These core principles are a great north star when creating a community, but the type of community varies from organization to organization.

  1. Establishing an ambassador program
  2. Crafting a newsletter
  3. User groups
  4. Events
  5. Internal

When looking at any of these communities (or any combination of them), bringing uniqueness is what will bring success. Having a “push/pull” dynamic where you can highlight community folks and their accomplishments will help with organic growth and ensure community participation.

Looking to start a community? Follow these steps to get started:

  1. Identify your target audience and their stage in the funnel
  2. Address the cold start problem by providing incentives, social proof, and feedback loops
  3. Re-engage your existing community members by offering value, recognition and opportunities
  4. Encourage self-selection by creating clear and relevant segments, messages and calls to action

Data Products aren’t just for data teams

Katie Hindson (Head of Product & Data, Lightdash)

With the shift in the data landscape, data literacy is the new computer literacy. Using data allows teams to be more productive and innovative while also enhancing the customer and employee experience. Today, 82% of leaders expect all of their employees to be data literate. The same leaders expect 70% of employees to heavily use data by 2025, up from 40% in 2018. With leadership acknowledging the importance of all team members using data, making it accessible is the number one priority.

The focus for the future is to start building tools for a data-literate future. A great starting point is to meet business users in their preferred UX. Being able to provide the ability to get insights in tools like slack or notion will help adoption, while also exploring things like natural language querying. To make business users successful means data teams have to increase and standardize the context for data tools. Data teams also need to be sure to better leverage metadata to help them feel confident in self-serve decision-making. Otherwise, data tools will only be a barrier to increasing data literacy in these personas.

And that's a wrap! We're looking forward to next year's Data Council conference. Thanks to Pete and the entire team for putting on such a great event.

Want to keep the conversation going about observability, LLMs, and delivering quality data? Get a custom demo from our team of data experts!