Getting Started with dbt (data build tool) Deployment in the Cloud

Getting Started 9 min read

In this tutorial, we'll walk you through the steps it takes to deploy and automate dbt models in the cloud using the models that you create in the dbt's own jaffle-shop tutorial. However, you can use this as a general guide to deploy ANY dbt models that you may have created for your organization.

Complete the dbt Tutorial

Setting up | docs.getdbt.com
In this tutorial, we will be turning the below query into a dbt project that is tested, documented, and deployed — you can check out the generated documentation for the project we’re building here.

Work your way through the above dbt tutorial, following all of the steps related to the dbt CLI until you reach the step for "Deploying your Project".

Alternatively, you can skip this step by forking our dbt-tutorial repository and using the code found on the finished-dbt-tutorial branch. However, you'll still need to provide your own Bigquery credentials.

Make Adjustments for Running dbt in Production

  1. Update your profiles.yml file to use environment variables in place of sensitive data or references to local files and directories.

    For the dbt tutorial, your profiles.yml will only need to exchange the keyfile location with "{{ env_var('BIGQUERY_KEYFILE') }}" . The final result will look something like this:
jaffle_shop: # this needs to match the profile: in your dbt_project.yml file
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service-account
      project: dbt-demos # Replace this with your project id
      dataset: dbt_shipyard # Replace this with dbt_your_name, e.g. dbt_bob
      threads: 4
      timeout_seconds: 300
      location: US
      priority: interactive
      keyfile: "{{ env_var('BIGQUERY_KEYFILE') }}"

2. Move your profiles.yml file to the root directory of your dbt project.

3. Remove the target and logs folders, alongside their contents.

4. Add the following code snippet, named execute_dbt.py to the root directory of your dbt project.

import subprocess
import os
import json

bigquery_credentials = os.environ.get('BIGQUERY_CREDS')
directory_of_file = os.path.dirname(os.path.realpath(__file__))
dbt_command = os.environ.get('DBT_COMMAND', 'dbt run')

os.chdir(directory_of_file)
if not bigquery_credentials or not bigquery_credentials == 'None':
    bigquery_credentials = json.loads(bigquery_credentials)
    with open('bigquery_creds.json', 'w') as outfile:
        json.dump(bigquery_credentials, outfile)

subprocess.run(['sh', '-c', dbt_command], check=True)
execute_dbt.py

This script accomplishes 3 main things.

  1. Switches the current working directory to the location where the execute_dbt.py script lives.

    This ensures that when you run dbt <command>, as long as this file lives in the root directory of your dbt project, it will always be able to execute properly.
  2. Creates a Bigquery credential file, named bigquery_creds.json using any JSON passed through an environment variable named BIGQUERY_CREDS.

    This is only necessary for Bigquery connections (which the dbt tutorial uses). In Shipyard, you can't upload a credential file, so instead we have to build it with the provided JSON string. You cannot use service-account-json to connect to Bigquery, due to limitations in passing multi-line private keys as environment variables.
  3. Runs a dbt CLI command using an environment variable of DBT_COMMAND. By default, running this script without providing the environment variable will execute dbt run.

With these 4 updates in place, we're ready to deploy dbt on Shipyard.

Sign up for Shipyard

To get started, sign up with your work email and your company name.

Once you've created an account and logged in you can start deploying and automating your dbt models at scale.

Connect your Github Account to Shipyard

In order to use the dbt Blueprint to its fullest potential, you'll need to set up an integration to sync your Github account to Shipyard. You can follow the first half of this guide.

While you can upload your dbt project directly to Shipyard, we recommend connecting to Github so you can always stay in sync with your dbt code updates.

Copy the dbt Blueprint from our Blueprint Library

  1. On the sidebar, click the "Blueprint Library" button.

2. Search for "dbt" and click "Add to Org" on the dbt- Execute CLI Command Blueprint.

3. On the next pop-up, you have the chance to re-name the Blueprint before it gets added to your organization. You can leave this as the default for this tutorial.

4. On the sidebar, click the Blueprints button and click on the name of the recently created "dbt - Execute CLI Command" Blueprint.

5. At this point, you should have landed on the "Inputs" tab of the dbt Blueprint. Switch to the "Code" tab.

Edit the Code of the dbt Blueprint

  1. Select the option for "Git" on the Code tab.
The dbt blueprint starts with "filler code" that will show an error if the setup of this tutorial has not been done.

2. Select the repo where your dbt project lives, select the branch or tag you want to sync with, and leave the Git Clone Location as "default".

3. Edit the "File to Run" field to contain <your-repo-name>/execute_dbt.py

The default clone location requires that we add our repo name as folder into the File to Run field.

4. Click Save and switch to the requirements tab.

Edit the Requirements of the dbt Blueprint

On this tab, you'll want to add and edit any environment variables that may be used in your dbt project. For the dbt tutorial project, you'll need to make the following adjustments:

  1. Copy/Paste the contents of your bigquery credential file into the environment variable named BIGQUERY_CREDS.
  2. Don't touch the environment variable of BIGQUERY_KEYFILE or DBT_PROFILES_DIR. They are set to ./bigquery_creds.json and . respectively.

3. Update the version of dbt to the one you would prefer to use. Alternatively, you can remove the version of ==0.18.1 altogether and the latest version of dbt will always be installed.

Note: If you don't want to manage packages directly in Shipyard, you can remove them and instead include a `requirements.txt` file in your dbt repository that contains dbt.

Read our documentation if you're interested in learning more about how we treat Environment Variables and Packages.

4. Click "Save".

Create a Vessel to Execute dbt in the Cloud

  1. In the top-right corner, click "Use this Blueprint".
  2. Fill out the dbt Command you want to run. If left blank, dbt run will be used by default.

NOTE: We support running multiple commands successively. e.g. dbt compile && dbt run. However, we recommend splitting out commands into separate Vessels that are part of a larger Fleet to allow for better visibility into each function.

3.  Click "Next Step" at the bottom.

4. Add any schedule triggers that you may need to run dbt on. We recommend choosing Daily for starters.

5. Click "Next Step" at the bottom.

6. Give your Vessel a name and select the Project where you want to create it.

7. Add any emails that you want to receive notifications if the Vessel errors.

8. Update your guardrails to let the Vessel automatically retry if it runs into errors. We recommend at least 1 retry and a 5 minute retry delay.

Final settings for the dbt Tutorial Vessel

9. Click "Save & Finish" at the bottom. You've successfully made a Vessel that automates your dbt project in the cloud! Click "Run your Vessel" to test it out with an On Demand voyage.

10. If you followed this guide and the dbt tutorial to a T, you should see the following information in the output.

You can see that Shipyard prints out the git commit hash being used when running your Vessel and continues by printing out all of the same error logging that you would get while running dbt locally. Logs can be accessed and troubleshot at any time.

Running into Issues?
Check out the main branch of our dbt-tutorial repository for an example of what your final repository should look like. If you still can't figure it out, click the chat bubble in the bottom right or email us at support@shipyardapp.com

Next Steps

Set up your organization's dbt project

Now that you've successfully deployed the dbt tutorial of jaffle-shop to the cloud, you can set up your organization's dbt project! By selecting your dbt repository and making a few changes to your setup, you can ensure that your team is always running the latest dbt models.

NOTE: Each dbt Blueprint can only be connected to a single Github repository. If you need to manage multiple Blueprints for multiple Github repos, you can either duplicate the Blueprint made in this tutorial or you can go to the Blueprint Library and add the dbt Blueprint to your organization with a different name.

Create More Vessels with your dbt Blueprint

Since the dbt Blueprint allows you to execute any dbt command against your dbt repository, you can create multiple Vessels using the same Blueprint. Set up multiple Vessels to:

  • Run compile, test, run, and  execute a portion of your models.
  • Create different Vessels for running QA, Staging, and Production.

Use a requirements.txt file for package installation

By default, the dbt Blueprint in Shipyard includes the installation of the dbt package. If you would prefer not to manage package installation through Shipyard, you can include a requirements.txt file in your dbt project's root directory. If you do this, make sure to remove any overlapping packages from the Shipyard UI.

Connect with other databases

This tutorial walked through a very specific path to connect to Bigquery, but connecting to other databases is even easier!

First, update your profiles.yml to include a profile from any other supported database type.

Next, update specific credential fields to use the jinja template for environment variables of  "{{ env_var('DATABASE_CREDENTIAL') }}"

Then, add those specific environment variables and their associated values directly to Shipyard. These will be stored securely and passed to your script every time the Vessel runs.

Store Your dbt Logs and dbt Targets

By default, Shipyard wipes all of your data from the platform as soon as a voyage completes. That means that all of the targets, logging, and documentation created by dbt are immediately deleted. However, it doesn't have to be that way!

If you would like to store the targets or logs generated, you'll have to make a Fleet with Vessels that run after your dbt Vessel The good news is that this process is extremely easy in Shipyard!

We recommend searching the Blueprint Library for any Blueprints with the name "Upload File". These Blueprints will be able to swiftly upload files in <dbt-repo-name>/logs or <dbt-repo-name>/target to your cloud storage location of choice.

Create Fleets to Connect dbt to your entire Data Stack

Fleets are a powerful way to share data between Vessels and create complex workflows for any use case. As long as your scripts are written in Python or Bash, they'll be able to run on Shipyard.

With dbt Blueprints under your belt, you can set up Fleets to:

  • Kick off data loading jobs with Fivetran and immediately start running your dbt projects upon their completion.
  • Tie your action-based scripts (reporting, ML models, API updates, etc.) to the status of individual dbt models being run.
  • Sends custom emails or slack messages to vendors when data issues are found with dbt.

We hope this guide has helped you quickly deploy your dbt projects in the cloud with Shipyard. If you have any questions about automating dbt in the cloud, reach out to us at support@shipyardapp.com

About Shipyard:
Shipyard is a serverless data workflow platform that helps Data Teams launch, monitor, and share their solutions 10x faster. Driven by a mission to simplify every company’s data operations, they are creating an ecosystem where organizations can break down data silos and move beyond dashboards towards a future of fully automated, data-driven actions. The founding team draws on their previous experience at top agencies and media companies, handling high-throughput digital advertising and inventory data for Fortune 500 companies. For more information, visit www.shipyardapp.com or get started with a 14-day free trial.


Tags

Blake Burch

Blake is the co-founder of Shipyard, focused on the product roadmap and ensuring customer success. He's an enthusiast of new technology, data, privacy, automation, and board games.