Bulk Creating Fleets Using the Shipyard API
Shipyard Uses Shipyard

Bulk Creating Fleets Using the Shipyard API

Steven Johnson
Steven Johnson

We believe Shipyard is the easiest platform on the market to connect your data processes together. While it is easy to get a Fleet (workflow) made, we realize that it can be quite tedious to create the same workflow over and over again. As your data sources and transformations continue to grow, so too does the time investment. Thankfully, the Shipyard API can help you bulk create Fleets using YAML.

As a data team, this allows you to quickly bulk create Fleets that are common for your organization. If your team wanted to quickly create five Fleets that ran through a process of Fivetran -> dbt Cloud -> Tableau, they would have to manually create it step by step, input by input. While that process isn't hard it would take more time than it should.

If you are a consultant, you will generally have baseline processes that you implement initially for new clients. It can be complicated to implement the same workflows, with slight tweaks for each unique client, in a timely fashion, especially if the baseline process needs to be updated across the board.

Our API here at Shipyard can help solve these types of problems for data teams and consultancies, saving your team valuable time to focus on building analysis that can drive value for your business.  Let's dive in!

How I Built It

To start, we're going to set up a process that uses the Shipyard API to create three Fleets that go through a workflow of Executing a Fivetran Sync, Executing a dbt Cloud Job, and Triggering a Tableau Datasource Refresh for three separate clients.

To build out this process, you have to start inside of Shipyard then you can take it to your favorite Python coding environment to run the API call.

Shipyard Steps

  • Choose vendors for the process. I chose Fivetran, dbt Cloud, and Tableau
  • Using the library Blueprints in Shipyard, create the Vessels from each Vendor and enter dummy fields for each input.
  • Grab the Fleet YAML code

Python Steps

  • Create a dictionary for the three totally real companies and their information for each vendor.
  • Create a Python script that generates a unique Fleet YAML per company and creates the Fleet in Python.

After running through these steps, we will have a new Fleet in Shipyard for each company, and the script can be reused when you want to replicate the same process or add a new customer. Let's jump into the details...

1. Process and Vendor Selection

You can use the Shipyard API to create Fleets with any of our Library Blueprints or even include custom code. With that in mind, I chose to build out a flow that many of our customers use here at Shipyard. The flow starts with a Fivetran sync. If the Fivetran sync is successful, a dbt Cloud job will kick off. If the data is successfully transformed in dbt, a datasource refresh will begin in Tableau.

2. Building an Initial Fleet in Shipyard

While it is completely possible to build a Fleet from scratch using YAML, I enjoy using our graphical Fleet Builder too much to just skip it in this process. I created the three Vessel Fleet that I discussed above using our Libary Blueprints. In the inputs for each Vessel, I added dummy fields since we will use the specific values for our three companies in the Python script later on.

3. Grabbing our Fleet's YAML

Now that we have our initial flow created, we need to take the YAML from Shipyard to our coding environment. To do this, we click the YAML editor button and copy and paste the code into a text editor and name it tutorial.yaml. You will also need to delete the id field for the Fleet and Vessel, so Shipyard will generate new ids  for the Fleets generated with the API call later. Here is the YAML if you want to follow along:

name: Fleet for Tutorial
vessels:
    Execute Fivetran Sync:
        source:
            blueprint: Fivetran - Execute Sync
            inputs:
                FIVETRAN_API_KEY: YOUR_FIVETRAN_API_KEY
                FIVETRAN_API_SECRET: YOUR_API_SECRET
                FIVETRAN_CONNECTOR_ID: YOUR_CONNECTOR_ID
            type: BLUEPRINT
        guardrails:
            retry_count: 1
            retry_wait: 0s
            runtime_cutoff: 4h0m0s
            exclude_exit_code_ranges:
        notifications:
            emails: []
            after_error: true
            after_on_demand: false
    Execute dbt Cloud Job:
        source:
            blueprint: dbt Cloud - Execute Job
            inputs:
                DBT_ACCOUNT_ID: YOUR_ACCOUNT_ID
                DBT_API_KEY: YOUR_DBT_API_KEY
                DBT_JOB_ID: YOUR_JOB_ID
            type: BLUEPRINT
        guardrails:
            retry_count: 1
            retry_wait: 0s
            runtime_cutoff: 4h0m0s
            exclude_exit_code_ranges:
                - "200"
                - "201"
                - "211"
                - "212"
        notifications:
            emails: []
            after_error: true
            after_on_demand: false
    Trigger Tableau Datasource Refresh:
        source:
            blueprint: Tableau - Trigger Datasource Refresh
            inputs:
                TABLEAU_DATASOURCE_NAME: YOUR_DS_NAME
                TABLEAU_PASSWORD: YOUR_PASSWORD
                TABLEAU_PROJECT_NAME: YOUR_PROJECT_NAME
                TABLEAU_SERVER_URL: YOUR_SERVER_URL
                TABLEAU_SIGN_IN_METHOD: username_password
                TABLEAU_SITE_ID: YOUR_SITE_ID
                TABLEAU_USERNAME: YOUR_USERNAME
            type: BLUEPRINT
        guardrails:
            retry_count: 1
            retry_wait: 0s
            runtime_cutoff: 1h0m0s
            exclude_exit_code_ranges:
                - 200-205
        notifications:
            emails: []
            after_error: true
            after_on_demand: false
connections:
    Execute Fivetran Sync:
        Execute dbt Cloud Job: SUCCESS
    Execute dbt Cloud Job:
        Trigger Tableau Datasource Refresh: SUCCESS
notifications:
    emails: []
    after_error: true
    after_on_demand: false

After we complete step three, we will transition into our development environment to run the Python script to create the Fleets for the three companies.

4. Define Companies and Their Credentials

At this point, I needed to grab the credentials for each company and put them into a dictionary that I can call in the script. Below you will find the dictionaries that include the completely real information for these businesses. They were okay with me sharing their credentials publicly, however they would appreciate it if you didn't use their credentials in the services.

clients = {
    "pizza_planet": {
    "COMPANY_NAME": "pizza_planet",
    "FIVETRAN_API_KEY": "pizza",
    "FIVETRAN_API_SECRET": "planet",
    "FIVETRAN_CONNECTOR_ID": "pizza_sync",
    "DBT_ACCOUNT_ID": "buzz",
    "DBT_API_KEY": "lightyear",
    "DBT_JOB_ID": "toinfinity",
    "TABLEAU_DATASOURCE_NAME": "forky",
    "TABLEAU_PASSWORD": "isnottrash",
    "TABLEAU_PROJECT_NAME": "toys_data",
    "TABLEAU_SERVER_URL": "https://toys.online.tableau.com/",
    "TABLEAU_SITE_ID": "toydevelopment",
    "TABLEAU_USERNAME": "andy"
},
    "dinoco": {
    "COMPANY_NAME": "dinoco",
    "FIVETRAN_API_KEY": "route",
    "FIVETRAN_API_SECRET": "sixty_six",
    "FIVETRAN_CONNECTOR_ID": "cars_sync",
    "DBT_ACCOUNT_ID": "lightning",
    "DBT_API_KEY": "mcqueen",
    "DBT_JOB_ID": "kachow",
    "TABLEAU_DATASOURCE_NAME": "piston",
    "TABLEAU_PASSWORD": "cozy_cone",
    "TABLEAU_PROJECT_NAME": "flos_cafe",
    "TABLEAU_SERVER_URL": "https://cars.online.tableau.com/",
    "TABLEAU_SITE_ID": "cardevelopment",
    "TABLEAU_USERNAME": "mater"
},
    "monsters_inc": {
    "COMPANY_NAME": "monsters_inc",
    "FIVETRAN_API_KEY": "monsters",
    "FIVETRAN_API_SECRET": "incorporated",
    "FIVETRAN_CONNECTOR_ID": "monsters_sync",
    "DBT_ACCOUNT_ID": "mike",
    "DBT_API_KEY": "sully",
    "DBT_JOB_ID": "roar",
    "TABLEAU_DATASOURCE_NAME": "wescare",
    "TABLEAU_PASSWORD": "becausewecare",
    "TABLEAU_PROJECT_NAME": "randall",
    "TABLEAU_SERVER_URL": "https://monsters.online.tableau.com/",
    "TABLEAU_SITE_ID": "monstersdevelopment",
    "TABLEAU_USERNAME": "boo"
}
}

5. Creating and Running a Script to Generate Fleets

The Python script needed contains two sections. The first section is substituting the credentials from each business into the YAML that we created earlier in Shipyard. The second section sends a request to the Shipyard API to create the Fleets. If you would like to take a look at the script in one piece, check it out on GitHub. Let's take a look at each section:

YAML Generation

for company in clients:
    with open('tutorial.yaml', 'r') as f:
        data = yaml.safe_load(f)
    fivetran_inputs = data['vessels']['Execute Fivetran Sync']['source']['inputs']
    dbt_inputs = data['vessels']['Execute dbt Cloud Job']['source']['inputs']
    tableau_inputs = data['vessels']['Trigger Tableau Datasource Refresh']['source']['inputs']
    data['name'] = f'{clients[company]["COMPANY_NAME"]} Fleet'
    fivetran_inputs['FIVETRAN_API_KEY'] = clients[company]['FIVETRAN_API_KEY']
    fivetran_inputs['FIVETRAN_API_SECRET'] = clients[company]['FIVETRAN_API_SECRET']
    fivetran_inputs['FIVETRAN_CONNECTOR_ID'] = clients[company]['FIVETRAN_CONNECTOR_ID']
    dbt_inputs['DBT_ACCOUNT_ID'] = clients[company]['DBT_ACCOUNT_ID']
    dbt_inputs['DBT_API_KEY'] = clients[company]['DBT_API_KEY']
    dbt_inputs['DBT_JOB_ID'] = clients[company]['DBT_JOB_ID']
    tableau_inputs['TABLEAU_DATASOURCE_NAME'] = clients[company]['TABLEAU_DATASOURCE_NAME']
    tableau_inputs['TABLEAU_PASSWORD'] = clients[company]['TABLEAU_PASSWORD']
    tableau_inputs['TABLEAU_PROJECT_NAME'] = clients[company]['TABLEAU_PROJECT_NAME']
    tableau_inputs['TABLEAU_SERVER_URL'] = clients[company]['TABLEAU_SERVER_URL']
    tableau_inputs['TABLEAU_SITE_ID'] = clients[company]['TABLEAU_SITE_ID']
    tableau_inputs['TABLEAU_USERNAME'] = clients[company]['TABLEAU_USERNAME']
    with open(f'{clients[company]["COMPANY_NAME"]}_fleet.yaml', 'w') as f:
        data = yaml.dump(data, f, sort_keys=False, default_flow_style=False)

You can see in the above script that we are going to loop through each company to change the credentials and job information in the YAML. Once those values are replaced in the YAML, we dump the information into a new YAML called {company}_fleet.yaml.

Fleet Creation

shipyard_api_key = YOUR_API_KEY_HERE
org_id = YOUR_ORG_ID_HERE
project_id = YOUR_PROJECT_ID_HERE

headers = {
        'X-Shipyard-API-Key': f'{shipyard_api_key}',
        'Content-Type': 'application/x-www-form-urlencoded',
    }

    with open(f'{clients[company]["COMPANY_NAME"]}_fleet.yaml', 'rb') as f:
        api_data = f.read()

    response = requests.put(    f'https://api.app.shipyardapp.com/orgs/{org_id}/projects/{project_id}/fleets',
        headers=headers,
        data=api_data,
    )

The second part of the script also sits in the same loop as the first part. However, this section is using the Python requests package to take the YAML that we just created and send it to the Shipyard API to create the Fleets. Once we run the script, we will be able to head back into Shipyard to see the created Fleets.

Conclusion

While this was a simple example, this process can be replicated to be as complex as you want it to be. I'm hoping that this allows you to move closer to one of our core goals at Shipyard which is to take away the grunt work that data teams have to deal with everyday.

Be looking out for part two where we will be replicating this process, but swapping out the BI vendor for our example companies.