Data Movement. Snowflake Workflows and Orchestrating External Systems in Shipyard
Captain's Compass

Data Movement. Snowflake Workflows and Orchestrating External Systems in Shipyard

Steven Johnson
Steven Johnson

Welcome back to another post in the Captain's Compass series, where we strive to address your queries, concerns, and curiosity about data operations. Following last week's enlightening discussion about ephemeral file storage, we're going to dive deeper into data movement within Shipyard, exploring various workflows and understanding the exact locations of your data at different stages.

Two Key Workflows in Detail

We'll be exploring two major workflows today, both highlighting unique aspects of the data management and flow within Shipyard's systems.

data movement

1. The Snowflake Workflow

This common workflow starts off with a Snowflake vessel that downloads data from Snowflake as a CSV file. It then uses a Python script to filter the data down to specific columns, finally emailing that file out. Here's a closer look:

  • Download from Snowflake: Using Snowflake's API, the query is downloaded as a CSV file. At this point, the data enters Shipyard's file systems.
  • Filtering Data Using Python: This stage generates two files - the original CSV file and a second one containing the filtered columns. Both files reside in our file system.
  • Emailing the File: While the file is emailed to the recipient, it remains in our system until, like last week's episode illustrated, our Ephemeral File Storage system removes it.

The Snowflake workflow highlights how data interacts with our systems during the running process, whether it's downloaded from databases like Snowflake, Redshift, BigQuery or from sources like Google Drive or an S3 bucket. In each case, the data remains on our systems during the process and gets deleted afterward using our Ephemeral File Storage system.

snowflake workflow

2. Orchestrating External Systems

The second fleet we'll discuss is one that orchestrates external systems, such as executing a five-transink, running a DBT Cloud job, and triggering a Tableau data source refresh. Key points include:

  • No Data Inside Shipyard (unless you choose): By orchestrating outside systems, none of your proprietary data enters Shipyard. It all stays in those external systems.
  • Optional Artifacts: You could generate artifacts (e.g., from a DBT Cloud job) to save inside Shipyard, but it's optional.
  • Deletion after Use: As with the first workflow, our Ephemeral File Storage system ensures deletion right after use.

This second fleet illustrates scenarios where your data never even enters our system, emphasizing the flexibility of Shipyard to cater to various data operation needs.

Trusting Shipyard with Your Data

After these two editions of Captain's Compass, we hope you have a comprehensive understanding of what happens with your data when using Shipyard. Security is our utmost priority, and we have robust features to ensure that your data remains secure. By walking you through the process, we aim to foster trust, so you can confidently rely on Shipyard for your data operations.

Be sure to check out our substack of articles that our internal team curates weekly from all across the data space. Ready to get moving with Shipyard? Get started with our free Developer Plan now.

Data Movement in Shipyard