Integrating DuckDB into your Fleets

Integrating DuckDB into your Fleets

Steven Johnson
Steven Johnson

In the ever-evolving world of data, one trend has been unmistakable: the rise of ducks! From popular movies to our furry friends' favorite toys, ducks have captured the imagination in the data space.

Keeping up with this trend, we're thrilled to announce that we are partnering with MotherDuck! With this partnership, you will be able to use MotherDuck's products in Shipyard. We will be working on blog posts and use cases that outline that functionality as our flock grows.

To begin, we thought we would start with MotherDuck's popular database tool, DuckDB. In this post, we'll guide you through a practical use case of DuckDB and Shipyard. If you'd rather check this guide out in video form, check it out below!

Bringing DuckDB to Shipyard: A Practical Example

Our journey begins with a common task: integrating a CSV file from Google Drive into a DuckDB database. For this demonstration, we'll use the "Slinky Dog Dash" ride data from Walt Disney World, a dataset familiar to those who've followed our tutorials.

Step 1: Downloading Files from Google Drive

First off, we start by downloading the necessary files from Google Drive into Shipyard. Leveraging Shipyard's ephemeral management system, we securely import our DuckDB database and the CSV file. This step is done quickly using our Google Drive Blueprints.

integrating duckdb

Step 2: Creating a Python Vessel for Data Integration

Once the files are in Shipyard, the next step involves crafting a Python script to connect and manipulate these data sources. Our script begins by establishing a connection to the DuckDB database. Then, we proceed to insert the CSV data into a new table within the database. This process is streamlined through SQL commands and Shipyard’s intuitive interface. Note that the token is saved as an environment variable.

Although we're just creating the table in this example, the capabilities are much greater with DuckDB and Shipyard. The DuckDB integration in Shipyard is particularly exciting as it opens up avenues for more advanced use cases, such as performing transformations directly within Shipyard, offering an alternative to tools like dbt or Python-based data manipulation.

Step 3: Uploading Back to Google Drive

Finally, we wrap up our process by uploading the modified database back to Google Drive. This step demonstrates the seamless integration between cloud storage and Shipyard’s processing capabilities. Additionally, we ensure to include DuckDB in our Python packages, a critical component for the script to run successfully.

duckdb integrations

Why Shipyard with DuckDB?

Shipyard’s ephemeral file storage system stands out, especially when handling databases. It allows for flexible manipulation of data with the assurance that changes are saved back to a permanent storage like Google Drive. Moreover, Shipyard offers triggers for scheduling tasks and version control for managing changes effectively, making it an excellent tool for experimentation and development.

The Outcome

By the end of our process, we successfully integrated a CSV into our DuckDB database and uploaded the updated database back to Google Drive. The ease of this integration showcases the power and versatility of using DuckDB within Shipyard, particularly for data manipulation and orchestration tasks.

Looking Ahead

As we continue to explore and integrate new technologies like DuckDB in Shipyard, the possibilities for data manipulation and management keep expanding. Whether it's handling CSV files, managing databases, or automating data workflows, Shipyard proves to be an invaluable tool in the data space.

Sign up for Shipyard's free Dev plan today—all without needing a credit card.