What is Snowpark?
Captain's Compass

What is Snowpark?

Steven Johnson
Steven Johnson

Hello, and welcome to another edition of Captain's Compass! This is Steven Johnson, your friendly data community advocate at Shipyard. Today, we're delving into an exciting topic that's creating quite a buzz in the data world: Snowpark. We'll discuss what it brings to the table, why you should consider it, and how it stacks up against Shipyard.

What is Snowpark?

For the uninitiated, Snowpark is a game-changing feature from Snowflake that was launched last year, allowing the ability to build and run Python, Java, and Scala code within a Snowflake environment. Before Snowpark, SQL was the main language of use, but Snowpark has now broadened the landscape for developers.

Why Should You Try Out Snowpark?

One of the key reasons Snowpark stands out is its ability to bridge the gap between data engineers and data scientists by allowing them to work on the same platform. This kind of integration is crucial given the significant disconnect that often exists between different teams such as data engineers, data analysts, data scientists, and business users, who typically use separate tools and platforms.

Snowpark encourages a more unified workflow by letting all the stakeholders work within Snowflake, enhancing visibility into the pipeline, especially at the endpoint.

Another significant feature of Snowpark is its support for Anaconda, which enables users to install most Python packages. So, you're not limited to the Python packages pre-selected by Snowflake. If it can be installed using Anaconda, you can use it in Snowpark.

Perhaps one of the most appealing aspects of Snowpark is its use of the same infrastructure as your Snowflake instance. This feature can save users time, effort, and money as there's no need to spin up a separate server for Python tasks.

Snowpark vs Shipyard

One of the most frequent queries we receive is whether Snowpark is the same as Shipyard. To clarify: while both have their strengths, they serve different purposes and can complement each other.

Data Location

While Snowpark does offer some capabilities to pull outside data using its API, it's primarily designed to work with data already present within Snowflake. If your data is in Snowflake and you want to perform transformations, run ML models, or conduct analysis, Snowpark is perfect for the task.

On the other hand, Shipyard shines in its flexibility, handling data regardless of where it is located. Whether your data is in Snowflake or elsewhere, Shipyard enables data extraction, transformations, and ML model running.

Process Integration

Another aspect where Shipyard takes the lead is in integrating upstream processes with downstream processes. While I'm not sure if Snowpark supports extraction or other upstream tasks, Shipyard allows you to combine the extraction layer with your transformation layer, BI layer, and ML layer.

Package Installation

A minor difference lies in the package installation. As mentioned earlier, Snowpark uses Anaconda, while Shipyard uses PIP. This isn't about which is better, it's just a point of distinction between the two.

Can Snowpark Run Inside Shipyard?

Yes! This is one of the exciting features we've explored recently. Based on our testing, it's definitely possible to run Snowpark within Shipyard using Snowpark's API. Stay tuned, as we'll soon be sharing content on how you can use Snowpark inside Shipyard.

Conclusion

In this post, we've taken a look at what Snowpark is, the benefits it offers, and how it compares to Shipyard. It's not a question of Snowpark vs Shipyard, but how these two powerful tools can be used in tandem for effective data processes.

Be sure to check out our substack of articles that our internal team curates weekly from all across the data space. Ready to try Shipyard? Get started with our free Developer Plan now.