Where Does Data Orchestration Happen?
Data Orchestration Getting Started

Where Does Data Orchestration Happen?

Shipyard Staff
Shipyard Staff

This is part four of a six part series on ‘Simplifying Data Orchestration.’ Expertise is not found by using complexity, but in the ability to take a complex topic and break it down for broader audiences.


Introduction

There are some common discovery questions when evaluating a new tool, product, or process: who does data orchestration, what is data orchestration, when does data orchestration happen, where does data orchestration happen, why does data orchestration exist, and how do you do data orchestration?

In the first three articles of  this series, I answered why, what, and how in regards to data orchestration. What I’ve noticed is that these questions are difficult to find concrete answers to in data orchestration that anyone can understand. That is why this series was created. In this fourth installment, I’ll answer the “where” question.

Setting the Stage


Let's start with some basics. There are some terms you'll see used when talking about "where" data anything happens. In the last few years, two terms stand out: SaaS and OSS.

SaaS (Software as a Service) is a model where a third-party service provides and hosts applications. In this model, users don't need to download or install the software on their local devices. Instead, they access it through a web browser or a dedicated application interface.

OSS (Open Source Software) is where things get a bit tricker. Open source is code that's publicly accessible. Anyone can look at, change, and distribute the code as it suits their needs. OSS relies on peer review and community production to be developed in a decentralized way. There's also a second distinction here between open source and open core. For open source, parts of a product or platform are accessible; for open core, the entire platform is. Many times, both of these options are more flexible and come at a cheaper initial price that SaaS. However, there are downsides.


Build vs Buy (briefly)


This leads into another topic of debate that comes and goes: build vs. buy. With open source code, you build, and with SaaS, you buy. There are hundreds if not thousands of places you could go to further your reading on this topic. But that's not the point of this article. What you do need to know is that whether you use a SaaS tool or one that's completely open-source, the context matters. You'll spend the money either way - either by hiring people to build, or buying a 3rd party that's built it for you.

What gets tricky here is when you ask, so how much coding happens in SaaS vs OSS? It depends. You can buy a product and still have to write code, and you can build a product without writing a ton of code yourself. Wait, what? Yep. It can get confusing. There's a whole spectrum of data tools ranging from no code at all to you write all the code yourself. While SaaS tends to involve less coding, it's a guideline, not a rule. You may be able to use an open source product where the community has done most of the heavy lifting for you. In this instance, the code you have to write and maintain yourself is minimized.


Cloud-Hosted vs Self-Hosted


Self-hosting refers to the practice of running and maintaining software applications on your own infrastructure, servers, or computer systems, instead of relying on third-party hosting services or cloud providers. In other words, you're responsible for managing and paying for the components required to make the application accessible to users.

When you self-host an application, you have full control over the hosting environment, and can tailor to your needs. This approach is often preferred by individuals or organizations that prioritize customization, have strict security requirements, and have the technical expertise to manage it.

Self-hosting an application doesn't necessarily mean that the code you're using is all open source. Self-hosting simply means that you're running and maintaining the application on your own servers or infrastructure, rather than using a cloud-based service or relying on a third-party to host it for you.

The code you use for self-hosting can be either open source or proprietary. Let's look at both possibilities:

  1. Open Source: If the application you're self-hosting is based on OSS, the source code of the application is available to the public for use. When you self-host an open-source application, you typically have access to the source code, which allows you to modify and customize it.
  2. Proprietary Software: Alternatively, the application you self-host can be based on proprietary software. Proprietary software is distributed under a license that may restrict users' ability to view, modify, or distribute the source code. In such cases, you may have access to the compiled application, but you won't be able to view or modify the underlying source code. Proprietary software is often licensed on a commercial basis, and self-hosting such applications usually requires purchasing the necessary licenses from the software vendor.
best open source data pipeline tool

Cloud hosting is exactly what it sounds like - your application is hosted in the cloud by a third party. Instead of relying on a single physical server, cloud hosting uses a network of interconnected servers distributed across data centers. This setup allows for easy scalability, as resources can be dynamically allocated or scaled down based on demand.

I already mentioned that self-hosting can be either open source or proprietary. Similarly, cloud-hosting can also be open source or proprietary.


Where am I going with this?


This article could have been one line.

Data orchestration happens in either a cloud-based SaaS platform or OSS with cloud-based and self-hosted options.

For one, I'm sure my company wouldn't be very happy with me if that's all I had to say. Secondly, I would be doing all the more non-technical readers of this article a massive disservice.

In a way, answering the question of where data orchestration exists is simple. What's difficult is answering the sub-questions that provide additional context to the original question. So now that I've given a brief overview of SaaS and OSS, let's pick apart my one-liner answer.


Where Does Data Orchestration Happen?

  • In the cloud (Via a SaaS company, usually)
    • varying level of open-source code

OR

  • Self-hosted (running on your own machine or infrastucture)
    • varying level of open-source code

Here's a visual guide of the possible combinations for data orchestration. You'll notice there are some question marks. To the best of my knowledge, there isn't currently a company solely dedicated to data orchestration that fits these categories.

best open source data pipeline solution

Does that mean Shipyard fits all these green checkboxes? Great question. We currently don't offer the ability to self-host Shipyard on your own cloud or on-premise infrastructure. While we frequently get this request, we designed Shipyard so that data teams never have to worry about infrastructure while running workflows on our platform. Allowing for self-hosting means we would have to account for many different architectures, rather than specializing in one that "just works."

Of course, different teams and companies have different need, skill sets, and many other considerations. We at Shipyard know we aren't the only data orchestration tool out there, but we do strive to be the most usable one to the broadest audience.

Conclusion

Now that you understand a little more about where data orchestration can happen, stay tuned for the rest of the six-part series on simplifying data orchestration. We'll dig into even more of the important discovery questions. In the interim, check out our Substack of articles that our internal team curates weekly from all across the data space. Ready? Get started with our free Developer Plan now.