Automate Transcribing your YouTube Videos with the Shipyard API

Last week, we released a blog post discussing the new functionality of our API. In that post, we walked through a workflow where we sent a CSV from a Snowflake query through email. The Shipyard API allows you to change variables at run time which will allow you to create one Fleet that can handle any workflow without having to create multiple Fleets that do the same thing. That specific use case is a popular Fleet that our customers use in Shipyard.

However, I am excited to share how we can use that API enhancement and apply it to a project that we are currently working on inside of Shipyard. If you didn't know already, we have a YouTube channel where we show off use-cases and tutorials on how to use Shipyard from our team. (Shameless plug).

YouTube provides an automatically generated transcription, but we've noticed that it doesn't always do a great job. The accuracy is okay most of the time, but it does not do any clean up such as getting rid of uh's or add in punctuation.

We know that adding great transcriptions for our YouTube videos is a great way to watch without sound as well as boosting their performance in the YouTube algorithm. We have manually added in transcripts to a couple of our videos using OpenAI's Whisper and saw that they were much more accurate than the automatically generated ones from YouTube.

With that in mind, I wanted to create a system where I could send through the ID of a video when we upload it to YouTube and have the transcriptions from Whisper be automatically created and added to the video. Since this is a process that will be continually repeated with minimal changes, I knew this would be a perfect use case for our new API endpoint. Let's dive in and see how I built it. Check out a video version of this post below:

Building Fleet Template

Similar to the blog post from last week, we need to start by building a Fleet in Shipyard that we can use to send parameters through. The inputs aren't important at this point for this Fleet. The Fleet will need to accomplish 3 things:

  • Download the audio of the YouTube video.
  • Transcribe the video using Whisper's API.
  • Upload the created transcription to YouTube as a caption.

Thankfully, Shipyard has low-code Blueprints pre-made to handle the first two tasks. We will need to write a Python script that handles uploading captions to YouTube. You can see the code for that task along with the rest of the Fleet's setup in the YAML below:

name: Youtube Flow
vessels:
    Upload Captions to YouTube:
        source:
            language: PYTHON
            version: "3.9"
            file:
                name: test.py
                content: |-
                    print('Importing')
                    import os
                    import googleapiclient.discovery
                    from google_auth_oauthlib.flow import InstalledAppFlow
                    from google.oauth2.credentials import Credentials
                    import googleapiclient.errors
                    from googleapiclient.http import MediaFileUpload
                    from google.auth.transport.requests import Request
                    print('Finished Importing')

                    video_id = os.environ.get('VIDEO_ID')

                    SCOPES = ['https://www.googleapis.com/auth/youtube.force-ssl']
                    TRANSCRIPT_FILE = 'transcription.txt'
                    VIDEO_ID = video_id
                    YOUTUBE_TRANSCRIPT_NAME = 'Shipyard Transcription'
                    CLIENT_ID = os.environ.get('GOOGLE_CLIENT_ID')
                    CLIENT_SECRET = os.environ.get('GOOGLE_CLIENT_SECRET')

                    def authenticate():
                        creds = None

                        refresh_token = os.environ.get('REFRESH_TOKEN')
                        if refresh_token:
                            creds = Credentials.from_authorized_user_info({
                                'refresh_token': refresh_token,
                                'client_id': CLIENT_ID,
                                'client_secret': CLIENT_SECRET,
                            }, SCOPES)
                            creds.refresh(Request())
                        else:
                            flow = InstalledAppFlow.from_client_secrets_file('google_creds.json', SCOPES)
                            creds = flow.run_local_server(port=0)
                            print('Your refresh token is: {}'.format(creds.refresh_token))

                        return googleapiclient.discovery.build('youtube', 'v3', credentials=creds)

                    youtube = authenticate()

                    # Read transcript file
                    with open(TRANSCRIPT_FILE, 'r') as file:
                        transcript = file.read()

                    # Set up the media file upload
                    media = MediaFileUpload(TRANSCRIPT_FILE, mimetype='application/octet-stream')

                    # Call the captions.insert method
                    request = youtube.captions().insert(
                        part="snippet",
                        body={
                            "snippet": {
                                "videoId": VIDEO_ID,
                                "language": "en",
                                "name": YOUTUBE_TRANSCRIPT_NAME,
                                "isDraft": False
                            }
                        },
                        media_body=media
                    )

                    response = request.execute()

                    print(response)
            file_to_run: test.py
            environment:
                - name: GOOGLE_APPLICATION_CREDENTIALS
                  value: SHIPYARD_HIDDEN
                - name: REFRESH_TOKEN
                  value: SHIPYARD_HIDDEN
                - name: VIDEO_ID
                  value: SHIPYARD_HIDDEN
                - name: GOOGLE_CLIENT_ID
                  value: SHIPYARD_HIDDEN
                - name: GOOGLE_CLIENT_SECRET
                  value: SHIPYARD_HIDDEN
            packages:
                - name: google-api-python-client
                  version: ==1.7.2
                - name: google-auth
                  version: ==1.8.0
                - name: google-auth-httplib2
                  version: ==0.0.3
                - name: google-auth-oauthlib
                  version: ==0.4.1
            type: CODE
        guardrails:
            retry_count: 0
            retry_wait: 0s
            runtime_cutoff: 1h0m0s
        notifications:
            emails:
            after_error: true
            after_on_demand: false
    Whisper Transcribe Audio:
        source:
            blueprint: Whisper - Transcribe Audio
            inputs:
                WHISPER_DESTINATION_FILE_NAME: transcription.txt
                WHISPER_FILE: youtube.webm
            type: BLUEPRINT
        guardrails:
            retry_count: 0
            retry_wait: 0s
            runtime_cutoff: 1h0m0s
        notifications:
            emails:
            after_error: true
            after_on_demand: false
    Youtube Download Video To Shipyard:
        source:
            blueprint: Youtube - Download Video to Shipyard
            inputs:
                YOUTUBE_DOWNLOAD_TYPE: audio
                YOUTUBE_FILE_NAME: youtube.webm
                YOUTUBE_VIDEO_ID: EkWW4tlzjMU
            type: BLUEPRINT
        guardrails:
            retry_count: 0
            retry_wait: 0s
            runtime_cutoff: 1h0m0s
        notifications:
            emails:
                - blake@shipyardapp.com
            after_error: true
            after_on_demand: false
connections:
    Whisper Transcribe Audio:
        Upload Captions to YouTube: SUCCESS
    Youtube Download Video To Shipyard:
        Whisper Transcribe Audio: SUCCESS
notifications:
    emails:
    after_error: true
    after_on_demand: false

Choosing Fields to Change at Runtime

The beauty of Shipyard's new Trigger Fleet Run API endpoint really shows up here. Prior to this endpoint being available, I would have to create a separate Fleet for every single video or go back and edit each Fleet input individually for each video that I wanted to provide captions. Thankfully, I can easily send the parameters I want to change to the endpoint and those values are overridden at runtime for me.

The variables that we want to change at runtime are:

Youtube Download Video To Shipyard:
  • YOUTUBE_VIDEO_ID
Upload Captions to YouTube:
  • VIDEO_ID

Run the Fleet with the API

By referring to the sample code provided in the Shipyard documentation, we can now operate our previously created template Fleet utilizing the custom variables that we defined earlier. These custom variables must be incorporated into a dictionary for use.

json_data = {
  "vessel_overrides": [
    {
      "name": "Youtube Download Video To Shipyard",
      "environment_variable_overrides": {
        "YOUTUBE_VIDEO_ID": "mcoQPPHdsPo",
      }
    },
    {
      "name": "Upload Captions to YouTube",
      "environment_variable_overrides": {
        "VIDEO_ID": "mcoQPPHdsPo"
      }
    }
  ]
}

We will now send the json_data variable to the Shipyard API, which will activate the fleet, substituting the template variables from earlier with the custom ones. The following code will enable this process. To proceed, just input your Shipyard API key, organization, project, and Fleet IDs into the code below:

import requests

headers = {
    'Accept': 'application/json',
    'X-Shipyard-API-Key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
}



response = requests.post(
    'https://api.app.shipyardapp.com/orgs/<YOUR_ORG_ID>/projects/<YOUR_PROJECT_ID/fleets/<YOUR_FLEET_ID>/fleetruns',
    headers=headers,
    json=json_data,
)

The operation of the Fleet hinges on the variables we supplied, with the corresponding captions also being uploaded. Thanks to this configuration, we can continuously adjust the variables to run the exact same Fleet, but with different inputs. This means we no longer need to endlessly replicate a Fleet for adding captions to every video.