Skip to main content

Databricks SQL Warehouse - Upload File to Table

Overview​

Quickly upload a file from Shipyard to a SQL table in Databricks.

Recommended Setup

This should be used immediately after downloading data from another source. Although they are not required in order to connect, it is recommended that you provide the Catalog and the Schema that you will query. By not doing so, the connection will resort to the defaults and the uploaded table will reside there.

The recommended approach is to provide the volume where the uploaded file will be staged, then copied into the target table. Shipyard will remove the staged file after successfully copying into the target. It is also recommended to use one volume per schema, though this is not strictly enforced. If the volume provided does not exist, Shipyard will create it.

Additionally the match type selected greatly affects how this Blueprint works.

Note This blueprint cannot upload a file from your local machine.

Variables​

NameReferenceTypeRequiredDefaultOptionsDescription
Access TokenDATABRICKS_SQL_ACCESS_TOKENPasswordβœ…--The access token generated in Databricks for programatic access
Databricks Server HostDATABRICKS_SQL_SERVER_HOSTAlphanumericβœ…--The URL address of the SQL warehouse
Warehouse HTTP PathDATABRICKS_SQL_HTTP_PATHAlphanumericβœ…--The extended path for the SQL warehouse
CatalogDATABRICKS_SQL_CATALOGAlphanumericβž–--The optional catalog to connect to. If none is provided, this will default to Hive Metastore
SchemaDATABRICKS_SQL_SCHEMAAlphanumericβž–--The optional schema to connect to. If none is provided, the blueprint will connect to the default schema
VolumeDATABRICKS_SQL_VOLUMEAlphanumericβž–--The name of the volume to stage the file
Table NameDATABRICKS_SQL_TABLEAlphanumericβœ…--The table in Databricks to write to
Data TypesDATABRICKS_SQL_DATATYPESAlphanumericβž–--The optional Spark datatypes to use in Databricks. These should be in JSON format, and if none are provided then the datatypes will be inferred.
Insert MethodDATABRICKS_SQL_INSERT_METHODSelectβœ…appendAppend: append

Create or Replace: replace

This decides whether to append to an existing table or overwrite an exiting table.
File TypeDATABRICKS_SQL_FILE_TYPESelectβœ…csvCSV: csv

Parquet: parquet

The file type to load
Shipyard File Match TypeDATABRICKS_SQL_MATCH_TYPESelectβœ…exact_matchExact Match: exact_match

Glob Match: glob_match

Determines if the text in "Shipyard File Name" will look for one file with exact match, or multiple files using regex.
Shipyard Folder NameDATABRICKS_SQL_FOLDER_NAMEAlphanumericβž–--The optional name of the folder where the file in Shipyard is located
Shipyard File NameDATABRICKS_SQL_FILE_NAMEAlphanumericβœ…--The name of the file in Shipyard to load to Databricks

YAML​

Below is the YAML template for this Blueprint and can be used in the Fleet YAML Editor.

source:
blueprint: Databricks SQL Warehouse - Upload File to Table
inputs:
DATABRICKS_SQL_ACCESS_TOKEN: null ## REQUIRED
DATABRICKS_SQL_SERVER_HOST: null ## REQUIRED
DATABRICKS_SQL_HTTP_PATH: null ## REQUIRED
DATABRICKS_SQL_CATALOG: null
DATABRICKS_SQL_SCHEMA: null
DATABRICKS_SQL_VOLUME: null
DATABRICKS_SQL_TABLE: null ## REQUIRED
DATABRICKS_SQL_DATATYPES: null
DATABRICKS_SQL_INSERT_METHOD: append ## REQUIRED
DATABRICKS_SQL_FILE_TYPE: csv ## REQUIRED
DATABRICKS_SQL_MATCH_TYPE: exact_match ## REQUIRED
DATABRICKS_SQL_FOLDER_NAME: null
DATABRICKS_SQL_FILE_NAME: null ## REQUIRED
type: BLUEPRINT
guardrails:
retry_count: 1
retry_wait: 0h0m0s
runtime_cutoff: 1h0m0s
exclude_exit_code_ranges:
- '200'
- '202'
- '203'
- '204'
- '205'
- '206'
- '207'
- '208'
- '209'
- '210'
- '211'
- '249'