Azure Databricks Spark clusters
- Last Updated: May 13, 2024
- 2 minute read
- DataDirect Connectors
- ODBC
- Apache Spark SQL 8.0
- Documentation
Microsoft Azure Databricks is a hosted data analytics platform that processes large amounts of data for analytics, machine learning, data engineering, and more. To handle the streams of data required to perform these roles, the platform employs open-source Spark clusters for accessibility and scalability. The driver can connect your application or business intelligence (BI) tool to these clusters, providing a method for your applications to query your data.
To connect to an Azure Databricks Spark cluster:
- Set the Host Name option to specify the name or the IP address of the server to which you want to connect.
- Set Database Name (Database) option to provide the name of the database to which you want to connect.
- Set the Port Number option to provide the TCP port of the primary database server that is listening for connections to the Spark SQL database. This value is typically 443 for Azure Databricks Spark clusters.
- Set the Transport Mode (TransportMode) option to 1 (HTTP).
- Set the HTTP Path (HTTPPath) option to specify the path of the endpoint
to be used for HTTP/HTTPS requests. Note: Refer to the Databricks Documentation for instructions on retrieving your HTTP path.
- Set the Encryption Method (EncryptionMethod) option to
1(SSL). Data encryption behavior can be further configured using the connection properties described in "Summary of data encryption related options." - Optionally, set the User Agent Tag (UserAgent) option to specify the string value of the User-Agent header to be used in HTTP requests. If no value is specified, the following header value is used: Progress/8.0 (SparkSQL ODBC driver).
- Set the User Name (LogonID) option to specify
tokenor the username used for authentication. Setting a value oftokendictates that a token value must be specified for the Password option. - Set the Password option to specify the user generated token or password used for authentication.
Note: The User and Password options are not required to be stored in the connection string or
odbc.ini file. They can also be sent separately by the application
using the SQLConnect ODBC API. For SQLDriverConnect and SQLBrowseConnect, they will need to
be specified in the connection string.
The following examples demonstrate a basic connection to an Azure Databricks Spark cluster:
Using a connection string:
DRIVER=DataDirect 8.0 Apache Spark SQL Wire Protocol;
HostName=myinstance.cloud.databricks.com;PortNumber=443;TransportMode=1;
HTTPPath=sql/protocolv1/o/12345/1234-123456-1a2b3c4;EncryptionMethod=1;LogonID=token;
Password=a1b2c3-e4f4g5-h6i6
Using the odbc.ini file:
Driver=ODBCHOME/lib/ivsparkxx.so
Description=DataDirect Apache Spark SQL Wire Protocol
...
EncryptionMethod=1
...
HostName=myinstance.cloud.databricks.com
...
HTTPPath=sql/protocolv1/o/12345/1234-123456-1a2b3c4
...
LogonID=token
...
Password=a1b2c3-e4f4g5-h6i6
...
PortNumber=443
...
TransportMode=1