Apache Knox is a gateway system that serves as a reverse proxy to Apache Hadoop clusters. The primary advantage of Apache Knox is that it provides a single point of authentication that simplifies security policy enforcement while providing REST API access to a clustered environment. The driver supports connecting to Apache Knox in a similar manner to a standard connection using HTTP mode.

To connect to an Apache Knox gateway:

  1. Configure the minimum required options required for a connection:
    • Set the DatabaseName property to provide the name of the Apache Hive database to which you want to connect.
    • Set the ServerName property to provide the name or IP address of the Apache Knox server to which you want to connect.
    • Set the PortNumber property to provide the port of the primary Apache Knox server that is listening for connections. The default for an Apache Knox gateway instance is 8443.
  2. Set the TransportMode property to http.
  3. Set the HTTPPath property to provide the path of the endpoint to be used for HTTP/HTTPS requests. This value is the Hive endpoint as defined by Apache Knox and corresponds to the name of the server topology file. The default for an Apache Knox gateway is gateway/default/hive.
  4. Optionally, if your server is configured for SSL, set the EncryptionMethod property to ssl to enable SSL data encryption. Data encryption behavior can be further configured using the connection properties described in "Data encryption properties."
  5. Optionally, if your server is configured for Kerberos authentication:
    1. Set the AuthenticationMethod property to kerberos.
    2. Set the ServicePrincipalName to provide the service principal name for your Apache Knox gateway to be used for Kerberos authentication. The default for an Apache Knox gateway instance is knox/servername@REALM.COM. For example, knox/knoxserver1.example.com@EXAMPLE.COM.

The following example demonstrates a basic connection URL to Apache Knox using Kerberos and SSL data encryption.

jdbc:datadirect:hive://knoxserver1:8443;DatabaseName=hivedb1;
AuthenticationMethod=kerberos;EncryptionMethod=ssl;
HTTPPath=gateway/default/hive;
ServicePrincipalName=knox/knoxserver1.example.com@EXAMPLE.COM;TransportMode=http;
Note: If you receive an HTTP/1.1 500 Server Error message when attempting to insert a large number of rows with Apache Knox, reduce the value specified for the ArrayInsertSize property until the operation succeeds.