Performance considerations

Save PDF

Last Updated: June 12, 2020
4 minute read

DataDirect Connectors
JDBC
Apache Hive 6.0
Documentation

ArrayFetchSize: To improve throughput, consider increasing the value of ArrayFetchSize. By increasing the value of ArrayFetchSize, you increase the number of rows the driver will retrieve from the server for a fetch. In turn, increasing the number of rows that the driver can retrieve reduces the number, and expense, of network round trips. For example, if an application attempts to fetch 100,000 rows, it is more efficient for the driver to retrieve 2000 rows over the course of 50 round trips than to retrieve 500 rows over the course of 200 round trips. Note that improved throughput does come at the expense of increased demands on memory and slower response time. Furthermore, if the fetch size exceeds the available buffer memory of the server, an out of memory error is returned when attempting to execute a fetch. If you receive this error, decrease the value specified until fetches are successfully executed.

For many applications, throughput is the primary performance measure. In contrast, response time (how fast the first set of data is returned) can be of greater importance for interactive or Web-based applications. Since smaller fetch sizes are generally returned more quickly than larger ones, you can improve response time by decreasing the value of ArrayFetchSize.

BatchMechanism: When Batchmechanism is set to multiRowInsert, the driver executes a single insert statement for all the rows contained in a parameter array. If the size of the insert statement exceeds the available buffer memory of the server, the driver executes multiple statements. This default behavior provides substantial performance gains for batch inserts.

BinaryDescribeType: When BinaryDescribeType is set to longvarbinary, the driver not only maps Binary to Longvarbinary, but also allocates more space to cache the long data. Because more space is allocated for the long data, your application will incur a performance penalty.

CatalogMode: Apache Hive’s native catalog functions return incorrect information in certain scenarios. To address this issue, by default, the driver uses a combination of driver-discovered information and native functions to retrieve more accurate catalog information than native functions alone. While using driver-discovered information improves accuracy, it does so at an expense to performance. If accurate catalog information is not required, you can improve performance by setting Catalog Mode connection option to native.

EnableCookieAuthentication:To improve response time when using HTTP mode (TransportMode=http), enable cookie based authentication (EnableCookieAuthentication=true). When cookie based authentication is enabled (the default), the driver uses cookies authenticate requests to the server after the initial authentication that occurs at connection. This eliminates the overhead associated with executing a standard re-authentication attempt for each request to the server.

EncryptionMethod: Data encryption may adversely affect performance because of the additional overhead (mainly CPU usage) required to encrypt and decrypt data.

InsensitiveResultSetBufferSize: To improve performance when using scroll-insensitive result sets, the driver can cache the result set data in memory instead of writing it to disk. By default, the driver caches 2 MB of insensitive result set data in memory and writes any remaining result set data to disk. Performance can be improved by increasing the amount of memory used by the driver before writing data to disk or by forcing the driver to never write insensitive result set data to disk. The maximum cache size setting is 2 GB.

MaxPooledStatements: To improve performance, the driver's own internal prepared statement pooling should be enabled when the driver does not run from within an application server or from within another application that does not provide its own prepared statement pooling. When the driver's internal prepared statement pooling is enabled, the driver caches a certain number of prepared statements created by an application. For example, if the MaxPooledStatements property is set to 20, the driver caches the last 20 prepared statements created by the application. If the value set for this property is greater than the number of prepared statements used by the application, all prepared statements are cached.

Refer to Designing JDBC applications for performance optimization in the Progress DataDirect for JDBC Drivers Reference for more information about using prepared statement pooling to optimize performance.

StringDescribeType: To obtain data from String columns with the getClob() method, the StringDescribeType connection property must be set to longvarchar. (Otherwise, calling getClob() results in an "unsupported data conversion" exception.) When StringDescribeType is set to longvarchar, the driver not only maps String to Longvarchar, but also allocates more space to cache the long data. Because more space is allocated for the long data, your application will incur a performance penalty.

UseCurrentSchema: If your application needs to access tables and views owned only by the current user, performance of your application can be improved by setting this property to true. When this property is set to true, the driver returns only tables and views owned by the current user when executing getTables() and getColumns() methods. Setting this property to true is equivalent to passing the user ID used on the connection as the schemaPattern argument to the getTables() or getColumns() call.

Progress DataDirect for JDBC for Apache Hive User's Guide Release 6.0.1

Performance considerations

Table of Contents

Performance considerations