Performance considerations
- Last Updated: June 12, 2020
- 4 minute read
- DataDirect Connectors
- JDBC
- Apache Hive 6.0
- Documentation
ArrayFetchSize: To improve throughput, consider increasing the value of ArrayFetchSize. By increasing the value of ArrayFetchSize, you increase the number of rows the driver will retrieve from the server for a fetch. In turn, increasing the number of rows that the driver can retrieve reduces the number, and expense, of network round trips. For example, if an application attempts to fetch 100,000 rows, it is more efficient for the driver to retrieve 2000 rows over the course of 50 round trips than to retrieve 500 rows over the course of 200 round trips. Note that improved throughput does come at the expense of increased demands on memory and slower response time. Furthermore, if the fetch size exceeds the available buffer memory of the server, an out of memory error is returned when attempting to execute a fetch. If you receive this error, decrease the value specified until fetches are successfully executed.
For many applications, throughput is the primary performance measure. In contrast, response time (how fast the first set of data is returned) can be of greater importance for interactive or Web-based applications. Since smaller fetch sizes are generally returned more quickly than larger ones, you can improve response time by decreasing the value of ArrayFetchSize.
BatchMechanism: When Batchmechanism is set to multiRowInsert, the driver executes a single insert
statement for all the rows contained in a parameter array. If the size of the insert
statement exceeds the available buffer memory of the server, the driver executes
multiple statements. This default behavior provides substantial performance gains for
batch inserts.
BinaryDescribeType: When BinaryDescribeType is set to longvarbinary, the driver not only maps Binary to Longvarbinary, but also
allocates more space to cache the long data. Because more space is allocated for the
long data, your application will incur a performance penalty.
CatalogMode: Apache Hive’s native catalog functions return incorrect information
in certain scenarios. To address this issue, by default, the driver uses a combination
of driver-discovered information and native functions to retrieve more accurate catalog
information than native functions alone. While using driver-discovered information
improves accuracy, it does so at an expense to performance. If accurate catalog
information is not required, you can improve performance by setting Catalog Mode
connection option to native.
EnableCookieAuthentication:To improve response time when using HTTP mode
(TransportMode=http), enable cookie based authentication
(EnableCookieAuthentication=true). When cookie based authentication
is enabled (the default), the driver uses cookies authenticate requests to the server
after the initial authentication that occurs at connection. This eliminates the overhead
associated with executing a standard re-authentication attempt for each request to the
server.
EncryptionMethod: Data encryption may adversely affect performance because of the additional overhead (mainly CPU usage) required to encrypt and decrypt data.
InsensitiveResultSetBufferSize: To improve performance when using scroll-insensitive result sets, the driver can cache the result set data in memory instead of writing it to disk. By default, the driver caches 2 MB of insensitive result set data in memory and writes any remaining result set data to disk. Performance can be improved by increasing the amount of memory used by the driver before writing data to disk or by forcing the driver to never write insensitive result set data to disk. The maximum cache size setting is 2 GB.
MaxPooledStatements: To improve performance, the driver's
own internal prepared statement pooling should be enabled when the driver does not run from
within an application server or from within another application that does not provide its own
prepared statement pooling. When the driver's internal prepared statement pooling is enabled,
the driver caches a certain number of prepared statements created by an application. For
example, if the MaxPooledStatements property is set to 20,
the driver caches the last 20 prepared statements created by the application. If the value set
for this property is greater than the number of prepared statements used by the application,
all prepared statements are cached.
Refer to Designing JDBC applications for performance optimization in the Progress DataDirect for JDBC Drivers Reference for more information about using prepared statement pooling to optimize performance.
StringDescribeType: To obtain data from String
columns with the getClob() method, the StringDescribeType connection property must be
set to longvarchar. (Otherwise, calling getClob()
results in an "unsupported data conversion" exception.) When StringDescribeType is set
to longvarchar, the driver not only maps String to
Longvarchar, but also allocates more space to cache the long data. Because more space is
allocated for the long data, your application will incur a performance penalty.
UseCurrentSchema: If your application needs to access tables and views
owned only by the current user, performance of your application can be improved by setting
this property to true. When this property is set to true, the driver returns only tables and views owned by the current
user when executing getTables() and getColumns() methods. Setting this property to true is equivalent to passing the user ID used on the connection as
the schemaPattern argument to the getTables() or getColumns() call.