The driver exposes the Google Analytics data model in the form of a relational schema. Google Analytics stores two types of data: management data and analytics data. Management data consists of objects related to the management of Google Analytics services. The driver maps these objects directly as relational tables. Analytics data consists of all the data that Google Analytics collects about a website. This data is stored in a table named Data. Given the volume of information stored in the Data table, and the challenges in formulating useful queries against such a large data set, the driver hides the Data table by default. The driver provides thirteen Custom tables that contain compatible Google Analytics metrics and dimensions. In effect, custom tables allow you to simplify and write more effective queries against Google Analytics. Custom tables can be added to your Google Analytics relational schema, and further customized, using the Configuration Manager. Nevertheless, by setting the ShowInternalTables connection property to true, you can expose the Data table, allowing you to query it directly.

Note: The driver itself defines a number of functional tables, such as INCOMPATIBILITY and METADATA. The driver uses these tables to translate the Google Analytics data model into a relational schema. The metadata for these tables, as well as the management and custom tables, is provided in corresponding topics beginning with the ACCOUNT topic.

The following sections provide information to help you write effective queries against Google Analytics.

Google Analytics data structure

Google Analytics is a service that generates detailed statistics about a website's traffic and traffic sources. But Google Analytics is not just a database. It is a multi-dimensional hypercube containing all kinds of measurements about traffic to a website. When you connect to Google Analytics using the driver, you can reach into this repository and flatten it into relational data that can be used with your JDBC application.

Imagine a very small store of data about your website. For each hit, the Google Analytics service logs the date, language of user, country of origin, new or returning user, and their time on the site (in seconds).

2019-01-01 en US new 5.3
2019-01-01 nl DK new 90.4
2019-01-01 es ES new 24
2019-01-01 ja JP new 4.2
2019-01-01 es MX new 345.3
2019-01-01 ja JP returning 655.9
2019-01-02 en US new 45.7
2019-01-02 en US new 345.9
2019-01-02 es ES new 57.7
2019-01-02 en US new 6.8
2019-01-03 es MX new 876.1
2019-01-03 ja JP returning 5.7
2019-01-03 en GB new 5.6
2019-01-03 en US new 617.9
2019-01-03 en US returning 56.1
2019-01-04 es MX new 45.1
2019-01-04 jp JP new 178
2019-01-04 en US returning 103.9

Google Analytics collected data for this website over four days. The data is broken down by date, language, country and user type. For each visit, the time spent on the site was recorded. The time on the site is a metric, and the other columns are dimensions.

In an actual scenario, Google Analytics aggregates a vast amount of information about your website. It measures hundreds of things, and categorizes them by hundreds of dimensions. The query interface that Google Analytics provides allows you to fetch these metrics and group them. Because of the massive amount of information they store, their interface limits you to fetching up to ten metrics at a time, grouped by no more than seven dimensions.

For example, suppose you want to know how much time new visitors spent on the site. Your dimension is user type and your metric is time. You would get back two rows:

new 2648
returning 821.6

How much data you get back depends on how you ask for it. If you ask for two dimensions, you get even more data, because you get one row per permutation. Requesting how much time users have spent on each day, broken down by country, returns more rows:

2019-01-01 DK 90.4
2019-01-01 ES 24
2019-01-01 JP 660.1
2019-01-01 MX 345.3
2019-01-01 US 5.3
2019-01-02 ES 57.7
2019-01-02 US 398.4
2019-01-03 GB 5.6
2019-01-03 JP 5.7
2019-01-03 MX 876.1
2019-01-03 US 674
2019-01-04 JP 178
2019-01-04 MX 45.1
2019-01-04 US 103.9

Custom table query

The Overview table is one of the custom tables exposed by the driver. The Overview table exposes the Audience Overview information made available via the Google Analytics dashboard. Audience Overview has a graph showing Sessions, Users, Pageviews, Pages/Session, Average Session Duration, Bounce Rate and Percent of New Sessions. In the lower right, there is a breakdown of sessions by language.

Once connected to Google Analytics, you can query custom tables in one of two ways. First, you can formulate queries against custom tables by explicitly providing the Google Analytics View ID in each WHERE clause. For example:


SELECT * FROM Overview WHERE viewId = 'ga:12345678'
Note: A View ID can be obtained from your Google Analytics dashboard (Dashboard>Admin>View Settings>View ID).

Alternatively, you can specify the DefaultView connection property in your connection string. This avoids having to include the View ID in each WHERE clause. The value of DefaultView should be the value of the "View Name" field in the Google Analytics dashboard (Dashboard>Admin>View Settings>View Name). With the DefaultView property specified, the WHERE clause filter is not needed. For example:


SELECT * FROM Overview

Each of these queries provides the same results:

VIEWID SEGMENTID STARTDATE ENDDATE _BROWSER _OPERATINGSYSTEM
ga:12345678 NULL "2019-01-01" "2019-01-30" NULL NULL
Note: To make the difference between metrics and dimensions clear, dimension names have an underscore prefix.
Important: Only one row was returned, and all of the dimensions came out as NULL because the driver enforces a rule that says if you ask for all dimensions, like we did with the SELECT *, then no dimensions are returned.

The DefaultQueryOptions connection property can also be used to filter results. DefaultQueryOptions specifies the values of Google Analytics startDate, endDate, and viewId query parameters for WHERE clause filtering during a session. Providing values for these parameters via DefaultQueryOptions simplifies queries and may produce more useful results.

Direct query

As discussed above, all the analytics data in Google Analytics resides in a hidden table named Data. If you want to query the Data table directly, you must expose the Data table by the ShowInternalTables connection property to true. With ShowInternalTables=true, you could execute the following query.


SELECT _LANGUAGE,SESSIONS FROM Data

In this case, the query against the Data table is functionally equivalent to the following custom table query.


SELECT _LANGUAGE,SESSIONS FROM Overview

However, it should be noted that hiding the Data table avoids queries such as SELECT TOP 10 * FROM Data, which is unlikely to yield useful results.

Adding tables

As discussed above, custom tables can be added to your Google Analytics relational schema, and further customized, using the Configuration Manager. Once you open the Configuration Manager, you can add and customize tables from the Configure Logical Schema dialog (Schema Settings>Add Tables>Configure Logical Schema). This dialog allows you define a new table, or customize an existing table, in terms of metrics and dimensions. You can select up to ten metrics and seven dimensions per table. Based on your selections, a table definition is created in the form of a JSON string which is incorporated into the connection string as a value for the AddTables property. For example:


AddTables={"MyTable":["sessions","_language","_country"]}

The table that results from this example would include the three selected columns, plus the columns viewId, segmentId, startDate and endDate. Now, as opposed to querying the Data table as we did above, we accomplish the functional equivalent by querying MyTable:


SELECT _LANGUAGE,SESSIONS FROM MyTable
Note: This table could be defined based on sessions and language alone, but the driver enforces a rule that says if you ask for all dimensions, then no dimensions are returned. This means that both SELECT _LANGUAGE,SESSIONS and SELECT * reference one dimension, and therefore, the data is not broken down by language. There is no harm in adding extra dimensions to your definition.
Note:

The SubtractTables connection property allows you to specify a comma-separated list of tables that you do not want to expose in the relational view of your data.

Defining the columns

The Metadata table can be used to define the columns in a table. The Metadata table has a list of all metrics and dimensions. Use only the metrics and dimensions that are marked with a "PUBLIC" status. By default, the driver ignores metrics and dimensions with a "DEPRECATED" status. You can expose deprecated metrics and dimensions by setting ShowDeprecatedObjects to true.

Not all combinations of metrics and dimensions are valid. Refer to the table called Incompatibility. If you see a row in that table that contains both columns, it means they can't be used in the same query.