Introduction to the Google Analytics Data Model
- Last Updated: July 2, 2025
- 6 minute read
- DataDirect Connectors
- JDBC
- Documentation
The driver exposes the Google Analytics data model in the form of a
relational schema. Google Analytics stores two types of data: management data and
analytics data. Management data consists of objects related to the management of
Google Analytics services. The driver maps these objects directly as relational
tables. Analytics data consists of all the data that Google Analytics collects about
a website. This data is stored in a table named Data. Given the volume of information stored in the Data table, and the challenges in formulating useful
queries against such a large data set, the driver hides the Data table by default. The driver provides thirteen Custom tables that
contain compatible Google Analytics metrics and dimensions. In effect, custom tables
allow you to simplify and write more effective queries against Google Analytics.
Custom tables can be added to your Google Analytics relational schema, and further
customized, using the Configuration Manager. Nevertheless, by setting the ShowInternalTables
connection property to true, you can expose the
Data table, allowing you to query it
directly.
INCOMPATIBILITY and
METADATA. The driver uses these tables to
translate the Google Analytics data model into a relational schema. The metadata for
these tables, as well as the management and custom tables, is provided in
corresponding topics beginning with the ACCOUNT topic.The following sections provide information to help you write effective queries against Google Analytics.
Google Analytics data structure
Google Analytics is a service that generates detailed statistics about a website's traffic and traffic sources. But Google Analytics is not just a database. It is a multi-dimensional hypercube containing all kinds of measurements about traffic to a website. When you connect to Google Analytics using the driver, you can reach into this repository and flatten it into relational data that can be used with your JDBC application.
Imagine a very small store of data about your website. For each hit, the Google Analytics service logs the date, language of user, country of origin, new or returning user, and their time on the site (in seconds).
| 2019-01-01 | en | US | new | 5.3 |
| 2019-01-01 | nl | DK | new | 90.4 |
| 2019-01-01 | es | ES | new | 24 |
| 2019-01-01 | ja | JP | new | 4.2 |
| 2019-01-01 | es | MX | new | 345.3 |
| 2019-01-01 | ja | JP | returning | 655.9 |
| 2019-01-02 | en | US | new | 45.7 |
| 2019-01-02 | en | US | new | 345.9 |
| 2019-01-02 | es | ES | new | 57.7 |
| 2019-01-02 | en | US | new | 6.8 |
| 2019-01-03 | es | MX | new | 876.1 |
| 2019-01-03 | ja | JP | returning | 5.7 |
| 2019-01-03 | en | GB | new | 5.6 |
| 2019-01-03 | en | US | new | 617.9 |
| 2019-01-03 | en | US | returning | 56.1 |
| 2019-01-04 | es | MX | new | 45.1 |
| 2019-01-04 | jp | JP | new | 178 |
| 2019-01-04 | en | US | returning | 103.9 |
Google Analytics collected data for this website over four days. The data is broken down by date, language, country and user type. For each visit, the time spent on the site was recorded. The time on the site is a metric, and the other columns are dimensions.
In an actual scenario, Google Analytics aggregates a vast amount of information about your website. It measures hundreds of things, and categorizes them by hundreds of dimensions. The query interface that Google Analytics provides allows you to fetch these metrics and group them. Because of the massive amount of information they store, their interface limits you to fetching up to ten metrics at a time, grouped by no more than seven dimensions.
For example, suppose you want to know how much time new visitors spent on the site. Your dimension is user type and your metric is time. You would get back two rows:
| new | 2648 |
| returning | 821.6 |
How much data you get back depends on how you ask for it. If you ask for two dimensions, you get even more data, because you get one row per permutation. Requesting how much time users have spent on each day, broken down by country, returns more rows:
| 2019-01-01 | DK | 90.4 |
| 2019-01-01 | ES | 24 |
| 2019-01-01 | JP | 660.1 |
| 2019-01-01 | MX | 345.3 |
| 2019-01-01 | US | 5.3 |
| 2019-01-02 | ES | 57.7 |
| 2019-01-02 | US | 398.4 |
| 2019-01-03 | GB | 5.6 |
| 2019-01-03 | JP | 5.7 |
| 2019-01-03 | MX | 876.1 |
| 2019-01-03 | US | 674 |
| 2019-01-04 | JP | 178 |
| 2019-01-04 | MX | 45.1 |
| 2019-01-04 | US | 103.9 |
Custom table query
The Overview table is one of the custom tables exposed by
the driver. The Overview table exposes the Audience Overview
information made available via the Google Analytics dashboard. Audience Overview has
a graph showing Sessions, Users, Pageviews, Pages/Session, Average Session Duration,
Bounce Rate and Percent of New Sessions. In the lower right, there is a breakdown of
sessions by language.
Once connected to Google Analytics, you can query custom tables in one of two ways. First, you can formulate queries against custom tables by explicitly providing the Google Analytics View ID in each WHERE clause. For example:
SELECT * FROM Overview WHERE viewId = 'ga:12345678'
Alternatively, you can specify the DefaultView connection property in your connection string. This avoids having to include the View ID in each WHERE clause. The value of DefaultView should be the value of the "View Name" field in the Google Analytics dashboard (Dashboard>Admin>View Settings>View Name). With the DefaultView property specified, the WHERE clause filter is not needed. For example:
SELECT * FROM Overview
Each of these queries provides the same results:
VIEWID |
SEGMENTID |
STARTDATE |
ENDDATE |
_BROWSER |
_OPERATINGSYSTEM |
ga:12345678 |
NULL |
"2019-01-01" |
"2019-01-30" |
NULL |
NULL
|
The DefaultQueryOptions connection property can also be used to filter results.
DefaultQueryOptions specifies the values of Google Analytics
startDate, endDate, and
viewId query parameters for WHERE clause filtering during a
session. Providing values for these parameters via DefaultQueryOptions simplifies
queries and may produce more useful results.
Direct query
As discussed above, all the analytics data in Google Analytics resides in a
hidden table named Data. If you want to query the
Data table directly, you must expose the
Data table by the ShowInternalTables
connection property to true. With ShowInternalTables=true, you could execute the
following query.
SELECT _LANGUAGE,SESSIONS FROM Data
In this case, the query against the Data
table is functionally equivalent to the following custom table query.
SELECT _LANGUAGE,SESSIONS FROM Overview
However, it should be noted that hiding the Data table avoids queries such as SELECT TOP
10 * FROM Data, which is unlikely to yield useful results.
Adding tables
As discussed above, custom tables can be added to your Google Analytics relational schema, and further customized, using the Configuration Manager. Once you open the Configuration Manager, you can add and customize tables from the Configure Logical Schema dialog (Schema Settings>Add Tables>Configure Logical Schema). This dialog allows you define a new table, or customize an existing table, in terms of metrics and dimensions. You can select up to ten metrics and seven dimensions per table. Based on your selections, a table definition is created in the form of a JSON string which is incorporated into the connection string as a value for the AddTables property. For example:
AddTables={"MyTable":["sessions","_language","_country"]}
The table that results from this example would include the three selected
columns, plus the columns viewId, segmentId, startDate
and endDate. Now, as opposed to querying the
Data table as we did above, we accomplish the
functional equivalent by querying MyTable:
SELECT _LANGUAGE,SESSIONS FROM MyTable
SELECT _LANGUAGE,SESSIONS and
SELECT * reference one
dimension, and therefore, the data is not broken down by language. There is no harm
in adding extra dimensions to your definition.The SubtractTables connection property allows you to specify a comma-separated list of tables that you do not want to expose in the relational view of your data.
Defining the columns
The Metadata table can be used to define the columns in a
table. The Metadata table has a list of all metrics and dimensions.
Use only the metrics and dimensions that are marked with a "PUBLIC" status. By
default, the driver ignores metrics and dimensions with a "DEPRECATED" status. You
can expose deprecated metrics and dimensions by setting ShowDeprecatedObjects to
true.
Not all combinations of metrics and dimensions are valid. Refer to the table
called Incompatibility. If you see a row in that table that
contains both columns, it means they can't be used in the same query.