Introduction to the Google Analytics 4 data model
- Last Updated: June 12, 2023
- 4 minute read
- DataDirect Connectors
- ODBC
- Documentation
The driver exposes the Google Analytics data model in the form of a relational schema. Google Analytics stores two types of data: admin data and analytics data. Admin data consists of objects related to the management of Google Analytics services. The driver maps these objects directly as relational tables. Analytics data consists of all the data that Google Analytics collects about a website.
Given the large volume of analytics data available and the limitations around which aspects of analytic data that can be returned in a single result set, it can be challenging to formulate valid SQL queries. To address this issue, the driver requires that you create custom tables containing the specific metrics and dimensions that you want to run queries against. The Configuration Manager includes a table designer tool to assist with custom table creation. See Adding custom tables for details.
INCOMPATIBILITY and
METADATA. The driver uses these tables to
translate the Google Analytics data model into a relational schema. The metadata for
these tables, as well as the admin and custom tables, is provided in corresponding
topics beginning with the ACCOUNTS topic.The following sections provide information to help you write effective queries against Google Analytics.
Google Analytics data structure
Google Analytics is a service that generates detailed statistics about a website's traffic and traffic sources. But Google Analytics is not just a database. It is a multi-dimensional hypercube containing all kinds of measurements about traffic to a website. When you connect to Google Analytics using the driver, you can reach into this repository and flatten it into relational data that can be used with your JDBC application.
Imagine a very small store of data about your website. For each hit, the Google Analytics service logs the date, language of user, country of origin, new or returning user, and their time on the site (in seconds).
| 2019-01-01 | en | US | new | 5.3 |
| 2019-01-01 | nl | DK | new | 90.4 |
| 2019-01-01 | es | ES | new | 24 |
| 2019-01-01 | ja | JP | new | 4.2 |
| 2019-01-01 | es | MX | new | 345.3 |
| 2019-01-01 | ja | JP | returning | 655.9 |
| 2019-01-02 | en | US | new | 45.7 |
| 2019-01-02 | en | US | new | 345.9 |
| 2019-01-02 | es | ES | new | 57.7 |
| 2019-01-02 | en | US | new | 6.8 |
| 2019-01-03 | es | MX | new | 876.1 |
| 2019-01-03 | ja | JP | returning | 5.7 |
| 2019-01-03 | en | GB | new | 5.6 |
| 2019-01-03 | en | US | new | 617.9 |
| 2019-01-03 | en | US | returning | 56.1 |
| 2019-01-04 | es | MX | new | 45.1 |
| 2019-01-04 | jp | JP | new | 178 |
| 2019-01-04 | en | US | returning | 103.9 |
Google Analytics collected data for this website over four days. The data is broken down by date, language, country and user type. For each visit, the time spent on the site was recorded. The time on the site is a metric, and the other columns are dimensions.
In an actual scenario, Google Analytics aggregates a vast amount of information about your website. It measures hundreds of things, and categorizes them by hundreds of dimensions. The query interface that Google Analytics provides allows you to fetch these metrics and group them. Because of the massive amount of information they store, their interface limits you to fetching up to ten metrics at a time, grouped by no more than nine dimensions.
For example, suppose you want to know how much time new visitors spent on the site. Your dimension is user type and your metric is time. You would get back two rows:
| new | 2648 |
| returning | 821.6 |
How much data you get back depends on how you ask for it. If you ask for two dimensions, you get even more data, because you get one row per permutation. Requesting how much time users have spent on each day, broken down by country, returns more rows:
| 2019-01-01 | DK | 90.4 |
| 2019-01-01 | ES | 24 |
| 2019-01-01 | JP | 660.1 |
| 2019-01-01 | MX | 345.3 |
| 2019-01-01 | US | 5.3 |
| 2019-01-02 | ES | 57.7 |
| 2019-01-02 | US | 398.4 |
| 2019-01-03 | GB | 5.6 |
| 2019-01-03 | JP | 5.7 |
| 2019-01-03 | MX | 876.1 |
| 2019-01-03 | US | 674 |
| 2019-01-04 | JP | 178 |
| 2019-01-04 | MX | 45.1 |
| 2019-01-04 | US | 103.9 |
Adding custom tables
Custom tables can be added to your Google Analytics relational schema, and further customized, using the Configuration Manager. Once you open the Configuration Manager, you can add and customize tables from the Configure Logical Schema button on the Schema Settings tab. This tool allows you define a new table, or customize an existing table, in terms of metrics and dimensions. You can select up to ten metrics and nine dimensions per table. Based on your selections, a table definition is created in the form of a JSON string which is incorporated into the connection string as a value for the Add Tables (AddTables) option. For example:
AddTables='{"MyTable":["sessions","_language","_country"]}'
"_browser". The driver uses the
underscore character to distinguish dimensions from metrics.The table that results from this example would include the three selected
columns, plus the columns currencyCode,
keepEmptyRows, propertyId,
rowId, and timeZone. The
following is an example query for your custom table:
SELECT _LANGUAGE,SESSIONS FROM MyTable
Paging support
The driver employs row offset paging when returning results using the Reports API. The page size returned for results is set to 10,000 rows.
Realtime reports
Realtime reports return events and usage data for periods ranging from the present to 30 minutes ago (up to 60 minutes for Google Analytics 360 properties). Reports can return up to 14 dimensions and 4 metrics. For more information on the supported dimensions and metrics, refer to the documentation for the Google Analytics Data API.
startMinutesAgo: The inclusive start minutes for the query with Realtime Report API. The default is29(twenty-nine minutes prior to the current time).endMinutesAgo: The inclusive end minutes for the query with Realtime report API. The default is0(current time).
SELECT _appVersion,_city,activeUsers,eventCount FROM REALTIMEREPORTS where propertyID=123344545
Note that paging is not supported when querying realtime reports, and a maximum of 100,000 rows can be returned.
Related Links
- ACCOUNTS
- ACCOUNTS_DATASHARINGSETTINGS
- ACCOUNTS_SEARCHCHANGEHISTORYEVENTS
- ACCOUNTS_SEARCHCHANGEHISTORYEVENTS_CHANGES
- ACCOUNTS_USERLINKS
- ACCOUNTS_USERLINKS_AUDIT
- ACCOUNTS_USERLINKS_AUDIT_DIRECTROLES
- ACCOUNTS_USERLINKS_AUDIT_EFFECTIVEROLES
- ACCOUNTS_USERLINKS_DIRECTROLES
- ACCOUNTS_USERLINKS_NAMES
- ACCOUNTSLIST
- ACCOUNTSUMMARIES
- ACCOUNTSUMMARIES_PROPERTYSUMMARIES
- INCOMPATIBILITY
- METADATA
- METADATA_DIMENSIONS
- METADATA_DIMENSIONS_DEPRECATEDAPINAMES
- METADATA_METRICS
- METADATA_METRICS_BLOCKEDREASONS
- METADATA_METRICS_DEPRECATEDAPINAMES
- PROPERTIES
- PROPERTIES_ACCESSREPORT
- PROPERTIES_ACCESSREPORT_DIMENSIONHEADERS
- PROPERTIES_ACCESSREPORT_METRICHEADERS
- PROPERTIES_ACCESSREPORT_ROWS
- PROPERTIES_ACKNOWLEDGEUSERDATACOLLECTION
- PROPERTIES_ATTRIBUTIONSETTINGS
- PROPERTIES_AUDIENCES
- PROPERTIES_AUDIENCES_FILTERCLAUSES
- PROPERTIES_AUDIENCES_SEQUENCESTEPS
- PROPERTIES_CONVERSIONEVENTS
- PROPERTIES_CUSTOMDIMENSIONS
- PROPERTIES_CUSTOMMETRICS
- PROPERTIES_CUSTOMMETRICS_RESTRICTEDMETRICTYPE
- PROPERTIES_DATARETENTIONSETTINGS
- PROPERTIES_DATASTREAMS
- PROPERTIES_DATASTREAMS_GLOBALSITETAG
- PROPERTIES_DATASTREAMS_MEASUREMENTPROTOCOLSECRETS
- PROPERTIES_DISPLAYVIDEO360ADVERTISERLINKS
- PROPERTIES_DISPLAYVIDEO360ADVERTISERLINKPROPOSALS
- PROPERTIES_FIREBASELINKS
- PROPERTIES_GOOGLEADSLINKS
- PROPERTIES_GOOGLESIGNALSSETTINGS
- PROPERTIES_USERLINKS
- PROPERTIES_USERLINKS_AUDIT
- PROPERTIES_USERLINKS_AUDIT_DIRECTROLES
- PROPERTIES_USERLINKS_AUDIT_EFFECTIVEROLES
- PROPERTIES_USERLINKS_DIRECTROLES
- PROPERTIESLIST
- REALTIMEREPORTS