Visit the Dev Docs
A better version of this page exists at https://developer.mixpanel.com/docs/mixpanel-bigquery-export-design.This guide describes how Mixpanel exports your data to a Google BigQuery dataset.
You must provide a Google group email address to use the BigQuery export by using Mixpanel's Data Warehouse Export API.
Mixpanel exports transformed data into BigQuery at a specified interval.
Mixpanel creates a dataset in its own BigQuery instance and gives "View" access to the account(s) provided at the time of creating the pipeline.
Mixpanel applies transformation rules to make the data compatible with data warehouses.
This section describes the schemas for BigQuery. Event data and user data are loaded into separate BigQuery tables. User profile data is loaded into one table, and events are loaded into either a single table for each event or one table for all events.
Using One Table for All Events
You can export all of your Mixpanel data into a single BigQuery table. Mixpanel recommends a single table schema because it makes querying the data simpler.
If you select the single table schema, Mixpanel creates a mp_master_event table.
The table has one column per unique property name across all events in the history of the dataset.
Suppose you have an event with a purchase_amount
property and one with a referrer
property, the table will have a purchase_amount column and a referrer column. Events that don’t have a property contain a NULL value in that column.
For examples of one table for all events, see One Table for All Events.
Using One Table for Each Event
Note
One table for each event is not available during the trial period.
Each Mixpanel event can be loaded into its own table.
If you select the table per event option, Mixpanel creates one table per unique event name. Each table is named for the event. For example, if you have an event called signup
, the table name is signup.
Each table for each event has one column per unique property name across all events in the history of the dataset.
For an example of one table for each event, see One Table for Each Event.
Here is an example dataset with different schema options in either one table for all events or one table for each event.
{
“event”: “Signup”,
“properties”: {
“plan”: “free”,
“browser”: “chrome”,
“distinct_id”: 1,
“time”: 123
}
}
{
“event”: “Purchase”,
“properties”: {
“product_id”: “123”,
“browser”: “chrome”,
“distinct_id”: 1,
“time”: 124
}
}
{
“event”: “Signup”,
“properties”: {
“plan”: “paid”,
“browser”: “firefox”,
“ab_test”: 1,
“distinct_id”: 2,
“time”: 125
}
“Signup”
123
1
“free”
“chrome”
NULL
NULL
“Purchase”
124
1
NULL
“chrome”
“123”
NULL
“Signup”
125
2
“paid”
“firefox”
NULL
1
123
1
“free”
“chrome”
125
2
“paid”
“firefox”
1
Table: purchase
124
1
“chrome”
“123”
Note
Nested and repeated fields are not supported during the trial period. The values are flattened currently.
BigQuery supports nested and repeated fields. Mixpanel exports array properties as repeated fields and complex objects as nested fields into BigQuery.
Comments
Please sign in to leave a comment.