Visit the Dev DocsA better version of this page exists at https://developer.mixpanel.com/docs/data-warehouse-export.
Mixpanel enables you to export your Mixpanel data to four data warehouse solutions:
This documentation targets users with intermediate or advanced knowledge of databases and specific knowledge of Amazon Web Services, Google Cloud Platform, or Snowflake technology.
Data Warehouse Export is currently available as a 30 day trial. You can create one data pipe during this period. Data pipelines will close when the trial period ends.
You must configure your data warehouse to accept the data before you can export data from Mixpanel to a data warehouse.
For additional information on configuring the Mixpanel export for each type of data warehouse, see:
After configuring the data warehouse, you can use Mixpanel's data warehouse export API to export the data by creating a data pipeline.
After configuring the warehouse and creating a pipeline with the API, you can begin to query Mixpanel data from the data warehouse. This opens up the use of SQL use from within Google Bigquery, Snowflake, and Amazon Redshift Spectrum.
Mixpanel can export event and user data.
The event data that is exported using the data warehouse export API is the same data exported from the Mixpanel data export API.
The user data that is exported using the data warehouse export API is the same data exported from the Mixpanel engage API.
Mixpanel supports hourly and daily exports.
When an export is started, a backfill can be scheduled so that historical data is also exported to the data warehouse.
Historical data will be exported one day at a time over the course of a few days when a pipeline is created. Use the from_date parameter to specify when the date you want to use to export historical data.
The completion time for a backfill depends on the number of days and the amount of data in the project.
Scheduling is daily as part of the trial.
User data is exported to a single table named
mp_people_data (user data is accessible as a view of
mp_people_data in BigQuery).
Since user profiles are mutable, the data in the table is replaced every time an export happens based on schedule (daily or hourly).
Exports from projects with ID merge enabled will need to use the identity mapping table to replicate the user counts seen in UI reporting. When ID merge is enabled, Mixpanel assigns multiple identifiers to an individual. Mixpanel resolves these into one identifier, and uses that for reporting unique user counts. Read more about how Mixpanel resolves IDs here.
Pipelines export event data as they appear when Mixpanel ingests them. This means exported event data has the original user identifier, not the resolved identifier. Use the identity mappings table to accurately count unique users. This will allow you to recreate the identity cluster that Mixpanel creates.
Mixpanel automatically exports the ID mapping table when you create a people export pipeline from a project with ID merge enabled.
When using the ID mappings table, you should use the resolved
distinct_id in place of the non-resolved
distinct_id whenever present. If there is no resolved
distinct_id, you can then use the
distinct_id from the existing people or events table.
Below is an example SQL query that references the ID mapping table to count unique users in San Francisco viewing Mixpanel's Insights report:
SELECT CASE WHEN m.resolved_distinct_id IS NOT NULL THEN m.resolved_distinct_id WHEN m.resolved_distinct_id IS NULL THEN e.distinct_id END as resolved_distinct_id, tab_1, COUNT(tab_1) AS count FROM mixpanel_nessie_day_partitioned_4.mp_master_event e FULL OUTER JOIN mixpanel_nessie_day_partitioned_4.mp_identity_mappings_data_tOMEp1HMp2 m ON e.distinct_id = m.distinct_id WHERE tab_1="insights" AND mp_city="San Francisco" AND CAST(time AS DATETIME) >= "2020-04-01" GROUP BY resolved_distinct_id, tab_1 LIMIT 100
Mixpanel has the following policy for data latency:
Mixpanel adds 24 hours of end to end latency between when the data is exported from Mixpanel until the data reaches the data warehouse.
Mixpanel adds an additional 24 hours for data that reaches the pipeline late. Mixpanel defines late data as any data point or user profile update that reaches Mixpanel servers later than two hours after the end of export window.
Event data stored in Mixpanel’s datastore and event data in the data warehouse can fall out of sync.
The discrepancy can be be attributed to several different causes:
- Late data can arrive multiple days later due to a mobile client being offline.
- The import API can add data to previous days.
- Delete requests related to GDPR can cause deletion of events and event properties.
Mixpanel is able to detect any changes in your data and replace the old data with the latest version. Data sync helps keep the data fresh, minimizes missing data points, and most importantly keeps your data warehouse GDPR compliant.
Sync is not available as part of the trial.
Event data needs to be cleaned when some characters are not legal for table or column names, or when collisions occur in the dataset. This section provides the rules on how Mixpanel cleans data.
Mixpanel applies these rules for table and column names:
- Special characters and whitespaces are replaced with
- Letters are converted to lowercase.
- Maximum name length is 128 characters. (The name is truncated after 128 characters.)
- Properties that start with a
$(dollar sign) will have a prefix of
- Properties that conflict with reserved keywords are prefixed with
There are several naming transformations that occur based on character conflicts.
Capitalized letters are converted to lower case. Any duplicate strings will have a numeric value appended to them to differentiate.
For example if values “XY” and “Xy” are sent in:
- Both are transformed to “xy”.
- If “Xy” is sent in after "XY", it becomes “xy_1”.
- Any subsequent “xy” values inherit incremental numeric values (i.e. xy_2, xy_3, etc.).
Mixpanel transforms values to resolve type conflicts.
If a property value is passed with a particular data type and is passed subsequently with a different data type, then the new data type appends to the property name.
For example, if “X” appears with type INT first, then subsequently appears as type STRING (or VARCHAR), then the property name will be transformed to "x_string" with a data type of string.
New properties that were not present in previous imports will append to the old rows with a
Mixpanel offers a 30-day trial version of the Data Warehouse Export feature and the Data Pipelines API. The trial allows for one data export pipeline per project to be created. The pipeline will close and no more data will be exported from Mixpanel after the trial period has expired.
Deleting the one pipeline for a project will not allow you to make a second pipeline for that same project.
A trial pipeline has several limitations when compared to the full feature. In the trial:
- Export scheduling is daily only.
- Data sync is unavailable.
- You can only create one pipeline per project.
- Data is only backfilled for the one day, being the previous day from when the pipeline is created.
- Pipelines will, by default, include both event and user data.
- The pipeline cannot filter by event name.
- The “Create Pipeline” parameters will default to the values highlighted to list in the parameters table.
- BigQuery pipelines will flatten nested or repeated fields.