Every data point sent to Mixpanel is stored as JSON in our data store, with the exception of Autotrack events.
This export API allows you to download your raw event data as it is received and stored within Mixpanel, complete with all event properties (including distinct_id) and the exact timestamp the event was fired. This returned raw JSON can then be used for a variety of tasks.
All data returned from the export API is real-time.
The specific URL endpoint for
data.mixpanel.com, which differs from that of our main API. Some considerations around using this:
Mixpanel API Endpoints
The first step to exporting your data from Mixpanel is choosing the API endpoint you need to use depending on what data you need:
Raw export API: Provides a full export of your raw Event data; i.e., instead of returning the number of users who performed an action on a given day like a Mixpanel report would, it returns all instances of that action and all associated properties with each action.
Engage API: Provides access to Mixpanel People data, including all users with their associated people properties.
Historical Data Export
The amount of historical raw data you can access depends on your plan type–free, startup, or enterprise. To determine how much data history you can access, visit the pricing page.
Shortcuts for Exporting Data
This Gist contains a sample Python script that downloads raw event data from the /export/ API endpoint and writes it to a file.
This Gist contains a sample Python script that downloads People data from the /engage/ API endpoint and writes it to a file.
- You can export the past 12 months of events by using a JQL query, or with Mixpanel's export API's. The Mixpanel-API Python module makes this easy.
Set Up An ETL (Extract, Transform, and Load) Configuration
Once you’ve exported your data, you can manipulate the response before sending it on to another platform - i.e., grab the specific part of the data you need from the response. For example, if you’re exporting People data to MailChimp to sync user unsubscribe status, you can look for a user’s email address and the value of the$unsubscribed key in a People export.
Once you grab the desired data from the dataset and format the data appropriately, you would then hit another API such as MailChimp or Salesforce to add this Mixpanel data into the other platform in the relevant format for that platform.
The best way to automate this process is to first run tests until a working ETL model is built between Mixpanel and the final data location. Once you’ve built a successful test model, you can automate the process to run at desired intervals (with a cron or scheduled job).
For example, your crontab file for the example above might contain an entry like:
# start the Mixpanel ETL job daily at 6am server time 0 6 * * * /usr/bin/mixpanel_to_mailchimp_etl.py
- Due to client-side queueing, offline iOS and Android data may take up to 5 days to enter the raw data store.
- For this API, returned timestamps are expressed in seconds since January 1, 1970 in your project's timezone, not UTC. This means that converting the raw exported timestamps using many epoch converters will result in incorrect offsets, as generally epoch timestamps are assumed to be in UTC. You must add back the offset between project time and UTC before storing or processing the data. For example, if your project is set to Pacific time, you would need to add 7 hours (or 8 hours if not in daylights savings time) (60 min * 60 secs * 7 hours) to the timestamp in order to convert this timestamp into UTC.
- This endpoint uses gzip to compress the transfer; as a result, raw exports should not be processed until the file is received in its entirety. While this process is normally quick and results in a smaller file size, some large exports can take a few minutes to generate. Ensure the timeout set on the receiving client is large enough to account for this process (e.g. larger than 60 seconds).
- Data returned from this endpoint is JSONL (newline-delimited JSON). Most receiving client libraries will automatically assume it gets a JSON string back and attempt to decode it. This specific API does not return valid JSON in aggregate, but each row is valid JSON within the API's output. Thus, raw exports, once received in full, should be parsed line-by-line instead of as an array of JSON objects.
Example Usage Cases
- If you receive a spike of 10K events but notice that only a few users contributed to it and would like to dive deeper into the data.
- If you are buying mobile ads and would like to dive deeper into the exact UDIDs and see who you really can attribute to the install.