Visit the Dev Docs
A better version of this page exists at https://developer.mixpanel.com/docs/data-warehouse-export-api.The Data Pipelines API contains a list of endpoints that are supported by Mixpanel that help you create and manage your data pipelines.
A pipeline is an end to end unit that is created to export Mixpanel data and move it into a data warehouse.
Trial Version
The Data Warehouse Export API offers a one-time trial. You can call and schedule a trial export by passing trial = true
when creating a pipeline. The trial export will automatically stop after 30 calendar days.
Mixpanel currently supports a data warehouse export pipeline and a raw data pipeline. When you create a pipeline, the type
parameter determines whether a data warehouse export pipeline or a raw data pipeline is created.
Export to Data Warehouse
The data warehouse export pipeline is a fully managed pipeline that includes transformations and scheduling. Visit the data warehouse export documentation for more information.
Raw Export Pipeline
The raw export pipeline is a scheduled export that moves your unaltered Mixpanel data to a blob storage destination. Visit the raw export pipeline documentation for more information.
Before exporting data from Mixpanel you must configure your data warehouse to accept the data.
For additional information on configuring the Mixpanel export for each type of data warehouse, see:
To ensure the security of your data, the Mixpanel API requires a basic system of authentication.
Required Parameter
api_secret
- This can be found by clicking on the settings gear in the upper righthand corner and selecting Project Settings.
Authorization Steps
The Data Export API accepts basic access authentication over HTTPS as an authorization method. To make an authorized request, put your project's API Secret in the "username" field of the basic access authentication header. Make sure you use HTTPS and not HTTP - our API rejects requests made over HTTP, since this sends your API Secret over the internet in plain text.
Request Type: POST
This request creates the export pipeline. The type
parameter defines the kind of pipeline that is initiated. The following data warehouse types are supported:
bigquery
Mixpanel exports events and/or user data into Google BigQuery.aws
This options creates the S3 data export and glue schema pipeline. Mixpanel exports events and/or user data as JSON packets. Mixpanel also creates schema for the exported data in AWS Glue. Customers can use AWS Glue to query the exported data using AWS Athena or AWS Redshift Spectrum.snowflake
This option creates the Snowflake export pipeline. Mixpanel exports events and/or user data into Snowflake.
URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/create
Headers:
application/x-www-form-urlencoded
Parameters:
type
stringrequired
Data Warehouse Export:
Type parameters include bigquery
, snowflake
, azure-blob
and aws
.
Raw Export Pipeline: Type parameters include s3-raw
, gcs-raw
and azure-raw
. Initializes s3-raw
, gcs-raw
, or azure-raw
pipelines accordingly.
trial
booleanoptional
Default: false
. A trial pipeline will be created if value is true
.
The trial exports all of your events and user data for thirty calendar days, starting from one day before the API call was made. A trial pipeline has default values for the following parameters:
data_source: events and people
sync: false
from_date: <defaults to previous day>
to_date: <no value>
frequency: daily
events: <no value>
schema_type
stringoptional
Default: monoschema
. Allowed options are monoschema
and multischema
. monoschema
loads all events into a single table. multischema
loads every event into its own dedicated table. All user data is exported as monoschema
.
data_source
stringoptional
Default: events
. data_source
can be either events
or people
. events
exports Mixpanel event data. people
exports Mixpanel user data.
sync
booleanoptional
Default: false
. A value of true
updates exported data with any changes that occur in your Mixpanel dataset. These changes include deletions, late data, and imports that fall into your export window.
from_date
stringrequired
The starting date of the export window. It is formatted as YYYY-MM-DD
.
to_date
stringoptional
The ending date of the export window. It is formatted as YYYY-MM-DD
. The export will continue indefinitely if to_date
is empty.
frequency
stringoptional
Default: daily
. frequency
can be either hourly
or daily
. hourly
exports the data every hour. daily
exports the data at midnight (based on the projects timezone). frequency
should only be passed if your export window is indefinite.
events
stringoptional
A whitelist for the event you intend to export. It is okay to pass this multiple times to whitelist multiple events.
All events in the project will be exported if no events are specified.
where
stringoptional
A selector expression used to filter by events
data, such as event properties. Learn more about how to construct event selector expressions here. This parameter is only valid when data_source
is events.
data_format
stringoptional
Default: json
. The file format of the exported data. data_format
can be either json
or parquet
.
Return: Create API returns the name of the pipeline created. Use the name of the pipeline to check the status of or cancel the pipeline.
For BigQuery pipelines, the request returns the BigQuery dataset name and URL. Use this URL to access the BigQuery dataset.
Mixpanel creates the dataset within its own Google Cloud Platform project. The service shares a read only view of the dataset created with the user/group provided to the API endpoint.
{
"pipeline_names":[
"trial-events-daily-bigquery-monoschema",
"trial-people-daily-bigquery-monoschema"
],
"bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}
Mixpanel creates a dataset in its own BigQuery instance and gives "View" access to the account(s) provided at the time of creating the pipeline.
The following parameters are specific to BigQuery exports.
bq_region
stringrequired
Default: US
.
The following regions are supported for BigQuery:US
US_EAST_1
US_WEST_2
US_EAST_4
NORTH_AMERICA_NORTHEAST_1
SOUTH_AMERICA_EAST_1
EU
EUROPE_NORTH_1
EUROPE_WEST_2
EUROPE_WEST_3
EUROPE_WEST_6
ASIA_SOUTH_1
ASIA_EAST_1
ASIA_EAST_2
ASIA_NORTHEAST_1
ASIA_NORTHEAST_2
ASIA_NORTHEAST_3
ASIA_SOUTHEAST_1
ASIA_SOUTHEAST_2
AUSTRALIA_SOUTHEAST_1
bq_share_with_group
string required
Group account email address to share the data-set with.
Example Request
#Replace API_SECRET with your project's API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="bigquery" \
-d bq_region="US_EAST_4" \
-d trial=true \
-d bq_share_with_group="bq-access-alias@somecompany.com" \
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase"
Example Response
Use the URL that returns as the bigquery_dataset_name
to access the BigQuery dataset.
{
"pipeline_names":[
"trial-events-daily-bigquery-monoschema",
"trial-people-daily-bigquery-monoschema"
],
"bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}
snowflake_share_with
stringrequired
Name of the account with which the data-set should be shared
region
stringrequired
The valid region for the Snowflake instance:us-west-aws
us-east-aws
Example Request
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="snowflake" \
-d region="us-west-aws" \
-d trial=true \
-d snowflake_share_with="mysnowflakeaccountname" \
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase"
s3_bucket
stringrequired
s3 bucket to which the data needs to be exported.
s3_region
stringrequired
The valid s3 region for the bucket.
The following regions are supported for AWS S3:us-east-2
us-east-1
us-west-1
us-west-2
ap-south-1
ap-northeast-3
ap-northeast-2
ap-southeast-1
ap-southeast-2
ap-northeast-1
ca-central-1
cn-north-1
cn-northwest-1
eu-central-1
eu-west-1
eu-west-2
eu-west-3
eu-north-1
sa-east-1
s3_role
stringrequired
There is no default value. AWS Role the writer should assume when writing to s3.
s3_prefix
stringoptional
There is no default value. The path prefix for the export.
s3_encryption
stringoptional
Default: none
. Options are none
, aes
and kms
. At rest encryption used by the s3 bucket.
s3_kms_key_id
stringoptional
There is no default value. If s3_encryption is set to kms, this can specify the custom key id you desire to use.
use_glue
booleanoptional
Default: false
, Use glue schema export.
glue_database
string conditionally required
The glue database to which the schema needs to be exported. Required if use_glue
is true
.
glue_role
string conditionally required
There is no default value. The role that needs to be assumed for updating glue. Required if use_glue
is true
.
glue_table_prefix
stringoptional
There is no default value. Prefix to add to table names when creating them.
Example Request
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="aws" \
-d trial=true \
-d s3_bucket="example-s3-bucket" \
-d s3_region="us-east-1" \
-d s3_prefix="example_custom_prefix" \
-d s3_role="arn:aws:iam::<account-id>:role/example-s3-role" \
-d use_glue=true \
-d glue_database="example-glue-db" \
-d glue_role="arn:aws:iam::<account-id>:role/example-glue-role" \
-d glue_table_prefix="example_table_prefix" \
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase"
Additional Azure Parameters
The following parameters are specific to Azure Blob Storage, Azure Data Lake, and Azure Raw exports.
storage_account
stringrequired
Blob Storage Account where the data will be exported.
container_name
stringrequired
The Blob Container within the account where data will be exported.
prefix
stringoptional
A custom prefix for all the data being exported to the container.
client_id
stringrequired
clientId
from the Service Principal credentials.
client_secret
stringrequired
clientSecret
from the Service Principal credentials.
tenant_id
stringrequired
tenantId
from the Service Principal credentials. This is specific to the Active Directory instance where the Service Principal resides.
Example Request
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="azure-blob" \
-d trial="true" \
-d data_format="parquet" \
-d storage_account="mystorageaccount" \
-d container_name="mixpanel-export" \
-d prefix="custom_prefix/for/data" \
-d schema_type="multischema" \
-d client_id="REDACTED" \
-d client_secret="REDACTED" \
-d tenant_id="REDACTED"
#Whitelist a "Page View" and "Item Purchase" event
-d events="Page View" \
-d events="Item Purchase
Additional GCS Raw Scheduled Export Parameters
The following parameters are specific to raw scheduled exports to GCS blob storage.
gcs_bucket
String required
The GCS bucket to export the Mixpanel data to.
gcs_prefix
String required
The GCS path prefix of the bucket.
gcs_region
String required
The GCS region for the bucket.
The following regions are supported for GCS:
"northamerica-northeast1"
"us-central1"
"us-east1"
"us-east4" "us-west1”
"us-west2"
"southamerica-east1"
"europe-north1"
"europe-west1"
"europe-west2"
"europe-west3"
"europe-west4"
"europe-west6"
"asia-east1"
"asia-east2"
"asia-northeast1”
"asia-northeast2"
"asia-northeast3"
"asia-south1"
"asia-southeast1" "australia-southeast1"
Request Type: POST
For a given pipeline name, this request cancels the pipeline and stops any future jobs to be scheduled for the pipeline.
URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel
Headers:
application/x-www-form-urlencoded
Parameters:
name
stringrequired
The name that uniquely identifies the pipeline.
Example Request:
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel \
-u API_SECRET: \
-d name="sample_job_name"
Return: 200 OK
indicates a successful cancellation. Any other message indicates failure of the cancellation.
Request Type: POST
Given the name of the pipeline this API returns the status of the pipeline. It returns the summary and status of all the recent run export jobs for the pipeline.
URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/status
Headers:
application/x-www-form-urlencoded
Parameters:
name
stringrequired
The name that uniquely identifies the pipeline.
summary
stringoptional
Default: false
. Only lists task count by status and no details.
status
array of stringsoptional
Filters the tasks by the given status. Valid options for status are pending, running, retried, failed, canceled, and timed_out.
Example Request: Status with Summary
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d name="YOUR_PIPELINE_NAME" \
-d summary="true"
Example Return: Status With Summary
// with summary
{
"canceled": 933,
"retried": 80,
"succeeded": 1
}
Example Request: Status with no Summary and a Filter
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d name="YOUR_PIPELINE_NAME" \
-d status="running
Example Return: Status with no Summary and a Filter
//no summary.
{
"canceled": [
{
"name": "company-july-2016-backfill-hourly-monoschema",
"state": "canceled",
"last_finish": "0000-12-31T16:00:00-08:00",
"run_at": "2016-07-26T00:00:00-07:00",
"from_date": "2016-07-26T00:00:00-07:00",
"to_date": "2016-07-26T00:00:00-07:00"
},
{
"name": "company-july-2016-backfill-hourly-monoschema",
.
.
Request Type: GET
This API endpoint returns the list of all the pipelines scheduled for a project.
URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs
Example Request:
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs \
-u API_SECRET:
Example Result
{
"9876543210": [
{
"name": "events-daily-bigquery-monoschema",
"Dispatcher": "backfill",
"last_dispatched": "2019-02-01 12:00:00 US/Pacific",
"frequency": "hourly",
"sync_enabled": "true"
}
]
}
Request Type: GET
This endpoint returns the timestamps of all syncs grouped by date.
URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/timeline
Parameters:
name
stringrequired
The name that uniquely identifies the pipeline.
Example Request:
#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/timeline \
-u API_SECRET: \
-d name=”YOUR_PIPELINE_NAME”
Example Return:
{
"day_syncs": [
"date": "2019-08-19",
"sync_times": [
"2019-08-19 14:27:46.044605 -0700 PDT"
],
"status": "synced"
},
{
"date": "2019-08-20",
"sync_times": [
"2019-08-20 14:33:09.315098 -0700 PDT"
],
"status": "synced"
},
]
}
Comments
Please sign in to leave a comment.