Data Warehouse Export API

Contents: [ Configure the Data Warehouse , Authentication , Create a Pipeline , Cancel a Pipeline , Check the Status of a Pipeline , Get a List of Scheduled Pipelines ]
[Trial Version , Required Parameter, Authorization Steps, Additional BigQuery Parameters, Additional Snowflake Parameters, Additional AWS S3 and Glue Parameters]

Visit the Dev Docs

A better version of this page exists at https://developer.mixpanel.com/docs/data-warehouse-export-api.

The data warehouse export API contains a list of endpoints that are supported by Mixpanel to help you create and manage your data warehouse pipeline.

A pipeline is an end to end unit that is created to export Mixpanel data and pipe it into a data warehouse.

Trial Version

The Data Warehouse Export API offers a one-time trial. You can call and schedule a trial export by passing trial = true when creating a pipeline. The trial export will automatically stop after 30 calendar days.

Configure the Data Warehouse

Before exporting data from Mixpanel you must configure your data warehouse to accept the data.

For additional information on configuring the Mixpanel export for each type of data warehouse, see:

Authentication

To ensure the security of your data, the Mixpanel API requires a basic system of authentication.

Required Parameter

api_secret - This can be found by clicking on the settings gear in the upper righthand corner and selecting Project Settings.

Authorization Steps

The Data Export API accepts basic access authentication over HTTPS as an authorization method. To make an authorized request, put your project's API Secret in the "username" field of the basic access authentication header. Make sure you use HTTPS and not HTTP - our API rejects requests made over HTTP, since this sends your API Secret over the internet in plain text.

Create a Pipeline


Request Type: POST

This request creates the export pipeline. The type parameter defines the kind of pipeline that is initiated. The following data warehouse types are supported:

  1. bigquery Mixpanel exports events and/or people data into Google BigQuery.

  2. aws This options creates the S3 data export and glue schema pipeline. Mixpanel exports events and/or people data as JSON packets. Mixpanel also creates schema for the exported data in AWS Glue. Customers can use AWS Glue to query the exported data using AWS Athena or AWS Redshift Spectrum.

  3. snowflake This option creates the Snowflake export pipeline. Mixpanel exports events and/or people data into Snowflake.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/create

Headers:

Content Type

application/x-www-form-urlencoded

Parameters:

Parameter
Type
Description

type

array of strings
required

Type parameters include bigquery, snowflake and aws. Initializes bigquery, snowflake or aws pipelines accordingly.

trial

boolean
optional

Default: false. A trial pipeline will be created if value is true.

The trial exports all of your events and people data for thirty calendar days, starting from one day before the API call was made. A trial pipeline has default values for the following parameters:

schema_type : monoschema
data_source: events and people
sync: false
from_date: <defaults to previous day>
to_date: <no value>
frequency: daily
events: <no value>

schema_type

string
optional

Default: monoschema. Allowed options are monoschema and multischema. monoschema loads all events into a single table. multischema loads every event into its own dedicated table. All people data is exported as monoschema.

data_source

string
optional

Default: events. data_source can be either events or people. events exports Mixpanel event data. people exports Mixpanel people data.

sync

boolean
optional

Default: false. A value of true updates exported data with any changes that occur in your Mixpanel dataset. These changes include deletions, late data, and imports that fall into your export window.

from_date

string
required

The starting date of the export window. It is formatted as YYYY-MM-DD.

to_date

string
optional

The ending date of the export window. It is formatted as YYYY-MM-DD. The export will continue indefinitely if to_date is empty.

frequency

string
optional

Default: daily. frequency can be either hourly or daily. hourly exports the data every hour. daily exports the data at midnight (based on the projects timezone). frequency should only be passed if your export window is indefinite.

events

string
optional

A whitelist for events you intend to export. All events in the project will be exported if no events are specified.

where

string
optional\

A selector expression used to filter events data. Learn more about how to construct event selector expressions here. This parameter is only valid when data_source is events.

Return: Create API, on success, returns the name of the pipeline that will be used for any future calls to the pipeline. There is no return for Snowflake and AWS create requests.

For BigQuery pipelines, the request returns the BigQuery dataset name and URL. Use this URL to access the BigQuery dataset.

Mixpanel creates the dataset within its own Google Cloud Platform project. The service shares a read only view of the dataset created with the user/group provided to the API endpoint.

{  
   "pipeline_names":[  
      "trial-events-daily-bigquery",
      "trial-people-daily-big query"
   ],
   "bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}

Additional BigQuery Parameters

The following parameters are specific to BigQuery exports.

Parameter
Type
Description

bq_region

string
optional

Default: US. Allowed choices are US, EU and ASIA_NORTHEAST_1. Location of BigQuery instance.

bq_share_with_user

string
conditionally required

List of account email addresses to share the data-set with.

Required if bq_share_with_group is not passed.

bq_share_with_group

string
conditionally required

List of group account email addresses to share the data-set with.

Required if bq_share_with_user is not passed.

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="bigquery" \
-d trial=true \
-d bq_share_with_group="bq-access-alias@somecompany.com" \
-d bq_share_with_user="myemail@somecompany.com" \
-d bq_share_with_user="user2email@somecompany.com"

Example Response

Use the URL that returns as the bigquery_dataset_name to access the BigQuery dataset.

{  
   "pipeline_names":[  
      "trial-events-daily-bigquery",
      "trial-people-daily-big query"
   ],
   "bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}

Additional Snowflake Parameters

The following parameters are specific to Snowflake exports.

Parameter
Type
Description

snowflake_share_with

string
required

Name of the account with which the data-set should be shared

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="snowflake" \
-d trial=true \
-d snowflake_share_with="mysnowflakeaccountname"

Additional AWS S3 and Glue Parameters

Parameter
Type
Description

s3_bucket

string
required

s3 bucket to which the data needs to be exported.

s3_region

string
required

The valid s3 region for the bucket.

s3_prefix

string
optional

There is no default value. The path prefix for the export.

s3_encryption

string
optional

Default: none. Options are none, aes and kms. At rest encryption used by the s3 bucket.

s3_kms_key_id

string
optional

There is no default value. If s3_encryption is set to kms, this can specify the custom key id you desire to use.

s3_role

string
optional

There is no default value. AWS Role the writer should assume when writing to s3.

use_glue

boolean
optional

Default: false, Use glue schema export.

glue_database

string
conditionally required

The glue database to which the schema needs to be exported. Required if use_glue is true.

glue_role

string
conditionally required

There is no default value. The role that needs to be assumed for updating glue. Required if use_glue is true.

glue_table_prefix

string
optional

There is no default value. Prefix to add to table names when creating them.

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="aws" \
-d trial=true \
-d s3_bucket="example-s3-bucket" \
-d s3_region="us-east-1" \
-d s3_prefix="example_custom_prefix" \
-d s3_as_role="arn:aws:iam::<account-id>:role/example-s3-role" \
-d use_glue=true \
-d glue_database="example-glue-db" \
-d glue_role="arn:aws:iam::<account-id>:role/example-glue-role" \
-d glue_table_prefix="example_table_prefix

Cancel a Pipeline


Request Type: POST

For a given pipeline name, this request cancels the pipeline and stops any future jobs to be scheduled for the pipeline.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel

Headers:

Content Type

application/x-www-form-urlencoded

Parameters:

Parameter
Type
Description

name

string
required

The name that uniquely identifies the pipeline.

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel \
-u API_SECRET: \
-d name="sample_job_name"

Return: 200 OK indicates a successful cancellation. Any other message indicates failure of the cancellation.

Check the Status of a Pipeline


Request Type: GET

Given the name of the pipeline this API returns the status of the pipeline. It returns the summary and status of all the recent run export jobs for the pipeline.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/status

Headers:

Content Type

application/x-www-form-urlencoded

Parameters:

Parameter
Type
Description

name

string
required

The name that uniquely identifies the pipeline.

summary

string
optional

Default: false. Only lists task count by status and no details.

status

array of strings
optional

Filters the tasks by the given status. Valid options for status are pending, running, retried, failed, canceled, and timed_out.

Example Request: Status with Summary

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d name="YOUR_PIPELINE_NAME" \
-d summary="true"

Example Return: Status With Summary

// with summary
{
"canceled": 933,
"retried": 80,
"succeeded": 1
}

Example Request: Status with no Summary and a Filter

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d status="pending" \
-d status="running

Example Return: Status with no Summary and a Filter

//no summary.
{
"canceled": [
{
"name": "shutterfly-july-2016-backfill-hourly",
"state": "canceled",
"last_finish": "0000-12-31T16:00:00-08:00",
"run_at": "2016-07-26T00:00:00-07:00",
"from_date": "2016-07-26T00:00:00-07:00",
"to_date": "2016-07-26T00:00:00-07:00"
},
{
"name": "shutterfly-july-2016-backfill-hourly",
.
.

Get a List of Scheduled Pipelines


Request Type: GET

This API returns the list of all the pipelines scheduled for a project.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs \
-u API_SECRET:

Example Result

{
  "9876543210": [
    {
      "name": "foo-bq-pipeline",
      "Dispatcher": "backfill",
      "last_dispatched": "2019-02-01 12:00:00 US/Pacific",
      "frequency": "hourly"
    }
  ]
}
Is this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.