Data Pipelines API

Contents: [ Data Pipeline Types , Configure the Destination to Receive Mixpanel Data , Authentication , Create a Pipeline , Cancel a Pipeline , Check the Status of a Pipeline , Get a List of Scheduled Pipelines ]
[Trial Version , Export to Data Warehouse, Raw Export Pipeline, Required Parameter, Authorization Steps, Additional BigQuery Parameters, Additional Snowflake Parameters, Note , Additional AWS S3 and Glue Parameters, Additional Azure Blob Storage Parameters]

Visit the Dev Docs

A better version of this page exists at https://developer.mixpanel.com/docs/data-warehouse-export-api.

The Data Pipelines API contains a list of endpoints that are supported by Mixpanel that help you create and manage your data pipelines.

A pipeline is an end to end unit that is created to export Mixpanel data and move it into a data warehouse.

Trial Version

The Data Warehouse Export API offers a one-time trial. You can call and schedule a trial export by passing trial = true when creating a pipeline. The trial export will automatically stop after 30 calendar days.

Data Pipeline Types

Mixpanel currently supports a data warehouse export pipeline and a raw data pipeline. When you create a pipeline, the type parameter determines whether a data warehouse export pipeline or a raw data pipeline is created.

Export to Data Warehouse

The data warehouse export pipeline is a fully managed pipeline that includes transformations and scheduling. Visit the data warehouse export documentation for more information.

Raw Export Pipeline

The raw export pipeline is a scheduled export that moves your unaltered Mixpanel data to a blob storage destination. Visit the raw export pipeline documentation for more information.

Configure the Destination to Receive Mixpanel Data

Before exporting data from Mixpanel you must configure your data warehouse to accept the data.

For additional information on configuring the Mixpanel export for each type of data warehouse, see:

Authentication

To ensure the security of your data, the Mixpanel API requires a basic system of authentication.

Required Parameter

api_secret - This can be found by clicking on the settings gear in the upper righthand corner and selecting Project Settings.

Authorization Steps

The Data Export API accepts basic access authentication over HTTPS as an authorization method. To make an authorized request, put your project's API Secret in the "username" field of the basic access authentication header. Make sure you use HTTPS and not HTTP - our API rejects requests made over HTTP, since this sends your API Secret over the internet in plain text.

Create a Pipeline


Request Type: POST

This request creates the export pipeline. The type parameter defines the kind of pipeline that is initiated. The following data warehouse types are supported:

  1. bigquery Mixpanel exports events and/or people data into Google BigQuery.

  2. aws This options creates the S3 data export and glue schema pipeline. Mixpanel exports events and/or people data as JSON packets. Mixpanel also creates schema for the exported data in AWS Glue. Customers can use AWS Glue to query the exported data using AWS Athena or AWS Redshift Spectrum.

  3. snowflake This option creates the Snowflake export pipeline. Mixpanel exports events and/or people data into Snowflake.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/create

Headers:

Content Type

application/x-www-form-urlencoded

Parameters:

Parameter
Type
Description

type

array of strings
required

Data Warehouse Export:
Type parameters include bigquery, snowflake, azure-blob and aws.

Raw Export Pipeline: Type parameters include s3-raw and gcs-raw. Initializes s3-raw or gcs-raw pipelines accordingly.

trial

boolean
optional

Default: false. A trial pipeline will be created if value is true.

The trial exports all of your events and people data for thirty calendar days, starting from one day before the API call was made. A trial pipeline has default values for the following parameters:

schema_type : monoschema
data_source: events and people
sync: false
from_date: <defaults to previous day>
to_date: <no value>
frequency: daily
events: <no value>

schema_type

string
optional

Default: monoschema. Allowed options are monoschema and multischema. monoschema loads all events into a single table. multischema loads every event into its own dedicated table. All people data is exported as monoschema.

data_source

string
optional

Default: events. data_source can be either events or people. events exports Mixpanel event data. people exports Mixpanel people data.

sync

boolean
optional

Default: false. A value of true updates exported data with any changes that occur in your Mixpanel dataset. These changes include deletions, late data, and imports that fall into your export window.

from_date

string
required

The starting date of the export window. It is formatted as YYYY-MM-DD.

to_date

string
optional

The ending date of the export window. It is formatted as YYYY-MM-DD. The export will continue indefinitely if to_date is empty.

frequency

string
optional

Default: daily. frequency can be either hourly or daily. hourly exports the data every hour. daily exports the data at midnight (based on the projects timezone). frequency should only be passed if your export window is indefinite.

events

string
optional

A whitelist for events you intend to export. All events in the project will be exported if no events are specified.

where

string
optional

A selector expression used to filter by events data, such as event properties. Learn more about how to construct event selector expressions here. This parameter is only valid when data_source is events.

data_format

string
optional

Default: json. The file format of the exported data. data_format can be either json or parquet.

Return: Create API returns the name of the pipeline created. Use the name of the API to edit the pipeline.

For BigQuery pipelines, the request returns the BigQuery dataset name and URL. Use this URL to access the BigQuery dataset.

Mixpanel creates the dataset within its own Google Cloud Platform project. The service shares a read only view of the dataset created with the user/group provided to the API endpoint.

{  
   "pipeline_names":[  
      "trial-events-daily-bigquery",
      "trial-people-daily-big query"
   ],
   "bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}

Additional BigQuery Parameters

Mixpanel creates a dataset in its own BigQuery instance and gives "View" access to the account(s) provided at the time of creating the pipeline.

The following parameters are specific to BigQuery exports.

Parameter
Type
Description

bq_region

string
optional

Default: US. Allowed choices are US, EU and ASIA_NORTHEAST_1. Location of BigQuery instance. For a full list of BigQuery supported regions, see this list.

bq_share_with_group

string
required

Group account email address to share the data-set with.

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="bigquery" \
-d trial=true \
-d bq_share_with_group="bq-access-alias@somecompany.com" 

Example Response

Use the URL that returns as the bigquery_dataset_name to access the BigQuery dataset.

{  
   "pipeline_names":[  
      "trial-events-daily-bigquery",
      "trial-people-daily-big query"
   ],
   "bigquery_dataset_name":"https://bigquery.cloud.google.com/dataset/mixpanel-prod-1:sample_dataset_name"
}

Additional Snowflake Parameters

The following parameters are specific to Snowflake exports.

Note

Currently, Mixpanel only supports the us-west region for Snowflake pipelines.

Parameter
Type
Description

snowflake_share_with

string
required

Name of the account with which the data-set should be shared

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="snowflake" \
-d trial=true \
-d snowflake_share_with="mysnowflakeaccountname"

Additional AWS S3 and Glue Parameters

Parameter
Type
Description

s3_bucket

string
required

s3 bucket to which the data needs to be exported.

s3_region

string
required

The valid s3 region for the bucket. For a full list of AWS supported regions, see this list.

s3_prefix

string
optional

There is no default value. The path prefix for the export.

s3_encryption

string
optional

Default: none. Options are none, aes and kms. At rest encryption used by the s3 bucket.

s3_kms_key_id

string
optional

There is no default value. If s3_encryption is set to kms, this can specify the custom key id you desire to use.

s3_role

string
optional

There is no default value. AWS Role the writer should assume when writing to s3.

use_glue

boolean
optional

Default: false, Use glue schema export.

glue_database

string
conditionally required

The glue database to which the schema needs to be exported. Required if use_glue is true.

glue_role

string
conditionally required

There is no default value. The role that needs to be assumed for updating glue. Required if use_glue is true.

glue_table_prefix

string
optional

There is no default value. Prefix to add to table names when creating them.

Example Request

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
-u API_SECRET: \
-d type="aws" \
-d trial=true \
-d s3_bucket="example-s3-bucket" \
-d s3_region="us-east-1" \
-d s3_prefix="example_custom_prefix" \
-d s3_role="arn:aws:iam::<account-id>:role/example-s3-role" \
-d use_glue=true \
-d glue_database="example-glue-db" \
-d glue_role="arn:aws:iam::<account-id>:role/example-glue-role" \
-d glue_table_prefix="example_table_prefix

Additional Azure Blob Storage Parameters

The following parameters are specific to Azure Blob Storage exports.

Parameter
Type
Description

storage_account

string
required

Blob Storage Account where the data will be exported.

container_name

string
required

The Blob Container within the account where data will be exported.

prefix

string
optional

A custom prefix for all the data being exported to the container.

client_id

string
required

clientId from the Service Principal credentials.

client_secret

string
required

clientSecret from the Service Principal credentials.

tenant_id

string
required

tenantId from the Service Principal credentials. This is specific to the Active Directory instance where the Service Principal resides.

Example Request

url https://data.mixpanel.com/api/2.0/nessie/pipeline/create \
	-u API_SECRET: \
	-d type="azure-blob" \
	-d trial="true" \
	-d data_format="parquet" \
	-d storage_account="mystorageaccount" \
	-d container_name="mixpanel-export" \
	-d prefix="custom_prefix/for/data" \
	-d schema_type="multischema" \
	-d client_id="REDACTED" \
	-d client_secret="REDACTED" \
	-d tenant_id="REDACTED"

Cancel a Pipeline


Request Type: POST

For a given pipeline name, this request cancels the pipeline and stops any future jobs to be scheduled for the pipeline.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel

Headers:

Content Type

application/x-www-form-urlencoded

Parameters:

Parameter
Type
Description

name

string
required

The name that uniquely identifies the pipeline.

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/cancel \
-u API_SECRET: \
-d name="sample_job_name"

Return: 200 OK indicates a successful cancellation. Any other message indicates failure of the cancellation.

Check the Status of a Pipeline


Request Type: GET

Given the name of the pipeline this API returns the status of the pipeline. It returns the summary and status of all the recent run export jobs for the pipeline.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/status

Headers:

Content Type

application/x-www-form-urlencoded

Parameters:

Parameter
Type
Description

name

string
required

The name that uniquely identifies the pipeline.

summary

string
optional

Default: false. Only lists task count by status and no details.

status

array of strings
optional

Filters the tasks by the given status. Valid options for status are pending, running, retried, failed, canceled, and timed_out.

Example Request: Status with Summary

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d name="YOUR_PIPELINE_NAME" \
-d summary="true"

Example Return: Status With Summary

// with summary
{
"canceled": 933,
"retried": 80,
"succeeded": 1
}

Example Request: Status with no Summary and a Filter

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/status \
-u API_SECRET: \
-d status="pending" \
-d status="running

Example Return: Status with no Summary and a Filter

//no summary.
{
"canceled": [
{
"name": "shutterfly-july-2016-backfill-hourly",
"state": "canceled",
"last_finish": "0000-12-31T16:00:00-08:00",
"run_at": "2016-07-26T00:00:00-07:00",
"from_date": "2016-07-26T00:00:00-07:00",
"to_date": "2016-07-26T00:00:00-07:00"
},
{
"name": "shutterfly-july-2016-backfill-hourly",
.
.

Get a List of Scheduled Pipelines


Request Type: GET

This API returns the list of all the pipelines scheduled for a project.

URI: https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs

Example Request:

#Replace API_SECRET with your projects API secret
curl https://data.mixpanel.com/api/2.0/nessie/pipeline/jobs \
-u API_SECRET:

Example Result

{
  "9876543210": [
    {
      "name": "foo-bq-pipeline",
      "Dispatcher": "backfill",
      "last_dispatched": "2019-02-01 12:00:00 US/Pacific",
      "frequency": "hourly"
    }
  ]
}
Is this article helpful?

Comments

0 comments

Please sign in to leave a comment.