Snowflake

Contents: [ Schema , Queries ]
[One Table for All Events, The Table, The View, Example, One Table for Each Event, The Tables, The Views, Example, Queries for One Table for All Events, Queries for One Table for Each Event]

Visit the Dev Docs

A better version of this page exists at https://developer.mixpanel.com/docs/mixpanel-snowflake-export.

This guide describes how Mixpanel data is exported into a Snowflake dataset.

Use the Data Warehouse Export API to export your Mixpanel data into Snowflake. The following page describes the how the data transformations are applied as the data is exported.

Once an export job is scheduled, Mixpanel exports data to Snowflake on a recurring basis. This data will be a transformed version of the raw data stored in Mixpanel.

Mixpanel exports data to its own Snowflake account and gives your Snowflake account access to read the data. As a result, will need a Snowflake account to be able to use the exported data. For more information on how to use your shared data, see Sharing Data in Snowflake.

Schema

This section describes the schemas for Snowflake.

Mixpanel exports data into a single-column raw data table. Mixpanel then creates a view to expose all properties as columns.

Mixpanel recommends you place all events into a single table to make querying easier.

One Table for All Events

If you select one table for all events then the data is exported into one table. The table is clustered based on the composite of distinct_id and time columns. The raw table format is as follows:

The Table

Table name: mp_master_event_raw

Column name
Type
Description

data

VARIANT

cleaned JSON data.

The following properties are guaranteed to be present in every row of the table.

Property name
Type
Description

mp_event_name

string

The name of the event.

time

int

The timestamp that represents when the event occurred.

distinct_id

string

The unique ID of the user who triggered the event.

The View

The name for the view is mp_master_event, which is generated from raw table.

The data column is not present in the view. Instead, the view will have a column mapped to every unique property present in the raw table.

As a convenience, the time property converts to a timestamp type.

Example

Raw Data:

     {
       “event”: “Signup”, 
       “properties”: {
       “plan”: “free”, 
       “browser”: “chrome”, 
       “distinct_id”: 1, 
       “time”: 123
      }
}
{
       “event”: “Purchase”, 
       “properties”: {
       “product_id”: “123”, 
       “browser”: “chrome”, 
       “distinct_id”: 1, 
       “time”: 124
      }
}
{
       “event”: “Signup”, 
       “properties”: {
       “plan”: “paid”, 
       “browser”: “firefox”, 
       “ab_test”: 1, 
       “distinct_id”: 2, 
       “time”: 125
      }
}

Raw Table: mp_master_event_raw

mp_master_event_raw

data

{
“mp_event_name”: ”Signup”,
“plan”: “free”,
“browser”: “chrome”,
“distinct_id”: “1”,
“time”: 123
}

{
“mp_event_name”: “Purchase”,
“product_id”: “123”,
“browser”: “chrome”,
“distinct_id”: “1”,
“time”: 124
}

{
“mp_event_name”: ”Signup”,
“plan”: “paid”,
“browser”: “firefox”,
“ab_test”: 1,
“distinct_id”: “2”,
“time”: 125
}

The View: mp_master_event

mp_event_name
time
distinct_id
plan
browser
product_id
ab_test

“Signup”

Thursday, January 1, 1970 12:02:03 AM

“1”

“free”

“chrome”

NULL

NULL

“Purchase”

Thursday, January 1, 1970 12:02:04 AM

“1”

NULL

“chrome”

"123"

NULL

“Signup”

Thursday, January 1, 1970 12:02:05 AM

“2”

“paid”

“firefox”

NULL

1

One Table for Each Event

The Tables

Every table has a single column. Each table shares the cleaned name of the event of which it corresponds with appended with a “_raw” suffix.

The time property in the data column displays the UTC timezone, not the project timezone.

Each table is clustered, based on the composite of distinct_id and time columns.

Column name
Type
Description

data

VARIANT

The cleaned JSON data event.

The following properties are guaranteed to be present in every row of the table.

Property name
Type
Description

time

int

The timestamp representing when the event occurred.

distinct_id

string

The unique ID of the user who triggered the event.

The Views

Each view is named after the corresponding cleaned name of the event.

The data column is not present. Instead, the view will have a column mapped to every unique property present in the raw table.

As a convenience, the time property converts to a timestamp type.

Example

Raw Data

{
       “event”: “Signup”, 
       “properties”: {
       “plan”: “free”, 
       “browser”: “chrome”, 
       “distinct_id”: 1, 
       “time”: 123
}

{
       “event”: “Purchase”, 
       “properties”: {
       “product_id”: “123”, 
       “browser”: “chrome”, 
       “distinct_id”: 1, 
       “time”: 124
}

{
       “event”: “Signup”, 
       “properties”: {
       “plan”: “paid”, 
       “browser”: “firefox”, 
       “ab_test”: 1, 
       “distinct_id”: 2, 
       “time”: 125
}

The Raw Data Tables
The result are two raw tables: signup_raw and purchase_raw that contain all events.

signup_raw

data

{
“plan”: “free”,
“browser”: “chrome”,
“distinct_id”: “1”,
“time”: 123
}

{
“plan”: “paid”,
“browser”: “firefox”,
“ab_test”: 1,
“distinct_id”: “2”,
“time”: 125
}

purchase_raw

data

{
“product_id”: “123”,
“browser”: “chrome”,
“distinct_id”: “1”,
“time”: 124
}

The Views
The views of the signup_raw and purchase_raw tables display as follows:

View: signup

time
distinct_id
plan
browser
ab_test

Thursday, January 1, 1970 12:02:03 AM

“1”

“free”

“chrome”

NULL

Thursday, January 1, 1970 12:02:05 AM

“2”

“paid”

“firefox”

1

View: purchase

time
distinct_id
browser
product_id

Thursday, January 1, 1970 12:02:04 AM

“1”

“chrome”

“123”

Queries

A query is a request for data results. You can perform actions on the data, such as combine data from different tables; add, change, or delete table data; and perform calculations.

Snowflake supports a VARIANT type that can store JSON objects and arrays.

Mixpanel exposes array and object top-level properties as VARIANT columns in the view.

Queries for One Table for All Events

In this example, here’s how to query the raw table.

SELECT count(*)
FROM mixpanel.mp_master_event_raw
WHERE data:mp_event_name::string = “Signup”;

Using the view, you would run this same query as:

SELECT count(*)
FROM mixpanel.mp_master_event
WHERE mp_event_name = “Signup”;

Queries for One Table for Each Event

In this example, here’s how to query the raw table.

SELECT count(*)
FROM mixpanel.signup_raw
WHERE data:distinct_id::string = “1”;

Using the view, you would run this same query as:

SELECT count(*)
FROM mixpanel.signup
WHERE distinct_id = “1”;
Is this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.