Lookup tables

Introduction

We know that there is no perfect data model, and we can’t always have all the data prepared exactly right to answer every possible question. Moreover, there are times when the relevant data doesn’t make it into Mixpanel or the data actually changes after being sent into Mixpanel, which makes answering specific user behavioral questions very difficult, if not impossible.

Enter Lookup tables. Lookup tables provide the ability for Mixpanel customers to augment their data within Mixpanel with additional data (properties). If you are familiar with VLOOKUP in Microsoft Excel and Google Spreadsheets, then lookup tables is nothing but VLOOKUP for Mixpanel.

This is not a way for users to “upload” events into Mixpanel - rather, if there are existing properties that are being tracked (e.g. ProductID), users can enrich those properties by adding new properties that are mapped with existing properties (e.g. Product Category, Product price, Product color..).

Let’s say that a customer tracks purchases within Mixpanel, but they only track ProductID as a property, but not Price, Category or Color. They can easily augment and enrich their data within Mixpanel by uploading a products table with ProductID, Price, Category and color, and they can immediately start analyzing purchases not just by ID, but also by these new attributes of price, category and color, which show up just like any other property, and this analysis can be done retroactively as long as the identifier (in this case, ProductID) is present on events.

Screen_Shot_2020-05-27_at_10.57.31_AM.png

With just ProductID, questions can only be asked using ProductID and not with desired attributes like category, price or color.
Screen_Shot_2020-05-27_at_10.59.02_AM.png

Users might have the mapping between ProductID to category, color and price in a CSV, and it might look something like this:

Screen_Shot_2020-05-27_at_10.59.54_AM.png

Once they map this lookup table to the Mixpanel property (here, ProductID), then the available data to ask questions is not just ProductID, but also category, color and price:

Screen_Shot_2020-05-27_at_11.00.41_AM.png

Screen_Shot_2020-05-27_at_11.00.53_AM.png

Now each of the questions with the additional attributes (category, color, price) can be answered:

Screen_Shot_2020-05-27_at_11.01.02_AM.png

Just like User profiles augment details about the User, Lookup tables augment details about any property within Mixpanel, and here’s a quick overview of this:

Currently, this is in closed beta and available to customers on the Growth and Enterprise plans. Please reach out to your Customer Success Manager or email us at insights@mixpanel.com if you’d like to be included in this closed beta.

Use-cases

  • Enrich property values

    • If you are an e-commerce customer and are tracking Purchases by Product, and have only been tracking the ProductID and the Product price, but now want to analyze Purchases by Product Category (e.g. Clothing, Shoes…) and Product Sub-Category (“Mens”, “Womens”, “Children”), you can upload your product catalog as a lookup table within Mixpanel, and can do the analysis for all Purchases by Product Category and Product Sub-Category, even retroactively.
    • If you are a marketplace platform, and are tracking a OrderCompleted, and want to do revenue analysis, but don’t have the Commission Rate that needs to be paid to the seller at the time of tracking the OrderCompleted, you can upload the lookup table with OrderID and Commission Rate into Mixpanel, and do a complete revenue analysis.
    • If you are a media platform, and are tracking video plays but were only tracking the “video name” but not the “genre”, you can upload a lookup table with the mapping of “video name” and “genre” and then do the video engagement analysis by genre within seconds.
      • Before Lookup table mappingLT_newvalue_genre_old.gif
      • After lookup table mapping:LT_newvalue_genre_new.gif
  • Change property values

    • If you are a media company, and are tracking video categories as a property, but your category names have changed (Thriller -> Suspense & Thriller), you will end up seeing two groups of values for the same videos - all the video watches prior to the change will show up as Thriller, and the rest of the video watches after the change will show up as Suspense & Thriller, and this makes accurate calculations pretty annoying. If you upload your category table as a lookup table with the old values and the new values (Categoryname: Thriller, Categoryname(new): Suspense & Thriller) and then use the new property (Categoryname(new), you can now do all the video analysis with the correctly mapped category names without needing any data cleanup.
    • If you are a media platform and are tracking the attribution channels for your signed up users, but changed from tracking Google as “Google Ads” -> “Google Search” and Google is now showing up as two different segments, you can fix this by uploading a lookup table of the attribution channel mapping of “Attribution channel” -> “New attribution channel”:
      • Before lookup table mapping:LT_remapping_old.gif
      • After lookup table mapping:LT_remapping_new.gif
  • Filter by a long list of values

    • If you are a media company, and are trying to analyze the performance of a subset of all the videos in your catalog that were part of an experiment, you’d want the ability to filter down the report with a long list of video names that were considered in that experiment. Choosing property values one at a time within the UI can be quite cumbersome, so you can upload a lookup table of videos with an additional property (e.g. Part of Experiment?) of whether or not the video was a part of the experiment, and then filter the report by that new property (Filter -> Video name -> Part of Experiment? -> true).
      • Before lookup table mapping:LT_filterlist_old.gif
      • After lookup table mapping:LT_filterlist_new.gif

  

Who will have access to this? 

Currently, this is in closed beta and available to customers on the Growth and Enterprise plans. Please reach out to your Customer Success Manager or email us at insights@mixpanel.com if you’d like to be included in this closed beta.

 

Step-by-step guide

Step 0: Know how lookup tables get mapped to existing Mixpanel properties

  • As mentioned earlier, lookup tables are a way to add new properties based on the existing properties that are already being tracked within Mixpanel, so you’ll need to know what is the base property that is already being tracked.
    • In our earliest example, ProductID is the event property that was already being tracked within Mixpanel. 
  • This base property will be used as the join key to connect the contents of the lookup table with the events data within Mixpanel.
  • Here’s a more descriptive representation of this:

Events in Mixpanel

Event

DistinctID

ProductID

Purchase

U1

1

Purchase

U2

1

Purchase

U1

2

 Products lookup table 

ProductID

Product Category

Product Color

Product price

1

jeans

blue

39.99

2

tshirt

white

25

3

tie

red

20

 Events table with mapped product lookup table

Event

DistinctID

ProductID

Product Category

Product Color

Product price

Purchase

U1

1

jeans

blue

39.99

Purchase

U2

1

jeans

blue

39.99

Purchase

U1

2

tshirt

white

25

 The ProductID is the join key that connects the Mixpanel events to the lookup table.

Step 1: Configure the CSV that you want to upload into Mixpanel

  • Open a new CSV in your choice of spreadsheet software (Excel, Google Spreadsheet..)
  • The column that represents the join key within the products lookup table needs to be added as the first column in the CSV.
  • The column name DOES NOT NEED to be the same as the join key (in this case, the first column in the products lookup table didn’t need to be “ProductID”) - it just needs to be the first column in the CSV.
  • Now add a new column for each new attribute that you are trying to get into Mixpanel.
    • From our earlier example, Product Category, Product color and Product price will be its own columns.
  • Now add all the values as separate rows.
  • Save this CSV/Download the CSV. XLS files are not supported at this time.
  • Please note: There cannot be rows with duplicate values of the join key, as it is a 1:1 map with the property within Mixpanel.

Right:

ProductID

Product Category

Product Color

Product price

1

jeans

blue

39.99

2

tshirt

white

25

3

tie

red

20

 Wrong:

ProductID

Product Category

Product Color

Product price

1

jeans

blue

39.99

1

jeans

white

45

2

tie

red

20

 

In this case, the second row with the duplicate ProductId will be dropped during the upload process.

LT_CSVdownloadandupload.gif

Step 2: Upload CSV into Mixpanel

  • Go to Lexicon 
  • Navigate to lookup tables
  • Hit “Upload lookup table”

Screen_Shot_2020-05-27_at_11.32.50_AM.png

Step 3: Map the Mixpanel property to the lookup table

  • Navigate to the property that you want the lookup table to be joined against (aka your join key)
  • Select the property
  • Select “Map to lookup table”
  • Select the lookup table from the list of lookup tables
  • Select “Apply”

LT_mappingtolookuptable.gif

Step 4: Analyze!

  • Go to the Mixpanel report where you are doing your analysis
  • Select the operation that you want to do (Filter/Breakdown/Aggregate) and then select the join key property (in our earlier example, ProductID) and then select which of the new properties you want to use for the analysis.

Before the Lookup table mapping:

LT_newvalue_genre_old.gif

After the lookup table mapping:

LT_newvalue_genre_new.gif

You can re-upload a new version of the lookup table as often as you want. This replaces the lookup table that you have (not an update new rows or columns, but a complete replace) but keeps all the existing property mappings in place, so the properties that were mapped to an earlier version of the lookup table doesn't need to be re-mapped after the lookup table has been replaced.

We are also looking to add API support to update lookup tables programmatically.

 

FAQ 

  • Are there any limitations to the size of the lookup tables? 

This is still in closed beta, so we currently have a limitation of 100 MB per imported lookup table, and up to 5 million rows of data across all lookup tables uploaded. Just to be clear, you can use multiple lookup tables in your projects, but the total count of rows has to be less than 5 million rows across all uploaded CSVs.

  • How do I specify which column in the CSV should be used to join/match against the Mixpanel property?

By default, the first column in the CSV is treated as the join key. Which means if i'm adding a lookup profile for "ProductID" (Mixpanel event property), then I'll need to have the ProductID in the first column of the uploaded CSV. The name of that column doesn't have to be the same as the Mixpanel property (it doesn’t have to be “ProductID”); it just has to be the first column.

  • Can the property that I'm using to join with the lookup table be a list property?

    YES! You can just map a list property to a lookup table the same way you'd map a string property. Here is a detailed article that talks about how filters and breakdowns work with list properties in Mixpanel (with and without lookup tables).

  • I see that numbers and dates in the lookup table are all being imported as strings. Is there a way to specify the data type of these properties when I'm importing the CSV?

    At this moment, all the data in the lookup table is treated as strings, and users can typecast the values within each of the reports. In the future, we will provide the option to change data types of the lookup profile properties.

  • Can the value of a Lookup table property (lookup profile property) be a list?

    At this moment, no.

  • Can Custom properties be created from Lookup profile properties?

    Yes!

  • Can each Mixpanel property (join key) only be mapped to one lookup table?

    Yes. One Mixpanel property can only map to ONE lookup table.

  • Can multiple properties map to the same lookup table?

    Yes. For example, first_trip_city_id and last_trip_city_id can both map to the City lookup table, but one Mixpanel property cannot map to multiple lookup tables (if first_trip_city_id is mapped to "City" lookup table, it can't also map to "Region" lookup table, before unmapping from the "City" lookup table.)

  • Can I use APIs to import Lookup tables?

    YES! Here is a pointer to the HTTP docs on how to use the HTTP API to update lookup tables.

  • Can I import lookup tables using import pipelines?

    Not at the moment.

  • Can I export lookup profile properties with Export APIs / Pipelines?

    Not at the moment.

  • If I had mapped a property to a lookup table, and then I delete the lookup table, does the property mapping get unmapped?

    Yes.

  • If I had mapped a property to a lookup table, and then I deleted the lookup table, what happens to saved reports that used these lookup profile properties (e.g. ProductID → Product sub category)?

    We throw an error saying that "Product sub category" is no longer available for use, and that they should use another property to filter/aggregate/breakdown on.

  • Do I need to have paid for User profiles to avail of Lookup tables and Lookup profile properties?

    No. There is no such requirement. You can upload lookup tables and map those to your event properties without requiring user profiles.

Concepts

  • Lookup table is a collection of lookup profiles. An existing event or profile property can be mapped to a lookup table (for e.g. Song name can be the event property in Mixpanel, which is mapped to the Songs lookup table). When using APIs to programmatically update lookup tables, group_key uniquely identifies the lookup table as a whole.
  • Lookup profile is a single profile that is mapped to a single mixpanel event/profile property. A lookup profile consists of a join key and one or more lookup profile properties. So for example, the mixpanel property value of Song name = "Gangnam style" can be mapped to a single lookup profile in the Songs lookup table which could have the unique identifier of "Gangnam style", and properties of "Genre: pop" and "hashtags: [amazing]".
  • Join key is the primary identifier of the lookup profile, and this is what is used to join against the mapped Mixpanel event property. In the above example, "Gangnam style" is a join key for that lookup profile. When using APIs to programmatically update lookup tables, group_id refers to the join key.
  • Lookup profile property is a property on a single lookup profile. For example, "Genre" is a lookup profile property key, and "pop" is a lookup profile property value for the lookup profile of "Gangnam style".

Here's a quick illustration of how lookup table, mixpanel property, join key and lookup profile property are related:

lookuptableconcepts_1.jpg

lookuptableconcepts_2.jpg

 

Implementation

You can now programmatically update lookup tables - visit Mixpanel's Developer Documentation to learn more:

 

Did this answer your question?

Comments

0 comments

Please sign in to leave a comment.