Data discrepancies

If you are experiencing discrepancies in your data, you have come to the right place! This article walks you through the most common causes of data discrepancies and how to debug them. Should the discrepancy involve our pipeline feature, then please head to our Developer Docs.

Discrepancies in Mixpanel reports

Mixpanel reports calculate data in different ways. While the Insights report defaults to the total event count ('totals'), the funnel report defaults to unique user count ('uniques'). So if you are seeing discrepancies between a Funnel and an Insights report, take a step back and look at the filtering for the events. It's important to note that the 'totals' in Funnels show total conversions, not total event count. Please refer to Funnels for more information on this.

With discrepancies within the Mixpanel interface, it's important to look out for:

  • Comparisons between unique user count vs. total event count? In funnels; Are you using unique, total conversions or session conversions?
  • Differences in properties - E.g. event property vs. user property vs. custom property

If you took a screenshot of a report a while ago and the data has changed since then, you can check if any user properties have been used in the report as they change over time, while event properties hold constant. You can also check if you have imported data or if data was ingested later by breaking down a report by the property mp_processing_time_ms

A good way to start is to remove all filtering from the reports to check if the underlying data is the same, then re-add them and see when the discrepancy occurs. Likely the culprit will be a filter or a breakdown. 

Discrepancies between Mixpanel and other sources

Two systems will always track data differently due to their nature. It might very likely be that the systems will never track exactly the same data. However, it is important to get to the bottom of what's causing the discrepancy so you can establish trust in your data.

Common causes

  • Ad blockers and Do Not Track-settings for client-side integrations
  • Different timezones
  • Different queries
    • Are both systems looking at the same event and the same timeframe?
    • Are any filters applied to the query? Does the discrepancy persist if you remove them?
    • Are you looking at event or user properties?
  • Different calculations
    • Some of our reports have calculations applied, such as Funnels or Retention. Does the same calculation apply to the data in your other source?
  • Client-side vs. server-side tracking
    • Client-side integrations are more vulnerable to data tracking issues due to ad blockers and DNT settings
    • Mixpanel's SDKs can need loading times to trigger the first event
  • Systems use different triggers for data sending
    • E.g. the First App Open event in Mixpanel will trigger when our SDK has loaded, other systems might trigger a comparable event earlier
    • The event definitions might be different - Think of a button click on the client-side triggering the event in one system vs. an API call triggering the event in the other system
  • Data might be ingested at later points or not at all
    • Mixpanel accepts data that has been triggered a while ago, either via mobile SDKs or event import. You can check the $import property and the mp_processing_time_ms to confirm when data has been ingested.
    • Mixpanel events older than 5 days sent to our /track endpoint will not be ingested, but other systems might accept these events (e.g. Firebase). Check how old an event was at point of ingestion in the other system to confirm. 
  • Cohort export: A cohort might show more in Mixpanel than what is actually being exported to the partner. You can find out more about troubleshooting this here.

Debugging

To debug, please make sure that you are looking at data in both systems like this:

  • data that is triggered at the same point of the user journey
  • at the same timeframe and timezone
  • with the same filtering for the query of the data
  • at the same unit (unique user count, total event count or session count)

Once you have established this, we recommend drilling down into the data, for example into one day that shows the biggest discrepancies. You can also drill down into specific segments such as country to identify users that have been tracked in one system, but not the other. The goal is to confirm which users are present in one system but not the other, to understand a pattern.

You can also compare the total event count versus the unique user count in the affected systems. If the totals match, but the unique user count shows discrepancies, it likely points to an ID management issue. 

A 'last resort' to get to the bottom of things is to implement your own server-side tracking of the data. This will be more reliable as it’s less prone to issues and is independent of Mixpanel and other systems. It would give you a source of truth to compare any other system to and go from there. While this is resource-intense, it’s a good way to get to a source of truth.

Discrepancies with Segment

The first step here would be to check if you are tracking with a cloud-mode or device-mode integration: Segment: Set Up Guide

If the discrepancy is between Mixpanel and another source, but you're tracking via Segment in cloud-mode, you can do the following to troubleshoot:

  • If Segment and Mixpanel show the same data, we recommend reaching out to Segment Support, as this likely points to an issue with Segment tracking.
  • If Mixpanel and Segment don’t show the same data, and both have a discrepancy with a third source, we also recommend reaching out to Segment Support to troubleshoot the discrepancy between the 3rd party and Segment. In Mixpanel, you can check for specific distinct_ids that should have events in Mixpanel, or specific events that should be in Mixpanel but might’ve been ingested with a different distinct_id.

If the discrepancy is between Segment and Mixpanel only, keep in mind that device-mode tracking will send the data to Mixpanel directly. If there are discrepancies, you can check for data by searching for specific events, or specific distinct_ids, depending on what you have available from Segment, as it might be that the data has been ingested but allocated to the wrong distinct_id.

If you are tracking via cloud-mode, data will be sent from Segment to Mixpanel. It would basically be the same approach as above, whereas you’d need information from Segment Support when data has been sent. If in cloud-mode, then the issue is the communication between the two systems and Segment would need to provide information on when and how data has been sent.

 

 

Did this answer your question?

Comments

0 comments

Please sign in to leave a comment.