Statistical significance in funnels validates an increase or decrease in conversion rate for a property or cohort segment. It attempts to identify random chance with respect to overall conversion. A p-value indicating statistical significance is calculated in the overview table when you choose a property or cohort to group by.
In statistical hypothesis testing, the p-value or probability value is the probability that the variation in a segment’s conversion rate, compared to the overall conversion rate, is not driven by a random chance. This value is shown for every segment by default.
In order to clarify this statistical significance, the segmentation chart shows the confidence level of each segment. Confidence level is defined as 1 - p.
- > 0.95 = statistically significant, indicated in green. This variation in conversion rate is likely not driven by random chance.
- < 0.95 = not statistically significant, indicated in red. This variation in conversion rate is likely driven by random chance.
Scrolling further down the table takes you to the statistically insignificant segments. If a segment has less than 30 samples, p-value is not shown, as the sample size is too low to detect difference from overall population.This is indicated by “Insufficient samples”.
The number of samples is the same as the count of entries into the funnel. If the funnel is looking at the unique count, this will be the number of unique users who entered the funnel in that segment. If the funnel is looking at total count, this will be the total number of entries into the funnel in that segment.
You can choose to sort by any of the columns of the overview table in descending or ascending order by clicking on the header. If you sort by statistical significance, values with confidence level of > 0.95 are shown first, and then values with confidence level < 0.95. The secondary sorting is determined by the overall conversion rate for the funnel.
Find Interesting Segments
You can determine which users are converting and which are not by using the built in “Find Interesting Segments” feature.
Find Interesting Segments can help you discover:
- Whether certain property segments outperform the overall funnel conversion.
- Which cohorts perform the best to get ideas on optimizing cohort behavior.
- Which segments are under-performing.
- Changes in the conversion rates of segments.
View the top and bottom converting segments in your funnel by clicking the Find interesting segments button at the bottom of the segmentation chart.
An email that breaks down the top and bottom converting segments of your funnel based on statistical significance is automatically sent after you click the button. If no statistically significant segments are found, then the email shows non-statistically significant segments.
A time comparison chart is also included in the email. This chart shows the top segments that had the most significant conversion time change between the current date period and the previous date period.
Mixpanel automatically compares the currently selected date period to the previous one. For example, if you are viewing the current week, the email will compare to the week before.
Rather than having to search through multiple segment breakdowns to find this significant data, this feature will automatically do that for you.
Mixpanel will comb through your event properties and cohorts, and show you which of those segments convert higher or lower than average, and are therefore statistically significant.
This feature is currently in beta release, and user properties are not yet supported.
Why did I get an email saying ‘no interesting segments’?
When your analysis email says “no interesting segments”, this means that none of the segments you analyzed were behaving significantly differently from the overall population at a large enough volume.
To resolve this issue, try extending the date range of the funnel or try a different funnel.
Why are the numbers I see in the email not matching what I see in the report?
If the analysis request included dates in the past five days, and is sent from mobile SDK, data may be delayed and therefore not included at the time of the analysis.
How are the results/segments sorted?
Mixpanel takes into consideration the property, the number of people in the funnel, as well as the deviation from overall conversion behavior.
Deep Dive for Statisticians
This section is intended for users who want to understand the mathematics involved in statistical significance in depth.
To determine whether a particular segment’s conversion rate is significantly different from the overall conversion rate, we use a hypergeometric distribution to calculate statistical significance. The hypergeometric distribution is used to model the probability of picking k items of a particular type in n attempts without replacement from a population of size N having K items of the same type.
For example, let’s say we have a sock drawer with 20 socks, 10 blue and 10 red. If we randomly picked 10 socks one at time from the drawer without putting them back between picks, and we wanted to know the probability of 9 of those socks being red and 1 of them being blue, we would use a hypergeometric distribution to calculate that.
This is applied to funnels by considering the total number of users who enter the funnel to be a finite population of size N, out of which a subset of users convert (K). We then estimate the probability of getting k conversions in a particular segment (given that there were n users who entered the funnel in that segment) if users in that particular segment were picked at random from the overall user set. The higher the probability, the higher likelihood that variations we see in conversion rate are due to random chance.
To calculate the actual p-value, we estimate the hypergeometric cumulative distribution function (CDF) for N, K, n.
In the CDF, the value of any point (X) represents the probability that a random draw would result in fewer conversions P(k < X). 1 - P(k < X) represents the probability that a random draw would result in more conversions P(k >= X).
These two probabilities are used to represent the probability that the selected segment will either outperform (P(k < X)) or underperform (P(k > X)) the overall set of users. Mixpanel takes the higher probability of the two, and calculate the p-value as 1 - max(P(outperform), P(underperform)).