Importing Data

Use Cloud Import to import old events into Mixpanel or a large batch of events and profiles into Mixpanel at once. You may find this useful if you are a new customer and you have already collected events and user profiles in a different system or if you have exported data from Mixpanel to fix data quality problems and want to import the corrected data. With Cloud Import you copy events and/or user profiles into an Amazon S3 or Google Cloud Storage bucket and Cloud Import ingests the data. Cloud Import is the easiest way to get large volumes of events and profiles into Mixpanel. However, if you cannot use cloud storage services or don’t wish to use them, you can also use the /import endpoint to directly import events into Mixpanel. Skip ahead to the “Sending old events without Cloud Import” section of this document in that case. 

Preparing event data for Cloud Import

Format each event as a new line in a NDJSON object with the following attributes.

event

Name for the event (UTF-8 string)

properties

JSON dictionary representing a collection of properties for the event. The properties are a key-value JSON object and can contain any valid UTF-8 string as a name and a valid JSON value. You must include the following special properties in each event JSON.

time

The time an event occurred. The value should be a UNIX seconds timestamp (seconds since midnight, Jan 1, 1970, UTC). SI will discard events without time.

distinct_id

The value of distinct_id is a string and it uniquely identifies the user. It can be 255 characters in length.

$insert_id

The value of $insert_id is 36 character case-sensitive alphanumeric characters (with -) that uniquely identifies the event. 

The following is an example of a properly formatted event.

{
   "event": "Purchase Complete",
   "properties": {
       "time": 1358208000,
       "distinct_id": "13793",
       "$insert_id": "p09unabdez2236d7",
       "city_id": "1",
       "payment_type": "cc",
   }
}

Preparing people profiles for Cloud Import

Format each people property as a new line in a NDJSON object with the following attributes.

$distinct_id

The value of $distinct_id is a string that uniquely identifies the user. It can be 255 characters in length. 

$replace

The value of $replace is a JSON key-value pair containing all the properties that should be set for the user. Cloud Import will set or replace all the current properties in the user profile (if present) with the new properties provided in JSON.

The following is an example of a properly formatted people profile.

{
   "$distinct_id": "aAdonfR3492123",
   "$replace": {
       "Address": "1313 Mockingbird Lane",
       "Birthday": "1948-01-01"
   }
}

Configuring cloud storage for Cloud Import

Amazon S3 configuration

Create a new data access policy or add the following permissions to an existing data access policy. 

  1. Go to the IAM service in AWS and create a policy.
  2. Click the JSON on the right and parse this code to create policy required (please replace  <bucket_name> with your S3 bucket name): 
{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "MixpanelObjectAccess",
           "Effect": "Allow",
           "Action": [
                "s3:PutObject",
                "s3:GetObject",
      "s3:ListBucket",
                "s3:HeadBucket"
           ],
           "Resource": [
"arn:aws:s3:::<bucket-name>/*",
        "arn:aws:s3:::<bucket-name>"
]
}
   ]
}

After creating the policy above, you need to create a cross account IAM Role to assign the policies to the role.

  1. Go to the AWS IAM service on the console.
  2. Click Roles in the sidebar.
  3. Click Create Role.
  4. Select Other AWS Accounts on the trust policy page and enter "485438090326" for the account ID.
  5. In the Permissions page, find and select the policies you created above.
  6. In the Review page, enter a name and description for the role and click Create Role.

Next, limit the trust relationship to the Mixpanel export user to ensure only Mixpanel has the ability to assume this specific role.

  1. Navigate to the AWS IAM service in the console.
  2. Click Roles in the sidebar.
  3. Find and click the role you just created.
  4. Navigate to the Trust Relationships tab.
  5. Click Edit trust relationship.
  6. Replace the contents with the following JSON:
{
 "Version": "2012-10-17",
 "Statement": [
   {
     "Effect": "Allow",
     "Principal": {
       "AWS": "arn:aws:iam::485438090326:user/mixpanel-export"
     },
     "Action": "sts:AssumeRole",
     "Condition": {}
   }
 ]
}

Google Cloud Storage configuration

Create a bucket in GCP GCS and ensure export-upload@mixpanel-prod-1.iam.gserviceaccount.com has the following roles:  Storage Object Creator and Storage Object Viewer

After the bucket is created please share it with Mixpanel.

Running Cloud Import

Copy your prepared data to your cloud storage bucket. Split your data into multiple files (by date, event, etc.) for the best performance. While there is no hard limit on file size, you will get a much faster import if each data file is <10GB. 

To run Cloud Import or check on the status of a Cloud Import request, use the /import endpoint as described in the Developer Documentation here.

Sending old events without Cloud Import

You can write a one-off script to send your events to Mixpanel.

To do this, for every event in your database you would make a POST request that looks like this:

curl https://api.mixpanel.com/import \
-u YOUR_API_SECRET: \
-d data=eyJldmVudCI6ICIkc2lnbnVwIiwgInByb3BlcnRpZXMiOiB7ImRpc3RpbmN0X2lkIjogIjQ4MSIsICJ0aW1lIjogMTMyMTQ5OTM3MSwgInRva2VuIjogIjEzZmUzZGRjODZlYjZmOTBjNGVlN2QwZDQ3NTYzMTUwIn19 \
-d verbose=1

This request is very similar to our standard HTTP API. The data parameter is a Base64 encoded JSON array with the event you are importing ($signup) and the associated properties.

By decoding the Base64 data parameter from the above request, you can see the raw JSON:

{ "event":       "$signup",
 "properties":  {"distinct_id": "481",
                  "time": 1321499371,
                  "token": "13fe3ddc86eb6f90c4ee7d0d47563150"}}
  • Event: You can set this to any event, but $signup is particularly useful for retention analysis.
  • Distinct_id: The user ID you have been sending to Mixpanel up to this point for that user. In general, this is the value you pass into the identify method.
  • Time: A unix epoch style timestamp in UTC that tells Mixpanel when the event fired. This can be any time in the last 5 years. The above example, 1321499371, represents November 17th, 2011 at 3:09 AM GMT.
  • Token: The token property is your Mixpanel project token, which you can find by clicking your name in the upper righthand corner of your Mixpanel project and selecting Settings from the dropdown. Don’t confuse your API Secret with your project token!

Batching Requests

Using the /import endpoint, you can also batch requests to Mixpanel instead of sending one event per request. The endpoint will accept up to 50 messages in a single batch.

You can read more about batching requests to Mixpanel in our HTTP API documentation.

Did this answer your question?

Comments

0 comments

Article is closed for comments.