Creating Segment Objects

The creation of Segment objects can be done through the use of five functions: create_segment, generate_segments, generate_fixed_time_segments, generate_collapsing_window_segments, and detect_deadspace within Distill’s Segmentation package. Each function creates Segment objects and returns them in the form of a Segments object. These functions fall under the categories of basic Segment creation, automatic Segment generation, and detecting deadspace described below.

UserALE Log Preprocessing

Before Segment objects can be created, the UserALE logs must be put in a format expected by the Segment creation functions. Each function expects logs to be structured in a dictionary sorted by clientTime. The keys of the dictionary are universally unique identifiers (UUIDs) for each log and the value for each UUID key is the log itself. Distill provides analysts with a function that will generate these IDs: get_UUID. This function can be used as follows:

# A UserALE log
log

# Generate UUID
uuid = distill.getUUID(log)

Throughout the rest of this documentation UUID and UID will be used interchangeably to describe these unique identifiers. In addition, note that the functions within Distill’s Segmentation package expect the UserALE log clientTime field to either be represented as an integer or a python datetime object. This is another preprocessing step that must be taken before beginning to use Segmentation functions.

Basic Segment Creation

The most literal way to create Segment objects is through the use of the create_segment function. This function takes in three parameters in order to create Segment objects: a sorted dictionary of UserAle logs, a list of segment names, and a list of tuples that represent the start clientTime and end clientTime of the segment. Given this information, Segment objects can be created as follows:

# Sorted dictionary of UserALE logs
sorted_dict

# List of segment names
segment_names = ["segment1", "segment2"]

# Time tuples
start_end_vals = [(start_time_1, end_time_1), (start_time_2, end_time_2)]

# Create Segments
segments = distill.create_segment(sorted_dict, segment_names, start_end_vals)

The above code will output a Segments object that contains each Segment object indicated.

Automatic Segment Generation

If an analyst does not know the start and end times of interest within the UserALE logs, Segment generation functions provide a more automatic way to create Segment objects. There are three functions that aid in the automatic creation of Segment objects: generate_segments, generate_fixed_time_segments, and generate_collapsing_window_segments. Each of these functions provide an optional parameter entitled label that denotes a prefix to use for the naming of each generated Segment object.

Generate Segments

The generate_segments function is an automatic way to create Segment objects and is based off of the matching of a particular UserALE log field with a list of possible values. The function will then generate Segment objects based on windows of time starting before and after the matched field, indicated in seconds as a function parameter. The below code illustrates the basic use of this function:

# Sorted dictionary of UserALE logs
sorted_dict

# Generate segment objects based on user clicks
segments = distill.generate_segments(sorted_dict, 'type', ['click'], 1, 2)

The above code will return a Segments object that contains Segment objects that represent windows of time 1 second prior to a ‘click’ type and 2 seconds after a ‘click’ type. If we wanted to generate Segment objects that matched both ‘click’ and ‘load’ types, then we could use the following code:

# Sorted dictionary of UserALE logs
sorted_dict

# Generate segment objects based on user clicks and loads
segments = distill.generate_segments(sorted_dict, 'type', ['click', 'load'], 1, 2)

Note that generate_segments does not overlap Segment objects. In the event that two matching events happen back-to-back within the logs and the second log is already in the Segment generated by the first, the second log will not have its own Segment created. This non-overlapping behavior also may create Segment objects that are shorter in time than expected. For instance, if a Segment is created with an end time that is after the start time of a new Segment, the new Segment object’s start time will default to the end time of the previous Segment.

Generate Fixed Time Segments

The generate_fixed_time_segments function generates Segment objects based on fixed time intervals. An example usage of this function is shown below:

# Sorted dictionary of UserALE logs
sorted_dict

# Generate segment objects based on 5 second intervals
segments = distill.generate_fixed_time_segments(sorted_dict, 5, label="generated")

The above code will create a Segments object that contains Segment objects created based off of 5 second intervals. This example also demonstrates the usage of the optional label parameter.

Note that by default this function will not trim additional logs that do not fit into a fixed time window if the time between the start of the first log and end of the last log are not divisible by the indicated interval. To avoid this, generate_fixed_time_segments also has an optional argument entitled trim. If true, trim will trim the logs that do not fit into an additional fixed time window.

Generate Collapsing Window Segments

The generate_collapsing_windows_segments function generates Segment objects based on a window of time in which the given field name has a value matching one of the values indicated by the field_values_of_interest list parameter. An example usage of this function is shown below:

# Sorted dictionary of UserALE logs
sorted_dict

# Generate segment objects based on a collapsing window
segments = distill.generate_collapsing_window_segments(sorted_dict, "path", ["Window"])

The above function creates a Segments object that contains Segment objects that begin when the path field has the string “Window” and ends when the path field no longer contains “Window.”

Detecting Deadspace

The final Segment creation function involves the automatic detecting of deadspace within the sorted UserALE log dictionary. Deadspace is time in which the user is idle. The detect_deadspace function creates Segment objects based on deadspace in the logs given a threshold for what is considered to be ‘deadspace’. An example of this is shown below:

# Sorted dictionary of UserALE logs
sorted_dict

# Create segment objects based on detected deadspace
segments = distill.detect_deadspace(sorted_dict, 20, 1, 2)

The above code will output a Segments object holding Segment objects that represent deadspace. In this case, we have defined ‘deadspace’ to be any idle time of 20 seconds. Each time deadspace is detected, the logs that occurred 1 second before and 2 seconds after that idle time are recorded in the Segment. Note that the optional label parameter is also available for the detect_deadspace function.