Creating Segment Objects¶
The creation of Segment
objects can be done through the use of five functions: create_segment
, generate_segments
,
generate_fixed_time_segments
, generate_collapsing_window_segments
, and detect_deadspace
within Distill’s
Segmentation package. Each function creates Segment
objects and returns them in the form of a Segments
object.
These functions fall under the categories of basic Segment
creation, automatic Segment
generation, and detecting
deadspace described below.
UserALE Log Preprocessing¶
Before Segment
objects can be created, the UserALE logs must be put in a format expected by the Segment
creation
functions. Each function expects logs to be structured in a dictionary sorted by clientTime
. The keys of the dictionary
are universally unique identifiers (UUIDs) for each log and the value for each UUID key is the log itself. Distill provides analysts with a
function that will generate these IDs: get_UUID
. This function can be used as follows:
# A UserALE log
log
# Generate UUID
uuid = distill.getUUID(log)
Throughout the rest of this documentation UUID and UID will be used interchangeably to describe these unique identifiers.
In addition, note that the functions within Distill’s Segmentation package expect the UserALE log clientTime
field to either be
represented as an integer or a python datetime
object. This is another preprocessing step that must be taken before
beginning to use Segmentation functions.
Basic Segment Creation¶
The most literal way to create Segment
objects is through the use of the create_segment
function. This function
takes in three parameters in order to create Segment
objects: a sorted dictionary of UserAle logs, a list of segment
names, and a list of tuples that represent the start clientTime
and end clientTime
of the segment. Given this
information, Segment
objects can be created as follows:
# Sorted dictionary of UserALE logs
sorted_dict
# List of segment names
segment_names = ["segment1", "segment2"]
# Time tuples
start_end_vals = [(start_time_1, end_time_1), (start_time_2, end_time_2)]
# Create Segments
segments = distill.create_segment(sorted_dict, segment_names, start_end_vals)
The above code will output a Segments
object that contains each Segment
object indicated.
Automatic Segment Generation¶
If an analyst does not know the start and end times of interest within the UserALE logs, Segment
generation functions
provide a more automatic way to create Segment
objects. There are three functions that aid in the automatic creation
of Segment
objects: generate_segments
, generate_fixed_time_segments
, and generate_collapsing_window_segments
.
Each of these functions provide an optional parameter entitled label
that denotes a prefix to use for the naming of
each generated Segment
object.
Generate Segments¶
The generate_segments
function is an automatic way to create Segment
objects and is based off of the matching of
a particular UserALE log field with a list of possible values. The function will then generate Segment
objects based
on windows of time starting before and after the matched field, indicated in seconds as a function parameter. The below
code illustrates the basic use of this function:
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on user clicks
segments = distill.generate_segments(sorted_dict, 'type', ['click'], 1, 2)
The above code will return a Segments
object that contains Segment
objects that represent windows of time 1 second
prior to a ‘click’ type and 2 seconds after a ‘click’ type. If we wanted to generate Segment
objects that matched both
‘click’ and ‘load’ types, then we could use the following code:
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on user clicks and loads
segments = distill.generate_segments(sorted_dict, 'type', ['click', 'load'], 1, 2)
Note that generate_segments
does not overlap Segment
objects. In the event that two matching events happen back-to-back
within the logs and the second log is already in the Segment
generated by the first, the second log will not have its
own Segment
created. This non-overlapping behavior also may create Segment
objects that are shorter in time than
expected. For instance, if a Segment
is created with an end time that is after the start time of a new Segment
,
the new Segment
object’s start time will default to the end time of the previous Segment
.
Generate Fixed Time Segments¶
The generate_fixed_time_segments
function generates Segment
objects based on fixed time intervals. An example
usage of this function is shown below:
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on 5 second intervals
segments = distill.generate_fixed_time_segments(sorted_dict, 5, label="generated")
The above code will create a Segments
object that contains Segment
objects created based off of 5 second intervals.
This example also demonstrates the usage of the optional label
parameter.
Note that by default this function will not trim additional logs that do not fit into a fixed time window if the time between
the start of the first log and end of the last log are not divisible by the indicated interval. To avoid this, generate_fixed_time_segments
also has an optional argument entitled trim
. If true, trim
will trim the logs that do not fit into an additional
fixed time window.
Generate Collapsing Window Segments¶
The generate_collapsing_windows_segments
function generates Segment
objects based on a window of time in which the
given field name has a value matching one of the values indicated by the field_values_of_interest
list parameter.
An example usage of this function is shown below:
# Sorted dictionary of UserALE logs
sorted_dict
# Generate segment objects based on a collapsing window
segments = distill.generate_collapsing_window_segments(sorted_dict, "path", ["Window"])
The above function creates a Segments
object that contains Segment
objects that begin when the path field has the
string “Window” and ends when the path field no longer contains “Window.”
Detecting Deadspace¶
The final Segment
creation function involves the automatic detecting of deadspace within the sorted UserALE log dictionary.
Deadspace is time in which the user is idle. The detect_deadspace
function creates Segment
objects based on deadspace
in the logs given a threshold for what is considered to be ‘deadspace’. An example of this is shown below:
# Sorted dictionary of UserALE logs
sorted_dict
# Create segment objects based on detected deadspace
segments = distill.detect_deadspace(sorted_dict, 20, 1, 2)
The above code will output a Segments
object holding Segment
objects that represent deadspace. In this case, we
have defined ‘deadspace’ to be any idle time of 20 seconds. Each time deadspace is detected, the logs that occurred 1
second before and 2 seconds after that idle time are recorded in the Segment
. Note that the optional label
parameter
is also available for the detect_deadspace
function.