LLM for User Interaction Understanding

Background

This project closely relates to the Visual Analytics for Sensemaking project. Please check that project webpage for the introduction on sensemaking.

A published paper describing the idea and a link to a online demo are available on this page. The source code of the system is available on GitHub. This is the recording of the paper presentation for VIS 2022.

The goal of this project is to understand how users make sense of data by visualising and analysing the sensemaking process represented as a high-dimensional vector sequence.

As described in the paper, each step in sensemaking is captured as a ‘provenance vector’ (called ‘provectories’ in the paper), which includes all the information necessary to reconstruct the visualisation state. This includes information such as what data are displayed, how they are visualisation, and user interaction.

In the paper, the Gapminder is used as an example (see the screenshot below), the provenance vector includes information such as what the x and y axis represent, what are the colour and size of each circle, and the year of the data.

As the user explores the data, he/she may change the year, select different attribute for the axis (e.g., showing ‘population’ instead of ‘income’ on the x axis), and so on. Each such change updates the ‘provenance vector’, and a sensemaking process can be described as a temporal sequence of such vectors. Such sequences (called ‘provectories’ in the paper) can be projected into lower dimensions with methods such as t-SNE or UMAP (the middle of the top figure). It is also possible to show the traces of many user sessions together to see if there is any similarity or difference (the right of the top figure).

Research Questions (Project Ideas)

Q1. Sensemaking Pattern Discovery

Q1.1 Single User

Once the sensemaking process is captured, it can be analysed by either visualisation or machine learning to identify any interesting patterns:

For single user, such patterns can be:

Are there any frequent patterns, i.e., a sequence of actions that appear multiple time? If so, what does this mean, i.e., why was the user doing this?
Is it possible to infer the analysis actions, such as comparing two similar states or analysing the clustering in the data?
Is it possible to infer what strategy the user is using, such as a depth- or breadth-first search?
Is it possible to show what data have been explore and what hasn’t?
Is it possible to infer when a user is stuck and can use some help with further analysis?
Did the user find the answer, and what is it?
…

This is a previous student project on this.

Q1.2 Multiple Users

There are also many interesting questions about a group of users, such as:

What are the differences and similarities among the user sequences?
Is it possible to tell who are the experts and who are the novices?
…

As mentioned, visualisation and/or machine learning can be used to answer these questions.

Q2. Provenance Embedding

If you are familiar with Large Language Models or LLMs (more information in the Human-AI teaming project description), such as chatGPT, that create a vector representation of text; the provenance vector (or Provectoris) is a vector representation of sensemaking. Since the former is usually called ‘text embedding’, we call this ‘Provenance Embedding’.

Q2.1 Analysis with Large Language Models (LLMs)

Besides visualisation and machine learning, can we use Large Language Models (LLMs) to analyse the provenance data? Maybe this will work because the provenance embedding and the text embedding used by LLMs are similar in format? Since LLMs are trained on text data while provenance embedding has lots of numerical data (such as time, mouse position, etc.), will it be more effective if each step of the provenance is described in text rather a vector, such as ‘user changed the year from 1800 to 1802’? The starting point would be just giving the provenance to a LLM and ask it to analyse.

Q2.2 Provenance Embedding Model

LLMs are trained to predict the next word in a sentence. However, they exhibit ‘intelligence’ way beyond this, such as summarising text and answer questions. So the question is: Is it possible to do something similar to Provenance Embedding? Can we training a model that can predict the next step in sensemaking? What other ‘intelligence’ such a model can have?