# Evaluation
How to tell if a visualization is working.
- assess the domain problem (do others agree with assumptions)
- determine the data and tasks (can people build the knowledge they need)
- choose the right encodings (can people see the patterns they need correctly)
- map data to encodings via algorithm (does the algorithm perform correctly)
- design interactions to explore data (can people quickly and intuitively interact with the data)
# Holistic Evaluation
- What new knowledge can my users gain? (insight based evaluation)
- Does my visualization work better than other methods? (experimental evaluation)
# Insight-Based Evaluation
Quantify knowledge gained by a visualization, similar to ethnography.
# Experimental Evaluation
Run a controlled study to measure how quickly and accurately people can complete tasks by using different visualizations.
# Insight
The fundamental unit of measurement when evaluating data visualizations. Also the purpose of visualization.
Unit of discovery - what does the tool enable someone to do?
- complex: involve all or large amounts of the data in synergistic ways
- deep: builds up over time, accumulating and building on itself
- qualitative: can be uncertain and subjective and have multiple levels of resolution
- unexpected: often unpredictable, serendipitous, and creative
- relevant: deeply embedded in the data domain and connecting to existing knowledge
Two types:
- Knowledge-building insight: substance that accumulates over time
- Spontaneous insight: discrete event of discovery
Metrics for Insight-Based Evaluation:
- time to insight
- number of insights
- importance of insights
- depth of insights
- system adoption rate (people electing to use a system)
# Qualitative Evaluation
Collect quotes, use cases, and anecdotes that help illustrate the effectiveness of your solution.
- systematic surveys: read and assemble knowledge on your target topic and existing solutions
- semi structured interviews: conversation with collaborators
- think aloud: work with collaborators while using your solution on their own data
- journaling: ask collaborators to use your tool and note their observations
Use insight as key measure during these processes. Document as much as possible, including metadata about your participants.
# Systematic Surveys
- understand the problem and prior solutions
- determine what expertise is necessary for a holistic solution
- spec key tasks, datasets and other considerations
# Semi structured interviews
- figure out the problems that experts see as most impactful
- decide who will best serve in which roles
- verify your understanding of the target problem
- get focused, fluid feedback about what is and isn't working
# Think aloud Studies
- Establish limits in current solutions and key workflows
- Get direct feedback on low-fi prototypes grounded in design
- Identify strengths and weaknesses of the current design
# Journaling Studies
- Give your collaborators the tool and let them play with it. Make sure to ask them to document things like interesting findings or confusing settings/bugs with screenshots
# Experimental Design
More precise than insights, but less about the domain. Measure how people complete a specific set of tasks under different conditions.
- Form a specific question
- Generate a set of falsifiable hypotheses
- Determine your independent (what you change) and dependent (what you measure) variables.
- Build your stimuli & experimental infrastructure (task framing, how to complete task, data collection, etc.)
- experimental tasks: statistical, comparison (which approach is more effective), decision making (choose an approach for a given scenario)
- experimental stimuli: the items people use to complete the tasks (where does data come from?)
- independent variables (each participant can see one, all, or some conditions)
- dependent variables: things to measure (time, accuracy, number of interactions, etc.)
- control variables: things that might affect your result (order, etc.)
# Analyze your data
Descriptive statistics:
Use measures of the data distribution to estimate whether there's differences between independent variables.
Inferential Tests:
Use statistical tests to estimate the likelihood that your observed differences reflect true difference.
# Evaluation Trade Offs
Qualitative:
- holistic input
- real-world use
- richer data
- less controlled
- slower
- less precise
Experimental
- highly precise
- more generalizable
- easier to understand
- less specific
- less detailed
- more abstract
# Formative Evaluation
Test our understanding of the problem space and gather insight into the user's processes. Measure how well different designs optimize for a given set of tasks or goals.
Area Survey: what are the core tasks and needs of the problem Preference Mining: what design do people like
# Summative Qualitative Evaluation
Measure how well a given tool supports a domain. Provide a measure of performance against a target baseline.
Think aloud: what are users' impressions of a tool Horse race: which design is most efficient for a set of tasks popular vote: what design do people like best