Research foundations

The measurement gap

Measuring emotional experience is harder than it looks. The dominant approaches each carry structural limitations that leave a significant gap in what researchers and practitioners can actually deploy.

Simple self-report scales

Single-item ratings ("How do you feel? 1–10"), emoji sliders, and visual analogue scales are easy to administer and widely used. But they are not grounded in any validated model of emotion. They collapse a complex, multidimensional experience into a single number — losing almost all signal in the process. A "7" before an experience and a "7" after could reflect entirely different emotional states. There is no principled way to compare ratings across people, sessions, or contexts because the scale itself has no anchored meaning.

Clinical instruments

Validated instruments like the PHQ-9 (depression), GAD-7 (anxiety), PANAS (positive/negative affect schedule), and POMS (profile of mood states) are rigorously developed and clinically meaningful. But they carry significant structural limitations for general-purpose before/after emotional capture:

Clinically specific. The PHQ-9 measures depression. The GAD-7 measures anxiety. Neither spans the full range of human affective experience — positive states, mixed states, and states that don't map onto a clinical diagnosis are outside their scope.
Burdensome to administer. PHQ-9 takes 2–3 minutes; PANAS and POMS take 10–20. For before/after capture or repeated daily administration this is prohibitive. Completion rates fall sharply with length.
Opaque to users. "Over the last two weeks, how often have you had little interest or pleasure in doing things? — Not at all / Several days / More than half the days / Nearly every day." This question requires clinical framing to interpret. The person answering rarely understands what their score measures or how it relates to how they actually feel right now. This is a trust and engagement failure on top of the scope limitation.
Not composable. No single instrument spans the full affective space. Using multiple instruments compounds the administration burden and produces disconnected data streams.

The gap affect-kit addresses

There is no widely-adopted, evidence-grounded instrument that is simultaneously:

Fast enough for before/after capture and repeated administration
Broad enough to cover any emotional state, not just clinical ones
Grounded in validated affective science, not arbitrary scales
Legible to the people taking it — producing results they recognize and understand

affect-kit addresses this gap using Russell's circumplex model of affect as the scientific foundation [2] and the NRC VAD Lexicon as the vocabulary backbone [3]. The design goal: an instrument a researcher can defend and a user can connect with — simultaneously. The rater takes under 60 seconds, produces a structured Rating object grounded in validated V/A/D coordinates, and uses language users recognize because it comes from how people actually describe how they feel.

1. The dimensional model of affect

Why dimensions, not categories

Discrete-emotion theories — Ekman's "basic emotions" of anger, disgust, fear, happiness, sadness, and surprise — have been challenged since the 1980s by evidence that emotion is better modeled as a continuous space than as a set of named categories. Russell's circumplex model [2] and Mehrabian & Russell's PAD/VAD model [1] propose that any emotional state can be located in a space defined by three orthogonal dimensions:

Valence (V) — pleasantness. How positive or negative the experience is. Ranges from −1 (maximally negative) to +1 (maximally positive).
Arousal (A) — activation. How energized or calm. Ranges from −1 (maximally calm/lethargic) to +1 (maximally energized/excited).
Dominance (D) — control. The degree of agency, from −1 (completely helpless) to +1 (completely in control). See §1.2 for why this dimension is preserved even though it isn't visualized.

The dimensional approach has accumulated substantial evidence:

Posner et al. (2005) found that PET and fMRI studies map emotional experience onto valence and arousal axes more cleanly than onto discrete categories.
Lindquist et al. (2012) [7] found that "basic emotion" categories lack consistent neural signatures, while valence and arousal dimensions do.
Barrett (2017) [6], in How Emotions Are Made, argues that emotions are constructed from core affect (valence × arousal) plus conceptual knowledge — strong support for treating dimensions as primary and labels as secondary refinements.

The three-dimensional VAD space (Mehrabian & Russell 1974/1980; Russell 1980). Selected emotion examples plotted at their NRC VAD Lexicon coordinates [3]. Axes extend −1 to +1. Note that anger and anxious share similar V/A positions but differ substantially in D (control): anger D=+0.31, anxious D=−0.13.

1.1 Why include dominance even though it's not visualized

Two emotions can share nearly identical V and A coordinates but differ meaningfully in D:

Frustrated V=−0.84, A=0.30, D=−0.49

Negative, somewhat activated, helpless. Angry but without agency.

Anger V=−0.67, A=0.73, D=+0.31

Negative, highly activated, agentic. Angry and feeling capable of acting.

These distinctions have genuine clinical relevance: high-D negative states call for assertion and boundary skills; low-D negative states call for containment and grounding. The difference also surfaces in longitudinal patterns — chronic low-D states correlate with depression and learned helplessness; chronic high-D negative states with anger dysregulation.

In affect-kit, the face glyph and color visualization use only V and A — those are the dimensions a face can naturally communicate. D is preserved in every Rating object as analytical metadata, available for downstream research export and aggregation, but not visualized in the UI.

2. The interaction model: orient, refine, commit

2.1 The three-step flow

A capture has three steps. They are deliberately distinct so each one does a single thing well:

Orient. Drag the face to indicate roughly where you feel you are in V/A space. The face mirrors your placement and the color tints in real time; the 55 emotion words sort themselves so the most relevant rise to the top. This step is a gesture, not a measurement — your body knows it's here before you have a word for it.
Refine. Tap the words that fit, up to five, at three intensity levels. Each word carries lexicon-validated VAD coordinates, and the intensity-weighted composite of your selections is the measurement.
Commit. Tap Done to finalize. The rating preserves both the pre-verbal placement (face) and the composite VAD from your words (composite), separately.

This sequence is grounded in affective science:

Feeling precedes naming (Damasio 1999 [16]; Barrett 2017 [6]). Pre-verbal somatic awareness is real and distinct from verbal categorization.
Labeling emotions reduces their intensity — "affect labeling" (Lieberman et al. 2007 [9]) — so naming after the gesture avoids contaminating the somatic signal.
The face provides immediate non-judgmental feedback: it is a mirror of the user's state, not an evaluation of it.

2.2 Why the gesture is not the measurement

The orientation step might feel like a rating, but mechanically it isn't. The gesture does three things, and exactly three:

Drives the face glyph expression in real time
Drives the color tint interpolation
Sorts the NRC VAD Lexicon by Euclidean distance from the gesture position so the most relevant words rise to the top

The gesture is an affordance for navigation through a 55-word vocabulary — an intuitive, pre-verbal entry point into the lexicon. A user who drags toward the upper-left (negative V, positive A) sees anger, panic, and frustration rise to the top; a user who drags toward the lower-right sees calm, peaceful, and serene. The gesture replaces what would otherwise be a flat, overwhelming list.

2.3 Words compose the V/A/D measurement

The validated measurement comes from the selected words and their intensity levels, weighted using each word's NRC VAD Lexicon coordinates:

For each selected word at intensity level ∈ 3:

weight = level / 3                       // 0.33, 0.67, or 1.0

composite.v = Σ(word.v × weight) / Σ(weight)
composite.a = Σ(word.a × weight) / Σ(weight)
composite.d = Σ(word.d × weight) / Σ(weight)

If no words are selected, composite is null — the rating falls back to the gesture position with d=0. This preserves the somatic signal even when the user doesn't refine further.

This design is why affect-kit can produce genuine dominance values despite showing only a 2D gesture surface: dominance enters the rating through the lexicon. A user who taps frustrated (NRC d=−0.49, low-control) versus anger (d=+0.31, agentic) generates structurally different ratings. The gesture alone cannot make this distinction; the words can.

2.4 The face-vs-composite gap as a signal

Even though the gesture isn't the formal measurement, it carries real signal — and the rating object preserves it separately. The Rating object holds both the pre-verbal placement (rating.face) and the label-aggregated composite coordinates (rating.composite). The gap between them is itself research data.

Work on emotional granularity (Barrett 2001; Kashdan, Barrett & McKnight 2015 [8]) links the ability to make fine-grained distinctions between similar emotions to mental health outcomes. Large, persistent gaps between where a user gestures and what they eventually label are operationally close to measures of alexithymia — difficulty identifying or describing one's own feelings. Longitudinal tracking of this gap is a future research use case for the package.

3. The face glyph

3.1 FACS action units

The face glyph encodes V and A using Facial Action Coding System (FACS, Ekman & Friesen 1978 [5]) action units selected for their cross-cultural consistency — validated in 17 cultures by Cordaro et al. (2018) [10]:

Action unit	What it does	Driven by
AU 1+2 (inner+outer brow raiser)	Surprise / sad brow lift	Sadness: negative V × negative A
AU 4 (brow lowerer)	Frowning, anger furrow	Anger: negative V × positive A
AU 6 (cheek raiser)	Crows feet, Duchenne smile	Joy: positive V × positive A
AU 12 (lip corner puller)	Smile arc	Joy + general positive V
AU 15 (lip corner depressor)	Down-turned mouth	Sadness: all negative V
Eye openness (width + thickness)	Wide vs narrow	Arousal: positive A = wider
Mouth openness	Parted lips	Arousal: positive A = open

anger AU 4 + narrow eyes

sadness AU 1+2 + AU 15

neutral baseline

calm soft arch (no crows feet)

joy AU 6 + AU 12 (Duchenne)

Calm vs joy distinction. The calm face uses a soft upward eye arch (positive V, negative A) without crows feet. Joy adds the Duchenne crows feet — orbicularis oculi contracts only with high-intensity positive affect. This anatomical distinction matters for cross-cultural recognition.

3.2 Why a reductive glyph

The face is intentionally minimal: black ink linework, no skin tone, no eye whites, no hair, no body. This is by design:

Cultural neutrality. Skin color, hair texture, and facial structure encode race, ethnicity, and gender. Removing them removes that load from a tool designed to reflect universal affective dimensions.
Universality. The features are a graph of relationships — brows above eyes, mouth below nose — not a portrait of a specific person.
Self-projection. A non-photorealistic face is easier to project onto. Users see "their" feeling, not a depicted person's feeling.
Research-aligned. Schematic faces are recognized at the V/A level as well as photographs (Aviezer et al. 2008 [15]).

3.3 Animation principles

The face on the rater is alive but not theatrical:

Breath. Subtle vertical translation at ~0.55 Hz, amplitude ~0.5 viewBox units. Conveys a living presence without being distracting.
V-driven cohesion. At high arousal with negative valence (anger zone), facial features tremor and are displaced from each other. At high arousal with positive valence, features are stable but bouncy. Calm states are still. The rule: "out of control but still a face."
Shock on chip selection. Selecting a high-arousal label triggers a brief burst of additional motion, reinforcing the connection between the word and the face's expression.
Reduced motion. prefers-reduced-motion: reduce disables tremor and breath. Functional transitions (drag morphing, chip level changes) are preserved but shortened.

4. Vocabulary: the NRC VAD Lexicon

4.1 Primary source

affect-kit's 55-word emotion vocabulary draws V/A/D coordinates from the NRC Valence-Arousal-Dominance Lexicon v2.1 (Mohammad 2025 [3]):

20,000+ English words with V/A/D ratings, crowd-sourced from native speakers
Published values normalized to 0..1 (rescaled to −1..1 in affect-kit)
Free for research and commercial use
Multilingual extensions available for ~100 languages

Secondary cross-references: Warriner et al. (2013) [4] (13,915 words, V/A only, high precision) and Bradley & Lang (1999) ANEW norms (1,034 words, comprehensive V/A/D).

4.2 Vocabulary curation principles

The 55 emotions were selected from the NRC lexicon following these principles:

Quadrant coverage. Span all four V/A quadrants reasonably evenly. Avoid the common clinical bias toward negative-emotion vocabularies.
Intensity gradation. Include emotions of varying intensity within each quadrant. Content and ecstatic are both positive-valence / positive-arousal but at very different intensities.
Single-word. Prefer single words over phrases. "Overwhelmed" not "feeling overwhelmed." Compactness matters for chip rendering.
Common language. Everyday words over clinical terms. "Sad" not "dysphoric."
No moral loading. Skip words that imply judgment. "Resentful" is an emotion; "guilty" carries moral weight but is included because it's affectively specific. "Sinful" is not.

4.3 Emotional granularity as a trackable outcome

Research by Barrett and colleagues (Kashdan, Barrett & McKnight 2015 [8]) established that emotional granularity — the ability to make fine-grained distinctions between similar emotions — predicts mental health outcomes: higher granularity correlates with lower psychopathology; lower granularity with depression, anxiety, and alexithymia.

The intensity levels (1/2/3) on chips are themselves a granularity affordance. A user who selects only level-3 chips treats each emotion as binary; a user who uses all three levels makes finer distinctions. The number of distinct emotion words a user employs across sessions, and how they distribute across V/A space, are trackable granularity signals.

4.4 Why the vocabulary is not customizable

Consumers cannot pass custom emotion words via props. The reasons:

Trust. The entire rating computation depends on V/A/D values being empirically validated. Arbitrary words break that guarantee silently.
Calibration. V/A/D values are empirical norms, not opinions. Allowing overrides is a category mistake.
Internationalization is principled. Future language support will use validated multilingual lexicons (NRC has them for ~100 languages), selected via a strict language prop, not free-form data.

5. Color design

5.1 The four-quadrant palette

Color mode maps the V/A position to a continuous blend across four corner colors, each associated with a quadrant of emotional space:

Pink −V, +A anger, frustration, agitation

Gold +V, +A joy, excitement, enthusiasm

Green +V, −A calm, content, peaceful

Blue −V, −A sad, lonely, defeated

Color choices follow cross-cultural color-emotion associations (Adams & Osgood 1973 [18]; Hupka et al. 1997): red/pink for anger, gold for joy, green for calm, blue for sadness. Cyan is deliberately avoided at the lower-left (sadness) corner because in HRV/biofeedback contexts cyan conventionally signals "high coherence" — a semantic conflict with sadness.

5.2 OKLab interpolation

Colors blend in OKLab color space (Ottosson 2020 [14]) via bilinear interpolation across the four corners. OKLab is perceptually uniform, so V/A positions blend smoothly without the hue-flip and luminance-jump artifacts that HSL interpolation produces at quadrant interiors (particularly around gold, which has a narrow luminance band in HSL).

5.3 When color is appropriate

Color mode defaults to off in clinical and wellness contexts where affect-kit is a supporting instrument. Reasons:

Don't compete with the primary signal. If the host product owns a color identity (e.g., HRV biofeedback with a green/red palette), affect-kit introducing a second color system creates visual noise.
Don't moralize the state. A bright pink surface for negative emotions reads as "bad." Monochrome lets the user have a state without being told it's good or bad.
Reduces visual stimulation for activated states. A user in a high-arousal anxious state does not benefit from a vibrant red surface.

Color mode is appropriate for standalone use — journaling, publication, social sharing — where visual expressiveness is the point.

6. Longitudinal foundations

affect-kit is a logbook substrate — it captures single ratings and renders them faithfully. Longitudinal interpretation is future territory, but the design decisions made now are constrained by what the literature can actually defend.

6.1 What progress means for emotional self-rating

The fundamental error in longitudinal emotion tracking is treating affect as having a monotonic "improvement" direction. Research shows otherwise:

Feeling more negative emotion is not worse if it's contextually appropriate (Davis, Gross & Ochsner 2011 [12]).
Emotional flexibility — the ability to access a range of emotions including negative ones — predicts wellbeing better than emotional positivity (Kashdan & Rottenberg 2010 [13]).
Pattern interruption carries more clinical information than long-term averages: sudden shifts in emotion patterns over weeks matter more than whether average valence is slightly positive (Bolger, Davis & Rafaeli 2003 [17]).

6.2 Three operationally-defined longitudinal patterns

The affect-dynamics literature converges on three patterns that can be meaningfully tracked longitudinally, anchored on valence as the primary axis (the dimension with the most settled operational definitions and the most intuitive user interpretation):

Inertia

Kuppens et al. (2010) [19]

The autocorrelation of affect across consecutive measurements. High inertia — especially sustained negative valence — indexes a system that fails to respond to context. The clinical reading is stuck, not stable. Operationally: autocorrelation of V across a session window.

Resilience

Ong et al. (2006) [20]

The rate at which affect returns toward baseline after a perturbation. Faster, more reliable recovery is associated with better long-term adaptation. Chronic non-recovery is a warning sign. Operationally: V's slope back toward baseline after a negative excursion.

Range

Gruber et al. (2013) [21]; Barrett (1998)

Two distinct constructs live under this heading: Affective variability — how much V actually moves across a time window (the spread of V values); and Emotional granularity — how finely the user discriminates among states (the intraclass correlation across self-reported emotion ratings). A person can gain granularity without gaining variability, and vice versa.

6.3 Why valence is the anchor for future recipes

Of the three VAD dimensions, valence is the strongest anchor for longitudinal recipes. Arousal carries less interpretive consensus over time — the same elevated arousal reading can mean excitement, anxiety, or flow depending on context. Dominance has very little experience-sampling research behind it as an outcome variable. Affect-kit captures all three dimensions in every Rating, but the first longitudinal recipe implementations will anchor on valence first.

Each future recipe must be parameterised (thresholds, windows, definitions set by the consumer), cite the minimum it would defend, and avoid composite "wellness scores." The construct opposite of emotional granularity is alexithymia — not contentment. Both matter for different reasons.

6.4 HRV + affect: parallel tracks

In contexts pairing affect-kit with HRV biofeedback, the two signals must be tracked in parallel and never combined into a single score. HRV is physiological (autonomic state, objective normal ranges); affect is phenomenological (felt experience, no normal ranges). A user with low HRV who reports calm is not an error — it is interesting data. Show both as parallel timelines that share an x-axis, and let users see the divergences themselves.

7. Design position: logbook substrate, not diagnostic

affect-kit's published components capture, render, and aggregate emotion ratings. They do not interpret trajectories or make clinical claims.

Longitudinal interpretation of dimensional affect is genuinely unsettled science. Applying clinical-flavored names like "stuck," "anxious loop," or "resilience score" to regions of V/A space prescribes meaning the data alone doesn't carry. The rater UI uses face, color, and quadrant position as affordances for label selection — navigation aids — not as semantic claims about quadrants themselves. Treating those quadrants as semantically meaningful in longitudinal views would import meaning that the capture flow never asserted.

What affect-kit ships today

<affect-kit-rater> — single-session capture
<affect-kit-result> — single-rating display
<affect-kit-face> — standalone V/A face glyph
<affect-kit-compare> — two snapshots side-by-side with no diff metric, no "improvement" claim

Future recipe conditions

A longitudinal recipe can graduate from the playground to the published package when it satisfies:

A defended default (specific thresholds with cited evidence)
Cited papers (minimum: one peer-reviewed study supporting the measure)
A story for what the widget doesn't claim

Privacy requirements

Emotion ratings are among the most sensitive categories of personal data. Any application built on affect-kit's rating objects should:

Store ratings with end-to-end encryption at rest
Provide full deletion paths, irrevocably, on demand
Offer a local-first option — ratings storable entirely on-device
Never share ratings with third parties without explicit per-event consent
Export in standard formats (VAD CSV, PMHC-compatible for clinical use)

Bibliography

Listed in order of significance to the package, following the source document.

Mehrabian, A., & Russell, J. A. (1974). An approach to environmental psychology. MIT Press. — Original VAD/PAD model.
Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39(6), 1161–1178. doi:10.1037/h0077714
Mohammad, S. M. (2025). NRC Valence-Arousal-Dominance Lexicon v2.1. National Research Council Canada. saifmohammad.com/WebPages/nrc-vad.html — Primary vocabulary source.
Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45(4), 1191–1207. doi:10.3758/s13428-012-0314-x
Ekman, P., & Friesen, W. V. (1978). Facial Action Coding System. Consulting Psychologists Press. — FACS; basis for face glyph design.
Barrett, L. F. (2017). How Emotions Are Made: The Secret Life of the Brain. Houghton Mifflin Harcourt. — Constructionist theory; rationale for V/A primacy.
Lindquist, K. A., et al. (2012). The brain basis of emotion: a meta-analytic review. Behavioral and Brain Sciences, 35(3), 121–143. doi:10.1017/S0140525X11000446
Kashdan, T. B., Barrett, L. F., & McKnight, P. E. (2015). Unpacking emotion differentiation: transforming unpleasant experience by perceiving distinctions in negativity. Current Directions in Psychological Science, 24(1), 10–16. doi:10.1177/0963721414543222
Lieberman, M. D., et al. (2007). Putting feelings into words: affect labeling disrupts amygdala activity. Psychological Science, 18(5), 421–428. doi:10.1111/j.1467-9280.2007.01916.x
Cordaro, D. T., et al. (2018). Universals and cultural variations in 22 emotional expressions across five cultures. Emotion, 18(1), 75–93. doi:10.1037/emo0000301
Mehling, W. E., et al. (2012). The Multidimensional Assessment of Interoceptive Awareness (MAIA). PLoS One, 7(11), e48230. doi:10.1371/journal.pone.0048230
Davis, R. N., Gross, J. J., & Ochsner, K. N. (2011). Psychological distance and emotional experience: what you see is what you get. Emotion, 11(2), 438–444. doi:10.1037/a0022983
Kashdan, T. B., & Rottenberg, J. (2010). Psychological flexibility as a fundamental aspect of health. Clinical Psychology Review, 30(7), 865–878. doi:10.1016/j.cpr.2010.03.001
Ottosson, B. (2020). A perceptual color space for image processing. bottosson.github.io/posts/oklab — OKLab specification.
Aviezer, H., et al. (2008). Angry, disgusted, or afraid? Studies on the malleability of emotion perception. Psychological Science, 19(7), 724–732. doi:10.1111/j.1467-9280.2008.02140.x
Damasio, A. (1999). The Feeling of What Happens. Harcourt. — Somatic-marker hypothesis; feeling-before-naming.
Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: capturing life as it is lived. Annual Review of Psychology, 54, 579–616. doi:10.1146/annurev.psych.54.101601.145030
Adams, F. M., & Osgood, C. E. (1973). A cross-cultural study of the affective meanings of color. Journal of Cross-Cultural Psychology, 4(2), 135–156. doi:10.1177/002202217300400203
Kuppens, P., et al. (2010). Emotional inertia and psychological maladjustment. Psychological Science, 21(7), 984–991. doi:10.1177/0956797610361794 — Inertia: autocorrelation of affect across time.
Ong, A. D., et al. (2006). Psychological resilience, positive emotions, and successful adaptation to stress in later life. Journal of Personality and Social Psychology, 91(4), 730–749. doi:10.1037/0022-3514.91.4.730 — Resilience: rate of recovery toward baseline.
Gruber, J., et al. (2013). A matter of time: temporal dynamics of emotional experience affect well-being. Emotion, 13(6), 1030–1036. doi:10.1037/a0033788 — Affective variability.