DOTE (Distributed Open Transcription Environment)

Guide for DOTE users

Glossary of key terms in DOTE

Below is a list of key terms in alphabetical order with short definitions and links to relevant help pages for more information.

2D Video

In DOTE terminology, we use 2D Video to refer to the traditional recordings made by video cameras that flatten and render the visible 3D world onto a rectangular frame of digital pixels. The use of the nomenclature of dimension and angle (Eg. 2D, 2½D, 3D, 180, 360) is becoming standardised in more technical research on advanced recording and immersive technologies.

See also 360-degree Video.

360-degree Video

A 360-degree Video recording is made using a special 360-degree camera with at least two wide-angle lenses arranged in a fixed configuration around a central axis. Using special algorithms, a spherical view of everything visible (panorama) around the fixed point of the camera can be reconstructed using a Projection mapping. Such video recordings are passive in that the Viewport can be rendered in a digital media player or in Virtual Reality from only that fixed point. Some 360-degree cameras can also recover depth mapping and render a stereoscopic view in all directions. 180-degree videos represent the visible field in a half-sphere. DOTE does not support 180-degree videos directly, but if they are transposed into an equirectangular format with the other half of the sphere behind the camera rendered black, then they can be imported.

See Equirectangular and Spatial Audio.

Active Media

When media files are added to a Project using the Media Manager they can be activated to appear in the current Transcript. If a media file is deactivated, it will not be available for use in the current Transcript.

Alignment Symbol

An alignment symbol (Mondadian conventions) is a unique symbol reserved for use in a transcript to indicate the temporal alignment of an action in a Subtier with the HEAD of the Neighbourhood.

See Realignment.


An Ambisonic audio recording is a representation of the Spatial Audio field around a fixed point. Special Ambisonic microphones with at least four microphone capsules in a fixed configuration can be used to capture multiple channels of sound that later can be rendered through headphones to reconstruct the Spatial Audio field from the 'point of view' of the hearer. In later releases, DOTE will support Ambisonic recordings in a standard format and render the Spatial Audio field to match the 360-degree Video recording with which it is associated.


Autocompletion is the automatic, context-sensitive prompting of possible character or symbol strings that the user can select from. There are several aspects of a Transcript that can be Autocompleted in the Editor.

  1. The initial Speaker-id column.
  2. Named Subtier Types: action, translation and gloss.
  3. Special symbols and symbol pairs.
  4. Overlaps.


DOTE provides a behind-the-scenes automatic backup system called Autobackup. After a user-defined time interval (Settings), a new snapshot is taken of the current Transcript if there are unsaved changes. The Transcript itself is not saved to disk; only a backup copy is made.


A Checkpoint is a user-initiated snapshot of the current Transcript if there have been changes made since the last known Checkpoint, regardless of whether or not the Transcript has been saved. A message can be added to each Checkpoint to log the history of changes in a storied fashion. Messages are usually in the active, imperative voice describing the changes made by the new Checkpoint since the last one, eg. "Edit lines 45-47: add stress and loudness."


A comment is a message in a Transcript by a transcriber that concisely describes something happening in the data that cannot be easily represented using the traditional conventions, eg. in the Jeffersonian conventions comments are written within double parentheses: ((comment)).

Cf. Technical Comment


In qualitative research, events in social interaction can be described in textual form using a variety of Conventions to carefully document specific phenomena concerning speech and action. DOTE has been implemented to adhere as closely as possible to two standardised transcription Conventions, eg. Jeffersonian and Mondadaian.

CS Mode

When a media Activated in a Transcript is played back, there is a toggle option in the Editor panel called CS Mode that synchronises playback with the relevant lines of the Transcript that match the timecode segment on the Timeline. In DOTE, this is only possible if Sync-codes have been manually added to the Transcript.

See also Synchronised Media.


The Editor is the Panel in which the text of the Transcript is created and edited.


An Equirectangular video describes one method for squeezing a 360-degree Video panoramic recording into a standard 2D rectangular video frame. Unavoidably, it is massively distorted and thus difficult to 'read' in terms of the relative positioning of people/objects and of the directionality of actions and gaze, eg. who is looking at whom. It is also a term for one of the Projections that can be rendered from the recording format into a Viewport.


Git is a free, open source distributed version control system used by DOTE.

See Version Control and Checkpoint.


The HEAD of a Neighbourhood is always the first line. It is either a primary speaker line or a Timing Interval line, and it provides the a notional temporal structure to which other lines in the Neighbourhood adhere.

Jump Cut

Video-cues can initiate a transition between one view of the video and another that is displayed in the Viewport of a Video Panel. This transition can be sudden -- a Jump Cut -- or Smooth.

Cf. Zoom and Pan.

Line number

There are two senses of Line Number in DOTE:

  1. The abstract Line Numbers in the Editor. The Transcript Editor assigns a temporary, unique number in ascending order to each and every line.
  2. The Line Numbers assigned when the Transcript is Exported to an RTF document. The Exporter can assign a permanent, unique line number in ascending order to every line or some of the lines according to a principle, such as only assign a Line Number to a line if it has a Speaker-id, a Timing Interval and/or a Comment. Thus, these two senses are not equivalent.


Media can be looped by selecting a portion of the Waveform on the Timeline. During playback, this selected section will Play repeatedly.

Media Manager

The Media Manager is a tool to add media files to an individual Project. It is used to import (by copying), configure and delete media files, as well as make them Active in the current Transcript. A Project can contain multiple media files, and each Transcript in a Project can activate one or more of these media to use in the Timeline and Video Panel(s).


A Neighbourhood is a concept we developed to better encapsulate what goes on during a recording in which a relatively short, fixed time period elapses. Sets of lines in a transcript can be grouped together temporally because they try to represent events that happen within a single, continuous duration in time. In practice, the duration of time is most often dictated by the maximum number of characters (with a specific font size) that are available on a single line on a standard page of text, eg. A4/Letter. A new Neighourhood begins when the lines in the prior Neighbourhood terminate before the maximum is reached. Each Neighbourhood is temporally contiguous with the one immediately before and the one after. Everything transcribed in a single Neighbourhood occurs within that single duration of time, whether it be speech, multimodal action or events happening in the scene. Relevant actions taking place sequentially or simultaneously are represented in a set of transcript lines using a Script-based Transcription System. A Neighbourhood contains all those actions, which are represented using the Jeffersonian or Mondadaian conventions, sometimes in concurrent Subtiers.

See also HEAD.

Non-Sequential Overlap

Sometimes, speakers or actions may overlap in duration but are not sequentially relevant to each other. For example, two separate groups (A and B) are speaking at the same time in a room, so that the speech of one person in group A is inadvertently overlapping with another speaker in group B. In DOTE, this is marked with matching single curly brackets -- {non-sequential overlap} -- on more than one line in a Neighbourhood.


Overlaps between speakers are traditionally marked between matching single square brackets -- [overlap] -- on more than one line in a Neighbourhood.

See Non-Sequential Overlap.


Video-cues can initiate a transition between one view of the video and another that is displayed in the Viewport of a Video Panel. A Pan transition smoothly and linearly tracks between an initial view of the video and a target view. This is only true if the same media is selected and Smooth Transition is selected.

See also Zoom and Jump Cut.

Play Transport

The Play Transport consists of the media controls that affect Playback.


A Projection is a rendering of a 360-degree Video recording onto a 2D rectangular Viewport by mapping geometrical perspective according to a specific algorithm. Like with printed world maps/atlases -- not globes -- each Projection enhances certain features and distorts others.

See Equirectangular.


A DOTE Project consists of all temporally synchronised Media and all Transcripts associated with one continuous event.

Proportional Timing Interval

A Proportional Timing Interval is a non-standard method to indicate durations of time in a Transcript. DOTE supports a special Unicode symbol to indicate 0.1 seconds: . This symbol indicates the passing of 0.1 seconds, eg. ◘◘◘◘◘ = 0.5 seconds. It is especially useful in the Mondadaian system for marking Timing Interval tiers in the Head position in a neighbourhood instead of the more conventional non-proportional 'pause' indications, eg. (0.1).


In qualitative research, the visual layout of transcripts is semantically important, especially for Overlaps and Subtiers. Built in to DOTE is a sophisticated parser that tracks vertical alignment within and across Neighbourhoods in both Jeffersonian and Mondadaian systems. When something goes out of alignment, DOTE can indicate this and suggest how to automatically realign everything in a neighbourhood.


Recamming is a term that developed in computer gaming to describe using the virtual cameras in a game to create a novel video narrative based on the game assets. In DOTE, we support recamming video recordings while transcribing and presenting transcripts using Video-cues.

See Zoom, Pan, Smooth Transition and Jump Cut. See also Machinima.

Regular Expression

A Regular Expression (regex) is a sophisticated system of wildcards and search operators that can structure a search in a Unicode text, such as a Transcript.

Rich Text Format (RTF)

The Rich Text Format (RTF) is a common, lightweight data structure to represent simple text formatting beyond plain text. It is readable by most word processors.

Script-based Transcription System

A Script-based Transcription System denotes a set of conventions and ways of writing that assume that speech (and action) can be written in chunks (neighbourhoods) of dialogue, much like a play or film script. The assumption in DOTE is that the script is read from left-to-right and from top-to-bottom. An alternative is a score-based transcription system that assumes that all speech and action occurs simultaneously in one long, continuous score, comprising subtiers, that notionally continues to infinity, much like a musical score. This alternative is partially implemented by DOTE in relation to Subtier Types within a Neighbourhood. In future releases, an optional, dedicated micro score-based tool will be smartly interchangeable with the script-based system.

Smooth Transition

Video-cues can initiate a transition between one view of the video and another that is displayed in the Viewport of a Video Panel. A Smooth Transition smoothly and linearly transforms between an initial view of the video and a target view; otherwise there is a sudden transition.

See also Zoom, Pan and Jump Cut.

Spatial Audio

Spatial Audio refers to recordings of sound that represent and can be used to reproduce the experience of sound that is spatialised around the hearer to match closely how it would be perceived if one was present in the event recorded at the location of the microphone. Stereo recordings (2.0) are not really Spatial Audio since they only reproduce the source of sounds along one axis (left/right).

See Ambisonic.


Every line in the Transcript that has a speaker, including translation and interlinear gloss Subtier Types, requires a unique Speaker-id in the initial column. Also, every action subtier requires a unique Speaker-id in order to determine the participant/actant that does the action in question. The Speaker-id can be of a reasonable length from 1 character to 20 characters. The Speaker-id should only contain letters and numbers (alphanumeric); it should avoid special symbols and punctuation.

Subtier Type

In a modification to the Mondadaian system, subtiers are structured into Types and given names. The named Subtier Types are assigned to specific speakers or participants, and a unique symbol is assigned to each. There are four basic Subtier Types: .translation, @gloss, /action, #category.


DOTE can generate and export Subtitles derived from the Transcript in the Editor. These subtitles are in a file in SRT format and can be overlaid on the video by most external media players during playback.


In order to anchor the Transcript text to the media from which it is derived, Sync-codes can be created that tie the timecode on the Timeline to a specific line in the Transcript.

See also CS-mode.

Synchronised Media

DOTE assumes that in a specific Project, all imported Media are already synchronised to the same start point and end point in time in relation to the original recorded event. This must be undertaken externally in a video editor prior to importing.

Technical Comment

A Technical Comment is a comment on the structure of the Transcript itself. It is specific to DOTE, but is found in some programming languages as well. The onset of the comment is marked by //, and everything after that on the same line is not a part of the Transcript. Thus, a transcriber can mark up the Transcript line-by-line with brief meta-messages about the process of editing the Transcript or interesting phenomena. Consecutive Technical Comments at the beginning of the Transcript are treated as a simple form of meta-data. Technical Comments can be hidden when exporting the Transcript to RTF or Subtitles.

Cf. Comment.


A Tier in a Transcript is a line in a Neighbourhood that is dedicated to representing one specific phenomena or event, such as the speech, eye gaze or hand movements of one speaker.

See Subtier Type.

Timecode (or Timestamp)

A Timecode is a temporal reference to a specific moment in a playable media file, starting at 0:00:00.0 [hour:min:sec.tenth_of_sec].

See Sync-code and Video-cue that both anchor to Timecodes.


The Timeline is a linear graphical representation of time passed in a Project.

See Waveform and Timecode.

Timing interval

Intervals of time can be marked explicitly in a transcription system. In the Jeffersonian system, this is commonly represented as a pause indicated by a time duration in single parentheses, eg. (1.5) = 1.5 seconds. In the multimodal Mondadaian system, the same representation of pauses is used, not only in a speaker tier but also in a dedicated tier in the (HEAD) that contains one or more pause indications interspersed by Alignment Symbols. The latter is primarily used to represent the timing of non-verbal actions. In DOTE, this is called a primary Timing Interval tier, since is dedicated to timing intervals.

See also Proportional Timing Interval.


A Transcript is a specific textual object in DOTE that is created and edited in the Editor.

Transcript Heuristics

DOTE uses a sophisticated parser to interpret the structure of a Transcript. On that basis, descriptive Errors and Warnings can be given to help the transcriber conform their Transcript to the standard Conventions as well as to the special formatting required by DOTE. Moreover, DOTE makes heuristic suggestions to Realign Neighbourhoods.


Unicode is a global standard for representing languages, symbols and emojis.

User Interface (UI)

The User Interface is the visual (and aural) presentation of the computer application to the user.

Version Control

Version Control is a broad term encompassing all computer systems that track changes in a set of digital documents or media files.

See Git, Autobackup and Checkpoint.


A Video-cue is a unique point on the Timeline that indicates that a change in the Viewport of a Video Panel is to be performed. This function automates the presentation of media during Playback in a more cinematic fashion. A limited set of virtual Recam camera movements are supported.

See also Zoom, Pan, Smooth Transition and Jump Cut.

Video Panel

A Video Panel is a unique panel in the DOTE UI dedicated to displaying a selected video media file according to the user's wishes. Both 2D and 360-degree video can be displayed in a Video Panel. Moreover, the video can be zoomed and panned, and transitions can be sudden or smooth when using Video-cues.

See also Zoom, Pan, Smooth Transition and Jump Cut.


The Viewport is the rectangular portion of the video that is currently visible in the frame of the Video Panel.

Warnings and Errors

DOTE can flag Warnings and Errors in the use of DOTE, as well as in the Editor.


In a Timeline, a waveform representing the amplitude over time of a selected audio track can be displayed. When imported into a Project using Media Manager, the waveform is generated by DOTE by sampling the audio track to convert amplitude into a graphical representation.


Video-cues can initiate a transition between one view of the video and another that is displayed in the Viewport of a Video Panel. A Zoom transition smoothly and linearly tracks between an initial view of the video and a target view by zooming in or out. This is only true if the same media is selected and Smooth Transition is selected.

See also Pan and Jump Cut.