Skip to main content Skip to navigation
📊 Data Collection & Graphing

Data Calculation and Summarization

Turning raw session tallies into rate, mean duration, percentage, and count, with worked math and when each summary measure actually fits.

Topic 6 of 8

Data Calculation and Summarization

Summary: Counting and timing during a session gives you raw data. Summarizing turns that raw data into one number your BCBA can graph and read at a glance: a rate, a mean duration, a percentage, or a count. This page walks through the four you’ll use most, with the actual arithmetic an RBT does between clients, plus when each one is the honest choice and when it quietly lies. The math is easy. Picking the right summary is the part the exam cares about.

You spend the session tallying clicks and timing episodes. Good. But nobody graphs a column of tally marks. At some point that raw data has to become a single number per session, and which number you report changes the story the graph tells. Report a count when you should have reported a rate and you can make a behavior look like it’s exploding when it’s flat. So this isn’t busywork. Summarizing is where measurement either stays honest or goes sideways.

Most of the time your data sheet or your clinic’s app does the arithmetic for you. You still have to understand it. You need to catch a number that looks wrong, explain a graph to a parent, and answer the exam questions that hand you raw data and ask for the summary. So we’ll do the math by hand.

The four summary measures you’ll actually use

There are really four you reach for over and over: count, rate, mean duration, and percentage. Each one compresses a pile of raw observations into a single value. The trick is that each one assumes something about your data, and when that assumption breaks, the summary misleads.

Here’s the shape of each before we dig in:

SummaryWhat it answersThe math
CountHow many times did it happenJust the total tally
RateHow often per unit of timeCount ÷ time
Mean durationHow long does an episode typically runTotal duration ÷ number of episodes
PercentageWhat proportion met the criterion(Part ÷ whole) × 100

Work through them one at a time.

Count: the raw total

Count is the simplest summary, and it barely counts as a summary at all. You tally every occurrence of the behavior during the observation and the total is your number. Twelve mands, twelve. Three instances of aggression, three.

Count is honest only when your observation windows are the same length every time. If Monday’s session ran 30 minutes and Tuesday’s ran 60, a count of 12 on Monday and 12 on Tuesday does not mean the behavior held steady. It means the behavior actually dropped, because the same number happened in twice the time. The count hides that completely.

So reach for a plain count when:

  • The behavior has a clear start and stop you can tell apart
  • Every observation period is the same length
  • You want the rawest, most direct number

The moment your session lengths vary, stop reporting count and convert to rate. That conversion is the next measure.

On the exam: A bare count is only comparable across sessions of equal duration. If a question gives you different session lengths and asks how to compare them fairly, the answer is rate, not count. They are testing whether you know a raw total can deceive when time isn’t held constant.

Rate: count per unit of time

Rate takes your count and divides it by how long you observed. The number that comes out carries its own context, like “0.4 per minute,” so it stays fair even when your windows are different lengths.

Rate = count ÷ observation time

Worked example, the classic one:

  • 12 responses in 30 minutes
  • 12 ÷ 30 = 0.4 per minute

So you’d report 0.4 responses per minute. A few more so the arithmetic sticks:

  • 15 instances in 30 minutes = 15 ÷ 30 = 0.5 per minute
  • 45 words read in 5 minutes = 45 ÷ 5 = 9 words per minute
  • 24 instances in 2 hours = 24 ÷ 2 = 12 per hour

Two rules that keep rate clean. First, pick one time unit and stay with it. Per minute and per hour are both fine, but mixing them inside one data set turns your graph into nonsense. Second, always keep the raw count and the duration behind the rate, not just the quotient. If you log “0.4 per minute” with nothing else, nobody, including future you, can ever check the math.

Rate is also the right call whenever speed is the point, not just whether the behavior happened. Two learners each get 20 math facts correct. One finished in 2 minutes (10 per minute), the other took 10 minutes (2 per minute). The count calls them identical. The rate shows one is fluent and the other is accurate but slow. (Frequency and rate get the full treatment on their own page; here the point is that rate is the summary you compute from a count plus a time.)

Mean duration: average length of an episode

When the behavior is something you time rather than count, your raw data is a stack of episode lengths. Mean duration boils that stack down to one number: how long a typical episode runs.

Mean duration = total duration of all episodes ÷ number of episodes

Say a learner had three tantrums in a session, lasting 4 minutes, 8 minutes, and 6 minutes.

  • Total duration = 4 + 8 + 6 = 18 minutes
  • Number of episodes = 3
  • Mean duration = 18 ÷ 3 = 6 minutes per episode

So you’d report a mean tantrum duration of 6 minutes. This is the summary the BCBA wants when the clinical question is “how long does it last when it happens.” A behavior-reduction program built around shortening episodes lives on this number, and you watch it fall session over session.

Don’t confuse mean duration with total duration. Total duration is the whole 18 minutes, which answers “how much of the session was spent tantruming.” Mean duration is the 6, which answers “how long was a typical one.” Same raw timings, two different summaries, two different questions.

Common mistake: Dividing total duration by the session length instead of by the number of episodes. That gives you a proportion of the session, not an average episode. If you want the average length of one episode, the denominator is the number of episodes, full stop. Mixing those two up is one of the most common math slips supervisors catch on data sheets.

Percentage: a part out of a whole

Percentage is the workhorse of skill-acquisition data. Percent correct, percent of trials independent, percent of intervals with the behavior. Whenever you want to express one number as a proportion of another, you compute a percentage.

Percentage = (part ÷ whole) × 100

The “part” is the number of times the thing happened, and the “whole” is the number of chances it had to happen. Worked examples:

  • 8 correct responses out of 10 trials = (8 ÷ 10) × 100 = 80% correct
  • 6 independent responses out of 15 prompts = (6 ÷ 15) × 100 = 40% independent
  • Behavior scored in 9 of 20 intervals = (9 ÷ 20) × 100 = 45% of intervals

The whole point of percentage is that it self-corrects for an uneven denominator. If you ran 10 trials one day and 20 the next, raw “8 correct” versus “12 correct” isn’t comparable. But 80% versus 60% is, instantly. That’s why mastery criteria are almost always written as a percentage across consecutive sessions: it stays fair no matter how many trials you got through.

One caution. A percentage divorced from its denominator can fool you. “100% correct” sounds great until you learn it was 1 out of 1. A single trial that happened to go right is not mastery, it’s a coin flip. Always keep the raw numbers next to the percentage so a tiny denominator can’t masquerade as strong data.

On the exam: Percentage of intervals comes from interval recording (partial interval, whole interval, momentary time sampling), and the denominator is the number of intervals, not the number of behaviors. A question might say a behavior occurred in 9 of 20 intervals and ask for the summary. That’s 45% of intervals. Don’t try to turn it into a count or a rate. Interval data summarizes as a percentage of intervals.

A full session, start to finish

Let me walk you through one real session so you see all of this in one place.

You’re running a 40-minute session with a 7-year-old, Daniel. Two things on the program. First, a DTT block teaching him to tact common objects: you run 12 trials and record whether each response is correct. Second, you’re tracking calling out (talking without raising his hand), which the team wants to bring down, and you time each episode of a related behavior, leaving the chair, because the BCBA wants its duration.

By the end of the session your raw data looks like this:

  • Tact trials: correct, correct, incorrect, correct, correct, correct, incorrect, correct, correct, correct, incorrect, correct (9 correct out of 12)
  • Calling out: tallied 8 times across the 40 minutes
  • Out-of-seat episodes: three of them, lasting 2 minutes, 5 minutes, and 2 minutes

Now you summarize each one with the right measure:

  • Tact accuracy is a percentage, because it’s correct-out-of-opportunities: (9 ÷ 12) × 100 = 75% correct.
  • Calling out is a rate, because what matters is how often it happens and you’ll want it comparable to sessions of other lengths: 8 ÷ 40 = 0.2 per minute.
  • Out-of-seat is a mean duration, because the clinical question is how long each episode runs: (2 + 5 + 2) ÷ 3 = 9 ÷ 3 = 3 minutes per episode.

Three behaviors, three different summaries, all from one session. Notice nobody summarized everything the same way. The measure followed the question each program was asking. That’s the whole skill.

Choosing the right summary

This is the part the exam keeps circling back to, so make it automatic. Match the summary to what the program is trying to know:

  • The clinical question is “how often” and sessions are equal length → count
  • The clinical question is “how often” and sessions vary, or speed matters → rate
  • The clinical question is “how long does an episode last” → mean duration
  • The clinical question is “how much of the session was spent on it” → total duration (or duration as a percentage of the session)
  • The clinical question is “what proportion was correct, independent, or scored” → percentage

A quick gut check before you write a number down: ask what would make this comparable to last week’s session. If session lengths differ, you need rate or a percentage, not a raw count. If you’re averaging timed episodes, you need mean duration with the episode count as your denominator. The summary should survive a change in how long you happened to observe.

Where the math goes wrong

Most summarization errors aren’t fancy. They’re small and they’re predictable:

  • Wrong denominator. Dividing duration by session length when you wanted average episode length, or dividing correct trials by the number of behaviors instead of the number of trials. The denominator is half the formula. Get it wrong and the answer is garbage that still looks plausible.
  • Mixed units. Minutes in one cell, hours in the next, and now your rates don’t compare and your graph is fiction.
  • Dropping the raw numbers. Logging only the summary. A 100% that was 1 of 1, or a rate with no count or time behind it, can’t be checked or trusted.
  • Mental math between clients. Division done in your head while you’re packing up is where slips happen. If the app computes it, let it. If you compute it, write down the inputs so the work is visible.
  • Summarizing the wrong dimension. Reporting a count for a continuous behavior, or a duration for something you can only sensibly count. The summary has to match how the behavior was measured in the first place.

When a summary looks off, go back to the raw data before you report it. A rate of 40 per minute or a tantrum averaging 90 minutes is usually a math error or a units error, not a real finding. Trust the raw tallies and timings, recompute, and only then log the number.

The short version

  • Raw data has to become one number per session before it can be graphed. That number is your summary.
  • Count is the raw total. It’s only fair across sessions of equal length.
  • Rate is count ÷ time. Use it when sessions vary in length or when speed is the goal. 12 responses in 30 minutes = 0.4 per minute.
  • Mean duration is total duration ÷ number of episodes. Use it for “how long does an episode run.” Watch the denominator.
  • Percentage is (part ÷ whole) × 100. Use it for correct-out-of-opportunities and for percent of intervals. Keep the raw numbers so a tiny denominator can’t fool you.
  • Pick the summary by the question the program is asking, not out of habit.
  • Always keep the inputs behind every summary so the math can be checked.