· David Cruz · ABA Best Practices · 15 min read
Mastery Criteria in ABA - Stop Guessing, Start Knowing
You've been tracking data for weeks. The graph looks promising. But has the learner actually hit 80% across 3 consecutive sessions? Here's how to stop eyeballing mastery and let your data answer definitively.

Key Takeaways
Mastery criteria turn “I think they got it” into “the data confirms it.” Define the metric, threshold, and consecutive periods upfront. Then collect data as usual - TallyFlex evaluates progress automatically and tells you the moment criteria are met. No more counting backwards through data sheets.
You’ve been running discrete trials for three weeks. The graph trends upward. Your gut says the learner has it. But when your BCBA asks “Did they hit 80% across 3 consecutive sessions?” - you’re not entirely sure.
So you flip through your data sheets. You count backwards. Session 12… 82%. Session 11… 76%. Wait, does that break the streak? Start over. Session 10…
This isn’t a clinical skill problem. It’s a tracking problem.
The Real Cost of Eyeballing Mastery
When mastery decisions rely on manual counting, two things happen:
You move on too early. The graph looked good enough, so you advanced to the next target. Two weeks later, the skill falls apart during maintenance probes. The learner didn’t actually reach criteria - the data just looked close.
You move on too late. You keep running trials on a skill the learner mastered three sessions ago. That’s session time spent on targets that should already be in maintenance, while new goals wait. The learner isn’t harmed by the extra practice - but the pace of acquisition slows when you could be teaching new skills.
Both cost time. And both happen because checking mastery criteria by hand is tedious enough that it doesn’t happen consistently.
Set It Once, Know for Certain
TallyFlex handles mastery evaluation automatically. Set the criteria when you create the objective. Collect data like you normally would. TallyFlex evaluates every session against your criteria and tracks the streak in real time.
Here’s what you configure:
- Metric - What you’re measuring (percent correct, frequency, duration, or latency)
- Threshold - The target value (80%, fewer than 3 occurrences, under 10 seconds)
- Period - How to group the data (per session or daily)
- Consecutive periods - How many in a row must meet the threshold
Example: “80% correct across 3 consecutive sessions” means TallyFlex checks each session’s percent correct. When three sessions in a row hit 80% or above, TallyFlex flags the objective as ready - then you review the data and confirm mastery.
No counting backwards. No second-guessing.
For programs that need a full progression - say, building from a baseline of 5 spontaneous requests to a goal of 10 - TallyFlex can generate the entire sequence of STOs at once. Set the baseline, goal, and how you want to split the steps, and it creates all the objectives with correct mastery criteria ready to activate in order.
Choosing the Right Period Type
The period type changes how data gets grouped for evaluation. Pick the one that matches your program:
Session works best for structured teaching programs and skill acquisition targets where you run multiple sessions per day. Each session is evaluated independently. If a learner has a rough first session but nails the afternoon session, each is evaluated independently - the morning is a miss and the afternoon starts a new streak.
Daily combines all sessions from a day into a single data point. This works well for general behavior tracking where you want an overall picture of the day rather than session-by-session granularity. Daily criteria can also promote natural generalization - the skill has to hold across different times, conditions, and sometimes different staff.
For School Teams - STOs Map to IEP Goals
If you’re tracking IEP goals, this is where Short-Term Objectives (STOs) become your best friend. Each IEP goal typically includes mastery criteria already written into the objective:
“Given a verbal prompt, the student will identify sight words with 80% accuracy across 3 consecutive data collection days.”
That objective maps directly to an STO in TallyFlex:
- Metric: Percent correct
- Threshold: 80%
- Period: Daily
- Consecutive periods: 3
When progress report time comes, you don’t need to dig through binders. The mastery progress widget shows exactly where each student stands - which periods passed, the current streak, and whether criteria have been met.
For teams with multiple staff collecting data on the same student, everyone’s data feeds into the same evaluation. The paraprofessional’s morning session and the SPED teacher’s afternoon session both count toward the same daily aggregate.
Prompt Fading Made Measurable
For skill acquisition programs using prompt hierarchies, STOs create a structured path from prompted to independent responses.
Set up a sequence of objectives that progressively tighten the success criteria:
- 80% success at Verbal prompt level for 5 consecutive days
- 80% success at Gestural for 5 consecutive days
- 80% success at Independent for 5 consecutive days
When the learner masters each level, activate the next STO and update the Goal Support Level (the maximum prompt level that still counts as success). The prompt fading plan becomes data-driven instead of subjective.
What Happens When Mastery Is Confirmed
When the consecutive period requirement is met, TallyFlex surfaces a “Ready to mark as mastered” prompt. You review the data, confirm the decision, and the STO is marked as mastered with the exact session or day recorded. That’s your documentation - timestamped, objective, defensible.
From there, move the skill to maintenance, activate the next STO, and keep going. The data trail is already built.
Concrete Examples Per Criterion Type
The criterion type you pick should match the program goal, not your habit. Here are worked examples for each.
Percent-independent criterion is for skills where prompts are part of teaching but the goal is independent responding. For a learner working on requesting items using a picture exchange, you might write the criterion as “80% independent responses across 3 consecutive sessions.” Each session, TallyFlex calculates the percent of trials that were Independent (no prompt). When three sessions in a row hit 80%+ independent, the criterion is met. Sessions that include Verbal, Gestural, Model, or Physical prompts in the response don’t count toward the percent independent. This is the right criterion for goals where the prompt level matters - not just whether the response was correct.
All-independent criterion is the strictest version. The learner must complete every trial independently across the consecutive period. For a task analysis with 8 steps, this means all 8 steps Independent across the consecutive sessions you specify. It’s appropriate for criteria where partial mastery isn’t acceptable - safety skills, for instance, where 80% independent on crossing the street isn’t a passing grade. Reach for this criterion sparingly. It often takes longer to meet than percent-independent, and the data has to be very dense to show whether the learner is actually getting there.
Max-prompt-level criterion is the right tool when you’re systematically fading prompts. The learner has to perform at or above a specified prompt level across the consecutive period. If your prompt hierarchy is Independent, Gestural, Verbal, Model, Partial Physical, Full Physical, you might set the criterion at “Verbal or better, 80% accuracy, 5 consecutive days.” Trials at Verbal, Gestural, or Independent count as success. Trials requiring Model, Partial Physical, or Full Physical count as failure. When the learner hits the criterion, you tighten the prompt level for the next STO. This is the criterion that drives a written prompt-fading plan.
Frequency criterion is for behaviors where you want a count, not a percent. Reduction targets always ship paired with the matching acquisition target per BACB Code 2.14: “Fewer than 3 occurrences of aggression per session AND 5 or more independent break-requests per session, both 5 consecutive sessions” pairs the decrease in aggression with the FCT-taught replacement (requesting a break) on the same criterion clock. Pure acquisition criteria stand alone: “10 or more peer initiations per recess, 3 consecutive days.” The threshold can be a maximum (for behaviors you want to reduce, paired with the FCT replacement minimum) or a minimum (for behaviors you want to increase).
Duration criterion measures time. “Sustained attention for at least 8 minutes, 3 consecutive sessions” is a duration acquisition goal. “Vocal disruption (defined as crying, screaming, or vocalizing above conversational volume for greater than 30 seconds, with onset marked at the first behavior and offset marked when the learner is calm and responsive for 30 consecutive seconds) under 2 minutes per episode, 5 consecutive sessions” is a duration reduction goal. The operational definition matters as much as the threshold - a duration measure on a vague topographical category like “tantrum” without a definition will not be reliable across observers.
Latency criterion measures the time between a stimulus and the response. “Responds to name within 5 seconds, 80% of trials, 3 consecutive sessions” combines latency with percent. The threshold is the latency value. The metric is percent of trials within that latency.
A Worked Example - Three Weeks of Real Data
Let’s walk through what mastery actually looks like in the data. The criterion: “80% correct across 3 consecutive sessions” on a sight-word identification target. The learner is in a school resource room, two sessions per day, four days per week.
Week 1:
- Monday AM: 60% (15 trials, 9 correct)
- Monday PM: 67% (15 trials, 10 correct)
- Tuesday AM: 73% (15 trials, 11 correct)
- Tuesday PM: 67% (15 trials, 10 correct)
- Wednesday AM: 80% (15 trials, 12 correct) - streak: 1
- Wednesday PM: 73% (15 trials, 11 correct) - streak broken
- Thursday AM: 80% (15 trials, 12 correct) - streak: 1
- Thursday PM: 87% (15 trials, 13 correct) - streak: 2
The week ended with a 2-session streak. The criterion isn’t met yet. Resist the urge to mark mastery on a feeling - the data isn’t there.
Week 2:
- Monday AM: 73% (streak broken back to 0)
- Monday PM: 80% - streak: 1
- Tuesday AM: 87% - streak: 2
- Tuesday PM: 80% - streak: 3 ✓
Tuesday PM, the criterion is met. TallyFlex surfaces the “Ready to mark as mastered” prompt. You review the three sessions: 80%, 87%, 80%. You confirm the mastery decision. The STO is marked as mastered with the timestamp of Tuesday PM. The next STO is ready to activate.
Week 3 - Maintenance check. You move the skill to maintenance. Once a week, you probe the skill to make sure it holds. Three weeks after mastery, you run a probe session: 87%. The skill has stayed in the learner’s repertoire.
This is what mastery is supposed to look like - a clear data trail, not a feeling, not a “I think we’re there.” The criterion was met objectively, the decision is documented, and you can move on.
When the Learner Regresses After Mastery
Not every mastered skill stays mastered. Maintenance probes exist for exactly this reason. When a regression shows up, you have decisions to make.
Single-probe regression. The learner mastered a skill three weeks ago. The maintenance probe comes back at 60%. Don’t immediately re-open the STO. Run another probe within the week. Single-session dips happen for a hundred reasons - the learner is tired, the SD was different, the reinforcer wasn’t strong enough. If the second probe is back at criterion, you had a fluke. If the second probe is also low, the regression is real.
Real regression. Two consecutive probes below criterion means the skill is unstable. Re-activate the STO. Reset the streak counter and run teaching trials again. The data trail will show that the skill was mastered, regressed, and then re-mastered - which is honest documentation. Don’t try to hide the regression by leaving the original mastery date intact. Insurance audits and IEP reviews care about the actual data more than they care about a clean line.
Pattern regression across multiple skills. If multiple mastered skills regress at the same time, the issue isn’t the skills - it’s something environmental. New teacher, medication change, schedule disruption, new staff member, big life event. The regression data is signal. Look at what changed and address that before re-running teaching trials.
When to Extend the Criteria, and When to Back Off
Extend the criteria when the bar is too low. If a learner hits 80% mastery and then performs at 75% in maintenance probes, the original criterion was probably too easy. The next time you write an STO for a similar skill, go to 85% or 90% - or to a more independent prompt level. The point of mastery criteria is to predict that the skill will hold. If it doesn’t hold, the criterion was wrong.
Back off the criteria when the bar is too high. If a learner has been on the same STO for ten weeks and the data is plateauing below criterion, something is wrong. The first thing to check is whether the criterion is reasonable for the learner. A criterion of 100% across 5 consecutive sessions might be appropriate for a fluent reader and inappropriate for a learner with significant cognitive impairment. Talk to the BCBA, look at the data, and consider whether 80% across 3 sessions would be a more honest goal that still represents real learning.
Extending consecutive periods after generalization concerns. If a learner masters a skill in the structured teaching environment but doesn’t generalize, consider switching the period type from session to daily, or extending the consecutive periods to require it to hold across more days. A criterion of “80% across 5 consecutive days” is harder to fake with one good session. It forces the data to show consistency.
Mastery Criteria and Phase Changes
Mastery criteria don’t operate in isolation. They live inside a Programs hierarchy of Program → Phase → Target, and the phase you’re in changes how mastery is interpreted.
Baseline phase. Mastery criteria during baseline don’t trigger an “advance” - they trigger an “auto-promote to teaching.” If the learner already meets the criterion during baseline, the target was probably already in the repertoire and the program should advance to a harder target. TallyFlex’s auto-promotion of baseline trackers handles this automatically when the criterion is met during the baseline phase.
Teaching phase. This is where most mastery decisions live. The criterion is met, the learner has demonstrated the skill, the STO is marked as mastered, and you advance to the next phase or the next STO.
Maintenance phase. After mastery is confirmed, reduce trial density and shift to periodic probes - for example, “performs at 80% during a maintenance probe, 3 probes over 4 weeks.” Maintenance is about whether the skill holds over time. The trial density is much lower (one session per week instead of multiple per day), and the criterion structure should reflect that.
Generalization is not a phase, it’s a strategy embedded throughout teaching. Stokes & Baer (1977) argued in “An implicit technology of generalization” that generalization should be programmed across the entire teaching arc - not bolted on at the end as a separate phase. The risk of treating generalization as a phase is “train and hope” - train in a structured setting, then hope generalization happens. The literature consistently shows that approach fails. Use multiple exemplars (different stimuli, different staff, different settings) starting in baseline. Vary the SD across trials. Rotate which RBT runs the session. When you write a generalization criterion (“skill demonstrated across 2 staff and 2 settings”), evaluate it from data you’ve been collecting all along, not from a separate generalization phase tacked on after teaching. If the skill only holds when the same RBT runs the session, the data is showing dependence on a specific person, not generalized mastery. Cross-staff data collection - which TallyFlex Teams supports - is the cleanest way to detect this.
Phase progression. When a phase’s terminal mastery criterion is met, TallyFlex can auto-promote to the next phase. This means the program flows on its own - baseline auto-promotes to teaching when criteria allow, and teaching auto-promotes to maintenance when mastery is confirmed. Generalization is not run as a separate phase; instead, generalization criteria are evaluated against the cross-stimuli, cross-staff, cross-setting data you collected throughout teaching. The whole sequence is configured upfront and runs without manual phase changes.
Common Questions
How many sessions of data do I need before I can evaluate a mastery criterion?
You need at least the consecutive period requirement. If your criterion is “80% across 3 consecutive sessions,” you need a minimum of 3 sessions. In practice, plan for at least 10-15 sessions of data so the trend is meaningful. A criterion that’s met on session 4 of teaching is probably evidence that the skill was already partly in the repertoire.
What if my IEP goal doesn’t list a consecutive period requirement?
Use a default of 3 consecutive sessions for percent-correct goals and 5 consecutive days for daily goals. Then write the consecutive period into the next IEP cycle. Most state IEP forms expect a consecutive period - if yours doesn’t, the BCBA or SPED case manager has been picking one mentally for years.
Can I change a mastery criterion mid-program?
Yes, but document it. If you tighten the criterion (say, from 80% to 90%), the previously-collected data may reset against the new threshold. If you loosen it, the data still applies. The point is that mastery criteria are professional judgment, not laws. Change them when the program evidence supports a change, and write down why.
What’s the difference between mastery criteria and a simple goal?
A goal is the outcome you’re aiming for. Mastery criteria are how you’ll know you got there. “The learner will identify community signs” is a goal. “The learner will identify 10 community signs with 80% accuracy across 3 consecutive sessions” is a goal with mastery criteria. The criteria are what makes the goal measurable - and what tells you when you can stop teaching it.
Do I need to set mastery criteria on every target?
Not necessarily. Some targets are observational (you’re tracking behavior to inform a future program). Some are maintenance-only (you don’t need a criterion because the goal is stability). Set mastery criteria on targets where there’s a decision to be made about advancing. If there’s no advancement decision, the criterion is overhead.
Start Tracking Mastery Automatically
Mastery criteria aren’t complicated. The hard part was always tracking them consistently. TallyFlex makes it automatic - set the criteria, collect your data, and know the moment a skill is truly mastered.
For the full setup walkthrough, see our Mastery Criteria & STOs documentation.


