10 Validity Argument

The Dynamic Learning Maps® (DLM®) Alternate Assessment System is based on the core belief that all students should have access to challenging, grade-level academic content. Therefore, DLM assessments provide students with the most significant cognitive disabilities the opportunity to demonstrate what they know and can do. This technical manual provides evidence evaluating the propositions and assumptions that undergird the assessment system as described in the DLM Theory of Action (Figure 10.1; for more information, see description in Chapter 1).

This chapter synthesizes the evidence provided in this technical manual and places it within a validity framework to assess the program’s overall success at producing results that reflect their intended meaning and can be used for their intended purposes. In addition, we discuss future research studies as part of ongoing and iterative processes of continuous improvement.

10.1 Validity Framework

Validation is the process of evaluating the evidence and theory presented in the overall validity argument. All aspects of the validity argument must be carefully evaluated (Lissitz, 2009; Sireci, 2009). The purpose of the assessment and support for intended use of results is critical to the overall validity argument because it is indicative of the model from which the assessment was originally designed (Mislevy, 2009). It follows, therefore, that the evidence collected throughout the entire development, administration, and reporting processes should point to a clear and persuasive link between the assessment purpose and the intended uses and interpretations of the results and overall project outcomes. Cohesion between what can be observed (e.g., student responses to assessment tasks) and what must be inferred (e.g., student achievement in the subject) must inform the validity and interpretative arguments (Kane, 2006). It is equally important that the dimensions and organization of the overall validity argument include not only the content sampled and procedural bases of the assessment, but also evidence for the underlying construct being assessed, any assessment elements that are irrelevant to the construct, and the relative importance of the consequences of the resulting scores (Linn, 2009; Messick, 1989).

The DLM program adopted an argument-based approach to assessment validation. The validity argument process began by specifying, with the DLM Governance Board, the intended, supported, and unsupported uses of DLM assessment results, consistent with expectations described in the Standards for Educational and Psychological Testing (Standards hereafter), which are the professional standards broadly used to inform development and evaluation of educational assessments (American Educational Research Association et al., 2014). We followed this with a three-tiered approach to validation, which included specification of 1) a Theory of Action with defining statements The term “statement” is used here to mean a claim within the overall validity argument. The term “claim” is reserved in this technical manual for use specific to content claims (see Chapter 3 of this manual). in the validity argument that must be in place to achieve the goals of the system and the chain of reasoning between those statements, 2) an interpretive argument defining propositions that must be examined to evaluate each component of the Theory of Action, and 3) validity studies to evaluate each proposition in the interpretive argument.

We also drew on the organization of evidence according to the Standards (American Educational Research Association et al., 2014). The Standards define five types of evidence for validity arguments, including evidence based on content, response process, internal consistency, relation to other variables, and consequences. Evidence collected for propositions in the DLM validity argument is classified according to type.

10.2 Intended Uses

In 2013, the DLM Governance Board determined uses of DLM assessment results. Table 10.1 shows intended, supported, and unsupported uses of DLM assessment results.

Table 10.1: Uses of Dynamic Learning Maps Assessment Results
Type of scoring Intended uses Supported uses (optional—states’ decision to use and responsibility to provide evidence) Uses that are NOT supported or intended
Mastery determinations obtained throughout the year from optional instructionally embedded assessments
  1. Instructional planning, monitoring, and adjustment
  1. One source of information for evaluations of educator and principal effectiveness
  1. Determining disability eligibility
  2. Placement
  3. Retention
  4. Inclusion in state accountability model to evaluate school and district performance (unless supported by state-conducted research)
  5. Direct comparisons with results on general education assessments
Summative reporting
  1. Reporting achievement within the instructed content aligned to grade-level content standards, to a variety of audiences, including educators and parents
  2. Inclusion in state accountability model to evaluate school and district performance
  3. Planning instructional priorities and program improvement for following school year
  1. One source of information for evaluations of educator and principal effectiveness
  2. Graduation (in states that use AA-AAS as exit exam)
  1. Determining disability eligibility
  2. Placement
  3. Retention
  4. Graduation (sole source of evidence)
  5. Direct comparisons with results on general education assessments
Note. AA-AAS = Alternate Assessment based on Alternate Achievement Standards.

10.3 Theory of Action

The DLM Theory of Action includes statements about the assessment program spanning from its design, delivery, and scoring, to long-term intended outcomes. While validity arguments typically center on validation of scores for intended uses, the DLM Theory of Action also specifies intended long-term outcomes associated with use of the system (consistent with the definition given by the National Council on Measurement in Education, 2018). The Theory of Action is depicted in Figure 10.1. Letters are assigned to each statement. Connections between statements, referred to as inputs in this chapter, are designated by numbers. Dashed lines represent connections that are present when the optional instructionally embedded assessments are utilized. For more information on the DLM Theory of Action, see Chapter 1 of this manual.

Figure 10.1: Dynamic Learning Maps Theory of Action Diagram

The DLM Theory of Action, which shows each statement inputs.

10.4 Propositions and Validity Evidence

The three-tiered approach to validation relies on propositions underlying the statements in the Theory of Action and evidence evaluating those statements. For each statement in the Theory of Action, we summarize the underlying propositions and associated procedural and empirical evidence informing evaluation of the proposition, indicate the type of evidence (e.g., content, response process, consequences), and identify the chapters in this manual containing a full description of the evidence. We also describe how connections in the Theory of Action affect the extent to which statements are supported and defensible. We retain the organizational structure of the Theory of Action when describing the statements, propositions, and evidence, organizing them according to elements of design, delivery, scoring, and long-term outcomes.

10.4.1 Design

The design components of the DLM assessment system include the learning maps, Essential Elements (EEs), Kite® Suite, assessments, training, and professional development. Specifically, we evaluate the following six statements related to the design of DLM assessments.

  1. Map nodes and pathways accurately describe the development of knowledge and skills.
  2. Alternate content standards, the Essential Elements, provide grade-level access to college and career readiness standards.
  3. The Kite system used to deliver DLM assessments is designed to maximize accessibility.
  4. Instructionally relevant testlets are designed to allow students to demonstrate their knowledge, skills, and understandings relative to academic expectations.
  5. Training strengthens educator knowledge and skills for assessing.
  6. Professional development strengthens educator knowledge and skills for instructing and assessing students with significant cognitive disabilities.

10.4.1.1 A: Map Nodes and Pathways Accurately Describe the Development of Knowledge and Skills

Learning maps are one of the critical content structures in the DLM assessment system. Together with EEs, they define the knowledge, skills, and understandings measured by the assessments, as shown in Figure 10.2. Map nodes span early foundational skills to college and career readiness skills, with connections between nodes defining how skills develop over time. The nodes and connections provide all students with an access point for learning rigorous grade-level academic content.

Figure 10.2: Theory of Action Inputs for Learning Maps and Essential Elements

Small section of the DLM Theory of Action showing a bidirectional relationship between Statements A and B.

Three propositions describe the accuracy of map nodes and pathways in the DLM assessment system. Table 10.2 summarizes the propositions and evidence evaluating the accuracy of nodes and pathways in the learning maps. Because EEs serve as an input into map design (as shown in Figure 10.2), some of the evidence in the table relies on evidence collected for EE development. Rather than restate the evidence, the overlap is noted with a dagger (†) where applicable. We applied this principle throughout the chapter when describing evidence derived from inputs in the Theory of Action.

Table 10.2: Propositions and Evidence for Learning Maps
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Nodes are specified at the appropriate level of granularity and are distinct from other nodes Description of map development process, mini-maps, Essential Element Concept Maps (EECMs) External review ratings Content 2
Nodes for linkage levels are correctly prioritized and are adequately spaced within breadth of map Description of map development process, mini-maps, EECMs Content 2, 3
Nodes are sequenced according to acquisition order, given population variability Description of map development process, mini-maps, EECMs External review ratings, modeling analyses, alignment study (linkage level vertical articulation) Content 2, 3
Relies on evidence from Theory of Action input statements, as shown in Figure 10.2.

For the map to accurately describe how students develop knowledge, skills, and understandings, nodes must be specified at the right grain size (i.e., neither too large nor too small) and be distinct from other nodes. Documentation describes how nodes were developed, including extensive literature syntheses and rounds of internal review to evaluate node size and distinction from other nodes. Essential Element Concept Maps (EECMs) and mini-maps available in Educator Portal depict the final selection of nodes assessed at each linkage level. Content and special education experts provided ratings on whether nodes were of correct size and distinct from other nodes. In the limited instances in which nodes were rated as being at the incorrect size or indistinct from other nodes, map developers reviewed the feedback and adjusted the map structure. Together, these sources of evidence support the proposition that nodes are of the appropriate grain size and are distinct from one another.

To demonstrate that the map is accurately specified, the nodes measured by each linkage level should be correctly prioritized from among the full breadth of nodes in the map and be adequately spaced throughout the map structure (i.e., neither bunched together too closely nor spread too far apart). Documentation describes the process that map developers used to select linkage level nodes, which included rounds of internal review examining node selection and spacing within the map. The EECMs and mini-maps in Educator Portal depict the final selection of nodes assessed at each linkage level. Because of the size and complexity of the full map, external review did not evaluate the selection of nodes prioritized for linkage levels. While additional evidence could be collected for this proposition, the current evidence provides moderate support for nodes being correctly prioritized and spaced through the map.

Finally, for the map to be accurately specified, nodes must be correctly sequenced. Documentation describes how map connections were developed, including extensive literature syntheses and rounds of internal review evaluating the correct ordering of nodes within the map structure. The EECMs used for test development and the mini-maps in Educator Portal depict the finalized node connections. Content and special education experts rated the ordering of node connections. In the limited instances of identified node connection reversals, the map developers adjusted the learning map connections. Additional indirect evidence of correct map sequencing comes from analyses evaluating linkage level ordering for each EE (linkage levels measure one or more nodes in the map). Modeling analyses indicated more than 80% of EEs showed evidence of correct hierarchical ordering (W. J. Thompson & Nash, 2019). The Target and Successor levels were most flagged for potential incorrect sequence. These levels also had the smallest available sample sizes. External alignment study ratings of the ordered linkage levels demonstrated correct ordering for most EEs. In ELA, panelists rated 76 of 95 EEs (80%) as showing a clear progression from Precursor to Successor levels. In mathematics, panelists rated 64 of 66 EEs (97%) as showing a clear progression. Collectively, the evidence supports the proposition that nodes are correctly sequenced.

10.4.1.2 B: Alternate Content Standards, the Essential Elements, Provide Grade-Level Access to College and Career Readiness Standards

The EEs are rigorous academic content standards measured by DLM assessments. Together with the learning map, the EEs make up the content structures for the DLM assessment system, as shown in Figure 10.2. Because students with significant cognitive disabilities have historically had limited opportunities to learn academic content (e.g., Karvonen, Flowers, et al., 2013), the DLM program prioritized adoption of rigorous academic content standards that provide adequate breadth and complexity of content.

Four propositions describe the EEs measured by DLM assessments. Table 10.3 summarizes the propositions and evidence for EEs providing grade-level access to rigorous academic expectations.

Table 10.3: Propositions and Evidence for Essential Elements
Proposition Procedural evidence Empirical evidence Type Chapter(s)
The grain size and description of Essential Elements (EEs) are sufficiently well-defined to communicate a clear understanding of the targeted knowledge, skills, and understandings Description of EE development, including state and content expert review Content 2
EEs capture what students should know and be able to do at each grade level to be prepared for postsecondary opportunities, including college, career, and citizenship Description of EE development and review, vertical articulation evidence Alignment study (EE to Common Core State Standards), postsecondary opportunities ratings Content 2, 7
The collection of EEs in each grade sufficiently sample the domain Description of development and review of EEs and blueprint Content 2
EEs have appropriately specified linkage levels measuring map nodes Helplet videos, Essential Element Concept Maps, description of development and review of EEs and map External review ratings, alignment study (EE to target node) Content 2
Relies on evidence from Theory of Action input statements, as shown in Figure 10.2.

For EEs to provide students with significant cognitive disabilities access to grade-level college and career readiness standards, the EEs must be well defined and must clearly articulate the targeted knowledge, skills, and understandings. Documentation describes the EE development process, which included instructions for test developers and educators to follow when specifying the targeted knowledge, skills, and understandings of each EE. The DLM Governance Board and other content experts reviewed the draft EEs and provided feedback on their clarity and grain size before adoption. While this proposition was not evaluated with empirical evidence such as rating data, the rigorous development process and rounds of internal and external review provide support for this proposition.

The EEs should also build across grades to prepare students for postsecondary opportunities. Documentation describes the development process, including procedures for establishing and reviewing the vertical articulation of EE content across grades and alignment with college and career expectations. The DLM Governance Board and other content experts reviewed the EEs for sequencing of content across grades. Empirical evidence includes external ratings on the alignment of EEs to the Common Core State Standards in each grade. Additional indirect evidence comes from a postsecondary opportunities study that evaluated the alignment of skills needed for postsecondary opportunities with the At Target performance level descriptors (PLDs) (Karvonen et al., 2022). Although panelists rated skills against grade- and content-specific PLDs, the PLDs are based on the types of skills students tend to master across EEs. Collectively, the evidence indicates that EEs capture what students should know and be able to do at each grade to be prepared for postsecondary opportunities.

For EEs to provide students with grade-level access to rigorous standards, the EEs must sufficiently sample the domain. Evidence includes a description of the process for developing EEs to sample the breadth of Common Core State Standards. The DLM Governance Board and other content experts reviewed the EEs for adequate sampling breadth. Documentation also describes the original blueprint development process in which blueprints were designed to sample content across the full range of claims and conceptual areas. Additional documentation describes the blueprint revision, which preserved the breadth of coverage across claims and conceptual areas. The DLM Governance Board and Technical Advisory Committee (TAC) reviewed both the EEs and the assessment blueprints for breadth of coverage before adoption. While this proposition was not evaluated with empirical evidence, the procedural evidence demonstrates that EEs sufficiently sample the domain.

Finally, to provide students with access to grade-level standards, EEs must have appropriately specified linkage levels, measuring nodes in the learning map. Documentation describes the process for simultaneously developing and revising the EEs and the learning maps to reflect each other and how linkage levels were developed. Short helplet videos describe how testlets measure map nodes. The EECMs demonstrate the map nodes measured by the five linkage levels of each EE. Project staff conducted rounds of review to consider alignment between map nodes and the EE linkage levels. During an alignment study, all EEs were rated as aligned to target-level nodes, with most EEs rated as having a “near” link to the target-level node. The combination of evidence across EE and map development and review provides strong support for the alignment of EEs to linkage levels measuring map nodes.

10.4.1.3 C: The Kite System Used to Deliver DLM Assessments is Designed to Maximize Accessibility

For the DLM program to accurately measure student knowledge, skills, and understandings, the Kite Suite must be accessible to DLM students and their educators. There are no direct inputs into the Kite Suite design statement in the Theory of Action (see Figure 10.1).

Four propositions describe the design and accessibility of the Kite Suite used to deliver DLM assessments. Table 10.4 summarizes the propositions and evidence evaluating the accessible design of the Kite Suite.

Table 10.4: Propositions and Evidence for the Kite Suite
Proposition Procedural evidence Empirical evidence Type Chapter(s)
System design is consistent with accessibility guidelines (e.g., accessible portable item protocol standards and web content accessibility guidelines) and current best practices System documentation Content 3, 4
Supports needed by students are available within and outside of the assessment system Accessibility Manual, First Contact and Personal Needs and Preferences helplet video Personal Needs and Preferences Profile data, test administrator surveys, focus groups Content 4
Item types support the range of students in presentation and response Description of item types Cognitive labs, test administrator survey Response Process, Content 3, 4
Kite Suite is accessible and usable by educators Test Administration Manual, Educator Portal helplet video Test administrator survey, focus groups 4

For the Kite Suite to maximize accessibility, system design should be consistent with accessibility guidelines, accessible portable item protocol (APIP) standards, web content accessibility guidelines (WCAG), and current best practices. System documentation provides evidence that the Kite Suite meets the highest standards for system design.

For the Kite Suite to be fully accessible to all students, all supports needed by students within and outside of the assessment system should be available. Documentation describes the types and range of accessibility supports available. Test administrators record the accessibility supports used by the student in the Personal Needs and Preferences Profile (PNP). In 2021–2022, all available PNP supports were used, ranging from less than 1% of students (n = 109) using uncontracted braille to 79% (n = 75,846) using human read aloud. Test administrator surveys provide additional evidence that accessibility supports needed by the student are available within and outside the assessment system. Test administrator survey responses from 2017 to 2022 indicate that 92%–95% of students had access to all supports necessary to participate in the assessment. In spring 2020, DLM staff conducted focus groups with some of the educators who disagreed that students had all necessary supports. The findings revealed that most of the accessibility and system challenges that educators reported stemmed, in part, from uncertainty about allowable practices for assessment administration, rather than gaps in system functionality. This uncertainty included practices used during instruction that are not allowed during assessment (e.g., hand-over-hand support). Together, the evidence demonstrates that the supports needed by students are available within and outside the system.

System accessibility also means that item types support the full range of DLM students in presentation and response. Multiple item types, as well as both computer-delivered and educator-administered items, allow students to demonstrate their knowledge, skills, and understandings. Cognitive labs showed that students can respond to a range of item types. The 2022 test administrator survey findings show that educators agreed or strongly agreed that 89% of students responded to items to the best of their knowledge, skills, and understandings and that 86% of students were able to respond regardless of disability, behavior, or health concerns. These percentages are consistent with responses from the 2018 and 2019 test administrator surveys. Overall, the evidence suggests that the different item types support the full range of students in presentation and response.

Finally, the Kite Suite must also be accessible to educators. Documentation describes how educators use Educator Portal, including the Instruction and Assessment Planner, to assign optional instructionally embedded testlets, manage student data, and access Testlet Information Pages. The documentation also describes how educators use Student Portal to administer the testlet to students. Short helplet videos describe how to use Educator Portal and Student Portal. Spring 2019 test administrator survey data indicate that most respondents found both Educator Portal and Student Portal easy to use. Educators rated their overall experience with Educator Portal as good or excellent in 85% of cases, and their overall experience with Kite Student Portal as good or excellent in 91% of cases. In addition, when asked about their experiences with Student Portal, educators found it to be either somewhat easy or very easy to enter the site in 95% of cases, somewhat or very easy to navigate within a testlet in 96% of cases, somewhat or very easy to record a response in 97% of cases, somewhat or very easy to submit a completed testlet in 97% of cases, and somewhat or very easy to administer testlets on various devices in 93% of cases. Focus group feedback also indicated that educators found the Kite Suite easy to use. Overall, there is strong evidence that the Kite Suite is accessible and usable for educators.

10.4.1.4 D: Instructionally Relevant Testlets are Designed to Allow Students to Demonstrate Their Knowledge, Skills, and Understandings Relative to Academic Expectations

To meet the needs of students with significant cognitive disabilities, DLM assessments are comprised of short, instructionally relevant testlets. Testlet development relies on inputs from the learning maps (Statement A), EEs (Statement B), and Kite Suite (Statement C), shown as arrows in the excerpt of the Theory of Action in Figure 10.3.

Figure 10.3: Theory of Action Inputs for Test Design

Small section of the DLM Theory of Action showing Statement D and its inputs.

Seven propositions describe the design and development of DLM testlets and items, which are summarized in Table 10.5.

Table 10.5: Propositions and Evidence for Test Design
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Items within testlets are aligned to linkage level Item writing handbook, fungible linkage level parameters, ELA and mathematics testlet helplet videos, Essential Element Concept Maps (EECMs) Alignment data, item analyses Content, Internal Structure 3, 4
Testlets are designed to be accessible to students Item writer qualifications, item writing handbook, Accessibility Manual, EECMs Internal and external review, test administrator survey, focus groups Content 3, 4
Testlets are designed to be engaging and instructionally relevant Item writer qualifications, item writing handbook, Accessibility Manual Internal and external review, test administrator survey, focus groups Content 3, 4
Testlets are written at appropriate cognitive complexity for the linkage level Item writing handbook, EECMs Cognitive process dimension ratings, text complexity, external review of passages, internal and external review Content 3
Items are free of extraneous content Item writing handbook, EECMs Internal and external review Content 3
Items do not contain content that is biased against or insensitive to subgroups of the population Item writing handbook Internal and external review, differential item functioning analyses Content, Internal Structure 3, 4
Items are designed to elicit consistent response patterns across different administration formats Item writing handbook, Accessibility Manual, Test Administration Manual, EECMs Internal and external review, item statistics Content, Internal Structure 3
Relies on evidence from Theory of Action input statements, as shown in Figure 10.3.

For testlet design to allow students to demonstrate their knowledge, skills, and understandings, items must appropriately measure the linkage level. The EECMs for each EE support item writing by specifying the linkage levels and nodes assessed. The item writing handbook provides guidance to item writers for how to write items measuring the linkage level. Items are written to be fungible, meaning that all items written to an EE and linkage level should measure the linkage level equally well. In modeling analyses, all items measuring the linkage levels have equal parameters. Helplet videos describing ELA and mathematics testlets illustrate these principles. External alignment ratings show that 96% of ELA items and 100% of mathematics items were rated as meeting content centrality criteria. Content centrality is the degree of fidelity between the content of the learning map nodes and the assessment items. See Chapter 3 of this manual for details. Additionally, item analyses show that the p-value for 98% of ELA items and 99% of mathematics items fall within two standard deviations of the mean for the linkage level. This demonstrates that items consistently measure the linkage level, supporting the intended fungibility of items. Together, the evidence shows that items are aligned to the linkage level.

Testlets must also be accessible to students. Items are written by educators familiar with students with significant cognitive disabilities. The item writing handbook specifies how to write accessible testlet content, and the Accessibility Manual (Dynamic Learning Maps Consortium, 2021a) describes how item writing guidelines are based on the Universal Design for Learning. The EECMs specify accessibility considerations for the content being measured. Items are internally and externally reviewed for accessibility. Test administrator survey responses from 2022 indicate that 86% of students were able to respond regardless of disability, behavior, or health concerns, which is consistent with findings from 2018 and 2019. As previously described for Statement C, educator focus groups did not reveal gaps in system functionality that prevent accessibility. Together, the evidence suggests testlet content is accessible to students.

Testlets should be designed to be engaging and instructionally relevant. Items are written by educators familiar with instruction for students with significant cognitive disabilities. The item writing handbook describes how to write engaging and instructionally relevant content, and the Accessibility Manual (Dynamic Learning Maps Consortium, 2021a) describes the engagement activities that are designed to motivate students, provide a context for the items, and activate prior knowledge. Internal and external review evaluates testlet content. On the 2022 test administrator survey, educators indicated that testlets aligned with instruction for approximately 71% of students in ELA and approximately 62% of students in mathematics. Focus group feedback reveals some variability in educator perception that content is engaging and instructionally relevant. Together, there is moderate evidence that testlets are engaging and instructionally relevant, with opportunity for additional data collection.

Testlet content must also be written at the appropriate cognitive complexity for the linkage level. The EECMs specify the cognitive complexity of the content, while the item writing handbook provides guidance for writing items that differentiate the linkage levels. All items are rated for cognitive process dimension and evaluated for text complexity. Rounds of internal and external review by content experts provide additional evidence that items and testlets are written at the appropriate level. Together, the evidence indicates that testlet content is generally written at an appropriate cognitive complexity for the linkage level.

Items should also be free of extraneous content. The EECMs define the node(s) and linkage levels being measured to focus item writing on necessary content. The item writing handbook provides further guidance for focusing item writing to reduce cognitive load for students with significant cognitive disabilities (e.g., using concise language at an appropriate reading level). Rounds of internal review by ATLAS staff and external review by content panels further evaluate item content. Together, evidence supports the proposition that items are free of extraneous content.

Items should similarly be free of biased or sensitive content. The item writing handbook provides guidance for writing testlets, including consideration for diverse subgroups of the population and topics that may be sensitive or inaccessible for students with significant cognitive disabilities (e.g., references to mobility, tasks requiring sight). Internal and external review includes bias and sensitivity review, incorporating the evaluation of items, testlets, and ELA texts. Differential item functioning analyses conducted across years demonstrate that more than 99% of items have no or negligible evidence of differential functioning across student subgroups. Together, evidence shows items to be free of sensitive content and bias.

Finally, items should elicit consistent responses regardless of administration format. The EECMs document the content measured by each linkage level to promote consistency within and across testlets. The item writing handbook further describes procedures to promote consistency so that item writers develop items and testlets according to fungibility principles (i.e., all items measuring an EE and linkage level have equal parameters). Documentation describes administration procedures for promoting consistency regardless of form type (e.g., braille) or supports used (e.g., switch use). Internal review and external review evaluate item content. Item statistics demonstrate that items consistently measure the linkage level. While there is opportunity for additional data collection, the collected evidence indicates that items elicit consistent responses regardless of administration format.

10.4.1.5 E: Training Strengthens Educator Knowledge and Skills for Assessing

Required training prepares educators to administer DLM assessments with fidelity. Because the format of DLM assessments differs from other assessments (e.g., short testlets, computer-based rather than portfolio), required training covers critical concepts educators need to know to administer DLM assessments as intended. There are no direct inputs for the design of required training in the Theory of Action.

Three propositions pertain to the required test administrator training for DLM assessments. Table 10.6 summarizes the propositions and evidence evaluating required training for educators.

Table 10.6: Propositions and Evidence for Required Training
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Required training is designed to strengthen educator knowledge and skills for assessing Documentation of scope of training and process of developing, posttest-passing requirements Content 9
Training prepares educators to administer DLM assessments Documentation of scope of training, Test Administration Manual and Accessibility Manual as supplemental materials Test administrator survey Consequences 4, 9
Required training is completed by all test administrators System requirement, description of self-directed and facilitated training options, state-provided guidance on training, Kite training status extracts, state and local monitoring Data extracts Consequences 9

Required training should strengthen educator knowledge and skills for assessing students. Documentation describes the scope and development process for identifying the set of critical knowledge and skills that test administrators need to administer DLM assessments. To demonstrate mastery of critical knowledge and skills, all test administrators must pass the module posttests with at least 80% accuracy. Since 2015, project staff have updated required training modules to reflect the most critical knowledge needed for administration while maintaining a short time commitment needed to complete. Together, the evidence supports the statement that required training is designed to strengthen educator knowledge for assessing students.

Training should prepare educators to administer assessments. Annual training covers topics educators need to know to administer DLM assessments. Test administrators also have access to the Test Administration Manual (DLM Consortium, 2021) and the Accessibility Manual (Dynamic Learning Maps Consortium, 2021a) to prepare for test administration. In 2022, 89% of survey test administrators agreed or strongly agreed that test administrator training prepared them for test administrator responsibilities. These results are consistent with prior findings from 2017 to 2021, which ranged from 87% to 91%. These sources of evidence show that training prepares educators to administer assessments, with some opportunity for continuous improvement.

Test administrators must complete required training. The Kite Suite prevents test administration until required training is completed. Self-directed and facilitated training formats are available to meet diverse needs. States and districts determine the format of training and provide test administrators with additional guidance and resources as needed. State, district, and building staff can monitor training completion using the Training Status Extract generated by the Kite Suite. Data extracts show that all test administrators completed required training.

10.4.1.6 F: Professional Development Strengthens Educator Knowledge and Skills for Instructing Students With Significant Cognitive Disabilities

Professional development modules support educators in providing high-quality, rigorous academic instruction. Because students with significant cognitive disabilities have historically had limited opportunities to learn academic content (e.g., Karvonen, Flowers, et al., 2013), the DLM assessment program prioritized development of professional development resources and their inclusion in the Theory of Action. Professional development modules are optionally available in all states, with some states including a subset of modules in their required educator training. Professional development relies on one input (EEs, Statement B), as shown in Figure 10.4.

Figure 10.4: Theory of Action Inputs for Professional Development

Small section of the DLM Theory of Action showing Statement F and its inputs.

Three propositions are related to the design and structure of professional development for educators who instruct students who take DLM assessments. Table 10.7 summarizes the evidence evaluating professional development.

Table 10.7: Propositions and Evidence for Professional Development
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Professional development (PD) modules cover topics relevant to instruction Description of approach to PD, list of PD modules, Essential Elements PD rating data Consequences 9
Educators access PD modules Description of process for accessing PD modules PD prevalence data Consequences 9
Educators implement the practices on which they have been trained Description of PD content and its application to practice PD rating data Consequences 9
Relies on evidence from Theory of Action input statements, as shown in Figure 10.4.

For professional development to strengthen educator knowledge and skills for instructing students with significant cognitive disabilities, it must cover topics relevant to instruction. The DLM professional development system includes 51 modules that address instruction in ELA and mathematics and support educators in creating Individual Education Programs (IEPs) that are aligned with the DLM EEs. Empirical evidence of the instructional relevance of the professional development modules comes from educator evaluations collected after completing the modules. During the 2021–2022 year, approximately 83% of educators completing the surveys either agreed or strongly agreed that the modules addressed content that is important for professionals working with students with significant cognitive disabilities, and 82% agreed that the modules presented new ideas to improve their work. This combined evidence shows that professional development covers topics relevant to instruction.

For professional development to strengthen knowledge, educators should access the modules. To facilitate access, the modules are available on the DLM professional development website in two formats, self-directed and facilitated. There are also bundles of related content. A subset of states requires that specific professional development modules be completed within the required training interface. In 2021–2022, nine states required at least one professional development module as part of their required test administrator training. Across all states administering DLM assessments in 2022, 21,888 modules were completed by 4,431 new test administrators and 3,033 returning test administrators as part of required training. Additionally, a total of 5,987 modules were completed by 1,426 individuals through the DLM professional development website. Since the first professional development module was launched in the fall of 2012, a total of 77,061 modules have been completed via the DLM professional development website. Overall, these findings show somewhat limited adoption of the professional development modules.

For professional development to strengthen educator knowledge, educators should implement the practices covered in the modules. Documentation (e.g., facilitator guides) describes application of professional development content to instructional practice. In 2021–2022, 83% of educators indicated that they intended to apply what they learned in the professional development modules to their professional practice. There is opportunity for additional evidence to be collected for this proposition.

10.4.2 Delivery

According to the chain of reasoning in the Theory of Action, the collective set of design statements informs delivery of DLM assessments. Theory of Action statements regarding delivery of DLM assessments pertain to administration and implementation of the DLM System. This portion of the Theory of Action encompasses statements about educators’ instruction, assessment administration, student interactions with the system, and the combined set of administered testlets. The specific statements related to the delivery of DLM assessments are:

  1. Educators provide instruction aligned with content standards and at an appropriate level of challenge.
  2. Educators administer assessments with fidelity.
  3. Students interact with the system to show their knowledge, skills, and understandings.
  4. The combination of administered assessments measure knowledge and skills at the appropriate breadth and complexity.

10.4.2.1 G: Educators Provide Instruction Aligned With Content Standards and at an Appropriate Level of Challenge

Students with significant cognitive disabilities have historically had limited opportunity to learn the full breadth of grade-level academic content (e.g., Karvonen et al., 2009, 2011; Karvonen, Wakeman, et al., 2013). To address this, the DLM program prioritizes educators providing instruction aligned with content standards at an appropriate level of challenge. Several inputs in the Theory of Action inform instructional practice, including the availability of rigorous grade-level expectations as reflected in the map and EEs (Statements A and B), the content of instructionally relevant testlets (Statement D), and professional development that strengthens educator knowledge and skills for instructing the DLM population of students (Statement F), as shown in Figure 10.5.

Figure 10.5: Theory of Action Inputs for Aligned Instruction

Small section of the DLM Theory of Action showing Statement G and its inputs.

Two propositions are related to the alignment of instruction with DLM content standards, as shown in Table 10.8.

Table 10.8: Propositions and Evidence for Aligned Instruction
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Educators provide students with the opportunity to learn content aligned with the assessed grade-level Essential Elements State and local guidance, blueprints Test administrator survey, First Contact survey Content 4
Educators provide instruction at an appropriate level of challenge, based on knowledge of the student Instructional resources, required training, mini-maps Focus groups, test administrator survey Content 4
Relies on evidence from Theory of Action input statements, as shown in Figure 10.5.

Educators should provide students with the opportunity to learn content aligned with the EEs. States and districts provide additional guidance to support educators in providing high-quality instruction aligned with grade-level EEs. Blueprints specify the breadth of EE coverage across claims and conceptual areas. The test administrator survey collects information on the approximate number of hours during the school year that educators spent on instruction for each of the DLM conceptual areas. For both ELA and mathematics, 51% of the test administrators provided 11 or more hours of instruction per conceptual area to their students in ELA, and 42% did so in mathematics. The results are consistent with prior survey data, showing variability in student opportunity to learn the full breadth of academic content. One factor that affects opportunity to learn is student engagement during instruction. Based on First Contact survey responses, 62% of the students who take DLM assessments demonstrate fleeting attention to educator-directed instruction, and 56% demonstrate fleeting attention to computer-directed instruction. A small percentage of students demonstrate little or no attention to educator-directed (14%) or computer-directed (13%) instruction. These results collectively indicate some variability in the amount of instruction and in the level of student engagement in instruction for the full breadth of academic content, with opportunity for continuous improvement.

Educators should also provide instruction at an appropriate level of challenge based on knowledge of the student. Instructional resources on the DLM website are intended to be used to inform instruction prior to administration of DLM assessments. This includes directions for accessing mini-maps in Educator Portal to identify the map nodes measured at each linkage level to inform instruction. Spring 2022 test administrator survey responses indicated that most or all ELA testlets matched instruction for approximately 71% of students, and most or all mathematics testlets matched instruction for approximately 62% of students. These percentages have annually increased since 2017. While there is opportunity for additional data collection and continuous improvement in instructional practice, current data provides some evidence supporting the proposition that educators provide instruction at an appropriate level of challenge based on knowledge of the student.

10.4.2.2 H: Educators Administer Assessments With Fidelity

Educators should administer DLM assessments as intended. Administering assessments with fidelity relies on two inputs in the DLM Theory of Action, the Kite Suite design (Statement C) and required training content (Statement E), and aligned instruction, as shown in Figure 10.6.

Figure 10.6: Theory of Action Inputs for Administration Fidelity

Small section of the DLM Theory of Action showing Statement H and its inputs.

Four propositions are related to fidelity of assessment administration, as shown in Table 10.9.

Table 10.9: Propositions and Evidence for Administration Fidelity
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Educators are trained to administer testlets with fidelity (i.e., as intended) Test Administration Manual (TAM), Accessibility Manual, Testlet Information Pages, required training Test administrator survey, test administration observations (TAOs) Response Process, Content 4, 9
Educators enter accurate information about administration supports TAM, Personal Needs and Preferences Profile (PNP) helplet video, required training, PNP system functionality PNP selections Response Process, Content 4, 9
Educators allow students to engage with the system as independently as they are able TAM PNP, TAOs, First Contact responses Response Process, Content 4, 9
Educators enter student responses with fidelity (where applicable) TAM PNP, TAOs, writing interrater agreement Response Process, Content 4, 9
Relies on evidence from Theory of Action input statements, as shown in Figure 10.6.

For educators to administer assessments with fidelity, they must be appropriately trained, which relies in part on evidence from the required training input (Statement E) in the Theory of Action (Figure 10.6). The Test Administration Manual (DLM Consortium, 2021), Accessibility Manual (Dynamic Learning Maps Consortium, 2021a), and Testlet Information Pages provide test administrators with information needed to administer assessments as intended. On the spring 2022 test administrator survey, educators agreed or strongly agreed that the required test administrator training prepared them for their responsibilities as test administrators in 89% of cases and agreed or strongly agreed that they were confident administering DLM testlets in 96% of cases. These results are consistent with responses received since 2017. From 2019 to 2022, educators agreed or strongly agreed that they used manuals and/or the DLM Educator Resource Page materials in 89%–93% of cases. Indirect evidence comes from test administration observation data; observations indicated educators administered assessments as intended, suggesting that training effectively prepared them to administer assessments. Together, the evidence shows that educators are trained to administer assessments with fidelity.

Fidelity of assessment administration also includes educators entering accurate information about administration supports used by the student. In addition to being covered in required training, documentation and a helplet video describe expectations for educators completing the student’s PNP to indicate supports used during assessment administration. During the 2021–2022 academic year, 88% of students had at least one support indicated on their PNP. The most selected supports in 2021–2022 were human read aloud, test administrator enters responses for student, and spoken audio. However, data are not available to evaluate the accuracy of educator PNP selections (i.e., whether supports were actually used on the assessment) or their consistency with supports used during instruction. While there is opportunity for additional data collection, the evidence provides general support for educators entering accurate information about supports used by the student.

Administering DLM assessments with fidelity requires that educators allow students to engage with the system as independently as they are able. The Test Administration Manual (DLM Consortium, 2021) provides guidance for administering computer-delivered assessments with flexibility to meet individual student needs. In 2022, 54% of student PNP records indicated the test administrator entered responses on the student’s behalf. Data are not available to evaluate whether students in fact used or needed this support during administration. On the First Contact survey, educators indicated that 40% of students could independently use a computer with or without assistive technology. Test administration observations collect additional data on test administrator behaviors. In 2022, 60% of observed behaviors were classified as supporting (e.g., clarifying directions), 38% neutral (e.g., asking the student to clarify their response), and 2% nonsupporting actions (e.g., physically directing the student to a response). While there is opportunity for additional data collection, evidence generally supports the proposition that educators allow students to engage with the system as independently as they are able.

Finally, educators should enter student responses into the system with fidelity. While all testlets that are educator-administered require test administrators to enter students’ responses, test administrators may also enter responses for students during computer-delivered testlets (i.e., if PNP support indicates the student needs the test administrator to enter responses). Resources provide educators with guidance for entering student assessment responses, including allowed and not-allowed practices. Test administration observations collected in 2021–2022 showed that test administrators entered responses for the student in approximately 18% of test observations (21 of 115), which is lower than the percentage of students whose PNP indicated the test administrator enters responses (54%). In approximately 86% of those cases (18 of 21 observations), observers indicated that the entered response matched the student’s response. The remaining three observers either responded that they could not tell if the entered response matched the student’s response, or the observer left the item blank. Additional fidelity evidence comes from evaluation of the consistency of scoring student writing samples, which are educator-administered and require the test administrator to enter responses in the system. Across years, interrater reliability of student writing samples scoring has demonstrated high levels of agreement, indicating educators enter student responses with fidelity. While evidence generally shows educators enter responses with fidelity, there is opportunity for additional data collection to further evaluate this proposition.

10.4.2.3 I: Students Interact With the System to Show Their Knowledge, Skills, and Understandings

Students should be able to interact with the system to show their knowledge, skills, and understandings. As shown in Figure 10.7, there are several relevant inputs in the Theory of Action that affect students’ ability to show their knowledge and skills. These include system design (Statement C), testlet design (Statement D), instruction (Statement G), and administration fidelity (Statement H), as well as indirect inputs such as EEs (Statement B) and required training (Statement E).

Figure 10.7: Theory of Action Inputs for Student Interaction With the System

Small section of the Theory of Action showing Statement I and its inputs.

Three propositions correspond to students interacting with the system to show their knowledge, skills, and understandings on DLM assessments. Table 10.10 summarizes the propositions and evidence evaluating student interaction with the system.

Table 10.10: Propositions and Evidence for Student Interaction With the System
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Students are able to respond to tasks, regardless of sensory, mobility, health, communication, or behavioral constraints Test Administration Manual (TAM), Accessibility Manual, administration fidelity, system and testlet design Test administrator surveys, Personal Needs and Preferences Profile/alternate form completion rates, test administration observations (TAOs), focus groups Response Process, Content 3, 4
Student responses to items reflect their knowledge, skills, and understandings System and testlet design, administration fidelity, received aligned instruction Cognitive labs, test administrator surveys Response Process, Content 3, 4
Students are able to interact with the system as intended TAM, Accessibility Manual, system design Cognitive labs, TAOs, test administrator surveys Response Process, Content 4
Relies on evidence from Theory of Action input statements, as shown in Figure 10.7.

To show their knowledge, skills, and understandings in the DLM System, all students should be able to respond to tasks regardless of sensory, mobility, health, communication, or behavioral constraints. Documentation describes procedures for selecting accessibility supports and administering the assessment in ways that align with individual student needs while also maintaining fidelity to intended practice. System and testlet accessibility design support students being able to show their knowledge, skills, and understandings. On the 2022 DLM test administrator survey, educators agreed or strongly agreed that 86% of students were able to respond regardless of disability, behavior, or health concerns. This percentage is consistent with responses since 2017. In 2022, educators indicated that 92% of students had access to all supports necessary to participate. This percentage is consistent with data since 2017. Test administration observation data showed that in 2021–2022, in 41 of 42 cases, students were able to respond without encountering difficulty using accessibility supports. The PNP selection data provide further evidence that educators select supports to meet individual student sensory, mobility, and communication needs, including enabling various display settings, switch use, or administration of alternate forms (e.g., braille). In 2021–2022, alternate forms for students with blindness or low vision were selected for 2,068 students (2%), and uncontracted braille was selected for 109 students (0.1%). As described for Statement C, accessibility focus groups revealed that the challenges educators reported stemmed in part from uncertainty about allowable practices for assessment administration rather than gaps in system functionality. This evidence shows that students, in general, can respond to tasks regardless of sensory, mobility, health, communication, or behavioral constraints.

Student responses to assessment items should reflect their knowledge, skills, and understandings rather than construct-irrelevant factors. Construct-irrelevant variance is minimized through system and testlet design and through administration fidelity (demonstrated by connection with Statements C, D, and H in Figure 10.7). Responses should similarly reflect students’ knowledge, skills, and understandings after receiving rigorous aligned instruction (demonstrated by connection with Statement G). Cognitive labs provide evidence that items do not present barriers to the intended response process due to construct-irrelevant testlet features or item response demands. In addition, on the 2022 test administrator survey, 89% agreed that students responded to items to the best of their knowledge, skills, and understandings. This percentage is consistent with agreement rates in 2018 and 2019. Evidence generally indicates that student responses reflect their knowledge, skills, and understandings.

Finally, students should be able to interact with the system as intended. Documentation describes intended student interactions, including allowed and not-allowed practices. Confidence that students interact with the system as intended relies on system and testlet design and administration fidelity (demonstrated by connections with Statements C, D, and H in Figure 10.7). Cognitive labs, test administration observations, and test administrator survey data on Student Portal functionality provide evidence that students interact with assessments as intended. Furthermore, in 2019, more than 90% of educators indicated that Student Portal made it easy to navigate testlets, record responses, submit testlets, and administer testlets on a variety of devices (computer, iPad, etc.). Combined, this evidence supports the proposition that students are able to interact with the system as intended.

10.4.2.4 J: The Combination of Administered Assessments Measure Knowledge and Skills at the Appropriate Breadth and Complexity

The Theory of Action includes a statement that the combination of administered assessments measure knowledge and skills at the appropriate breadth and complexity. Administering testlets at the appropriate breadth and complexity relies on several direct inputs in the DLM Theory of Action, including testlet content being available for all EEs and linkage levels (Statement D), educators administering testlets with fidelity, and students interacting with the Kite Suite (Figure 10.8). Additionally, this statement includes indirect inputs like the map structure and EEs.

Figure 10.8: Theory of Action Inputs for Appropriate Combination of Testlets

Small section of the Theory of Action showing Statement J and its inputs.

Three propositions are related to the breadth and complexity of administered assessments, as summarized in Table 10.11.

Table 10.11: Propositions and Evidence for the Appropriate Combination of Testlets
Proposition Procedural evidence Empirical evidence Type Chapter(s)
First Contact survey correctly assigns students to appropriate complexity band Description of First Contact survey design and algorithm development, First Contact helplet video Pilot analyses, educator adjustment patterns Content, Response Process 4
Administered testlets are at the appropriate linkage level Administration fidelity, mini-maps Adaptive routing patterns, linkage level parameters and item statistics, educator focus groups Content 3, 4
Administered testlets cover the full blueprint Test Administration Manual, monitoring extracts, blueprints, administration fidelity Blueprint coverage extracts and analyses, Special Circumstance codes Content 4
Relies on evidence from Theory of Action input statements, as shown in Figure 10.8.

For the combination of administered assessments to measure knowledge and skills at the appropriate level of complexity, the First Contact survey should correctly assign students to a complexity band. Documentation and a helplet video describe expectations for educators accurately completing the First Contact survey. Documentation also details the algorithms that use a subset of First Contact items to assign students to complexity bands, which are used to assign students their first spring testlet in each subject. Evidence from the fixed-form pilot administration of DLM assessments demonstrated that complexity bands appropriately assigned students to linkage levels (Clark et al., 2014). Data from the 2022 spring assessment window indicate that 33% of ELA students and 41% of mathematics students did not adapt after their first testlet. In instances in which adaptations did occur, students adapted up and down at similar rates. The combination of evidence demonstrates some support that First Contact survey responses correctly assign students to complexity bands, with opportunity for additional research.

All administered testlets should be at the appropriate linkage level to allow students to demonstrate their knowledge, skills, and understandings at the appropriate level of complexity. This proposition assumes that testlets correctly measure linkage levels (Statement D in Figure 10.8). Documentation describes linkage level initialization and adaptive routing, whereby students are assigned their first testlet based on their complexity band, and subsequent testlets are assigned via adaptive routing based on performance on prior testlets. Routing data show some evidence of adaptation between testlets, which supports the proposition that assessments are at the appropriate linkage level. Linkage level modeling parameters (i.e., probability of masters and nonmasters providing correct responses, base rate of probability) and item statistics (i.e., p-values and standardized difference values) show that students perform as expected on assessed linkage levels, providing some evidence that administered testlets are at the appropriate level. However, educator feedback from focus groups indicates some variability in the perception that testlets are of appropriate difficulty. While evidence generally supports the proposition, there is opportunity for additional data collection.

For the combination of testlets to measure skills of appropriate breadth of content (i.e., adequate construct representation), administered assessments should cover the full blueprint. Documentation describes assignment procedures, including how frequently the system assigns subsequent testlets. Educators can monitor whether coverage requirements are met using Blueprint Coverage and Test Administration Monitoring extracts, which are available to educators and to building, district, and state users. Across all grades, 96% of students in ELA and 97% of students in mathematics were assessed on all of the EEs and met blueprint requirements. Special Circumstance codes entered in the system explain why some students do not meet coverage requirements (e.g., chronic absence). These results provide evidence that nearly all students are administered testlets to cover the full blueprint.

10.4.3 Scoring

According to the chain of reasoning demonstrated in the Theory of Action, the collective set of design and delivery statements inform scoring of DLM assessments. Scoring statements in the Theory of Action encompass students’ mastery determinations, overall achievement, and usability of results. The specific statements are:

  1. Mastery results indicate what students know and can do.
  2. Results indicate summative performance relative to alternate achievement standards.
  3. Results can be used for instructional decision-making.

10.4.3.1 K: Mastery Results Indicate What Students Know and Can Do

Because DLM assessments report results as the set of mastered skills across all EEs, mastery results should be accurate indications of students’ knowledge, skills, and understandings. According to the Theory of Action, the only direct input informing mastery results the combination of administered testlets being at the appropriate breadth and complexity (Statement J). Indirect connections in the Theory of Action also influence mastery reporting (e.g., map structure, EEs, aligned instruction, administration fidelity, and students’ interaction with the Kite Suite), as shown in Figure 10.9.

Figure 10.9: Theory of Action Inputs for Mastery Results

Small section of the Theory of Action showing Statement K and its inputs.

There are three propositions corresponding to mastery results accurately indicating what students know and can do. Table 10.12 summarizes the propositions and evidence evaluating mastery results.

Table 10.12: Propositions and Evidence for Mastery Results
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Mastery status reflects students’ knowledge, skills, and understandings Documentation of mastery results, scoring method, and model fit procedures, combination of testlets, student interaction with system, aligned instruction Model fit, model parameters Internal Structure 5
Linkage level mastery classifications are reliable Description of reliability method Reliability results Internal Structure 8
Mastery results are consistent with other measures of student knowledge, skills, and understandings Description of mastery results Analyses on relationship of mastery results to First Contact ratings Other Measures 4, 7
Relies on evidence from Theory of Action input statements, as shown in Figure 10.9.

For mastery results to accurately indicate what students know and can do, linkage level mastery statuses must reflect students’ knowledge, skills, and understandings. Accuracy of mastery results assumes students were assessed on the full breadth of content and at the appropriate complexity (Statement J) and that students were able to interact with the system to show their knowledge, skills, and understandings (Statement I). Accuracy also relies on earlier indirect inputs in the Theory of Action (e.g., received aligned instruction [Statement G], testlets appropriately measure linkage levels [Statement D]). Documentation describes the mastery results produced by the assessment, the diagnostic scoring method used to determine mastery, and the procedures for evaluating model fit. Overall, 98% of the estimated linkage level models showed acceptable levels of absolute model fit and/or classification accuracy. In instances of poor fit, items flagged for misfit are prioritized for retirement to improve the fit of the remaining items measuring the linkage level. At the parameter level, 93% of linkage levels have a conditional probability of masters providing a correct response greater than .6, 60% of linkage levels have a conditional probability of nonmasters providing a correct response less than .4, and 71% of linkage levels have a discrimination greater than .4. Together, the evidence indicates that mastery status reflects students’ knowledge, skills, and understandings, with opportunity for continued data collection.

Linkage level mastery classifications should also be reliable. In 2022, 731 linkage levels (90%) met the recommended .7 cutoff for fair classification consistency (Johnson & Sinharay, 2018), indicating linkage level classifications are generally consistent and have low measurement error. These findings were persistent across linkage levels, which suggest there is precision of measurement across the continuum of knowledge, skills, and understandings, as defined by the five linkage levels.

Finally, mastery results should be consistent with other measures of student knowledge, skills, and understandings. Documentation describes the grain size of the mastery results reported in the Learning Profile portion of student reports and, when students take optional instructionally embedded assessments, the mastery results displayed in the Instruction and Assessment Planner. Linkage level mastery had a moderately positive correlation with educator ratings of academic items on the First Contact survey. Together, the current evidence supports this proposition but is limited; there is opportunity for further study.

10.4.3.2 L: Results Indicate Summative Performance Relative to Alternate Achievement Standards

Mastery results are combined to summarize overall achievement in the subject, using four performance levels. Mastery results (Statement K) serve as a direct input into summative performance, as shown in Figure 10.10.

Figure 10.10: Theory of Action Inputs for Alternate Achievement Standards

Small section of the Theory of Action showing Statement L and its inputs.

Three propositions are related to the accuracy and reliability of summative performance relative to alternate achievement standards, as shown in Table 10.13.

Table 10.13: Propositions and Evidence for Alternate Achievement Standards
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Performance levels provide meaningful differentiation of student achievement Standard setting procedure, grade- and subject-specific performance level descriptors, accurate mastery results Standard setting survey data, performance distributions Internal Structure, Content 6
Performance level determinations are reliable Description of reliability method, accurate mastery results Reliability analyses Internal Structure, Content 8
Performance level results are useful for communicating summative achievement in the subject to a variety of audiences Intended uses documentation, Performance Profile development, General Research File, supplemental resources Interviews, focus groups, governance board feedback Consequences, Content 7
Relies on evidence from Theory of Action input statements, as shown in Figure 10.10.

For results to indicate summative performance relative to alternate achievement standards, the performance levels must meaningfully differentiate student achievement. Performance levels are calculated from student mastery of linkage levels and rely on accurate mastery results (Statement K). Documentation describes the profile-based DLM standard setting method (Clark et al., 2017) and procedures for establishing grade- and subject-specific PLDs. Across all cut points (N = 414), panelists indicated they were comfortable with the group-recommended cut points in 94% of cases. Documentation also describes the adjustments made to the standards and the cut points for the grade- and subject-specific PLDs in 2022 because of the blueprint revisions, which reduced the total number of linkage levels available to be mastered. Using the adjusted cut points for the grade- and subject-specific PLDs, annual results for each subject demonstrate that student achievement is distributed across the four performance levels. Evidence shows that performance levels meaningfully differentiate student achievement.

Performance level determinations should also be reliable. Performance level reliability depends on accurate mastery determinations. Documentation describes the method for calculating performance level reliability (W. J. Thompson et al., 2019). In 2022, reliability results were high (e.g., polychoric correlations ranging from .789 to .928). These results indicate the DLM scoring procedure of assigning and reporting performance levels based on total linkage levels mastered produced reliable performance level determinations.

Performance level results should also be useful for communicating summative achievement in the subject to a variety of audiences. Consistent with the intended uses of DLM assessments, student performance levels were developed with the intention of communicating summative achievement to a variety of audiences. The Performance Profile portion of individual student score reports indicates the student’s overall achievement in the subject, which is intended to communicate summative achievement to educators and parents. Aggregate reports, which are provided at the class, school, district, and state levels, also indicate students’ performance levels. State education agencies additionally receive the General Research File, which includes each student’s performance level. The General Research File can be input into state data warehouses and included in state accountability metrics. Supplemental resources support use of these files for various audiences, including parent, educator, district, and state interpretive guides and documentation. Interviews and focus groups with educators indicate the Performance Profile is useful for summarizing overall achievement and for communicating about overall achievement with parents (Clark et al., 2018; Karvonen et al., 2017). The governance board also provided feedback on the design of individual student score reports, aggregate reports, and the General Research File. Together, the evidence supports the proposition that performance level results are useful for communicating summative achievement in the subject to a variety of audiences.

10.4.3.3 M: Results Can Be Used for Instructional Decision-Making

DLM mastery results are intended to inform instructional decision-making. When educators use the optional instructionally embedded assessments, mastery results can inform instructional planning, monitoring, and adjustment. Mastery information provided on summative score reports can inform instructional planning in the subsequent academic year. The use of results for instructional decision-making is directly influenced by mastery results (Statement K). Indirect connections in the Theory of Action also influence the use of results (e.g., map structure, EEs, aligned instruction, administration fidelity, training and professional development, and students’ interaction with the Kite Suite, and the combination of administered teslets), as shown in Figure 10.11.

Figure 10.11: Theory of Action Inputs for Results Being Instructionally Useful

Small section of the Theory of Action showing Statement M and its inputs.

Three propositions are related to instructional use of results. Table 10.14 summarizes the propositions and evidence evaluating the relevance of results for instruction.

Table 10.14: Propositions and Evidence for Results Being Instructionally Useful
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Score reports are appropriately fine-grained Report development process, Instruction and Assessment Planner structure Interview data, educator cadre feedback Consequences, Content 4, 7
Score reports are instructionally relevant and provide useful information for educators Report development process, scoring and reporting ad hoc committee, accuracy of mastery results Interview data, test administrator survey Consequences, Content 7
Educators can use results to communicate with parents about instructional plans and goal setting Design of score reports, parent interpretive guide, Talking with Parents guide Interview data, test administrator survey Consequences, Content 7
Relies on evidence from Theory of Action input statements, as shown in Figure 10.11.

For results to be instructionally useful, they should provide educators with appropriately fine-grained (i.e., specific and actionable) information. Documentation describes development and evaluation of the Learning Profile portion of score reports, which summarizes student linkage level mastery for each EE. The development process included rounds of interviews with educators providing feedback on their structure and content (Clark et al., 2018; Karvonen et al., 2016; Karvonen et al., 2017). For educators that use the optional instructionally embedded assessments, the Instruction and Assessment Planner adopts the structure of the Learning Profile to report mastery information that educators can use to inform instruction. Its design was informed by feedback collected during educator cadres. Collectively, the evidence demonstrates that score reports are appropriately fine-grained.

Score reports should also be instructionally relevant and provide useful information for educators. Documentation describes the process for designing and evaluating score report content for instructional utility, including annually convening a governance board scoring and reporting ad hoc committee to discuss score report improvements. Feedback from educator interviews generally indicates that the Learning Profile mastery results, along with the conceptual area mastery bar graphs in the Performance Profile, are instructionally relevant and provide useful information for instructional planning and goal setting (Clark et al., 2018). Evidence supports the proposition that score reports are instructionally relevant and provide useful information for educators.

Finally, educators should be able to use results to communicate with parents about instructional plans and goal setting. Resources to support communicating with parents include a parent interpretive guide and the Educator Guide for Talking With Parents. Focus group feedback indicates educators use score reports to discuss mastery results with parents, including ways parents can support student learning outside school (Clark et al., 2018). While there is opportunity for additional research, evidence to date supports the proposition that educators are able to use results to communicate with parents.

10.4.4 Long-Term Outcomes

The DLM program intends to achieve several long-term outcomes. According to the chain of reasoning in the Theory of Action, design, delivery, and scoring statements all combine to produce the following intended long-term outcomes for the DLM assessment system:

  1. State and district education agencies use results for monitoring and resource allocation.
  2. Educators make instructional decisions based on data.
  3. Educators have high expectations.
  4. Students make progress toward higher expectations.

10.4.4.1 N: State and District Education Agencies Use Results for Monitoring and Resource Allocation

Students with significant cognitive disabilities were only recently included in accountability reporting (No Child Left Behind Act, 2002). One of the long-term intended outcomes of the DLM program is that state and district education agencies use aggregated DLM assessment results in their decision-making processes. Use of DLM results at the district and state level relies on inputs from summative results (Statement L), as shown in Figure 10.12.

Figure 10.12: Theory of Action Inputs for State and District Use

Small section of the Theory of Action showing Statement N and its inputs.

There is one proposition for state and district education agencies’ use of assessment results. Table 10.15 summarizes the evidence evaluating the proposition on state and district use of results.

Table 10.15: Propositions and Evidence for State and District Use
Proposition Procedural evidence Empirical evidence Type Chapter(s)
District and state education agency staff use aggregated information to evaluate programs and adjust resources Available state and district reports, resources, state guidance Consequences 7

The DLM program intends for aggregated DLM results to inform educational policy decisions at the state and district level. Aggregated reports, which summarize student achievement at the class, school, district, and state levels, are available to state and district education agency staff. Supplemental resources describe the contents of reports and provide guidance on how they can be used. Individual states provide additional guidance on use of aggregated DLM assessment data for monitoring and resource allocation and are responsible for evaluating the effectiveness of the data for these purposes. Due to variation in policy and practice around use of DLM results for program evaluation and resource allocation within and across states, the DLM program presently relies on states to collect their own evidence for this proposition. DLM staff could collect data from states implementing the DLM System to strengthen the evidence for this proposition.

10.4.4.2 O: Educators Make Instructional Decisions Based on Data

Historically, score reports from alternate assessments largely classified all students as proficient and provided limited information that could inform instructional decision-making (Nitsch, 2013). One intended long-term outcome of the DLM System is that educators use DLM assessment results to make instructional decisions. The use of DLM results for making instructional decisions relies on inputs from the utility of the results (Statement M), as shown in Figure 10.13.

Figure 10.13: Theory of Action Inputs for Educators Making Instructional Decisions Based on Data

Small section of the Theory of Action showing Statement O and its inputs.

There are three propositions corresponding to educators making sound instructional decisions based on data from DLM assessments. Table 10.16 summarizes the propositions and evidence evaluating data-based instructional decision-making.

Table 10.16: Propositions and Evidence for Educators Making Instructional Decisions Based on Data
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Educators are trained to use assessment results to inform instruction Test Administration Manual (TAM), interpretation guides, state and local guidance, score report helplet videos Video review feedback and use rates, focus groups Consequences, Content 7
Educators use assessment results to inform subsequent instruction, including in subsequent academic year Interpretation guides Focus groups, test administrator survey Consequences, Content 4, 7
Educators reflect on their instructional decisions TAM, interpretation guides, score report helplet videos Focus groups Consequences, Content 3, 4

Educators should be trained to use DLM assessment results to inform instruction. The Test Administration Manual describes how to monitor student progress during the optional instructionally embedded window using mastery results in the Instruction and Assessment Planner and how to access summative individual student score reports. Documentation also describes the various resources to help educators interpret assessment results as intended, including an interpretive guide and a set of four short score report interpretation videos. State and local education agencies provide additional guidance on score report use. DLM staff conducted multiple rounds of internal review for helplet video content, followed by TAC review, and external review by educators from six states. Feedback from each group was incorporated into the final videos. However, there has been minimal use of the videos. Since 2018, the videos have had 4,851 plays and only 601 finishes. Additional information about educator training to use assessment results came from focus groups. Educators described variability in training they received on using assessment results to inform instruction, including some who received formal training from their building coordinator or district education agency, others who relied on DLM resources online, and others who received no training (Clark et al., 2018, 2022). There is opportunity for continuous improvement around educator training for using results to inform instruction.

Educators should use assessment results from the optional instructionally embedded window to inform subsequent instruction and summative assessment results to plan instructional priorities in the subsequent academic year. Interpretation guides describe using results from the optional instructionally embedded window to inform subsequent instruction. However, there is variability in the actual use of data to inform instruction. Focus groups and interviews provide some evidence that educators use DLM assessment results, including in the subsequent academic year. Educators described using the Learning Profile portion of score reports to plan instructional priorities; however, other educators reported not using the results (Clark et al., 2018). While there is some evidence to support the proposition that educators use results to inform instructional decision-making, given the variability in reported use, there is opportunity for continuous improvement in educator use of results to inform instruction.

Finally, educators should use assessment results to reflect on their instructional decisions. Interpretation guides and other resources describe ways educators can reflect on their instructional decision-making. Focus group feedback indicated various ways educators reflect on their instructional decisions, including how DLM assessment data inform their decision-making (Clark et al., 2018, 2022). While current data provide evidence that some educators reflect on their instructional decisions, additional data should be collected for this proposition.

10.4.4.3 P: Educators Have High Expectations

Historically, students with significant cognitive disabilities have not been held to high academic expectations (e.g., Karvonen, Flowers, et al., 2013). As a result, the DLM program prioritizes a long-term outcome of educators having high expectations for all students who take DLM assessments. The Theory of Action demonstrates the reciprocal relationship between educators having high expectations and students making progress toward higher expectations (Statement Q) in Figure 10.14). As students demonstrate progress toward higher expectations, educators similarly increase their expectations, and increased expectations lead to students making greater progress.

Figure 10.14: Theory of Action Inputs for Educator Expectations and Student Progress

Small section of the Theory of Action showing a bidirectional relationship between Statements P and Q and their inputs.

Two propositions correspond to educators’ expectations for students taking DLM assessments. Table 10.17 summarizes the propositions and evidence evaluating educator expectations.

Table 10.17: Propositions and Evidence for Educators Having High Expectations
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Educators believe students can attain high expectations Map, Essential Elements, student interaction with system, student progress Test administrator survey, focus groups Consequences 3
Educators hold their students to high expectations Aligned instruction; testlet breadth and complexity Test administrator survey, focus groups, skill mastery survey Consequences 4, 7
Relies on evidence from Theory of Action input statements, as shown in Figure 10.14.

Theory of Action statements and propositions provide evidence that educators believe students with significant cognitive disabilities can attain high expectations. For instance, educators believing students can attain high expectations is supported by the map structure, rigorous grade-level expectations, allowing students to engage with the system as independently as they are able, and observing students’ academic progress. Spring 2022 survey responses describe educators’ perceptions of the academic content in the DLM assessments. In 2022, educators indicated that content reflected high expectations for 85% of students, which reflects a consistent annual increase since 2015, when it was 72%. Educators also indicated that testlet content measured important academic skills for 77% of students, which similarly reflects a consistent annual increase since 2015, when it was 50%. These data suggest that educator responses may reflect the awareness that the DLM assessments contain challenging content but show slightly more division on its importance. Focus group data similarly reveal variability in educator beliefs about students with significant disabilities achieving high expectations (Clark et al., 2018, 2022). The DLM program will continue to collect evidence regarding the extent that educators believe students can attain high expectations.

Educators should also hold their students to high expectations. Procedural evidence for this proposition relies heavily on Theory of Action inputs. For instance, when educators hold their students to high expectations, they provide rigorous academic instruction. A survey of 95 educators on their perceptions of their students’ skill mastery revealed that many educators defined mastery as 75%–80% success on multiple trials. However, other educators perceived mastery as consistent skill demonstration over a longer period of time, generalization and transfer of skills, independence and speed of demonstration, and students’ being able to explain the concept (Nehler & Clark, 2019). While these definitions differ from how mastery is determined on DLM assessments, they reflect some evidence of educators holding students to high expectations. Focus group findings reveal variability in how educators describe their expectations for students (Clark et al., 2022; Kobrin et al., 2022). While there is some evidence for this proposition, additional data collection is needed regarding educators holding all students to high expectations.

10.4.4.4 Q: Students Make Progress Toward Higher Expectations

A historic challenge for students who take alternative assessments is low expectations for their demonstration of academic skills (e.g., Timberlake, 2014). Historically, these students have been taught a largely functional curriculum intended to prepare them for independent living (Ryndak et al., 2014). Their achievement was often described as proficient, despite their instruction on relatively low-level academic skills (Altman et al., 2010; Nitsch, 2013). Therefore, an intended long-term outcome of the DLM assessment system is that students make progress toward higher expectations over time. In addition to the reciprocal relationship with educator expectations (Statement P), students making progress toward higher expectations has direct inputs from summative results (Statement L) and the utility of the results (Statement M) (Figure 10.14).

There are three propositions pertaining to students making progress toward higher expectations. The procedural and empirical evidence for these propositions, as well as evidence based on test content, constitutes evidence of the consequences of the DLM System. Table 10.18 summarizes the propositions and evidence evaluating student progress toward higher expectations.

Table 10.18: Propositions and Evidence for Student Progress Toward Higher Expectations
Proposition Procedural evidence Empirical evidence Type Chapter(s)
Alternate achievement standards are vertically articulated Postsecondary opportunity (PSO) vertical articulation argument, grade- and subject-specific performance level descriptors (PLDs), description of PLD development process Review of grade- and subject-specific PLDs Content 6, 7
Students who meet alternate achievement standards are on track to pursue postsecondary opportunities Description of PSO panel work, description of vertical articulation PSO panel identification of academic skills, PSO panel alignment evidence Consequences, Content 7
Students demonstrate academic progress toward higher expectations Instruction and Assessment Planner, map, Essential Elements, aligned instruction, show knowledge Test administrator survey, system data, focus groups Consequences 4, 7
Relies on evidence from Theory of Action input statements, as shown in Figure 10.14.

For students to make progress toward higher expectations, alternate achievement standards should be vertically articulated across grades. A postsecondary opportunities report (Karvonen et al., 2022) describes the vertical articulation of EEs across grades and map content from foundational to college- and career-ready expectations. The vertical articulation of DLM content structures informs vertical articulation of the achievement standards. Grade- and subject-specific PLDs similarly build across performance levels and grades, describing the types of skills students achieving at each level tend to master. Grade- and subject-specific PLDs were developed as part of the mastery profile-based standard setting method under advisement of the DLM TAC and included rounds of internal and governance board review. Collectively, the evidence indicates that the alternate achievement standards are vertically articulated.

Students who meet alternate achievement standards should be on track to pursue postsecondary opportunities. Documentation from the postsecondary opportunities study describes the procedures for convening panels to evaluate the extent to which PLDs for the DLM At Target achievement standards align with the skills necessary for pursuing postsecondary opportunities (postsecondary education or training, or competitive integrated employment). The panel ratings show that students who demonstrate proficiency on DLM assessments demonstrate less complex versions of the skills necessary for postsecondary opportunities in the elementary grades, and those skills continue to develop through the middle and high school grades. Together, the evidence indicates that students who demonstrate proficiency on DLM assessments are on track to pursue postsecondary opportunities. However, as of 2022, there is a large percentage of students who do not yet demonstrate proficiency on DLM assessments.

Finally, students should demonstrate academic progress toward higher expectations. Procedural evidence for this proposition relies in part on Theory of Action inputs. For instance, students can demonstrate progress toward higher expectations due to the map structure and rigorous grade-level expectations in the EEs, receiving aligned instruction, and can demonstrate their progress by interacting with the system to show their knowledge, skills, and understandings. When educators use the optional instructionally embedded assessments, the Instruction and Assessment Planner summarizes student mastery, enabling educators to observe student progress over time. Focus group feedback provides some additional anecdotal evidence of students demonstrating progress toward higher expectations (Clark et al., 2018, 2022). Because of the challenges with reporting growth on alternate assessments (Nehler et al., 2019), under advisement of the DLM TAC, growth is not reported for DLM assessments. Additionally, comparisons of cross-year performance level distributions are cautioned, as these comparisons are based on changes in composition of independent samples for a grade and subject across years, and the overall definition of the population is changing over time (e.g., as states work toward the 1% threshold required under the Every Student Succeeds Act, 2015). While there is some evidence of students making progress over time, additional evidence is needed to evaluate student progress toward higher expectations.

10.5 Evaluation Summary

In the three-tiered validity argument approach, we evaluate the extent to which statements in the Theory of Action are supported by the underlying propositions. Propositions are evaluated by the set of procedural and empirical evidence collected through 2021–2022. Table 10.19 summarizes the overall evaluation of the extent to which each statement in the Theory of Action is supported by the underlying propositions and associated evidence. We describe evidence according to its strength. We consider evidence for a proposition to be strong if the amount of evidence is sufficient, it includes both procedural and empirical evidence, and it is not likely to be explained by an alternative hypothesis. We consider the evidence for a proposition to be moderate if it includes only procedural evidence and/or if there is some likelihood that the evidence can be explained by an alternative hypothesis. We also note where current evidence is limited or additional evidence could be collected.

Table 10.19: Evaluation of Propositions for Each Theory of Action Statement
Statement Overall evaluation
A. Map nodes and pathways accurately describe the development of knowledge and skills. There is strong evidence that nodes are specified at the appropriate granularity, based on descriptions of the node development process and external review ratings. There is moderate evidence that nodes were correctly prioritized for linkage levels, based on description of the map development process, including rounds of internal review. There is strong evidence that nodes are correctly sequenced, based on the description of the procedures for specifying node connections and external review ratings, and indirect evidence from modeling analyses and alignment data for the correct ordering of linkage levels. Collectively, the propositions support the statement that map nodes and pathways accurately describe the development of knowledge and skills.
  1. Alternate content standards, the Essential Elements, provide grade-level access to college and career readiness standards.
There is evidence that the grain size and description of Essential Elements (EEs) are sufficiently well-defined to communicate a clear understanding of the targeted knowledge, skills, and understandings, based on the development process and state and content expert review. There is strong evidence that EEs capture what students should know and be able to do at each grade to be prepared for postsecondary opportunities, including college, career, and citizenship, based on the development process, alignment studies, vertical articulation evidence, and indirect evidence from the postsecondary opportunities study. There is evidence that the EEs in each grade sufficiently sample the domain, based on the development process for EEs and blueprints prior to their adoption. There is strong evidence that EEs are accurately aligned to nodes in the learning maps, based on documentation of the simultaneous development process, external review ratings, and alignment study data. Collectively, the propositions support the statement that the EEs provide grade-level access to college and career readiness standards.
C. The Kite system used to deliver DLM assessments is designed to maximize accessibility. There is evidence that system design is consistent with accessibility guidelines and contemporary code. There is strong evidence that supports needed by the student are available within and outside of the assessment system, based on system documentation, Personal Needs and Preferences Profile (PNP) data, test administrator survey responses, and focus groups. There is evidence that item types support the range of students in presentation and response, based on cognitive labs and test administrator survey responses. There is also evidence that the Kite Suite is accessible to educators, based on test administrator survey responses and focus group findings. Collectively, the propositions support the statement that the Kite Suite is designed to maximize accessibility.
  1. Instructionally relevant testlets are designed to allow students to demonstrate their knowledge, skills, and understandings relative to academic expectations.
There is strong evidence that items within testlets are aligned to linkage levels, based on test development procedures, alignment data, and item analyses. There is generally strong evidence that testlets are designed to be accessible to students, based on test development procedures and evidence from external review, test administrator survey responses, and focus groups. There is evidence that testlets are designed to be engaging and instructionally relevant, with some opportunity for improvement, based on test development procedures and focus group feedback. There is strong evidence that testlets are written at appropriate cognitive complexity for the linkage level and that items are free of extraneous content. There is strong evidence from item analysis and differential item functioning analyses that items do not contain content that is biased against or insensitive to subgroups of the population. There is some evidence that items elicit consistent response patterns across different administration formats, with opportunity for additional data collection. Together, the propositions support the statement that instructionally relevant testlets are designed to allow students to demonstrate their knowledge, skills, and understandings relative to academic expectations.
  1. Training strengthens educator knowledge and skills for assessing.
There is strong evidence that required training is designed to strengthen educator knowledge and skills for assessing, based on the documentation of the scope of training and passing requirements. There is some evidence that required training prepares educators to administer DLM assessments, based on survey responses, with opportunities for continuous improvement. There is strong evidence that required training is completed by all test administrators, based on Kite training status data files, state and local monitoring, and data extracts. Together, the propositions support the statement that training strengthens educator knowledge and skills for assessing.
  1. Professional development strengthens educator knowledge and skills for instructing students with significant cognitive disabilities.
There is strong evidence that professional development covers topics relevant to instruction, based on the list of modules and educators’ rating of the content. There is some evidence that educators access the professional development modules, based on the module completion data. There is some evidence that educators implement the practices on which they have been trained, but overall use of professional development modules is low. While there is opportunity for continuous improvement in the use of professional development modules and their application to instructional practice, when propositions are fulfilled (i.e., professional development is used), professional development strengthens educator knowledge and skills for instructing students with significant cognitive disabilities.
  1. Educators provide instruction aligned with Essential Elements and at an appropriate level of challenge.
Evidence that educators provide students the opportunity to learn content aligned with the grade-level EEs shows variable results, based on responses to the test administrator survey and First Contact survey. There is some evidence that educators provide instruction at an appropriate level of challenge using their knowledge of the student, based on test administrator survey responses and focus groups. However, there is also some evidence from the opportunity to learn section of the test administrator survey that some students may not be receiving instruction aligned with the full breadth of academic content measured by the DLM assessment. There is a need for additional data collection on instructional practice. Together, the propositions provide some support that educators provide instruction aligned with EEs and at an appropriate level of challenge.
  1. Educators administer assessments with fidelity.
There is strong evidence that educators are trained to administer testlets with fidelity, based on training documentation, test administrator survey responses, and test administration observations (TAOs). Documentation describes entry of accessibility supports, which is supported by PNP data from the system, but currently, there is no evidence available to evaluate the accuracy of accessibility supports enabled for students or their consistency with supports used during instruction. There is some evidence that educators allow students to engage with the system as independently as they are able, based on TAOs, test administrator survey responses, and PNP data, with the opportunity to collect additional data. There is evidence that educators enter student responses with fidelity, based on TAOs and writing interrater agreement studies. Overall, available evidence for the propositions indicates that educators administer assessments with fidelity, with opportunity for additional data collection.
  1. Students interact with the system to show their knowledge, skills, and understandings.
There is evidence that students can respond to tasks regardless of sensory, mobility, health, communication, or behavioral constraints, based on test administrator survey responses, PNP selection data, TAOs, and focus groups. There is evidence from cognitive labs, test administrator survey responses, and TAOs that student responses to items reflect their knowledge, skills, and understandings and that students can interact with the system as intended. Evidence for the propositions collectively demonstrates that students interact with the system to show their knowledge, skills, and understandings.
J. The combination of administered testlets measure knowledge and skills at the appropriate breadth and complexity. There is generally strong evidence that the First Contact survey correctly assigns students to complexity bands, based on pilot analyses, with the opportunity for additional research. There is strong evidence that administered testlets cover the full blueprint based on blueprint coverage data and Special Circumstance codes. There is also generally strong evidence that administered testlets are at the appropriate linkage level, with the opportunity to collect additional data about adaptation. Overall, the propositions moderately support the statement that the combination of administered testlets is at the appropriate breadth and complexity.
  1. Mastery results indicate what students know and can do.
There is strong evidence that mastery status reflects students’ knowledge, skills, and understandings, based on modeling evidence, and there is strong evidence that linkage level mastery classifications are reliable based on reliability analyses. There is some evidence that mastery results are consistent with other measures of student knowledge, skills, and understandings, but additional evidence is needed. Overall, the propositions support the statement that mastery results indicate what students know and can do.
  1. Results indicate summative performance relative to alternate achievement standards.
There is strong evidence that performance levels meaningfully differentiate student achievement, based on the standard setting procedure, standard setting survey data, and performance distributions. There is strong evidence that performance level determinations are reliable, based on documentation and reliability analyses. The propositions support the statement that results indicate summative performance relative to alternate achievement standards.
  1. Results can be used for instructional decision-making.
There is strong evidence that reports are fine-grained, based on documentation of score report development and on interview and focus group data. There is evidence that score reports are instructionally relevant and useful and that they provide relevant information for educators, based on documentation of the report development process, interview, and test administrator survey responses. There is evidence of variability in training on how educators can use results to inform instruction, based on training content and focus groups. The evidence suggests there is opportunity to collect additional data. There is evidence that educators can use results to inform instructional choices and goal setting, based on documentation of the report development, interview data, focus groups, and test administrator survey responses. There is evidence that educators can use results to communicate with parents, but there is variability in actual practice. Additional data should be collected. Overall, the propositions support results being useful for instructional decision-making.
  1. State and district education agencies use results for monitoring and resource allocation.
There is some procedural evidence that district and state staff use aggregated information to evaluate programs and adjust resources. Because states and district policies vary regarding how results should be used for monitoring and resource allocation, states are responsible for collecting their own evidence for this proposition. Additional evidence could be collected from across state education agencies.
  1. Educators make instructional decisions based on data.
There is some evidence that educators are trained to use assessment results to inform instruction, based on available resources, video use rates and feedback, and focus group feedback, with some variability in use. There is some evidence that educators use assessment results to inform instruction, based on feedback from focus groups and the test administrator survey. There is some evidence that educators reflect on their instructional decisions, based on focus group feedback. Evidence for each proposition provides some support for the statement that some educators make instructional decisions based on data, but variability indicates that not all educators receive training or use data to inform instruction. Additional evidence collection and continuous improvement in implementation would strengthen the propositions and provide greater support for this long-term outcome of the system.
  1. Educators have high expectations.
There is some evidence that educators believe students can attain high expectations and hold their students to high expectations. Survey and focus group responses show variability in educator perspectives. There is opportunity for continued data collection of educators’ understanding of high expectations as defined in the DLM System. Presently, the propositions provide some support for the statement that educators have high expectations for their students.
  1. Students make progress toward higher expectations.
There is strong evidence that the alternate achievement standards are vertically articulated, based on the performance level descriptor development process and postsecondary opportunities vertical articulation argument. There is strong evidence, based on the postsecondary opportunities alignment evidence, that students who meet alternate achievement standards are on track to pursue postsecondary opportunities. However, there is a large percentage of students who do not yet demonstrate proficiency on DLM assessments. There is some evidence showing students make progress toward higher expectations based on input from prior Theory of Action statements and focus group data. There are complexities in reporting growth for DLM assessments, and evaluating within-year progress relies on educator use of optional instructionally embedded assessments, which to date is low. Evidence collected for the propositions to date provides some support for the statement that students make progress toward higher expectations.

10.6 Continuous Improvement

The DLM program is committed to continuous improvement of assessments, educator and student experiences, and technological delivery of the assessment system. As described in Chapter 1, the DLM Theory of Action guides ongoing research, development, and continuous improvement. Through formal research and evaluation as well as informal feedback, guided by the DLM Governance Board, the TAC, and others, the DLM program has made many improvements since its launch in 2015, including the blueprint and test pool revision in 2019–2020. This section describes examples of improvements related to the design, delivery, and scoring of DLM assessments to support achieving the program’s intended long-term outcomes.

10.6.1 Design Improvements

The Learning Maps and Test Development teams continually improve learning map development and item writing and testlet development processes. They use multiple sources of information from the field, research findings, and data collected throughout the school year. For example, in January 2018, the item writing process shifted from having all items written on site to a hybrid model involving both on-site and remote (online) activities. This hybrid approach is more efficient and maintains the high quality of items written. ATLAS staff have also refined procedures for retiring content from testlet pools over time. Model fit analyses, item-level differential item functioning evaluations, and content reviews are used to prioritize items and testlets for retirement. The aim of these retirement processes is to systematically refresh the operational pool with high-quality items.

10.6.2 Delivery Improvements

Improvements to test delivery and administration procedures focus on promoting accessibility, accurate delivery of testlet assignments, and a high-quality assessment experience for educators and students. Continuous improvements to the functionality of the Kite Suite have resulted in several changes in recent years. Staff annually prioritize updates to Educator Portal and Student Portal to improve user experience, drawing from input from the governance board and educator feedback from surveys and cadre groups where applicable.

10.6.3 Scoring and Reporting Improvements

DLM staff also make continuous improvements to scoring and reporting. The DLM TAC, including a modeling subcommittee of TAC members, regularly provides feedback on methodological psychometric topics. Examples of improvements based on this work include methods for evaluating model- and item-level fit and research to evaluate the ordering of linkage levels within EEs. Feedback from educator cadres and focus groups, along with a scoring and reporting ad hoc committee of DLM Governance Board members, has led to minor adjustments in score report contents (e.g., adding a link on score reports that directs users to additional resources on the DLM website) and to the creation of helplet videos to support score report interpretation and use.

10.7 Future Research

The evaluation of evidence for the statements in the Theory of Action identifies areas for future research. DLM staff will plan future research to collect additional evidence for propositions where current evidence is moderate or limited. We describe areas for future research throughout this technical manual. Longitudinal data collection is ongoing as part of the regular operations of the assessment system. As the DLM System continues to mature, additional evidence will be collected on a continuous basis. The DLM Governance Board will continue to collaborate on additional data collection as needed. Future studies will be guided by advice from the DLM TAC, using processes established over the life of the DLM System. Some examples of future research are described here.

In the area of design, DLM staff will conduct additional research on methods for evaluating the structure of the learning maps to provide a more comprehensive evaluation of the hierarchical ordering of linkage levels. In the area of delivery, DLM staff are examining new ways to gather information on students’ opportunity to learn in order to further evaluate the extent to which educators provide aligned instruction. In the area of scoring, the spring 2023 test administrator survey will collect information on educator ratings of student mastery as additional evidence to evaluate the extent that mastery ratings are consistent with other measures of student knowledge, skills, and understandings. To evaluate long-term outcomes, the annual test administrator survey will continue to provide a source of data from which to investigate changes over time in the long-term effects of the assessment system for students and educators.