11 References

Accessible Teaching, Learning, and Assessment Systems. (2021). 2020–2021 DLM administration during COVID-19: Participation, performance, and educational experience (Technical Report No. 21-02). University of Kansas. https://dynamiclearningmaps.org/sites/default/files/documents/publication/DLM-COVID.pdf

Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian networks in educational assessment. Springer. https://doi.org/10.1007/978-1-4939-2125-6

Altman, J. R., Lazarus, S. S., Quenemoen, R. F., Kearns, J., Quenemoen, M., & Thurlow, M. L. (2010). 2009 survey of states: Accomplishments and new issues at the end of a decade of change. University of Minnesota, National Center on Educational Outcomes. https://files.eric.ed.gov/fulltext/ED511742.pdf

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.

Babu, G. J. (2011). Resampling methods for model fitting and model selection. Journal of Biopharmaceutical Statistics, 21(6), 1177–1186. https://doi.org/10.1080/10543406.2011.607749

Bechard, S., Clark, A. K., Swinburne Romine, R., Karvonen, M., Kingston, N., & Erickson, K. (2019). Use of evidence-centered design to develop learning maps-based assessments. International Journal of Testing, 19(2), 188–205. https://doi.org/10.1080/15305058.2018.1543310

Bechard, S., Hess, K., Camacho, C., Russell, M., & Thomas, K. (2012). Understanding learning progressions and learning maps to inform the development of assessment for students in special populations. White paper. SRI international symposium, Menlo Park, CA.

Bechard, S., & Sheinker, A. (2012). Basic framework for item writers using Evidence Centered Design (ECD). University of Kansas, Center for Educational Testing and Evaluation.

Betancourt, M. (2018). A conceptual introduction to Hamiltonian Monte Carlo. arXiv. http://arxiv.org/abs/1701.02434

Bradshaw, L. (2016). Diagnostic classification models. In A. A. Rupp & J. Leighton (Eds.), The handbook of cognition and assessment: Frameworks, methodologies, and applications (1st ed., pp. 297–327). John Wiley & Sons. https://doi.org/10.1002/9781118956588.ch13

Bradshaw, L., Izsák, A., Templin, J., & Jacobson, E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33(1), 2–14. https://doi.org/10.1111/emip.12020

Bradshaw, L., & Levy, R. (2019). Interpreting probabilistic classifications from diagnostic psychometric models. Educational Measurement: Issues and Practice, 38(2), 79–88. https://doi.org/10.1111/emip.12247

Camilli, G., & Shepard, L. A. (1994). Method for Identifying Biased Test Items (4th ed.). SAGE Publications, Inc.

Carlin, B. P., & Louis, T. A. (2001). Empirical Bayes: Past, present and future. In A. E. Raftery, M. A. Tanner, & M. T. Wells (Eds.), Statistics in the 21st century. Chapman and Hall/CRC. https://doi.org/10.1201/9781420035391

Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01

Casella, G. (1985). An introduction to empirical Bayes data analysis. The American Statistician, 39(2), 83–87. https://doi.org/10.2307/2682801

Center for Applied Special Technology. (2018, January 20). Universal design for learning guidelines version 2.2. https://udlguidelines.cast.org

Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293–332. https://doi.org/10.1207/s1532690xci0804_2

Chen, J., Torre, J. de la, & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.x

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6(4), 284–290. https://doi.org/10.1037/1040-3590.6.4.284

Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558. https://doi.org/10.1016/0895-4356(90)90159-M

Cizek, G. J. (1996). Standard-setting guidelines. Educational Measurement: Issues and Practice, 15(1), 13–21. https://doi.org/10.1111/j.1745-3992.1996.tb00802.x

Cizek, G. J., & Bunch, M. B. (2006). Standard setting: A guide to establishing and evaluating performance standards on tests. SAGE Publications, Inc. https://doi.org/10.4135/9781412985918

Clark, A. K., Karvonen, M., Kingston, N., Anderson, G., & Wells-Moreaux, S. (2015). Designing alternate assessment score reports that maximize instructional impact. National Council on Measurement in Education Annual Meeting, Chicago, IL. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2015_Score_Report_Utility_Paper.pdf

Clark, A. K., Karvonen, M., Swinburne Romine, R., & Kingston, N. (2018). Teacher use of score reports for instructional decision-making: Preliminary findings. National Council on Measurement in Education Annual Meeting, New York, NY. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2018_Score_Report_Use_Findings.pdf

Clark, A. K., Kingston, N., Templin, J., & Pardos, Z. (2014). Summary of results from the fall 2013 pilot administration of the Dynamic Learning Maps Alternate Assessment System (Technical Report No. 14-01). University of Kansas, Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/publication/pilot_summary_of_findings.pdf

Clark, A. K., Kobrin, J., & Hirt, A. (2022). Educator perspectives on instructionally embedded assessment (Research Synopsis No. 22-01). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/IE_Focus_Groups_project_brief.pdf

Clark, A. K., Nash, B., Karvonen, M., & Kingston, N. (2017). Condensed mastery profile method for setting standards for diagnostic assessment systems. Educational Measurement: Issues and Practice, 36(4), 5–15. https://doi.org/10.1111/emip.12162

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Routledge.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037//0033-2909.112.1.155

Confrey, J., Gianopulos, G., McGowan, W., Shah, M., & Belcher, M. (2017). Scaffolding learner-centered curricular coherence using learning maps and diagnostic assessments designed around mathematics learning trajectories. ZDM – Mathematics Education, 49(5), 717–734. https://doi.org/10.1007/s11858-017-0869-1

Confrey, J., Maloney, A. P., & Corely, A. K. (2014). Learning trajectories: A framework for connecting standards with curriculum. ZDM – Mathematics Education, 46(5), 719–733. https://doi.org/10.1007/s11858-014-0598-7

Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1177/0146621612445470

DeBarger, A. H., Seeratan, K., Cameto, R., Haertel, G., Knokey, A.-M., & Morrison, K. (2011). Alternate assessment design-mathematics [{Technical Report 9: Implementing evidence-centered design to develop assessments for students with significant cognitive disabilities: Guidelines for creating design patterns and development specifications and exemplar task templates for mathematics}]. SRI International. http://alternateassessmentdesign.sri.com/techreports/AAD_M_TechRpt9_032911final.pdf

Dennis, A., Erickson, K., & Hatch, P. (2013). The Dynamic Learning Maps core vocabulary: overview [Technical Review]. https://www.dropbox.com/s/99ay2ypx37m6lps/DLM_Core_Vocabulary_Overview.pdf

DLM Consortium. (2021). Test Administration Manual 2021–2022. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.

Dynamic Learning Maps Consortium. (2014). Guide to external review of testlets. University of Kansas, Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/External_Review/External_Review_Guide_IM.pdf

Dynamic Learning Maps Consortium. (2021a). Accessibility Manual 2021–2022. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.

Dynamic Learning Maps Consortium. (2021b). Guide to DLM Required Test Administrator Training. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.

Dynamic Learning Maps Consortium. (2022a). 2021–2022 Technical Manual Update—Science. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.

Dynamic Learning Maps Consortium. (2022b). Technology Specifications Manual 2021–2022. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.

Efron, B. (2014). Two modeling strategies for empirical Bayes estimation. Statistical Science, 29(2), 285–301. https://doi.org/10.1214/13-STS455

Erickson, K. A., Hatch, P., & Clendon, S. (2010). Literacy, assistive technology, and students with significant disabilities. Focus on Exceptional Children, 42(5), 1–17. https://doi.org/10.17161/foec.v42i5.6904

Erickson, K. A., & Karvonen, M. (2014). College and career readiness instruction and assessment for pre-intentional and pre-symbolic communicators. Office of Special Education Programs Project Directors’ Meeting, Washington, D. C.

Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (revised edition). MIT Press. https://doi.org/10.7551/mitpress/5657.001.0001

Every Student Succeeds Act, 20 U.S.C § 6301 (2015). https://www.congress.gov/114/plaws/publ95/PLAW-114publ95.pdf

Falconer, J. R., Frank, E., Polaschek, D. L. L., & Joshi, C. (2022). Methods for eliciting informative prior distributions: A critical review. Decision Analysis. https://doi.org/10.1287/deca.2022.0451

Family Educational Rights and Privacy Act, 20 U.S.C § 1232g; 34 CFR Part 99 (1974). https://www.govinfo.gov/content/pkg/USCODE-2020-title20/pdf/USCODE-2020-title20-chap31-subchapIII-part4-sec1232g.pdf

Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40. https://doi.org/10.1007/s11336-018-09658-x

Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549. https://doi.org/10.1016/0895-4356(90)90158-L

Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions (2nd ed., pp. 38–46). John Wiley.

Flowers, C., Turner, C., Herrera, B., Towles-Reeves, L., Thurlow, M., Davidson, A., & Hagge, S. (2015). Developing a large-scale assessment using components of evidence-centered design: Did it work? Paper presentation. National Council on Measurement in Education.

Flowers, C., & Wakeman, S. (2016a). Alignment of Dynamic Learning Maps operational items to grade-level content standards: Integrated model for testing. ACERI Partners, LLC.

Flowers, C., & Wakeman, S. (2016b). Alignment of Dynamic Learning Maps operational items to grade-level content standards: Year-End models of testing. ACERI Partners, LLC.

Flowers, C., & Wakeman, S. (2020). Alignment of Dynamic Learning Maps operational items to grade-level content standards: Year-End model of testing. ACERI Partners, LLC.

Flowers, C., Wakeman, S., Browder, D. M., & Karvonen, M. (2009). Links for academic learning (LAL): A conceptual model for investigating alignment of alternate assessments based on alternate achievement standards. Educational Measurement: Issues and Practice, 28(1), 25–37. https://doi.org/10.1111/j.1745-3992.2009.01134.x

Gentry, J. R. (1982). An analysis of developmental spelling in "GNYS AT WRK". The Reading Teacher, 36(2), 192–200. https://www.jstor.org/stable/20198182

Goldstein, J., & Behuniak, P. (2010). Assumptions in alternate assessment: An argument-based approach to validation. Assessment for Effective Intervention, 36(3), 179–191. https://doi.org/10.1177/1534508410392208

Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255–282. https://doi.org/10.1007/BF02288892

Hambleton, R. K., Pitoniak, M. J., & Copella, J. M. (2012). Essential steps in setting performance standards on educational tests and strategies for assessing the reliability of results. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 47–76). Routledge. https://doi.org/10.4324/9780203848203

Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277. https://doi.org/10.1177/0146621604272623

Henson, R., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5

Hess, K. K. (2012). Learning progressions in K–8 classrooms: How progress maps can influence classroom practice and perceptions and help teachers make more informed instructional decisions in support of struggling learners (Synthesis Report No. 87). University of Minnesota, National Center on Educational Outcomes. https://nceo.umn.edu/docs/OnlinePubs/Synthesis87/SynthesisReport87.pdf

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. http://www.jstor.org/stable/4615733

Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power raters using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349. https://doi.org/10.1207/S15324818AME1404_2

Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635–664. https://doi.org/10.1111/jedm.12196

Johnstone, C., Altman, J. R., & Moore, M. (2011). Universal design and the use of cognitive labs. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margin: Challenges, strategies, and techniques (pp. 425–442). Information Age Publishing.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Prager.

Karvonen, M., Bechard, S., & Wells-Moreaux, S. (2015). Accessibility considerations for students with significant cognitive disabilities who take computer-based alternate assessments. Paper presentation. American educational research association annual meeting, Chicago, IL.

Karvonen, M., Clark, A. K., & Kavitsky, L. (2022). Aligned academic achievement standards to support pursuit of postsecondary opportunities: Year-End model (Technical Report No. 22-01). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/YE_PSO_technical_report.pdf

Karvonen, M., Clark, A. K., & Kingston, N. M. (2016). Alternate assessment score report interpretation and use: Implications for instructional planning. National Council on Measurement in Education Annual Meeting, Washington, DC. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2016_DLM_Score_Report.pdf

Karvonen, M., Clark, A. K., & Nash, B. (2015). 2015 Year-End model standard setting: English language arts and mathematics (Technical Report No. 15-03). University of Kansas; Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Standard_Setting_Tech_Report_YE.pdf

Karvonen, M., Flowers, C., & Wakeman, S. (2009). Predictors of access to the general curriculum for students with significant cognitive disabilities. Paper presentation. American Educational Research Association Annual Meeting, San Diego, CA.

Karvonen, M., Flowers, C., & Wakeman, S. Y. (2013). Factors associated with access to general curriculum for students with intellectual disability. Current Issues in Education, 16(3). https://cie.asu.edu/ojs/index.php/cieatasu/article/view/1309

Karvonen, M., Swinburne Romine, R., Clark, A. K., Brussow, J., & Kingston, N. M. (2017). Promoting accurate score report interpretation and use for instructional planning. National Council on Measurement in Education Annual Meeting, San Antonio, TX. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2017_Score_Reports.pdf

Karvonen, M., Wakeman, S. Y., Browder, D. M., Rogers, M. A. S., & Flowers, C. (2011). Academic curriculum for students with significant cognitive disabilities: Special education teacher perspectives a decade after IDEA 1997 [Research Report]. National Alternate Assessment Center. https://files.eric.ed.gov/fulltext/ED521407.pdf

Karvonen, M., Wakeman, S., Flowers, C., & Moody, S. (2013). The relationship of teachers’ instructional decisions and beliefs about alternate assessments of student achievement. Exceptionality, 21(4), 238–252. https://doi.org/10.1080/09362835.2012.747184

Kearns, J. F., Towles-Reeves, E., Kleinert, H. L., Kleinert, J. O. R., & Thomas, M. K.-K. (2011). Characteristics of and implications for students participating in alternate assessments based on alternate academic achievement standards. The Journal of Special Education, 45(1), 3–14. https://doi.org/10.1177/0022466909344223

Kingston, N., & Tiemann, G. C. (2012). Setting performance standards on complex assessments: The body of work method. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 201–224). Routledge. https://doi.org/10.4324/9780203848203

Kleinert, H. L., Browder, D. M., & Towles-Reeves, E. A. (2009). Models of cognition for students with significant cognitive disabilities: Implications for assessment. Review of Educational Research, 79(1), 301–326. https://doi.org/10.3102/0034654308326160

Kobrin, J., Clark, A. K., & Kavitsky, E. (2022). Exploring educator perspectives on potential accessibility gaps in the Dynamic Learning Maps alternate assessment (Research Synopsis No. 22-02). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Accessibility_Focus_Groups_project_brief.pdf

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

Leighton, J., & Gierl, M. (Eds.). (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511611186

Linn, R. L. (2009). The concept of validity in the context of NCLB. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 195–212). Information Age Publishing, Inc.

Lissitz, R. W. (2009). Introduction. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 1–18). Information Age Publishing, Inc.

Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1), 503–528. https://doi.org/10.1007/BF01589116

Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212. https://doi.org/10.1007/BF02294535

Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13–104). American Council on Education and Macmillan.

Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 83–110). Information Age Publishing, Inc.

Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design (Research Report RR-03-16). Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2003.tb01908.x

Mislevy, R. J., & Gitomer, D. H. (1995). The role of probability-based inference in an intelligent tutoring system. User Modeling and User-Adapted Interaction, 5(3–4), 253–282. https://doi.org/10.1007/BF01126112

Mislevy, R. J., & Riconscente, M. M. (2005). Evidence-centered assessment design: Layers, structures, and terminology (PADI Technical Report No. 9). SRI International. https://padi.sri.com/downloads/TR9_ECD.pdf

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (1999). Evidence-centered assessment design. Educational Testing Service. https://learnlab.org/research/wiki/images/5/51/Evidence-Centered-Assessment-Design.pdf

Moinester, M., & Gottfried, R. (2014). Sample size estimation for correlations with pre-specified confidence interval. The Quantitative Methods for Psychology, 10(2), 124–130. https://doi.org/10.20982/tqmp.10.2.p124

Nabi, S., Nassif, H., Hong, J., Mamani, H., & Imbens, G. (2022). Bayesian meta-prior learning using empirical Bayes. Management Science, 68(3), 1737–1755. https://doi.org/10.1287/mnsc.2021.4136

Nash, B., Clark, A. K., & Karvonen, M. (2016). First Contact: A census report on the characteristics of students eligible to take alternate assessments. University of Kansas, Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/publication/First_Contact_Census_2016.pdf

National Council on Measurement in Education. (2018). Position statement on theories of action for testing programs. https://higherlogicdownload.s3.amazonaws.com/NCME/c53581e4-9882-4137-987b-4475f6cb502a/UploadedImages/Documents/NCME_Position_Paper_on_Theories_of_Action_-_Final_July__2018.pdf

National Governors Association Center for Best Practices and Council of Chief State School Officers. (2010). Common core state standards. National Governors Association Center for Best Practices; Council of Chief State School Officers.

Neal, R. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, & X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo (Vol. 20116022). Chapman and Hall/CRC. https://doi.org/10.1201/b10905-6

Nehler, C., & Clark, A. K. (2019). Teacher rating of student mastery: Pilot (Research Synopsis No. 19-02). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Teacher_Rating_of_Student_Mastery.pdf

Nehler, C., Clark, A. K., & Karvonen, M. (2019). White paper: Considerations for measuring academic growth on Dynamic Learning Maps (DLM) alternate assessments. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.

Nippold, M. A. (2007). Later language development: School-age children, adolescents, and young adults (3rd ed.). Pro Ed.

Nitsch, C. (2013). Dynamic Learning Maps: The Arc parent focus groups. The Arc. https://dynamiclearningmaps.org/sites/default/files/documents/publication/TheArcParentFocusGroups.pdf

No Child Left Behind Act, 20 U.S.C. § 6319 (2002). https://www.congress.gov/107/plaws/publ110/PLAW-107publ110.pdf

Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer. https://doi.org/10.1007/978-0-387-40065-5

O’Leary, S., Lund, M., Ytre-Hauge, T. J., Holm, S. R., Naess, K., Dalland, L. N., & McPhail, S. M. (2014). Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies. Physiotherapy, 100, 27–35. https://doi.org/10.1016/j.physio.2013.08.002

Pearl, J. (1988). Probabilistic reasoning in intelligent systems. Morgan Kaufmann. https://doi.org/10.1016/C2009-0-27609-4

Perie, M. (2008). A guide to understanding and developing performance-level descriptors. Educational Measurement: Issues and Practice, 27(4), 15–29. https://doi.org/10.1111/j.1745-3992.2008.00135.x

Petrone, S., Rousseau, J., & Scricciolo, C. (2014). Bayes and empirical Bayes: Do they merge? Biometrika, 101(2), 285–302. https://doi.org/10.1093/biomet/ast067

Pontius, R. G., Jr., & Millones, M. (2011). Death to kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing, 32, 4407–4429. https://doi.org/10.1080/01431161.2011.552923

Popham, W. J. (2011). How to build learning progressions: Keep them simple, Simon! American Educational Research Association Annual Meeting, New Orleans, LA.

Ravand, H., & Baghaei, P. (2020). Diagnostic classification models: Recent developments, practical issues, and prospects. International Journal of Testing, 20(1), 24–56. https://doi.org/10.1080/15305058.2019.1588278

Rupp, A. A., Templin, J., & Henson, R. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.

Ryndak, D. L., Taub, D., Jorgensen, C. M., Gonsier-Gerdin, J., Arndt, K., Sauer, J., Ruppar, A. L., Morningstar, M. E., & Allcock, H. (2014). Policy and the impact on placement, involvement, and progress in general education: Critical issues that require rectification. Research and Practice for Persons with Severe Disabilities, 39(1), 65–74. https://doi.org/10.1177/1540796914533942

Sarama, J., & Clements, D. H. (2009). Early childhood mathematics education research: Learning trajectories for young children. Routledge. https://doi.org/10.4324/9780203883785

Shepard, L. A. (2018). Learning progressions as tools for assessment and learning. Applied Measurement in Education, 31, 165–174. https://doi.org/10.1080/08957347.2017.1408628

Sinharay, S., & Johnson, M. S. (2019). Measures of agreement: Reliability, classification accuracy, and classification consistency. In M. von Davier & Y.-S. Lee (Eds.), Handbook of Diagnostic Classification Models (pp. 359–377). Springer International Publishing. https://doi.org/10.1007/978-3-030-05584-4_17

Sireci, S. G. (2009). Packing and unpacking sources of validity evidence. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–38). Information Age Publishing, Inc.

Stan Development Team. (2022). RStan: The R interface to Stan. https://mc-stan.org/

Stefan, A. M., Evans, N. J., & Wagenmakers, E.-J. (2020). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods, 27(2), 177–197. https://doi.org/10.1037/met0000354

Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. https://www.jstor.org/stable/1434855

Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30(2), 251–275. https://doi.org/10.1007/s00357-013-9129-4

Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339. https://doi.org/10.1007/s11336-013-9362-0

Templin, J., & Henson, R. (2008). Understanding the impact of skill acquisition: Relating diagnostic assessments to measurable outcomes. Paper presentation. American Educational Research Association Annual Meeting, New York, NY.

Thompson, S. J., Johnstone, C. J., & Thurlow, M. L. (2002). Universal design applied to large scale assessments. University of Minnesota, National Center on Educational Outcomes. https://nceo.umn.edu/docs/OnlinePubs/Synth44.pdf

Thompson, W. J. (2020). Reliability for the Dynamic Learning Maps assessments: A comparison of methods (Technical Report No. 20-03). University of Kansas; Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Reliability_Comparison.pdf

Thompson, W. J. (2019). Bayesian psychometrics for diagnostic assessments: A proof of concept (Research Report No. 19-01). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://doi.org/10.35542/osf.io/jzqs8

Thompson, W. J., Clark, A. K., & Nash, B. (2019). Measuring the reliability of diagnostic mastery classifications at multiple levels of reporting. Applied Measurement in Education, 32(4), 298–309. https://doi.org/10.1080/08957347.2019.1660345

Thompson, W. J., & Nash, B. (2022). A diagnostic framework for the empirical evaluation of learning maps. Frontiers in Education, 6, 714736. https://doi.org/10.3389/feduc.2021.714736

Thompson, W. J., & Nash, B. (2019). Beyond learning progressions: Maps as assessment architecture: Illustrations and results. Symposium. National Council on Measurement in Education Annual Meeting, Toronto, Canada. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/Thompson_Nash_Empirical_evaluation_of_learning_maps.pdf

Timberlake, M. T. (2014). Weighing costs and benefits: Teacher interpretation and implementation of access to the general education curriculum. Research and Practice for Persons with Severe Disabilities, 39(2), 83–99. https://doi.org/10.1177/1540796914544547

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432. https://doi.org/10.1007/s11222-016-9696-4

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved \(\widehat{R}\) for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221

Vehtari, A., Simpson, D., Gelman, A., Yao, Y., & Gabry, J. (2022). Pareto smoothed importance sampling. arXiv. https://doi.org/https://doi.org/10.48550/arXiv.1507.02646

Wang, W., Song, L., Chen, P., Meng, Y., & Ding, S. (2015). Attribute-level and pattern-level classification consistency and accuracy indices for cognitive diagnostic assessment. Journal of Educational Measurement, 52(4), 457–476. https://doi.org/10.1111/jedm.12096

Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594. http://www.jmlr.org/papers/v11/watanabe10a.html

Xu, G. (2019). Identifiability and cognitive diagnosis models. In M. von Davier & Y.-S. Lee (Eds.), Handbook of Diagnostic Classification Models (pp. 333–357). Springer International Publishing. https://doi.org/10.1007/978-3-030-05584-4_16

Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81(3), 625–649. https://doi.org/10.1007/s11336-015-9471-z

Yorkston, K., Dowden, P., Honsinger, M., Marriner, N., & Smith, K. (1988). A comparison of standard and user vocabulary lists. Augmentative and Alternative Communication, 4(4), 189–210. https://doi.org/10.1080/07434618812331274807

Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF [{Working Paper}]. University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioral Science.