11 References

Accessible Teaching, Learning, and Assessment Systems. (2021). 2020–2021 DLM administration during COVID-19: Participation, performance, and educational experience (Technical Report No. 21-02). University of Kansas. https://dynamiclearningmaps.org/sites/default/files/documents/publication/DLM-COVID.pdf
Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., & Williamson, D. M. (2015). Bayesian networks in educational assessment. Springer. https://doi.org/10.1007/978-1-4939-2125-6
Altman, J. R., Lazarus, S. S., Quenemoen, R. F., Kearns, J., Quenemoen, M., & Thurlow, M. L. (2010). 2009 survey of states: Accomplishments and new issues at the end of a decade of change. University of Minnesota, National Center on Educational Outcomes. https://files.eric.ed.gov/fulltext/ED511742.pdf
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association.
Babu, G. J. (2011). Resampling methods for model fitting and model selection. Journal of Biopharmaceutical Statistics, 21(6), 1177–1186. https://doi.org/10.1080/10543406.2011.607749
Bechard, S., Clark, A. K., Swinburne Romine, R., Karvonen, M., Kingston, N., & Erickson, K. (2019). Use of evidence-centered design to develop learning maps-based assessments. International Journal of Testing, 19(2), 188–205. https://doi.org/10.1080/15305058.2018.1543310
Bechard, S., Hess, K., Camacho, C., Russell, M., & Thomas, K. (2012). Understanding learning progressions and learning maps to inform the development of assessment for students in special populations. White paper. SRI international symposium, Menlo Park, CA.
Bechard, S., & Sheinker, A. (2012). Basic framework for item writers using Evidence Centered Design (ECD). University of Kansas, Center for Educational Testing and Evaluation.
Betancourt, M. (2018). A conceptual introduction to Hamiltonian Monte Carlo. arXiv. http://arxiv.org/abs/1701.02434
Bradshaw, L. (2016). Diagnostic classification models. In A. A. Rupp & J. Leighton (Eds.), The handbook of cognition and assessment: Frameworks, methodologies, and applications (1st ed., pp. 297–327). John Wiley & Sons. https://doi.org/10.1002/9781118956588.ch13
Bradshaw, L., Izsák, A., Templin, J., & Jacobson, E. (2014). Diagnosing teachers’ understandings of rational numbers: Building a multidimensional test within the diagnostic classification framework. Educational Measurement: Issues and Practice, 33(1), 2–14. https://doi.org/10.1111/emip.12020
Bradshaw, L., & Levy, R. (2019). Interpreting probabilistic classifications from diagnostic psychometric models. Educational Measurement: Issues and Practice, 38(2), 79–88. https://doi.org/10.1111/emip.12247
Camilli, G., & Shepard, L. A. (1994). Method for Identifying Biased Test Items (4th ed.). SAGE Publications, Inc.
Carlin, B. P., & Louis, T. A. (2001). Empirical Bayes: Past, present and future. In A. E. Raftery, M. A. Tanner, & M. T. Wells (Eds.), Statistics in the 21st century. Chapman and Hall/CRC. https://doi.org/10.1201/9781420035391
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32. https://doi.org/10.18637/jss.v076.i01
Casella, G. (1985). An introduction to empirical Bayes data analysis. The American Statistician, 39(2), 83–87. https://doi.org/10.2307/2682801
Center for Applied Special Technology. (2018, January 20). Universal design for learning guidelines version 2.2. https://udlguidelines.cast.org
Chandler, P., & Sweller, J. (1991). Cognitive load theory and the format of instruction. Cognition and Instruction, 8(4), 293–332. https://doi.org/10.1207/s1532690xci0804_2
Chen, J., Torre, J. de la, & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/10.1111/j.1745-3984.2012.00185.x
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6(4), 284–290. https://doi.org/10.1037/1040-3590.6.4.284
Cicchetti, D. V., & Feinstein, A. R. (1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551–558. https://doi.org/10.1016/0895-4356(90)90159-M
Cizek, G. J. (1996). Standard-setting guidelines. Educational Measurement: Issues and Practice, 15(1), 13–21. https://doi.org/10.1111/j.1745-3992.1996.tb00802.x
Cizek, G. J., & Bunch, M. B. (2006). Standard setting: A guide to establishing and evaluating performance standards on tests. SAGE Publications, Inc. https://doi.org/10.4135/9781412985918
Clark, A. K., Karvonen, M., Kingston, N., Anderson, G., & Wells-Moreaux, S. (2015). Designing alternate assessment score reports that maximize instructional impact. National Council on Measurement in Education Annual Meeting, Chicago, IL. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2015_Score_Report_Utility_Paper.pdf
Clark, A. K., Karvonen, M., Swinburne Romine, R., & Kingston, N. (2018). Teacher use of score reports for instructional decision-making: Preliminary findings. National Council on Measurement in Education Annual Meeting, New York, NY. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2018_Score_Report_Use_Findings.pdf
Clark, A. K., Kingston, N., Templin, J., & Pardos, Z. (2014). Summary of results from the fall 2013 pilot administration of the Dynamic Learning Maps Alternate Assessment System (Technical Report No. 14-01). University of Kansas, Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/publication/pilot_summary_of_findings.pdf
Clark, A. K., Kobrin, J., & Hirt, A. (2022). Educator perspectives on instructionally embedded assessment (Research Synopsis No. 22-01). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/IE_Focus_Groups_project_brief.pdf
Clark, A. K., Nash, B., Karvonen, M., & Kingston, N. (2017). Condensed mastery profile method for setting standards for diagnostic assessment systems. Educational Measurement: Issues and Practice, 36(4), 5–15. https://doi.org/10.1111/emip.12162
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Routledge.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037//0033-2909.112.1.155
Confrey, J., Gianopulos, G., McGowan, W., Shah, M., & Belcher, M. (2017). Scaffolding learner-centered curricular coherence using learning maps and diagnostic assessments designed around mathematics learning trajectories. ZDM – Mathematics Education, 49(5), 717–734. https://doi.org/10.1007/s11858-017-0869-1
Confrey, J., Maloney, A. P., & Corely, A. K. (2014). Learning trajectories: A framework for connecting standards with curriculum. ZDM – Mathematics Education, 46(5), 719–733. https://doi.org/10.1007/s11858-014-0598-7
Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1177/0146621612445470
DeBarger, A. H., Seeratan, K., Cameto, R., Haertel, G., Knokey, A.-M., & Morrison, K. (2011). Alternate assessment design-mathematics [{Technical Report 9: Implementing evidence-centered design to develop assessments for students with significant cognitive disabilities: Guidelines for creating design patterns and development specifications and exemplar task templates for mathematics}]. SRI International. http://alternateassessmentdesign.sri.com/techreports/AAD_M_TechRpt9_032911final.pdf
Dennis, A., Erickson, K., & Hatch, P. (2013). The Dynamic Learning Maps core vocabulary: overview [Technical Review]. https://www.dropbox.com/s/99ay2ypx37m6lps/DLM_Core_Vocabulary_Overview.pdf
DLM Consortium. (2021). Test Administration Manual 2021–2022. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2014). Guide to external review of testlets. University of Kansas, Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/External_Review/External_Review_Guide_IM.pdf
Dynamic Learning Maps Consortium. (2021a). Accessibility Manual 2021–2022. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2021b). Guide to DLM Required Test Administrator Training. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2022a). 2021–2022 Technical Manual Update—Science. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Dynamic Learning Maps Consortium. (2022b). Technology Specifications Manual 2021–2022. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Efron, B. (2014). Two modeling strategies for empirical Bayes estimation. Statistical Science, 29(2), 285–301. https://doi.org/10.1214/13-STS455
Erickson, K. A., Hatch, P., & Clendon, S. (2010). Literacy, assistive technology, and students with significant disabilities. Focus on Exceptional Children, 42(5), 1–17. https://doi.org/10.17161/foec.v42i5.6904
Erickson, K. A., & Karvonen, M. (2014). College and career readiness instruction and assessment for pre-intentional and pre-symbolic communicators. Office of Special Education Programs Project Directors’ Meeting, Washington, D. C.
Ericsson, K. A., & Simon, H. A. (1993). Protocol analysis: Verbal reports as data (revised edition). MIT Press. https://doi.org/10.7551/mitpress/5657.001.0001
Every Student Succeeds Act, 20 U.S.C § 6301 (2015). https://www.congress.gov/114/plaws/publ95/PLAW-114publ95.pdf
Falconer, J. R., Frank, E., Polaschek, D. L. L., & Joshi, C. (2022). Methods for eliciting informative prior distributions: A critical review. Decision Analysis. https://doi.org/10.1287/deca.2022.0451
Family Educational Rights and Privacy Act, 20 U.S.C § 1232g; 34 CFR Part 99 (1974). https://www.govinfo.gov/content/pkg/USCODE-2020-title20/pdf/USCODE-2020-title20-chap31-subchapIII-part4-sec1232g.pdf
Fang, G., Liu, J., & Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84(1), 19–40. https://doi.org/10.1007/s11336-018-09658-x
Feinstein, A. R., & Cicchetti, D. V. (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543–549. https://doi.org/10.1016/0895-4356(90)90158-L
Fleiss, J. L. (1981). Statistical Methods for Rates and Proportions (2nd ed., pp. 38–46). John Wiley.
Flowers, C., Turner, C., Herrera, B., Towles-Reeves, L., Thurlow, M., Davidson, A., & Hagge, S. (2015). Developing a large-scale assessment using components of evidence-centered design: Did it work? Paper presentation. National Council on Measurement in Education.
Flowers, C., & Wakeman, S. (2016a). Alignment of Dynamic Learning Maps operational items to grade-level content standards: Integrated model for testing. ACERI Partners, LLC.
Flowers, C., & Wakeman, S. (2016b). Alignment of Dynamic Learning Maps operational items to grade-level content standards: Year-End models of testing. ACERI Partners, LLC.
Flowers, C., & Wakeman, S. (2020). Alignment of Dynamic Learning Maps operational items to grade-level content standards: Year-End model of testing. ACERI Partners, LLC.
Flowers, C., Wakeman, S., Browder, D. M., & Karvonen, M. (2009). Links for academic learning (LAL): A conceptual model for investigating alignment of alternate assessments based on alternate achievement standards. Educational Measurement: Issues and Practice, 28(1), 25–37. https://doi.org/10.1111/j.1745-3992.2009.01134.x
Gentry, J. R. (1982). An analysis of developmental spelling in "GNYS AT WRK". The Reading Teacher, 36(2), 192–200. https://www.jstor.org/stable/20198182
Goldstein, J., & Behuniak, P. (2010). Assumptions in alternate assessment: An argument-based approach to validation. Assessment for Effective Intervention, 36(3), 179–191. https://doi.org/10.1177/1534508410392208
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10(4), 255–282. https://doi.org/10.1007/BF02288892
Hambleton, R. K., Pitoniak, M. J., & Copella, J. M. (2012). Essential steps in setting performance standards on educational tests and strategies for assessing the reliability of results. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 47–76). Routledge. https://doi.org/10.4324/9780203848203
Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277. https://doi.org/10.1177/0146621604272623
Henson, R., Templin, J. L., & Willse, J. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74(2), 191–210. https://doi.org/10.1007/s11336-008-9089-5
Hess, K. K. (2012). Learning progressions in K–8 classrooms: How progress maps can influence classroom practice and perceptions and help teachers make more informed instructional decisions in support of struggling learners (Synthesis Report No. 87). University of Minnesota, National Center on Educational Outcomes. https://nceo.umn.edu/docs/OnlinePubs/Synthesis87/SynthesisReport87.pdf
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. http://www.jstor.org/stable/4615733
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating Type I error and power raters using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329–349. https://doi.org/10.1207/S15324818AME1404_2
Johnson, M. S., & Sinharay, S. (2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 55(4), 635–664. https://doi.org/10.1111/jedm.12196
Johnstone, C., Altman, J. R., & Moore, M. (2011). Universal design and the use of cognitive labs. In M. Russell & M. Kavanaugh (Eds.), Assessing students in the margin: Challenges, strategies, and techniques (pp. 425–442). Information Age Publishing.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Prager.
Karvonen, M., Bechard, S., & Wells-Moreaux, S. (2015). Accessibility considerations for students with significant cognitive disabilities who take computer-based alternate assessments. Paper presentation. American educational research association annual meeting, Chicago, IL.
Karvonen, M., Clark, A. K., & Kavitsky, L. (2022). Aligned academic achievement standards to support pursuit of postsecondary opportunities: Year-End model (Technical Report No. 22-01). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/YE_PSO_technical_report.pdf
Karvonen, M., Clark, A. K., & Kingston, N. M. (2016). Alternate assessment score report interpretation and use: Implications for instructional planning. National Council on Measurement in Education Annual Meeting, Washington, DC. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2016_DLM_Score_Report.pdf
Karvonen, M., Clark, A. K., & Nash, B. (2015). 2015 Year-End model standard setting: English language arts and mathematics (Technical Report No. 15-03). University of Kansas; Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Standard_Setting_Tech_Report_YE.pdf
Karvonen, M., Flowers, C., & Wakeman, S. (2009). Predictors of access to the general curriculum for students with significant cognitive disabilities. Paper presentation. American Educational Research Association Annual Meeting, San Diego, CA.
Karvonen, M., Flowers, C., & Wakeman, S. Y. (2013). Factors associated with access to general curriculum for students with intellectual disability. Current Issues in Education, 16(3). https://cie.asu.edu/ojs/index.php/cieatasu/article/view/1309
Karvonen, M., Swinburne Romine, R., Clark, A. K., Brussow, J., & Kingston, N. M. (2017). Promoting accurate score report interpretation and use for instructional planning. National Council on Measurement in Education Annual Meeting, San Antonio, TX. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/NCME_2017_Score_Reports.pdf
Karvonen, M., Wakeman, S. Y., Browder, D. M., Rogers, M. A. S., & Flowers, C. (2011). Academic curriculum for students with significant cognitive disabilities: Special education teacher perspectives a decade after IDEA 1997 [Research Report]. National Alternate Assessment Center. https://files.eric.ed.gov/fulltext/ED521407.pdf
Karvonen, M., Wakeman, S., Flowers, C., & Moody, S. (2013). The relationship of teachers’ instructional decisions and beliefs about alternate assessments of student achievement. Exceptionality, 21(4), 238–252. https://doi.org/10.1080/09362835.2012.747184
Kearns, J. F., Towles-Reeves, E., Kleinert, H. L., Kleinert, J. O. R., & Thomas, M. K.-K. (2011). Characteristics of and implications for students participating in alternate assessments based on alternate academic achievement standards. The Journal of Special Education, 45(1), 3–14. https://doi.org/10.1177/0022466909344223
Kingston, N., & Tiemann, G. C. (2012). Setting performance standards on complex assessments: The body of work method. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 201–224). Routledge. https://doi.org/10.4324/9780203848203
Kleinert, H. L., Browder, D. M., & Towles-Reeves, E. A. (2009). Models of cognition for students with significant cognitive disabilities: Implications for assessment. Review of Educational Research, 79(1), 301–326. https://doi.org/10.3102/0034654308326160
Kobrin, J., Clark, A. K., & Kavitsky, E. (2022). Exploring educator perspectives on potential accessibility gaps in the Dynamic Learning Maps alternate assessment (Research Synopsis No. 22-02). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Accessibility_Focus_Groups_project_brief.pdf
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310
Leighton, J., & Gierl, M. (Eds.). (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press. https://doi.org/10.1017/CBO9780511611186
Linn, R. L. (2009). The concept of validity in the context of NCLB. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 195–212). Information Age Publishing, Inc.
Lissitz, R. W. (2009). Introduction. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 1–18). Information Age Publishing, Inc.
Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1), 503–528. https://doi.org/10.1007/BF01589116
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212. https://doi.org/10.1007/BF02294535
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 13–104). American Council on Education and Macmillan.
Mislevy, R. J. (2009). Validity from the perspective of model-based reasoning. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 83–110). Information Age Publishing, Inc.
Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design (Research Report RR-03-16). Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2003.tb01908.x
Mislevy, R. J., & Gitomer, D. H. (1995). The role of probability-based inference in an intelligent tutoring system. User Modeling and User-Adapted Interaction, 5(3–4), 253–282. https://doi.org/10.1007/BF01126112
Mislevy, R. J., & Riconscente, M. M. (2005). Evidence-centered assessment design: Layers, structures, and terminology (PADI Technical Report No. 9). SRI International. https://padi.sri.com/downloads/TR9_ECD.pdf
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (1999). Evidence-centered assessment design. Educational Testing Service. https://learnlab.org/research/wiki/images/5/51/Evidence-Centered-Assessment-Design.pdf
Moinester, M., & Gottfried, R. (2014). Sample size estimation for correlations with pre-specified confidence interval. The Quantitative Methods for Psychology, 10(2), 124–130. https://doi.org/10.20982/tqmp.10.2.p124
Nabi, S., Nassif, H., Hong, J., Mamani, H., & Imbens, G. (2022). Bayesian meta-prior learning using empirical Bayes. Management Science, 68(3), 1737–1755. https://doi.org/10.1287/mnsc.2021.4136
Nash, B., Clark, A. K., & Karvonen, M. (2016). First Contact: A census report on the characteristics of students eligible to take alternate assessments. University of Kansas, Center for Educational Testing and Evaluation. https://dynamiclearningmaps.org/sites/default/files/documents/publication/First_Contact_Census_2016.pdf
National Council on Measurement in Education. (2018). Position statement on theories of action for testing programs. https://higherlogicdownload.s3.amazonaws.com/NCME/c53581e4-9882-4137-987b-4475f6cb502a/UploadedImages/Documents/NCME_Position_Paper_on_Theories_of_Action_-_Final_July__2018.pdf
National Governors Association Center for Best Practices and Council of Chief State School Officers. (2010). Common core state standards. National Governors Association Center for Best Practices; Council of Chief State School Officers.
Neal, R. (2011). MCMC using Hamiltonian dynamics. In S. Brooks, A. Gelman, G. Jones, & X.-L. Meng (Eds.), Handbook of Markov Chain Monte Carlo (Vol. 20116022). Chapman and Hall/CRC. https://doi.org/10.1201/b10905-6
Nehler, C., & Clark, A. K. (2019). Teacher rating of student mastery: Pilot (Research Synopsis No. 19-02). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Teacher_Rating_of_Student_Mastery.pdf
Nehler, C., Clark, A. K., & Karvonen, M. (2019). White paper: Considerations for measuring academic growth on Dynamic Learning Maps (DLM) alternate assessments. University of Kansas, Accessible Teaching, Learning, and Assessment Systems.
Nippold, M. A. (2007). Later language development: School-age children, adolescents, and young adults (3rd ed.). Pro Ed.
Nitsch, C. (2013). Dynamic Learning Maps: The Arc parent focus groups. The Arc. https://dynamiclearningmaps.org/sites/default/files/documents/publication/TheArcParentFocusGroups.pdf
No Child Left Behind Act, 20 U.S.C. § 6319 (2002). https://www.congress.gov/107/plaws/publ110/PLAW-107publ110.pdf
Nocedal, J., & Wright, S. J. (2006). Numerical optimization. Springer. https://doi.org/10.1007/978-0-387-40065-5
O’Leary, S., Lund, M., Ytre-Hauge, T. J., Holm, S. R., Naess, K., Dalland, L. N., & McPhail, S. M. (2014). Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies. Physiotherapy, 100, 27–35. https://doi.org/10.1016/j.physio.2013.08.002
Pearl, J. (1988). Probabilistic reasoning in intelligent systems. Morgan Kaufmann. https://doi.org/10.1016/C2009-0-27609-4
Perie, M. (2008). A guide to understanding and developing performance-level descriptors. Educational Measurement: Issues and Practice, 27(4), 15–29. https://doi.org/10.1111/j.1745-3992.2008.00135.x
Petrone, S., Rousseau, J., & Scricciolo, C. (2014). Bayes and empirical Bayes: Do they merge? Biometrika, 101(2), 285–302. https://doi.org/10.1093/biomet/ast067
Pontius, R. G., Jr., & Millones, M. (2011). Death to kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. International Journal of Remote Sensing, 32, 4407–4429. https://doi.org/10.1080/01431161.2011.552923
Popham, W. J. (2011). How to build learning progressions: Keep them simple, Simon! American Educational Research Association Annual Meeting, New Orleans, LA.
Ravand, H., & Baghaei, P. (2020). Diagnostic classification models: Recent developments, practical issues, and prospects. International Journal of Testing, 20(1), 24–56. https://doi.org/10.1080/15305058.2019.1588278
Rupp, A. A., Templin, J., & Henson, R. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
Ryndak, D. L., Taub, D., Jorgensen, C. M., Gonsier-Gerdin, J., Arndt, K., Sauer, J., Ruppar, A. L., Morningstar, M. E., & Allcock, H. (2014). Policy and the impact on placement, involvement, and progress in general education: Critical issues that require rectification. Research and Practice for Persons with Severe Disabilities, 39(1), 65–74. https://doi.org/10.1177/1540796914533942
Sarama, J., & Clements, D. H. (2009). Early childhood mathematics education research: Learning trajectories for young children. Routledge. https://doi.org/10.4324/9780203883785
Shepard, L. A. (2018). Learning progressions as tools for assessment and learning. Applied Measurement in Education, 31, 165–174. https://doi.org/10.1080/08957347.2017.1408628
Sinharay, S., & Johnson, M. S. (2019). Measures of agreement: Reliability, classification accuracy, and classification consistency. In M. von Davier & Y.-S. Lee (Eds.), Handbook of Diagnostic Classification Models (pp. 359–377). Springer International Publishing. https://doi.org/10.1007/978-3-030-05584-4_17
Sireci, S. G. (2009). Packing and unpacking sources of validity evidence. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 19–38). Information Age Publishing, Inc.
Stan Development Team. (2022). RStan: The R interface to Stan. https://mc-stan.org/
Stefan, A. M., Evans, N. J., & Wagenmakers, E.-J. (2020). Practical challenges and methodological flexibility in prior elicitation. Psychological Methods, 27(2), 177–197. https://doi.org/10.1037/met0000354
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. https://www.jstor.org/stable/1434855
Templin, J., & Bradshaw, L. (2013). Measuring the reliability of diagnostic classification model examinee estimates. Journal of Classification, 30(2), 251–275. https://doi.org/10.1007/s00357-013-9129-4
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339. https://doi.org/10.1007/s11336-013-9362-0
Templin, J., & Henson, R. (2008). Understanding the impact of skill acquisition: Relating diagnostic assessments to measurable outcomes. Paper presentation. American Educational Research Association Annual Meeting, New York, NY.
Thompson, S. J., Johnstone, C. J., & Thurlow, M. L. (2002). Universal design applied to large scale assessments. University of Minnesota, National Center on Educational Outcomes. https://nceo.umn.edu/docs/OnlinePubs/Synth44.pdf
Thompson, W. J. (2020). Reliability for the Dynamic Learning Maps assessments: A comparison of methods (Technical Report No. 20-03). University of Kansas; Accessible Teaching, Learning, and Assessment Systems. https://dynamiclearningmaps.org/sites/default/files/documents/publication/Reliability_Comparison.pdf
Thompson, W. J. (2019). Bayesian psychometrics for diagnostic assessments: A proof of concept (Research Report No. 19-01). University of Kansas, Accessible Teaching, Learning, and Assessment Systems. https://doi.org/10.35542/osf.io/jzqs8
Thompson, W. J., Clark, A. K., & Nash, B. (2019). Measuring the reliability of diagnostic mastery classifications at multiple levels of reporting. Applied Measurement in Education, 32(4), 298–309. https://doi.org/10.1080/08957347.2019.1660345
Thompson, W. J., & Nash, B. (2022). A diagnostic framework for the empirical evaluation of learning maps. Frontiers in Education, 6, 714736. https://doi.org/10.3389/feduc.2021.714736
Thompson, W. J., & Nash, B. (2019). Beyond learning progressions: Maps as assessment architecture: Illustrations and results. Symposium. National Council on Measurement in Education Annual Meeting, Toronto, Canada. https://dynamiclearningmaps.org/sites/default/files/documents/presentations/Thompson_Nash_Empirical_evaluation_of_learning_maps.pdf
Timberlake, M. T. (2014). Weighing costs and benefits: Teacher interpretation and implementation of access to the general education curriculum. Research and Practice for Persons with Severe Disabilities, 39(2), 83–99. https://doi.org/10.1177/1540796914544547
Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved \(\widehat{R}\) for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., & Gabry, J. (2022). Pareto smoothed importance sampling. arXiv. https://doi.org/https://doi.org/10.48550/arXiv.1507.02646
Wang, W., Song, L., Chen, P., Meng, Y., & Ding, S. (2015). Attribute-level and pattern-level classification consistency and accuracy indices for cognitive diagnostic assessment. Journal of Educational Measurement, 52(4), 457–476. https://doi.org/10.1111/jedm.12096
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research, 11, 3571–3594. http://www.jmlr.org/papers/v11/watanabe10a.html
Xu, G. (2019). Identifiability and cognitive diagnosis models. In M. von Davier & Y.-S. Lee (Eds.), Handbook of Diagnostic Classification Models (pp. 333–357). Springer International Publishing. https://doi.org/10.1007/978-3-030-05584-4_16
Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81(3), 625–649. https://doi.org/10.1007/s11336-015-9471-z
Yorkston, K., Dowden, P., Honsinger, M., Marriner, N., & Smith, K. (1988). A comparison of standard and user vocabulary lists. Augmentative and Alternative Communication, 4(4), 189–210. https://doi.org/10.1080/07434618812331274807
Zumbo, B. D., & Thomas, D. R. (1997). A measure of effect size for a model-based approach for studying DIF [{Working Paper}]. University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioral Science.