Science assessment: the next generation

Published on Smithsonian STEMvisions

As states adopt the Next Generation Science Standards (NGSS), everyone is wondering about what the assessments will look like. This is not because everyone is suddenly fascinated with the finer points of educational measurement, but because assessment is often known as the “tail that wags the dog” of education; it has a disproportionately large impact on curriculum, instruction, and outcomes. Effective assessment will be one of the most important aspects of successful implementation for states adopting the NGSS. However, the NGSS are designed to be different from previous standards, and creating well-aligned assessments will require much new thinking, research, and development.

The NGSS fuse disciplinary core ideas (facts and concepts within a discipline), practices (skills like argumentation and using models), and crosscutting concepts (ideas that apply across all scientific disciplines) into intertwined performance expectations (PEs). These PEs can be described as statements of “blended knowledge”. Past assessments have typically only targeted subject-specific facts and concepts. How can we develop assessment tasks that accurately and usefully measure blended knowledge?

NGSS rope with three strands representing practices, crosscutting concepts, and core ideas

NGSS performance expectations blend practices, crosscutting concepts, and core ideas. Image by the Smithsonian Science Education Center (SSEC).

Professor Nancy Songer of the University of Michigan’s school of education has been on the cutting edge of research and development work in assessing blended knowledge. She serves on the National Academy of Sciences Board on Testing and Assessment, has served on advisory panels for the College Board’s AP Biology redesign, and has been conducting her own research on assessing blended learning. I recently talked with Professor Songer about some of the questions facing those preparing to implement the NGSS and develop well-aligned curriculum materials and assessments. A condensed, paraphrased summary of our conversation is below.

Considering how new the standards are, where can we look for examples of tasks that require blended knowledge?

Professor Songer recently attended and presented at the Invitational Research Symposium on Science Assessment held last month in Washington, DC. Her presentation included some example assessment tasks, and the slides from all of the talks are now available online. Additionally, the National Assessment of Educational Progress (NAEP), Program for International Student Assessment (PISA), and Advanced Placement (AP) Biology tests all recently moved toward assessing blended knowledge while the NGSS were being drafted and finalized, and these can serve as relevant prototypes.

How do we align items to performance expectations?

Professor Songer worries that there are misconceptions about this among some NGSS stakeholders.

The most important thing for people to know is that there is not a one-to-one relationship or alignment between a PE and an item. Most PEs are too complex and have too many parts to be assessed by one item, or even one cluster of items. The PEs must be broken down into manageable pieces. Incidentally, knowing this can also be important for alleviating some of the anxiety people feel about the complexity of the PEs when they are first exploring the NGSS.

In an article published earlier this year in Science, Professor James Pellegrino explained that assessment developers need to decide what evidence is sufficient to conclude that a student has mastered a part of a PE.

In essence, the performance expectations found in the NGSS are claims about student proficiency. Claims about the student must be linked to forms of evidence that would support those claims.

Once assessment developers decide on the evidence necessary to make the claims, items should be written to match these explicit evidence statements.

What if a student cannot successfully answer an item that requires blended knowledge? How can we determine which aspect(s) of that knowledge the student did not have?

We’ve had success with clusters of items centered around an idea, rather than isolated items. For example, you might need to ask a few basic multiple-choice items about certain crucial facts or graph interpretation skills in addition to a larger task that requires developing a full scientific explanation based on data.

Can skills like basic graph reading be assessed outside of a particular science context?

No, we’ve found that knowledge is context dependent. Just because a student can interpret a graph in a biology context does not always mean that will transfer to a physics context. You learn more about the desired outcome by centering all items in the cluster around the same context.

How should these item clusters be scored?

It’s important that the items assessing blended knowledge outnumber and count more than the items that assess smaller pieces of knowledge. We’ve found that the balance should be around 80% blended items and 20% non-blended items. Partial credit should be used for partially correct answers on items assessing blended knowledge.

Will these assessments take more time (to complete and to score)?

The knowledge being assessed is more complicated, but that doesn’t necessarily mean that more class time must be spent on assessment. We will have to be more selective about what we assess. Instead of asking about 30 small pieces of knowledge with a multiple choice test, we might have 8-10 questions, mostly targeting blended knowledge. Part of the selection process involves determining “gatekeeper” ideas that are absolutely crucial for students to move forward. Scoring, however, will take more time, but the investment is worth it to get rich measures of blended knowledge.

What’s next?

Perhaps the most immediate challenge for NGSS-aligned assessment development is a lack of students who have had the opportunity to use curriculum materials that truly build the kind of blended knowledge we want to assess. In his Science article, Professor Pellegrino noted:

Considerable research and development will be needed to create and evaluate assessment tasks and situations that can provide adequate evidence of the proficiencies implied in the NGSS. This research must be carried out in instructional settings where students have had an adequate opportunity to construct the integrated knowledge envisioned by the NGSS.

More development work must be done to create prototypes for coordinated systems of curriculum materials and assessments before research can move forward. With more materials and students to work with, much more could be learned about how students learn this integrated knowledge and how to measure it.

Gotwals, A. W., & Songer, N. B. (2013). Validity Evidence for Learning Progression-Based Assessment Items That Fuse Core Disciplinary Ideas and Science Practices. Journal of Research in Science Teaching, 50(5), n/a–n/a. doi:10.1002/tea.21083

Pellegrino, J. W. (2013). Proficiency in science: assessment challenges and opportunities. Science, 340(6130), 320–3. doi:10.1126/science.1232065

Pellegrino, J. W. (2012). Assessment of science learning: Living in interesting times. Journal of Research in Science Teaching, 49(6), 831–841. doi:10.1002/tea.21032

Songer, N. B., & Gotwals, A. W. (2012). Guiding explanation construction by children at the entry points of learning progressions. Journal of Research in Science Teaching, 49(2), 141–165. doi:10.1002/tea.20454

Slides from the Invitational Research Symposium on Science Assessment (September 2013), Washington, DC.