The following is an example of a review of literature on high-stakes testing that the text author, Geoff Mills, wrote.
As a result of the reauthorization of the Elementary and Secondary Education Act (ESEA), also known as the "No Child Left Behind" legislation in the United States, there is renewed considerable interest among teachers, administrators, and parents about the impact of high-stakes testing on student achievement. Many stakeholders in K-12 education are challenging taken-for-granted assumptions about the impact of these tests especially as it relates to teaching and learning. Imig (2001) has described the legislation as "profound," "sweeping," "intrusive, "far reaching," and unprecedented." As part of this legislation, states will be required to test students in reading and math every year in grades 3 through 8, with sanctions for schools that fail to make adequate yearly progress.
There are few of us, regardless of our professional preparation, who are not impacted by this far-reaching legislation in the United States. However, it should be noted that the US is not the only country dealing with high-stakes testing. For example, Australia, England, France, and many East Asian countries have used high-stakes tests to determine if a student is eligible to enroll at a university, and to which university and field of study he or she will be admitted (Bishop, 2001). In short, we probably are professionally involved in the high-stakes testing scenario, or we know somebody (teacher, administrator, or student) who is.
A review of the literature related to high-stakes testing that has been published in the past few years, and consequently corresponds to the passage of the "No Child Left Behind" legislation by the US federal government, revealed that most studies focused on the following themes related to the impact of high-stakes testing: teacher quality, student dropout and retention, test validity, curriculum, and teaching and learning. The issue of high-stakes testing has reached such prominence that the American Educational Research Association (AERA), the largest international professional organization of educational researchers in the world, has taken a position on the issue.
According to AERA, certain uses of achievement tests are termed "high stakes" if they carry serious consequences for students and for educators. As noted by this position paper, "these various high-stakes testing applications are enacted by policy makers with the intention of improving education" (http://www.aera.net/policyandprograms/?id=378). The AERA position paper presents a dozen conditions that the organization considers essential to the "sound" implementation of high-stakes testing programs:
Protection Against High-Stakes Decisions Based on a Single Test
Adequate Resources and Opportunity to Learn
Validation for Each Separate Intended Use
Full Disclosure of Likely Negative Consequences of High-Stakes Testing Programs
Alignment Between the Test and the Curriculum
Validity of Passing Scores and Achievement Levels
Opportunities for Meaningful Remediation for Examinees Who Fail High-Stakes Tests
Appropriate Attention to Language Differences Among Examinees
Appropriate Attention to Students with Disabilities
Careful Adherence to Explicit Rules for Determining Which Students Are To Be Tested
Sufficient Reliability for Each Intended Use
Ongoing Evaluation of Intended and Unintended Effects of High-Stakes Testing (for a complete discussion of each of these policy issues visit the AERA Web site at http://www.aera.net/policyandprograms/?id=378.
Given the relatively new discussion about the effects of high-stakes testing on student achievement, there are still many areas of the AERA's policy statement to be investigated. However, this review of related literature uncovered studies that addressed issues beyond the policy statement that will impact the way all of us view high-stakes testing.
One perception about high-stakes testing commonly held by teachers is the belief that the mandate for high-stakes testing negatively impacts the quality of teachers. For example, Hilliard (2000) has argued that the net impact of "standardization" in testing (and teaching) is that teachers have become "robots that lead drill-and-kill sessions" (p. 302) in order to prepare students for success on standardized tests.
For many, the joy is gone, the hope is gone, and their pride is crushed. The climate in many schools that I see is more like a factory than anything else. (Hilliard, 2000, p. 302)
However, Bishop (2001) characterizes the impact of high-stakes testing on teacher quality in a far more positive light:
Fears that curriculum-based exams have caused the quality of instruction (and teaching) to deteriorate appear to be unfounded. Students in nations with rigorous exam systems were less likely to report that memorization is the best way to learn and more likely to report that they conducted experiments in science class. (Bishop, 2001, p. 4)
Bishop's portrayal of "teaching to the high-stakes test" is far more positive. In fact, Bishop goes on to argue that teachers in schools that have adopted high-stakes testing were more likely to adopt "best practice" teaching strategies when compared to teachers in non-high-stakes testing environments.
As we begin to look at the literature on high-stakes testing and its effect on student dropout rates, the complexity of the issue becomes apparent in the conflicting nature of the literature. As Jacob (2001) asserts, it is extremely difficulty to draw any kind of causal relationship from the data due to the inability to control for intervening variables such as student intelligence, socioeconomic status of the student's home and school, and motivation to succeed in school. Similarly, Jacob (2001) points out that many of the studies that have been conducted have focused on the impact of Minimum Competency Tests (MCT); that is, tests that are curriculum-based and for which there is little long-term impact on the student. In citing Lillard and DeCicca (2001), Jacob asserts that the "MCT has no statistically significant effect on dropout decisions" (p. 4).
In contrast to Jacob's findings, Bishop (2001) reports that;
Our analysis showed that states that reward schools for success and sanction schools that are failing had significantly higher achievement levels than states without these incentives. We also found that they had lower dropout rates. (p. 10)
Similarly, Amrein and Berliner (2003) have argued that the climbing dropout rate in the United States can at least be partly blamed on high-stakes testing. Their synthesis of the literature suggests that dropout rates were 4 to 6 percent higher in schools with high school graduation exams and that students in the bottom quintile in states with high-stakes tests were 25 percent more likely to drop out when compared to their peers in states without high-stakes tests. Amrein and Berliner go on to conclude that "88 percent of the states with high school graduation tests have higher dropout rates than do states without graduation tests" (p. 2).
Another key variable in the debate on high-stakes testing relates to the validity of the high-stakes test; that is, the test's ability to test what it purports to test. Hilliard (2000) makes a passionate plea to policy makers to consider the unprofessional use of high-stakes tests:
However, following the conservative agenda and practices, making final admissions and placement decisions solely on the basis of these tests (high-stakes decisions) is unscientific, unprofessional, and bad public policy.... There is a fundamental disconnect between high-stakes standardized testing and the movement toward excellence in education.... Simply put, it is rare to find high-stakes standardized tests that have meaningful validity. (p. 297–298)
Classroom teachers confronted with the results of their students' performance on standardized tests are quick to claim (especially if their children performed poorly) that the tests clearly are not aligned with the school's/classroom's curriculum and are, therefore, invalid measures of their students' real abilities.
Linn (2000) places the issue of test validity at the center of the debate as perhaps the most important question facing the use of high-stakes tests. In reviewing five decades of assessment-based reform in the United States, Linn reports little evidence to support claims that standardized tests are a valid measure of student performance when scores are compared across states.
Finally, the American Educational Research Association policy statement on high-stakes testing in preK-12 education raises concerns about the use of a single measure as a valid indicator of student performance:
Decisions that affect individual students' life chances or educational opportunities should not be made on the basis of test scores alone. Other relevant information should be taken into account to enhance the overall validity of such decisions. As a minimum assurance of fairness, when tests are used as part of making high-stakes decisions for individual students such as promotion to the next grade or high school graduation, students must be afforded multiple opportunities to pass the test (http://www.aera.net/policyandprograms/?id=378).
This position on the use of multiple measures to improve the validity of high-stakes testing is also echoed by Linn (2000):
Don't put all of the weight on a single test. Instead, seek multiple indicators. The choice of construct matters and the use of multiple indicators increases the validity of inferences based upon observed gains in achievement. (p. 15)
The message here seems fairly clear: Policymakers should seek the use of multiple measures in order to "triangulate" their assessment measures of individual student performance rather than basing high-stakes decisions on a single test.
Teaching and Learning
Perhaps the fundamental issue confronting policymakers who wish to promote the use of high-stakes testing is the real impact such tests will have on how teachers teach and students learn. Hilliard (2000) believes that few teachers actually use high-stakes tests as a tool to improve student achievement. In fact, Hilliard argues that there may be no link between high standards and high-stakes testing at all and that one of the unanticipated outcomes of the high-stakes testing reform movement is the promotion of teaching strategies contrary to our beliefs about effective teaching:
Many (teachers) are being turned into robots that lead drill-and-kill sessions to prepare for low-level high-stakes achievement tests. For many, the joy is gone, the hope is gone, and their pride is crushed. The climate in many schools that I see is more like a factory than anything else. (p. 302)
Similarly, Linn (2000) concludes that there is little evidence to support a causal relationship between the use of high-stakes tests and improved education:
Instead, I am led to conclude that in most cases the instruments and technology have not been up to the demands that have been placed on them by high-stakes accountability. Assessment systems that are useful monitors lose much of their dependability and credibility for that purpose when high stakes are attached to them. The unintended negative effects of the high-stakes accountability uses often outweigh the intended positive effects. (p. 14)
Amrein and Berliner (2003) make a similar assertion when they claim that one of the unintended outcomes of high-stakes testing has been that teachers take greater control of the learning environment and in so doing, limit the students' opportunities to direct their own learning.
However, Bishop (2001) paints a far different picture about the impact of high-stakes testing on the quality of teaching and student learning:
Fears that curriculum-based exams have caused the quality of instruction to deteriorate appear to be unfounded. Students in nations with rigorous exam systems were less likely to report that memorization is the best way to learn and more likely to report that they conducted experiments in the science class. Apparently, teachers subject to the subtle pressure of an external exam four years into the future adopted strategies that are conventionally viewed as best practices, not strategies designed to maximize scores on multiple-choice tests. (p. 3)
Finally, Rosenshine (2003) challenges Amrein and Berliner's assertion that teachers in high-stake testing states "teach to the test." Rosenshine offers an alternative interpretation of the same evidence suggesting that one desirable outcome of the accountability movement is a "strong academic focus" in classrooms and schools.
If there is one thing that is clear form this review of the literature on high-stakes testing, it is that there are no agreed-upon outcomes of the practice. For example, Amrein and Berliner (2003) and Rosenshine (2003) reviewed similar studies and arrived at different conclusions about the impact of high-stakes testing. Perhaps one explanation for the various interpretations links to the inability of researchers to employ a methodology that clearly controls all of the intervening variables that affect student achievement. However, what is clear from the literature is that there are serious concerns about the validity of high-stakes tests and the perhaps undesirable, unanticipated impact such tests have on the culture of schools. Clearly, any discussion of high-stakes testing must take into consideration the sociocultural context of the school environment and the added possible impact of SES on student achievement as measured by high-stakes tests of questionable validity.
American Educational Research Association. (2000). AERA position statement concerning high-stakes testing in Pre K-12 education. http://www.aera.net/policyandprograms/?id=378
Amrein, A. L., & Berliner, D. C. (2002). The impact of high-stakes tests on student academic performance: An analysis of NAEP results in states with high-stakes tests; and ACT, SAT, and AP test results in states with high school graduation exams. Education Policy Studies Laboratory, Arizona State University. http://www.edpolicylab.org EPSL-0211-126-EPRU, 1–60.
Amrein, A. L., & Berliner, D. C. (2003). The effects of high-stakes testing on student motivation and learning. Educational Leadership, 60(5), 32–38.
Bishop, J. H. (2001). A steeper, better road to graduation. Hoover Institution. http://www.educationnext.org/20014/56.html
Hilliard, A. G. (2000). Excellence in education versus high-stakes standardized testing. Journal of Teacher Education, 51(4), 293–304.
Jacob, B. A. (2001). Getting tough? The impact of high school graduation exams. Educational Evaluation and Policy Analysis, 23(2), 99–121.
Linn, R. L. (2000, March). Assessments and accountability. Educational Researcher (March 2000), 4–15.
Rosenshine, B. (2003). High-stakes testing: Another analysis. Educational Policy Analysis Archives, 11(24), 1-8.
Zwick, R. (2002). Is the SAT a 'wealth test'? Phi Delta Kappan, 84(4), 307–311.
|Amrein & Berliner 2002||x||x||x||x|
|Amrein & Berliner 2003||x||x|
|AERA Policy Statement 2000||x||x||x|