Is There Any Longitudinal Effect of the Washington Assessment
of Student Learning (WASL) on Student Achievement?
Donald C. Orlich
Science Mathematics Engineering
Education Center
Washington State University
Pullman, Washington 99164-4237
September 6, 2002
An accountability
conundrum has emerged due to the passage of the "No Child Left Behind Act of 2001" (PL 107-110) in January 2002. States are now forced by federal law to
show student adequate yearly progress targets, which will be met through
high-stakes testing (see Linn, Baker, and Betebenner,
2002).
Washington State's Model
The State of
Washington established the Washington Assessment of Student Learning (WASL) as its accountability tool. The WASL is
primarily keyed to the state's standards called "Essential Academic
Learning Requirements". The WASL is used to test all 4th, 7th,
and 10th graders in mathematics, reading, and writing. The 5th, 8th and
10th graders will be assessed in science. Listening is also being assessed. Using the data collected from the 1998
through 2001 WASL administrations; I calculated
effect sizes to observe trends.
Purpose of study. The purpose of this study is to
determine the effect on student achievement as a consequence of the
longitudinal administration of the Washington Assessment of Student Learning (WASL). The WASL scale score means and standard deviations were
available for the years 1998, 1999, 2000 and 2001 for mathematics and reading and
are show in Table 1.
|
Table 1. Means and
Standard deviations for 4th, 7th, and 10th
Grade Mathematics and |
||||
|
|
||||
|
Grade Level |
Spring |
Spring 1998 |
Spring |
Spring 1999 |
|
|
||||
|
4 - Mathematics |
383.5 |
32.2 |
386.5 |
33.9 |
|
4 - Reading |
402.1 |
19.3 |
404.2 |
19.5 |
|
|
||||
|
7 - Math |
357.4 |
46.4 |
364.7 |
52.0 |
|
7 - Reading |
390.1 |
20.1 |
393.1 |
20.2 |
|
|
||||
|
10 - Math |
N/R |
N/R |
382.2 |
42.8 |
|
10 - Reading |
N/R |
N/R |
402.8 |
29.5 |
|
|
||||
|
|
||||
|
Grade Level |
Spring |
Spring 2000 |
Spring |
Spring 2001 |
|
|
||||
|
4 - Mathematics |
391.2 |
34.9 |
393.3 |
34.9 |
|
4 - Reading |
407.3 |
19.6 |
405.7 |
18.6 |
|
|
||||
|
7 - Math |
369.1 |
53.6 |
368.7 |
51.6 |
|
7 - Reading |
393.8 |
20.9 |
394.5 |
20.6 |
|
|
||||
|
10 - Math |
387.6 |
40.0 |
390.8 |
41.1 |
|
10 - Reading |
407.3 |
30.2 |
410.0 |
30.5 |
|
|
||||
|
All means and standard deviations are from
files of Office of State Superintendent of Public Instruction, Olympia,
Washington. |
||||
An initial inspection of the scale score means shows a rather
small incremental increase in most means.
However, there is a scale point decline of 0.4 in the mean of Grade 7,
2001 math scores compared to 2000.
A similar decline is noted in 2001 for Grade 4 reading, where the mean
scale points dropped by 1.6 compared to 2000.
These patterns have been praised by state policy makers as showing
evidence of student progress.
However, are the scores truly reflective of student achievement? To answer that questions,
I used a statistical test called "effect size" (Cohen, 1988).
Effect size. The effect size is a tool by which to judge the relative
learning worth from independent samples.
In this case, what evidence is there that administering and teaching to
the WASL has a positive impact on student
achievement? The gauge to determine
that impact is called effect size.
(See Bloom 1984, Glass 1980, Marzano
et al. 2001, and Walberg 1999.)
The concept of effect size is based on a normal distribution of
test scores. The so-called
"Bell Curve" is a distribution of randomly occurring events. However, the curve is subdivided into
areas under the curve called "standard deviations." The measure of effect size is based on
how much of a standard deviation scores change. For example, if a sample set of test
scores shows a move of one full standard deviation on the curve as a
consequence of some specific intervention, then the effect size would be 1.0.
Computing effect size
To compute an effect size, you need a control group (or a
pre-test), an experimental group (or a post-test), test scores yielding
averages (means), and standard deviations.
(The latter is a measure of variability within a group mean, which shows
the spread of a distribution of scores.)
With independent samples, such as the WASL,
one can determine the effect sizes by comparing the means of two different
years.
Jacob Cohen (1988) defined an effect size as the difference
between two means divided by the standard deviation of either group. The effect size is then expressed as a
decimal or mixed number as a percent of a normal curve standard deviation. Cohen then suggested that the relative
efficacy of an effect could be stated in nominal terms. If an effect size (ES) were at least
0.2, it was labeled as small.
An ES of at least 0.5 was labeled as medium; while and ES of 0.8
or greater was large. Effect
sizes less than 0.2 are not important.
Thus, an effect size of 0.2 is required to show efficacy of
learning. Table 2 shows the effect
size calculations and nominal descriptors for this study.
An example follows showing how I computed effect sizes. In 1998, the Grade 4 mathematics score
mean was 383.5, while the 1999 group mean was 386.5. The standard deviation for 1998 was 32.2
points. The difference between the
means is 3.0, and is divided by 32.2 yielding a 0.09 effect size. An effect of 0.09 is defined as having no
effect. Using an effect
size calculation is a professional and objective tool that provides the learning
effect that might be expected if the WASL were a
useful tool to increase student learning.
Discussion of Data Sets
Table 2 shows the effect sizes for the 4th, 7th,
and 10th grade mathematics and reading scores from 1998 to
2001. Examining Table 2, you may
note that at the 4th grade level, five scores show no effect
in achievement, while there is one negative learning effect on Grade 4
reading in 2001, that is, a decline in achievement.
|
Table 2. Effect Size Calculations for 4th, 7th,
and 10th Grade Mathematics and Reading |
||||||
|
|
||||||
|
Grade Level |
1999/1998 |
Effect |
2000/1999 |
Effect |
2001/2000 |
Effect |
|
|
||||||
|
4 - Math |
0.09 |
None |
0.14 |
None |
0.06 |
None |
|
4 - Reading |
0.11 |
None |
0.16 |
None |
-0.08 |
Negative |
|
|
||||||
|
7 - Math |
0.16 |
None |
0.08 |
None |
-0.01 |
Negative |
|
7 - Reading |
0.15 |
None |
0.03 |
None |
0.05 |
None |
|
|
||||||
|
10 - Math |
N/R |
N/R |
0.13 |
None |
0.08 |
None |
|
10 - Reading |
N/R |
N/R |
0.15 |
None |
0.09 |
None |
|
|
||||||
|
The effect is described in nominal terms as
per Jacob Cohen's (1988) definitions. |
||||||
The Grade 7 pattern is similar showing no effect on five of
the six scores and one negative effect in mathematics for 2001. The Grade 10 results show no effect
on mathematics and reading scores in all cases. (Appendix A shows all calculations used in
this study.)
Using Cohen's (1988) definitions the 16 scores would show no
effect and not meet the federally mandated target. Setting the criterion measure of an
adequate yearly progress target may become an exercise of definitions and be
truly subjective, if not capricious.
Conclusion. Using an effect
size measurement and Cohen's (1988) nominal definitions, there is no effect,
that is, no positive impact on student achievement as a consequence of the
longitudinal administration of the Washington Assessment of Student Learning (WASL).
The results of this study parallel the findings of Audrey L. Amrein and David C. Berliner (2002) who analyzed the
consequences of 18 states with high-stakes tests. They reported that in 17 of the 18 states,
student learning remained at the same level as it was before the policy of
high-stakes tests was instituted.
Policy implications. Washington State policy makers must
re-examine the intent of the WASL and the empirical
data sets that analyze it to determine its educational worthiness and continued
fiscal expense. (See
Orlich 2000, Abbott and Joireman 2001, Basarab 2001, Fouts 2002, and Keim 2002.) The Oregon Board of Education voted to
kill student high-stakes tests in science, mathematics and writing in grades 3,
5 and 8 due to budget cuts (Oregonian, August 9, 2002). Considering Washington's one
billion-dollar budget shortfall and a WASL cost of
$61,673,910 that action must be considered. Further, state policy makers must inform
federal educational officials of the inherent problems regarding the use of
adequate yearly progress targets which are statistically illogical.
|
The author of this
study, Donald C. Orlich, is Professor Emeritus, Science Mathematics
Engineering Education Center at Washington State University. His telephone number is (509) 335-4844
and email address is dorlich@wsu.edu. This study reflects the author's work
and is not endorsed by Washington State University, which encourages scholarship
and academic freedom. |
References
Abbott, M.L. & Joireman,
J. (2001, July). The Relationships among Achievement, Low
Income, and Ethnicity across Six Groups of Washington State Students.
Lynwood, WA: Washington School Research Center, Technical Report#1.
Amrein, A. L. &
Berliner, D. C. (2002, March 28). "High-stakes Testing,
Uncertainty, and Student Learning." Educational Policy Analysis
Archives, 10, (18), 1-56. Retrieved April 1, 2002 from http://epasa.asu.edu/epaa/v10n18/
Basarab, S. (2001, February). An Overview of Student Assessment in
Washington State. Unpublished Report.
Citizens United for Responsible Education (CURE). Burien, WA. Web site of CURE
is at: http://www.eskimo.com/~cure/
Bloom, B. S. (1984). "The 2 Sigma Problem: The Search for Methods of Group
Instruction as Effective as One-to-One Tutoring." Educational
Researcher, 13, (6), 4-16.
Cohen, J. (1988). Statistical Power Analysis
for the Behavioral Sciences. 2nd
edition. Hillsdale, NJ: Lawrence Erlbaum Associates.
Fouts, J. T. (2002, April). The
Power of Early Success: A Longitudinal Study of Student Performance ion the
Washington Assessment of Student learning, 1998-2001. Lynwood, WA:
Washington School Research Center, Research Report #1
Glass, G. V. (1980). "Summarizing Effect
Sizes." In New Directions for Methodology of Social and
Behavioral Science: Quantitative Assessment of Research Domain. R.
Rosenthal, Ed. San Francisco: Jossey-Bass.
Keim, W. G. (2002). School
Accountability and Fairness: A Policy Study of the 2000 Washington State School
Accountability Criteria. Pullman, WA: Washington State University, Doctoral
Dissertation.
Linn, R. L., Baker, E. V., and Betebenner,
D. W. "Accountability Systems: Implications of Requirements of the No
Child Left Behind Act of 2001." Educational Researcher, 31, (6),
3-16.
Marzano, R. J., Pickering, D. J., & Pollock, J. E. (2001). Classroom Instruction that Works: Research-Based Strategies
for Increasing Student Achievement. Alexandria, VA: Association for
Supervision of Curriculum Development.
"No Child Left Behind Act of 2001," Public Law No.
107-110, 115 Stat. 1425 (2002)
Orlich, D. C. (2000). "A Critical Analysis of the Grade Four Washington Assessment
of Student Learning." Curriculum In
Context, 27, (2), 10-14. (On March 16, 2001 this paper was selected for
the "Outstanding Affiliate Article Award" by the 160,000 member
Association for Supervision and Curriculum Development at its Annual Conference
in Boston.)
Walberg, H. J. (1999). "Productive
Teaching." In New Directions for Teaching
Practice and Research. H. C. Waxman and H. J. Walberg, Eds. Berkley:
CA: McCutchan Publishing Corporation.
|
Grade 4 Mathematics |
||
|
Mean (m) 1999 - m 1998
|
386.5 - 383.5 32.2 |
= 0.09 |
|
m 2000 - m 1999 sd 1999 |
391.2 - 386.5 33.9 |
= 0.14 |
|
m 2001 - m 2000 sd 2000 |
393.3 - 391.2 34.9 |
= 0.06 |
|
|
||
|
Grade 4 Reading |
||
|
m 1999 - m 1998 sd 1998 |
404.2 - 402.1 19.3 |
= 0.11 |
|
m 2000 - m 1999 sd 1999 |
407.3 - 404.2 19.5 |
= 0.16 |
|
m 2001 - m 2000 sd 2000 |
405.7 - 407.3 19.6 |
= -0.08 |
|
|
||
|
Grade 7 Mathematics |
||
|
m 1999 - m 1998 sd 1998 |
364.7 - 357.4 46.4 |
= 0.16 |
|
m 2000 - m 1999 sd 1999 |
369.1 - 364.7 52.0 |
= 0.08 |
|
m 2001 - m 2000 sd 2000 |
368.7 - 369.1 53.6 |
= -0.01 |
|
|
||
|
Grade 7 Reading |
||
|
m 1999 - m 1998 sd 1998 |
393.1 - 390.1 20.1 |
= 0.15 |
|
m 2000 - m 1999 sd 1999 |
393.8 - 393.1 20.2 |
= 0.03 |
|
m 2001 - m 2000 sd 2000 |
394.5 - 393.8 20.9 |
= 0.03 |
|
|
||
|
Grade 10 Mathematics |
||
|
m 2000 - m 1999 sd 1999 |
387.6 - 382.2 42.8 |
= 0.13 |
|
m 2001 - m 2000 sd 2000 |
390.8 - 387.6 40.0 |
= 0.08 |
|
|
||
|
Grade 10 Reading |
||
|
m 2000 - m 1999 sd 1999 |
407.3 - 402.8 29.5 |
= 0.15 |
|
m 2001 - m 2000 sd 2000 |
410.0 - 407.3 30.2 |
= 0.09 |
|
|
||
╪