Is There Any Longitudinal Effect of the Washington Assessment of Student Learning (WASL) on Student Achievement?

Donald C. Orlich

Science Mathematics Engineering Education Center

Washington State University

Pullman, Washington 99164-4237

September 6, 2002

An accountability conundrum has emerged due to the passage of the "No Child Left Behind Act of 2001" (PL 107-110) in January 2002.  States are now forced by federal law to show student adequate yearly progress targets, which will be met through high-stakes testing (see Linn, Baker, and Betebenner, 2002).

Washington State's Model

The State of Washington established the Washington Assessment of Student Learning (WASL) as its accountability tool.  The WASL is primarily keyed to the state's standards called "Essential Academic Learning Requirements".  The WASL is used to test all 4th, 7th, and 10th graders in mathematics, reading, and writing.  The 5th, 8th and 10th graders will be assessed in science.  Listening is also being assessed.  Using the data collected from the 1998 through 2001 WASL administrations; I calculated effect sizes to observe trends.

Purpose of study.  The purpose of this study is to determine the effect on student achievement as a consequence of the longitudinal administration of the Washington Assessment of Student Learning (WASL).  The WASL scale score means and standard deviations were available for the years 1998, 1999, 2000 and 2001 for mathematics and reading and are show in Table 1.

 

Table 1.  Means and Standard deviations for 4th, 7th, and 10th Grade Mathematics and
Reading Scores on the Washington Assessment of Student Learning—1998-2001

 

Grade Level
and Subject

Spring
1998
Means

Spring 1998
Standard
Deviations

Spring
1999
Means

Spring 1999
Standard
Deviations

 

4 - Mathematics

383.5

32.2

386.5

33.9

4 - Reading

402.1

19.3

404.2

19.5

 

7 - Math

357.4

46.4

364.7

52.0

7 - Reading

390.1

20.1

393.1

20.2

 

10 - Math

N/R

N/R

382.2

42.8

10 - Reading

N/R

N/R

402.8

29.5

 

 

Grade Level
and Subject

Spring
2000
Means

Spring 2000
Standard
Deviations

Spring
2001
Means

Spring 2001
Standard
Deviations

 

4 - Mathematics

391.2

34.9

393.3

34.9

4 - Reading

407.3

19.6

405.7

18.6

 

7 - Math

369.1

53.6

368.7

51.6

7 - Reading

393.8

20.9

394.5

20.6

 

10 - Math

387.6

40.0

390.8

41.1

10 - Reading

407.3

30.2

410.0

30.5

 

All means and standard deviations are from files of Office of State Superintendent of Public Instruction, Olympia, Washington.

 

An initial inspection of the scale score means shows a rather small incremental increase in most means.  However, there is a scale point decline of 0.4 in the mean of Grade 7, 2001 math scores compared to 2000.  A similar decline is noted in 2001 for Grade 4 reading, where the mean scale points dropped by 1.6 compared to 2000.

These patterns have been praised by state policy makers as showing evidence of student progress.  However, are the scores truly reflective of student achievement?  To answer that questions, I used a statistical test called "effect size" (Cohen, 1988).

Effect size.  The effect size is a tool by which to judge the relative learning worth from independent samples.  In this case, what evidence is there that administering and teaching to the WASL has a positive impact on student achievement?  The gauge to determine that impact is called effect size.  (See Bloom 1984, Glass 1980, Marzano et al. 2001, and Walberg 1999.)

The concept of effect size is based on a normal distribution of test scores.  The so-called "Bell Curve" is a distribution of randomly occurring events.  However, the curve is subdivided into areas under the curve called "standard deviations."  The measure of effect size is based on how much of a standard deviation scores change.  For example, if a sample set of test scores shows a move of one full standard deviation on the curve as a consequence of some specific intervention, then the effect size would be 1.0.

Computing effect size

To compute an effect size, you need a control group (or a pre-test), an experimental group (or a post-test), test scores yielding averages (means), and standard deviations.  (The latter is a measure of variability within a group mean, which shows the spread of a distribution of scores.)  With independent samples, such as the WASL, one can determine the effect sizes by comparing the means of two different years.

Jacob Cohen (1988) defined an effect size as the difference between two means divided by the standard deviation of either group.  The effect size is then expressed as a decimal or mixed number as a percent of a normal curve standard deviation.  Cohen then suggested that the relative efficacy of an effect could be stated in nominal terms.  If an effect size (ES) were at least 0.2, it was labeled as small.  An ES of at least 0.5 was labeled as medium; while and ES of 0.8 or greater was large.  Effect sizes less than 0.2 are not important.  Thus, an effect size of 0.2 is required to show efficacy of learning.  Table 2 shows the effect size calculations and nominal descriptors for this study.

An example follows showing how I computed effect sizes.  In 1998, the Grade 4 mathematics score mean was 383.5, while the 1999 group mean was 386.5.  The standard deviation for 1998 was 32.2 points.  The difference between the means is 3.0, and is divided by 32.2 yielding a 0.09 effect size.  An effect of 0.09 is defined as having no effect.  Using an effect size calculation is a professional and objective tool that provides the learning effect that might be expected if the WASL were a useful tool to increase student learning.

Discussion of Data Sets

Table 2 shows the effect sizes for the 4th, 7th, and 10th grade mathematics and reading scores from 1998 to 2001.  Examining Table 2, you may note that at the 4th grade level, five scores show no effect in achievement, while there is one negative learning effect on Grade 4 reading in 2001, that is, a decline in achievement.

Table 2. Effect Size Calculations for 4th, 7th, and 10th Grade Mathematics and Reading
Scores on the Washington Assessment of Student Learning—1998-2001

 

Grade Level
and Subject

1999/1998
Effect
Size

Effect

2000/1999
Effect
Size

Effect

2001/2000
Effect
Size

Effect

 

4 - Math

0.09

None

0.14

None

0.06

None

4 - Reading

0.11

None

0.16

None

-0.08

Negative

 

7 - Math

0.16

None

0.08

None

-0.01

Negative

7 - Reading

0.15

None

0.03

None

0.05

None

 

10 - Math

N/R

N/R

0.13

None

0.08

None

10 - Reading

N/R

N/R

0.15

None

0.09

None

 

The effect is described in nominal terms as per Jacob Cohen's (1988) definitions.

 

The Grade 7 pattern is similar showing no effect on five of the six scores and one negative effect in mathematics for 2001.  The Grade 10 results show no effect on mathematics and reading scores in all cases.  (Appendix A shows all calculations used in this study.)

Using Cohen's (1988) definitions the 16 scores would show no effect and not meet the federally mandated target.  Setting the criterion measure of an adequate yearly progress target may become an exercise of definitions and be truly subjective, if not capricious.

Conclusion.  Using an effect size measurement and Cohen's (1988) nominal definitions, there is no effect, that is, no positive impact on student achievement as a consequence of the longitudinal administration of the Washington Assessment of Student Learning (WASL).

The results of this study parallel the findings of Audrey L. Amrein and David C. Berliner (2002) who analyzed the consequences of 18 states with high-stakes tests.  They reported that in 17 of the 18 states, student learning remained at the same level as it was before the policy of high-stakes tests was instituted.

Policy implications.  Washington State policy makers must re-examine the intent of the WASL and the empirical data sets that analyze it to determine its educational worthiness and continued fiscal expense.  (See Orlich 2000, Abbott and Joireman 2001, Basarab 2001, Fouts 2002, and Keim 2002.)  The Oregon Board of Education voted to kill student high-stakes tests in science, mathematics and writing in grades 3, 5 and 8 due to budget cuts (Oregonian, August 9, 2002).  Considering Washington's one billion-dollar budget shortfall and a WASL cost of $61,673,910 that action must be considered.  Further, state policy makers must inform federal educational officials of the inherent problems regarding the use of adequate yearly progress targets which are statistically illogical.

The author of this study, Donald C. Orlich, is Professor Emeritus, Science Mathematics Engineering Education Center at Washington State University.  His telephone number is (509) 335-4844 and email address is dorlich@wsu.edu.  This study reflects the author's work and is not endorsed by Washington State University, which encourages scholarship and academic freedom.

References

Abbott, M.L. & Joireman, J. (2001, July). The Relationships among Achievement, Low Income, and Ethnicity across Six Groups of Washington State Students. Lynwood, WA: Washington School Research Center, Technical Report#1.

Amrein, A. L. & Berliner, D. C. (2002, March 28). "High-stakes Testing, Uncertainty, and Student Learning." Educational Policy Analysis Archives, 10, (18), 1-56. Retrieved April 1, 2002 from http://epasa.asu.edu/epaa/v10n18/

Basarab, S. (2001, February). An Overview of Student Assessment in Washington State. Unpublished Report. Citizens United for Responsible Education (CURE). Burien, WA. Web site of CURE is at: http://www.eskimo.com/~cure/

Bloom, B. S. (1984). "The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring." Educational Researcher, 13, (6), 4-16.

Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. 2nd edition. Hillsdale, NJ: Lawrence Erlbaum Associates.

Fouts, J. T. (2002, April). The Power of Early Success: A Longitudinal Study of Student Performance ion the Washington Assessment of Student learning, 1998-2001. Lynwood, WA: Washington School Research Center, Research Report #1

Glass, G. V. (1980). "Summarizing Effect Sizes." In New Directions for Methodology of Social and Behavioral Science: Quantitative Assessment of Research Domain. R. Rosenthal, Ed. San Francisco: Jossey-Bass.

Keim, W. G. (2002). School Accountability and Fairness: A Policy Study of the 2000 Washington State School Accountability Criteria. Pullman, WA: Washington State University, Doctoral Dissertation.

Linn, R. L., Baker, E. V., and Betebenner, D. W. "Accountability Systems: Implications of Requirements of the No Child Left Behind Act of 2001." Educational Researcher, 31, (6), 3-16.

Marzano, R. J., Pickering, D. J., & Pollock, J. E. (2001). Classroom Instruction that Works: Research-Based Strategies for Increasing Student Achievement. Alexandria, VA: Association for Supervision of Curriculum Development.

"No Child Left Behind Act of 2001," Public Law No. 107-110, 115 Stat. 1425 (2002)

Orlich, D. C. (2000). "A Critical Analysis of the Grade Four Washington Assessment of Student Learning." Curriculum In Context, 27, (2), 10-14. (On March 16, 2001 this paper was selected for the "Outstanding Affiliate Article Award" by the 160,000 member Association for Supervision and Curriculum Development at its Annual Conference in Boston.)

Walberg, H. J. (1999). "Productive Teaching." In New Directions for Teaching Practice and Research. H. C. Waxman and H. J. Walberg, Eds. Berkley: CA: McCutchan Publishing Corporation.

Appendix A. Effect Size Calculations Used in this Study

Grade 4 Mathematics

Mean (m) 1999 - m 1998         
Standard Deviation (sd) 1988

386.5 - 383.5

32.2

= 0.09

m 2000 - m 1999

sd 1999

391.2 - 386.5

33.9

= 0.14

m 2001 - m 2000

sd 2000

393.3 - 391.2

34.9

= 0.06

 

Grade 4 Reading

m 1999 - m 1998

sd 1998

404.2 - 402.1

19.3

= 0.11

m 2000 - m 1999

sd 1999

407.3 - 404.2

19.5

= 0.16

m 2001 - m 2000

sd 2000

405.7 - 407.3

19.6

= -0.08

 

Grade 7 Mathematics

m 1999 - m 1998

sd 1998

364.7 - 357.4

46.4

= 0.16

m 2000 - m 1999

sd 1999

369.1 - 364.7

52.0

= 0.08

m 2001 - m 2000

sd 2000

368.7 - 369.1

53.6

= -0.01

 

Grade 7 Reading

m 1999 - m 1998

sd 1998

393.1 - 390.1

20.1

= 0.15

m 2000 - m 1999

sd 1999

393.8 - 393.1

20.2

= 0.03

m 2001 - m 2000

sd 2000

394.5 - 393.8

20.9

= 0.03

 

Grade 10 Mathematics

m 2000 - m 1999

sd 1999

387.6 - 382.2

42.8

= 0.13

m 2001 - m 2000

sd 2000

390.8 - 387.6

40.0

= 0.08

 

Grade 10 Reading

m 2000 - m 1999

sd 1999

407.3 - 402.8

29.5

= 0.15

m 2001 - m 2000

sd 2000

410.0 - 407.3

30.2

= 0.09