|
| |
Understanding Tests and Measurements
for the Parent and Advocate
by Peter W. D. Wright, Esq
&
Pamela Darr Wright, MA, MSW
"If something exists, it exists in some amount.
If it exists in some amount, then it is capable of being measured".
Rene Descartes, Principles of Philosophy, 1644
I. INTRODUCTION
Most parents of special needs children know that they must understand the law and their rights. Few parents know that they
must also understand the facts. The "facts" of their child's case are contained in the various tests and evaluations that have been
administered to the child. Changes in test scores over time provide the means to assess educational benefit or regression. Most
important educational decisions, from eligibility to the intensity of educational services provided, are based on the results of
psychological and educational achievement testing. Parents who obtain appropriate special educational programs for their
children have learned what different tests measure and what the test results mean.
As an attorney who specializes in representing special education children, I know that many parents consult with me after
deciding that their child's special education program is not appropriate. These parents are often right. However, in most cases
they do not have the evidence to support their belief, nor do they know how to interpret and use the evidence contained in
educational and psychological tests. They need evidence to support their beliefs.
Often these parents are convinced that a special education program is not providing sufficient help for the child --- that under
the present special education program, the child is failing to make progress and has fallen further behind. These parents
experience a sense of urgency --- the child has usually received special education for several years and time is running out.
Critical educational decisions are often made, based on the subjective beliefs of parents and educators. As a parent, you may
believe that your child is not making adequate progress in a special education program. The special education staff may firmly
believe that he is doing as well as he can --- or that your expectations are too high. Without objective information, both sides
will take positions that are based upon emotions --- and tempered by hopes and fears. Effective educational decision-making
must be based on objective information and facts, not subjective emotional reactions and beliefs.
Before you can participate in the development of an appropriate special education program, you must have a thorough
understanding of your child's strengths and weaknesses. This information is contained in the various tests that are used to
measure the child's ability and educational achievement.
To successfully advocate for your child, you must also learn about tests and measurements --- statistics. Statistics are ways of
measuring progress or lack of progress, using numbers. After you analyze the scores your child obtains when tested and
understand what these numbers mean, you will be able to develop an appropriate educational program for your child --- a
program from which the child benefits.
As you master the material contained in this article, you will understand what various tests and evaluations measure and how to
use information from tests to measure academic progress. You will learn how to use graphs to visually demonstrate your child's
progress or lack of educational progress in a very powerful and compelling manner.
The United States Supreme Court
Florence County School District Four v. Shannon Carter
- November 9, 1993
In Florence County School District Four v. Shannon
Carter, U.S., 114 S. Ct. 361, (1993), the United States Supreme Court issued a landmark decision. In Carter, the school system defaulted on their obligation to provide a free
appropriate education to Shannon Carter, a child with learning disabilities and an Attention Deficit Disorder. Let's look at how
the courts viewed the facts and the law in the Carter case.
A. Background
When Shannon was in the seventh grade, her parents talked to the public school staff and expressed concerns about Shannon's
reading and academic problems. She was evaluated by a public school psychologist who described Shannon as a "slow
learner" who was lazy, unmotivated and needed to be pressured to try harder. Her parents pressured her to work harder.
Despite the intense pressure, when Shannon was in the ninth grade, she failed several subjects. Her parents had her evaluated
by a child psychologist. That evaluator determined that Shannon's intellectual ability was actually above average. Educational
achievement testing demonstrated that sixteen year old Shannon was reading at the fifth grade level (5.4 GE) and doing math at
the sixth grade level (6.4 GE). Shannon had dyslexia. As she prepared to enter tenth grade, she was also
functionally illiterate.
In Shannon's case, the school district developed an IEP which proposed that after a year of special education in the tenth
grade, Shannon would read at the 5.8 grade equivalent level and perform math at the 6.8 grade equivalent level. In other
words, after one year of special education designed to remediate her learning disabilities, Shannon was expected to gain only
four tenths of a year, as measured by her scores on the Woodcock-Johnson and
Key Math educational achievement tests, a
gain from 5.4 to 5.8 and 6.4 to 6.8 grade levels in reading and math respectively.
Shannon's parents insisted that their daughter required a more intensive program so that she could master necessary reading,
writing and math skills. They felt that the proposed program was inadequate, and worried that Shannon would still be
functionally illiterate when she graduated in three years. Emory Carter insisted that his daughter should be able to read, write
and do arithmetic at a twelfth grade level when she graduated from high school.
Although Emory and Elaine Carter shared their concerns and wishes with the public school officials, the administrators took a
"take it or leave it" position and refused to provide Shannon with a more intensive special education program that provided
actual remediation in reading, writing, and arithmetic. Seeking more services for their daughter, the parents requested a special
education due process hearing. The Hearing Officer ruled that the public school IEP was appropriate. The parents appealed
this decision to a Review Panel and lost again.
At that point, Emory and Elaine Carter withdrew Shannon from her local public high school and enrolled her in Trident
Academy. Trident is a private school in Mt. Pleasant, South Carolina that specializes in remediating children with learning
disabilities, including dyslexia. Shannon's parents then appealed the Review decision to the U. S. District Court. They asked
Judge Houck to award them reimbursement for Shannon's private school education at Trident.
When Shannon graduated from Trident Academy three years later, her reading and math scores were on a high school level.
After hearing testimony and reviewing the transcripts and documents from the Due Process and Review Hearings,
U. S. District
Court Judge Houck found that the school district's IEP was "wholly inadequate" to meet Shannon's needs. He ruled that
Shannon had received an appropriate education at Trident and ordered Florence County to reimburse Shannon's parents for
the costs of her education.
On what basis did Judge Houck decide that the IEP proposed by Florence County was inappropriate? What evidence caused
him to decide that Shannon received an appropriate education at Trident Academy?
B. Evidence & Law
The crucial legal decisions in Shannon's case, and in many special education cases, rest on the evidence provided by various
tests and evaluations of the individual child. When Judge Houck wrote that the Florence County's IEP was "wholly inadequate"
to meet Shannon's needs, he was relying on his interpretation of the results of testing. Judge Houck understood the importance
of accurately interpreting test scores. He charted Shannon's test scores and included this data as part of his U. S. District Court
decision (see also Hall v. Vance, 555 EHLR 437, [E.D. NC 1983], affirmed at 774 F. 2d 629, 557 EHLR 155,
[4th Cir. 1985]) in which U. S. District Court Judge Dupree charted out James Hall's test scores to support his 1983 decision that
Vance County, North Carolina did not provide James with an appropriate education in the public school
program. When you
finish this article, you will also be able to interpret and chart your child's test scores and measure educational progress or lack of
progress.
Florence County appealed Judge Houck's decision to the U. S. Circuit Court of Appeals for the Fourth Circuit. Appeals from
the U. S. District Courts in Maryland, Virginia, West Virginia, North Carolina and South Carolina are heard in the U. S. Court
of Appeals for the Fourth Circuit by a three judge panel. The Fourth Circuit is composed of thirteen judges. Appeals from U.S. Circuit Courts of Appeals are filed in the U. S. Supreme Court. Occasionally a U. S. Circuit Court of Appeals will convene
all Judges appointed to the Circuit to hear a case. This is called an en banc review. A three judge panel of the Fourth Circuit
affirmed Judge Houck's decision as to the inadequacy of Florence County's proposed IEP. Florence County then appealed to
the United States Supreme Court. On November 9, 1993, the United States Supreme Court issued a unanimous decision on
Shannon's behalf. In the Carter decision, authored by Justice Sandra Day O'Connor, the Court upheld the lower decisions,
ruled against Florence County School District Four, and ordered them to reimburse Shannon's parents for the costs of her
tuition, room and board, and attorney's fees.
C. Objective Measurement of Progress
IEPs must include objective means of measuring the child's progress in a special education program. Volume 34 of the Code of
Federal Regulations, Section 300.346, entitled "Content of individualized education program" states that an IEP must include:
appropriate objective criteria and evaluation procedures and schedules for determining, on at least an annual basis, whether the short term instructional objectives are being
achieved.
In Shannon's case, her IEP stated that she "will be able to improve total reading level from the 5.4 grade level to the 5.8 grade
level as measured by the Woodcock Reading Mastery Test . . . (and that she) will improve math skills from the 6.4 grade
equivalent to the 6.8 grade equivalent as measured by the Key Math Diagnostic Test." This IEP complied with 34 C.F.R. §
300.346 by including "appropriate objective criteria." The criteria required a re-administration of the
Woodcock-Johnson and Key Math tests to measure progress. The U. S. District Court and the Fourth Circuit found that the proposed gain of four
months after a year of special education was "wholly inadequate."
In an effort to avoid Florence County's fate, many school districts around the country now develop IEPs that include no
objective measures of the child's progress. Instead of including educational goals where the child's progress is measured using
objective tests and measurements, as Florence County did with Shannon, many schools now propose IEPs that rely exclusively
on subjective teacher observations of the child's progress. Let's see how this works.
Let's take the case of Johnny, a child who has a learning disability that is manifested in the area of reading. Johnny is below
grade level in reading. Instead of developing an IEP that will measure progress in reading on a specific objective test, the
special education staff may come up with a goal such as: "Johnny will make measurable progress in reading, as measured by
teacher observation and teacher made tests at 80% accuracy."
"Objective measurement of progress" becomes the teacher's subjective observation as to whether the child has improved in
reading, writing, or arithmetic. The criteria of mastery becomes 80% of a subjective opinion. When parents object and ask for
a more intense program with clear independent objective standards, they are often rebuffed or criticized.
Many school board counsel and state departments of education have advised schools to move away from using objective
measurements of progress for special education children.
If you believe that the special education your child is receiving is inadequate, you must be able to provide evidence to support
your position. You will find this evidence in the public school and private sector testing that has been or will be completed on
your child.
After you master the material contained in this article, you will understand what the various tests and evaluations measure and
how the test results are reported. You will know how to convert the scores on different tests into numbers that are easily
understood. And, you will know how to measure educational progress or lack of progress, i.e. regression.
Michael
Three years ago, your eight year old son Mike began to
have serious difficulties in school. By the time he reached third grade, his
difficulty in reading was of great concern. His handwriting was nearly
illegible and homework was a nightmare. On several occasions, you consulted
with Mike's teacher about the problems he was having. Eventually, the teacher
sent Mike's "case" to a special education committee. You attended a
meeting of this committee --- which recommended that Mike be evaluated through
the school's special education department. Relieved that something was going
to be done, you consented to these battery of tests.
According to the evaluations, your son has a learning
disability. In Mike's case, he has visual-perceptual problems and visual-motor
problems that negatively affect his ability to read and write. Based on the
results of the evaluations, your son was found eligible for special education
services through his neighborhood school.
After Mike was found eligible for special education,
you attended a meeting to develop his Individualized Education Program (IEP).
This IEP provided for Mike to receive one period of special education in an
"LD Resource" class every day. It was your understanding that Mike
would receive individualized help in reading and writing from a teacher who
was specially trained to remediate his learning disability problems.
Three years have passed. Mike hasn't made much
progress, despite the special education help. He still has difficulty reading
aloud. His spelling is poor, and his handwriting is unreadable. He is behind
most of the children in his class. His attitude has changed. He is angry and
depressed and says he "hates school."
When you discussed your concerns about Mike's lack of
progress with his special education teacher, she reassured you that he was
making progress and told you to be patient. You think that patience is not the
issue; you are worried that your son will never master basic educational
skills. What kind of future will he have?
At a recent IEP meeting, you reiterated your concerns
about Mike's lack of progress and expressed the belief that he needs more help
than he is getting in the Resource program. The committee disagreed with you.
One person told you that Mike was getting all the help he needs and that he
was really doing quite well. Another committee member told you that your
expectations were too high --- and that if you didn't accept Mike's
limitations, you would damage him emotionally.
What should you do? You know that the time in the LD
resource class with several other children is not providing Mike with the
individualized help he needs. The school has not focused on teaching your son
how to read, write and do arithmetic. Now, the IEP team suggests more
"accommodations" and "modifications." They propose to
reduce his workload, give him untimed tests, and provide him with
"talking books" and a calculator. They do not propose to give
him individualized help so that he will learn to read, write, and do
arithmetic.
You believe that Mike's emerging "emotional
problems" are due to shame and embarrassment about not being successful
in school. How can you, a parent, prove this to the staff at Mike's
school so that they will develop an appropriate educational program for him?
How will you know when he is getting the help he needs?
II. THE PROCESS OF EDUCATIONAL
DECISION-MAKING
Many parents erroneously assume that interpreting
test data is beyond their competence and is the responsibility of school
personnel. If parents default on their responsibility and obligation to
understand this information, then the interpretation of the test data is left
to the school psychologist --- a person who often has very limited information
about your child, aside from test scores.
The basic principles of tests and measurements are not
difficult to master. As you read this article, you will see that you are
already familiar with many of the concepts discussed. Statistics and
statistical terms are used in many other areas of life, from business and
sports to medicine. Newspaper and magazine articles use statistics to inform
readers of change or lack of change. You read articles about changes in the
population, the economy --- even public opinion polls --- that include
statistical information to inform you or persuade you of a point.
Parents need to expend time and effort to develop an
adequate degree of expertise in statistics. You should re-read parts of this
article several times. Underline, make margin notes, and use a highlighter to
help you master the material. Be patient and put in the time. The time you
expend will help to change your child's life.
As you study this material, you will probably
encounter some terms and concepts that seem confusing at first --- terms like
standard deviation, standard score, and grade and age equivalents. Other
concepts will be familiar --- averages, percentiles.
After you master this information, you will understand
the educational and psychological tests that are administered to your child.
You will be able to use this information to make wise educational decisions.
You will find that your newfound knowledge and expertise exceeds that of many
of the special education committee members.
When you attend your next IEP or Eligibility meeting,
you will be glad you did your homework!
A. Katie
Katie is a fourteen year old ninth grader. She
"hates school" and is failing several subjects. As a young child,
Katie was bright, happy, and curious. When she entered third grade, her
attitude began to change. Now, she locks herself in her room, lies on her bed,
and listens to music for hours. She is sullen and angry and says she can't
wait to quit school.
In desperation, Katie's parents took her to a child
psychologist for testing. At a meeting to interpret the test results to Katie
and her parents, the psychologist explained that Katie scored two
"standard deviations" above the mean on the Similarities
subtest of the Wechsler Intelligence Test for Children, Third Edition (WISC-III)
and two and a half "standard deviations" below the mean on
the spontaneous writing sample of the Test of Written Language, Third Edition
(TOWL-III).
Test publishers are constantly updating and revising
their tests. The Wechsler Intelligence test for children was originally known
as the WISC. Later, it was revised and became known as the WISC-R. Several
years ago, the next version was published as the WISC-III. The first Test of
Written Language (TOWL) was replaced by the TOWL-II and was recently revised
again.
The Woodcock Johnson battery of tests was known as the
Woodcock Johnson Psycho-Educational Battery. The WJPEB included educational
achievement testing and cognitive ability testing. Dr. Woodcock also produced
the Woodcock Reading Mastery Test. Today, the current test series is called
the Woodcock-Johnson Psycho-Educational Battery, Revised, (WJ-R) which is an
educational achievement test that includes the Test of Cognitive Abilities.
The current version of any popular test is probably in
a revision status. A competitor test publishing company is probably trying to
develop a new and better version of the competitor's product. This article
will not focus on an analysis of each test's strengths and weaknesses.
Weaknesses in a current test will probably be eliminated by the next version
which will be out within a couple of years.
Parents must understand that tests do not necessarily
measure what they purport to measure. As you will see, a child's score on a
push-up test can be represented as an overall fitness score, a measure of arm
strength, an upper body measurement score, a measure of perseveration and
persistence, or a measure of a child's motivation. A score may measure only
one of the variables or it may accurately reflect all of the above.
To demonstrate this point, let's look at tests that
measure reading ability. One test that measures a child's reading ability
actually measures the child's ability to correctly read aloud and pronounce
isolated words out of context, i.e., a word recognition test. The test
includes a list of words, i.e., cat, tree, dog, house, person, etc. This kind
of reading test does not measure true reading and may be adversely impacted by
speech or word finding problems.
Another reading test measures reading by having the
child read a passage of text, then answer a series of multiple choice
questions about the passage. In this case, the child's score may be a measure
of the child's ability to intellectually eliminate certain answers of the
multiple choice format, i.e., a test of reasoning, not true reading. Some very
bright children may need to recognize and interpret only a few words to
discern the total context. Other children have excellent word recognition
abilities but cannot link or interpret the words in a body of text or passage.
Another reading test has the child read a passage of text aloud (measuring
oral reading) and then answer questions. The accuracy of the words read aloud
and the child's understanding of the passage makes up the reading score.
You need to know exactly how the test was administered
and what it measured.
When we first discussed Katie, we saw that she scored
two "standard deviations" above the mean on the Similarities
subtest of the Wechsler Intelligence Test for Children, Third Edition (WISC-III)
and two and a half "standard deviations" below the mean on
the spontaneous writing sample of the Test of Written Language, Third Edition
(TOWL-III).
Do these test scores explain the academic problems
Katie is having? Do they have anything to do with her moodiness and her
intense dislike of school? (Answers: Yes and Yes.) When we return to Katie's
case later in this article, you will understand the significance of her test
scores. You will also understand why Katie's self esteem has plummeted.
Remember: After you master the material contained in
this article, you will understand and be able to interpret your child's test
scores. You will be able to go back to the preceding paragraph and understand
the significance of Katie's scores. You will have acquired skills that will
enable you to answer questions like these:
How is your child functioning, compared with other
children the same age?
How is your child functioning, compared with others
in the same grade?
How much educational progress has your child made (what
has been learned) since the last test battery?
If your child is receiving special education, has
the child progressed or regressed in the special education placement?
If your child has shown an increase in age and grade
equivalent test scores, has the child actually fallen further behind the
peer group?
And, you will learn how to incorporate objective
measurements into your child's IEP so that educational progress can be charted
on a regular basis.
B. Measurement of Change: Rulers,
Yardsticks and Other Tools
To clarify these points, let's change the facts. You
can measure your child's physical growth with a measuring tape and a bathroom
scale. You can measure growth by charting how much height increases, as
measured in inches, and how much weight increases, as measured by pounds, over
a period of months or years. Using these tools, you can document his physical
growth. You don't need to be a doctor to understand that increases in these
measurements prove that your child is growing.
Assume that your child's height was five feet, three
inches last year. This year, the child is five feet, six inches tall. You can
report this information in several ways. You can say that last year, your
child was sixty-three inches tall and is now sixty-six inches tall. Or, you
can say that your child was 5.25 feet tall and is now five and a half feet
tall. You can even say that a year ago, your child was 160 centimeters tall
and is now 168 centimeters tall. Or, that your child was 1.75 yards tall and
is now 1.83 yards tall!
If you (or your child's pediatrician) have been
measuring your child at regular intervals, you can create a chart or graph
that documents changes in height or weight over time. Your child's
pediatrician has "growth charts" that you can use to compare your
child's growth with the growth of the "average" child.
Likewise, educational growth can be measured and
charted. The yardsticks used for measurement are different, but the principles
are the same. Measuring educational growth or progress is not much different
from charting physical growth. Instead of a tape measure and a set of bathroom
scales, you need psychological and educational achievement test results. Where
will you find the information you need? How can you measure change?
Most school districts test their students on
standardized group educational achievement tests at regular intervals. The
results of these tests provide information about how well school districts are
accomplishing their mission of educating children. The information contained
in the group standardized tests can provide you with some basic information.
Standardized educational achievement tests are general
measures. The information they provide is similar to that provided by medical
screening tests. Medical screening tests can suggest that a problem exists.
Additional testing is usually necessary before the problem can be accurately
identified and a treatment plan developed. Children's learning problems can be
identified in a similar manner. In most public schools, specific individual
ability and achievement tests to clarify learning problems are administered by
school psychologists and educational diagnosticians.
C. What Do Evaluations Tell You?
As you continue on your advocacy journey, you must
understand the exact nature of your child's disabling condition(s). How does
the disability affect her? In what areas? How serious is it? What are her
strengths and weaknesses? Does she need special education? What educational
issues need to be addressed? How will you know if she is making progress? How
much progress is sufficient? The answers to these questions will be found in
the evaluations and tests that are administered to children and adolescents.
Many parents erroneously believe that they cannot
understand the tests. They believe that this information is beyond their
ability to understand or comprehend. Usually, their reasoning goes like this:
"Gosh. I'm just a parent. I didn't even finish
college. I don't have any training in education or special education so I
can't understand that stuff!"
or
"The people who did that testing on my kid went to
school for years to learn how to do that. Who am I to think I can
understand it? I'm not a psychologist!"
If you believe that you "can't" understand
your child's testing, it's time to change your beliefs. You may be reading
this article because your son or daughter is performing poorly in school ---
or has been identified with learning problems --- and now believes that he or
she "can't" read or write or do arithmetic. Your child must confront
and overcome these erroneous beliefs about learning new or difficult material.
And, so must you.
III. STATISTICS: GENERAL
PRINCIPLES
Statistics are simply ways to measure things
and to describe relationships between things, using numbers. Part of the
confusion that many people experience when they first begin to learn
statistics is because of the unfamiliar terms and concepts. As we learned in
our earlier discussion about measuring physical growth, there are several
different ways to report the same information (inches, feet, yards,
centimeters, etc.) In the beginning, this can be confusing.
First, let's look at another familiar example that
many of us deal with regularly --- how to measure our car's gas mileage.
Remember: When using statistics, we can use several different terms to
describe the same concepts. If you want to describe your car's gas mileage,
you can make any of the following statements:
My gas tank is half full.
My gas tank is half empty.
I am at the fifty percent mark.
My odometer shows that I have another 150 miles
before the next fill-up.
My odometer shows that I have traveled 150 miles
since I last filled the tank.
All of these statements accurately describe your car's
consumption of gas.
With this information, you can make decisions.
When will you need to buy more gas? You know that your car has a fifteen
gallon gas tank. According to the gas gauge, your tank is slightly below the
halfway mark. You've been driving in the city. You'll be driving on the
highway for the rest of your trip. You have used a precise amount of gas and
have a precise amount of gas left in your tank. You can describe and define
this information in several ways --- gallons used, gallons remaining, miles
driven, miles to go, percentage full, and so forth. Using the information
above, you can do some simple math calculations and learn that your car
averages between seventeen to twenty-three miles to a gallon of gas, depending
on driving conditions.
Using this information or data, you can also measure
change. If you compare your car's present or current mileage to the
mileage you obtained last month, before you had your car tuned up, you can
measure miles per gallon before and after the tune-up. In this way, you can
measure the impact of the tune-up on your car's gas consumption. You can also
compare your car's mileage performance to that of other vehicles.
Let's look at another common way in which we use tests
and measurements. When you last visited your doctor, you mentioned that you
were feeling tired and sluggish. Your doctor asked several questions, then
recommended that you have some lab work. After reviewing the test results, the
doctor explained that your blood glucose level was moderately elevated.
To lower your blood glucose level, the doctor
recommended a plan of treatment that included a special diet and a daily
program of moderate exercise. After a month, you return for a follow-up visit.
More lab work is completed. If your glucose level has returned to normal, it
is unlikely that you will require additional treatment. But, if your glucose
level remains high, despite the diet and exercise program, you may need more
intensive treatment. By measuring change after an intervention and using "appropriate
objective criteria and evaluation procedures," you and your doctor
can make rational decisions about your medical treatment.
Remember: The principles that enable you to compute
your car's gas mileage and make medical decisions will also enable you to
understand educational change. When you measure educational progress (just as
when you measure your gas mileage and blood levels), the test scores can be
reported and compared in several different ways.
Because educational test scores are often reported in
different formats and compared in different ways, it is essential for parents
and advocates to understand all of the scoring methods used in
measuring and evaluating educational progress, including:
Age equivalent scores (AE)
Grade equivalent scores (GE)
Standard scores (SS) and standard deviations (SD),
and
Percentile ranks (PR).
Knowledge about statistics will enable you to assess
your child's progress or lack of progress in a particular educational program.
Lack of progress is usually referred to as regression. Unfortunately,
regression is a common educational problem that we will discuss in more detail
later. You must learn how to recognize regression and reverse the downward
spiral before your child is further damaged.
A. Statistics: Applied
Let's turn our attention to the performance of a group
of children. You must understand how an individual child scores when compared
with other children who are his age or in his grade --- and what this means.
First, we will examine a single component of physical
fitness in a group of elementary school students. Our group or sample
consists of 100 fifth grade students. These children are enrolled in a
physical fitness class to prepare them to take the President's Physical
Fitness Challenge. We will assume that the average chronological age (CA)
of these children is exactly ten years, zero months. (CA=10-0) The children
are tested in September, at the beginning of the school year.
To qualify as "physically fit," each child
must meet several goals. Push-ups are one measure of upper body strength. Each
child must complete as many push-ups as possible in a period of time. Each
child's raw score is the number of push-ups completed. The term raw
score is simply another way of describing the number of items correctly
answered or performed.
After all of the fifth grade students complete the
push-up test, their scores are listed. The results are as follows:
Half of the children completed ten push-ups or more.
Half of the children completed ten push-ups or less.
The average child completed 10 push-ups.
The average or mean number of push-ups
completed by this class of 100 fifth grade students is 10.
Half of the children scored above the mean
score of 10.
Half of the children scored below the mean or
average score of 10.
50 percent of the children scored 10 or above
50 percent of the children scored 10 or below.
As we continue to analyze the children's scores, we
see patterns:
One-third of the children scored between 7 and 10
push-ups.
One-third of the class completed between 10 and 13
push-ups.
Two-thirds of the children scored between 7 to 13
push-ups.
Half of the children (50 percent) completed between
8 and 12 push-ups.
The lowest scoring child completed 1 push-up.
The highest scoring child completed 19 push-ups.
Again, two-thirds of the children in this fifth grade
class were able to complete between 7 and 13 push-ups. The remaining third of
the children did fewer than 7 or more than 13 push-ups. Nearly all of the
children -- 98 out of 100 -- were able to complete between 4 and 16 push-ups.
This information is represented below in a bell curve chart.

The test results provide us with a sample of
data. As we analyze the data in our sample, we can compare the performance of
any individual child with that of the entire group. As we make these
comparisons, the data will enable us to recognize any individual child's
strengths and weaknesses when compared with the peer group of similar
youngsters.
If we conduct an identical push-up test with children
in other grades, we can compare our original group of 100 fifth grade children
with other groups of youngsters -- children who are older, younger, in
different grades, in different schools. If we gather enough information or
data from other sources, we can compare our original group of fifth graders --
or an individual child within our group -- to a national population of
children who are being tested for their upper body strength as measured by
their ability to do push-ups.
IV. MEASURING PROGRESS: THE
BELL CURVE
In nature, traits and characteristics distribute
themselves along theoretical curves. For our purposes, the most important
curve is called the normal frequency distribution or bell curve.
Because the percentages of areas along the bell curve are well-known and
thoroughly researched, they become our frame of reference.
By using the bell curve, we can now develop an actual
diagram or graph of the children's push-up scores. This map --- on the bell
curve --- provides us with additional information. We can see what percentages
of children were able to complete specific numbers of push-ups. When we use
the bell curve, we can visually demonstrate where any particular child
scores, when compared with other children who are the same age or in the same
grade. Likewise, with educational test scores, we can visually demonstrate
scores and change over time.
If we compare the push-up scores obtained by children
who attend different schools, we can determine whether the physical fitness of
children, as measured by their ability to do push-ups, varies in different
schools, neighborhoods, states, or countries.
We can also measure progress over time -- with
push-ups and with improvement in reading skills. Let's look at our class of
fifth graders again. We want to gather information as to whether the physical
fitness class is effective -- whether the children's fitness levels improve.
How can we answer this question?
To measure the effectiveness of the fitness class, we
will measure the children's number of push-ups before they take the
class and compare this score with their score after they take the
class. If the class is effective, we should see individual improvement and
group improvement. Some children will have minimal improvement -- these
children will fall further behind the peer group. Other children who performed
below their peers may show significant improvement. Some children will improve
so much that they now perform as well or better than the "average"
youngster.
We will measure the children's progress on one or more
occasions as they progress through the class. If the fitness class is
"working," that is, if the children's' fitness levels are improving,
then their ability to perform fitness skills should improve measurably over
time. In our example, physical fitness improvement is being assessed using "appropriate
objective criteria and evaluation procedures . . ." (34 C.F.R.
§300.346).
Because of its enormous usefulness in measuring
educational progress, we will return to the subject of the bell curve
repeatedly throughout this article.
A. Understanding The Bell Curve
On all bell curves, the bottom or horizontal line is
called the X axis. In our sample of fifth graders, the X axis
represents "number of push-ups." And, on all bell curves, the
up-and-down vertical line is called the Y axis. In our sample, the Y
axis represents the number of children who earned a specific score (number of
push-ups completed).

As you can see in the diagram (above), the highest
point of the bell curve on the X axis equals a score of ten push-ups. You
recall that more children completed ten push-ups than any other number. Thus,
the highest point on this bell curve represents a score of ten. The next most
frequently obtained scores were 9 and 11, followed by 8 and 12. This pattern
continues out toward the extreme ends of the bell curve. In our example, the
extremes occurred at 1 and 19 push-ups.
Using the bell curve, we can now chart each child's
score and compare it to the score achieved by all 100 students in the class.
Look at the bell curve above, and find 10 push-ups. We know that Amy completed
10 push-ups so her raw score was 10. Ten push-ups placed her squarely in the
middle of the class. Half of the youngsters in Amy's class earned a score of
10 or more; half of the children scored 10 or less. If you look at the bell
curve diagram (below), you see that Amy's score of 10 placed her at the 50%
level. The individual's percent level is referred to as their percentile
rank (PR). Amy's percentile rank is 50 (PR=50).

Erik completed thirteen push-ups. Looking at the bell
curve above, you see that his score of 13 placed him at the 84th percent
level. Erik's percentile rank is 84 (PR=84). Erik's ability to do
push-ups placed him at the 84th position out of the 100 fifth grade children
tested on our measure of upper body strength.
Sam completed seven push-ups. His raw score of 7
placed him at the (bottom) 16 percent. Sam's percentile rank was 16 (PR=16).
Out of our sample of 100 fifth grade children, 84 children earned a higher
score than Sam.
Larry completed 6 push-ups. We can convert his raw
score of 6 to a percentile rank of 9 (PR=9). 91 children scored higher and 8
children scored lower than Larry in upper body strength as measured by the
ability to do push-ups.
Oscar completed 2 push-ups. His raw score of 2 placed
him in the bottom 1 percent of fifth graders tested (PR=1).
Nancy's raw score of 17 placed her at the upper 99
percent. We say that Nancy scored at the 99th percentile rank (PR=99).
You can see the relationship between the number of
push-ups completed and the child's percentile rank (PR) reproduced in the
table below:
|
PUSH-UP SCORES AND
PERCENTILE RANKS |
|
Push-ups |
Percentile Rank |
Push-ups |
Percentile Rank |
|
19 |
99 |
9 |
37 |
|
18 |
99 |
8 |
25 |
|
17 |
99 |
7 |
16 |
|
16 |
98 |
6 |
9 |
|
15 |
95 |
5 |
5 |
|
14 |
91 |
4 |
2 |
|
13 |
84 |
3 |
1 |
|
12 |
75 |
2 |
1 |
|
11 |
63 |
1 |
1 |
|
10 |
50 |
|
The bell curve is a powerful tool. When you use
the bell curve, you can objectively compare any child's percentile rank to
that of a group of children. You can also compare a single child's progress or
regression when compared to the group.
Using the bell curve, you can compare a single child's
score to the scores obtained by other children who are older or younger or in
different grades.
Let's see how this works. Again, we will measure the
children's upper body strength by the number of push-ups they can perform. In
this case, we decide to evaluate all children in all the elementary
grades, from Kindergarten through fifth grade. We will assume that the average
chronological age of these elementary school children is exactly eight years
(CA=8-0 years).
After we test the third graders, we find that the average
or mean score of our sample of 100 eight year old third graders is 6
push-ups. This means that the "average" third grade child (who is 8
years old) can do 6 push-ups. We can also compare an individual child's score
on arithmetic problems answered correctly with the average number answered
correctly by children the same age.
How can we compare children from different groups?
Let's look at Larry who was a member of our original group of fifth graders.
Although the average fifth grader performed 10 push-ups, Larry only completed
6 push-ups. His raw score of 6 converts to a percentile rank of nine (PR=9).
When we compare Larry's performance to all elementary
school students, we learn that Larry (a fifth grader) is functioning at the
level of the average third grader --- who is also eight years old ---
in the ability to do push-ups. Therefore, we see that Larry's age equivalent
score is 8 years (AE=8-0) and his grade equivalent score is at the
third grade level (GE=3-0).
|
Fifth Grade Students: Push Up Scores |
|
Child's Name |
Raw Score |
Percentile Rank |
|
Oscar |
3 |
1 |
|
Larry |
6 |
9 |
|
Sam |
7 |
16 |
|
Amy |
10 |
50 |
|
Erik |
13 |
84 |
|
Frank |
15 |
95 |
|
Nancy |
17 |
99 |
Look at the table above and find Amy. At the time of
testing, Amy was 10-0 years old and in the fifth grade. She scored at the mean
for her peers, i.e., 10 push-ups. Her grade equivalent score was fifth
grade (GE=5-0) and her age equivalent score was 10.0 years
(AE=10-0). If we tested a 20 year old person and found that this person was
able to do 10 push-ups, then the 20 year old has an age equivalent score of
10-0 and a grade equivalent score of 5.0, i.e., the same score as Amy.
Look again at the table of scores above and find
Frank's name. You see that Frank earned a raw score of 15 push-ups which
converts to a percentile rank of 95 (PR=95). Frank's score looks great ---
until we remember that Frank was "held back" three times. Although
he is in the fifth grade, Frank is 13 years old!
With this new information, let's take another look at
Franks' performance. The average score for 8th graders (who are 13 years old)
is 15. Frank scored 15. Frank had a grade equivalent score of 8th grade (GE
= 8.0) and an age equivalent score of 13 years (AE = 13-0). When
we compare Frank with other children in his expected grade, we see that
his achievement is in the average range. Frank is in the 95th percentile level
when compared to fifth graders, not when compared to eighth
graders.
Frank's case brings up some additional questions.
Frank (age 13) was included in our sample of fifth graders who had an average
age of 10. When compared to this group of children who were younger than him,
Frank scored at the 95% percentile rank (PR) level. Question: If we
compare Frank's performance to that of children who are three years younger
than him, will this comparison provide us with an accurate picture of his
physical fitness? Answer: No.
In Frank's case, statistics inform us of two facts.
First, we see that Frank performs at a superior level when compared with other
children in his grade. Second, we see that he performs at an average
level when compared with children who are his age.
When you evaluate the significance of data from tests,
you must know how the scores are being reported. Test scores can be
reported using percentile ranks, age
equivalents, grade equivalents, raw scores, scale scores, subtest scores, or
standard scores.
Remember: Although Frank's performance was superior
for his grade, it was average for his age. If you did not
know Frank's age and grade, you would have been misled as to Frank's
actual achievement. But -- if Frank was an 8 year old 3rd grader, his scores
would be in the superior range, using both age equivalent and grade
equivalent measures.
The number of push-ups each child completed was his or
her raw score. Let's assume that we want to obtain an overall fitness
score. To obtain an overall or composite score, we will measure three
skills (sit-ups, push-ups, a timed 50 yard dash) and obtain scores on each of
these skills. In educational testing, the child's overall score (in reading,
math, etc.) is often a composite of several subtest scores.
Next, we will develop a weighting system that will
convert each child's raw score to a scale score. After we
convert the raw scores to scale scores, we will be able to compare each of the
three scores to each other (number of push-ups, number of sit-ups, seconds to
complete the 50 yard dash). How do we convert raw scores into scale scores?
One way to convert scores is by developing a rank
order system. In rank order scoring, the child who scores highest in an
event (most push-ups, most sit-ups, fastest run) receives a scale score
of 100; the lowest receives a score of 1. The other 98 children receive their
respective "rank" as their scale score.
After each child's raw scores are converted to scale
scores, we can easily compare an individual child to the group and to all
children who are the same age or in the same grade. We can also compare an
individual child's performance at different times, i.e. before and after
completing the fitness course. Was the child able to do significantly more
push-ups after taking the fitness course? Was the child reading better after
receiving reading remediation?
B. Composite
Scores
You can see that after we develop a global composite
score, the individual child's raw scores on each of the three
fitness subtests have less significance. This is exactly what happens with
educational achievement and psychological tests. Most educational tests are
composed of several subtests; the subtest scores are combined to develop
composite scores. More about this shortly.
Let's look at how composite scores can be used and
some of the problems that arise when we rely on them.
John is a member of our original group of 100 fifth
graders. He has good muscular strength (he scored at the 70% PR level in
push-ups and at the 78% PR in sit-ups). But, John is very slow and
uncoordinated. In the 50 yard dash, he finished 2nd from the last out of the
100 children (PR=2).
How will John's composite fitness score be derived? In
this example, we average John's percentile rank scores on the three events.
John's composite score is determined as follows: Add the percentile ranks of
each event (70 + 78 + 2 = 150), then divide this score by the number of events
(3). In John's case, 150 / 3 = 50.
John's composite score is 50. This composite
percentile rank score of 50 places him squarely in the "average"
range. Is John an "average" child? His individual scores
demonstrated a significant amount of subtest scatter. When you analyze
his three subtest scores, you see that he has specific strengths and a very
severe deficiency. Despite his average composite score, John is not an average
child! (If we performed the same analysis of John's composite score
using standard scores, he would have a standard score of 96.5 and percentile
rank of 41 --- again, John appears to be an average child).
Let's look at another example of composite scores to
see how they can mislead us. Oscar was at the 1 percent level in push-ups. But
when the other fitness subtests were given, Oscar was the fastest child in the
class scoring at the 99% level. He was average in sit-ups, scoring at the 50%
level. Oscar's composite fitness score, using percentile ranking, is 50%. Is
Oscar really an average child? Would he benefit from remediation to improve
his upper body strength, as measured by push-ups? Oscar also had a great deal of subtest
scatter, i.e., from extremely weak upper body strength to superior speed.
C. Subtest Scatter
When subtest scores vary a great deal, this is called subtest
scatter. If significant scatter exists, this suggests that the child has
areas of strength and weakness that need to be explored.
How can you determine if significant subtest
scatter is present? Most subtests have a mean score of 10.
Most children will score + or - 3 points away from the mean of 10, i.e. most
children will score between 7 and 13.
If the mean on a subtest is 10 (and most children
score between 7 and 13), then scores between 9 and 11 will represent minimal
subtest scatter. Lets assume that Child A is given a test that is composed of
10 subtests. The child's scores on the 10 subtests are as follows: on 4
subtests, the child scores 10, on 3 subtests, the child scores 9, and on 3
subtests, the child scores 11. In this case, the overall composite score is 10
and the scatter is very minimal. This child scored in the average range in all
10 subtests.
In our next example, we will assume that Child B earns
4 subtest scores of 10, 3 scores of 4, and 3 scores of 16. The child did
extremely well on 3 tests, very poorly on 3 tests, and average on 4 subtests.
Again, the child's composite score would be 10. Subtest scatter is the
difference between the highest and lowest scores. In this case, subtest
scatter would be 12 (16-4 = 12). Is this an "average" child? Because
the child's scores demonstrate very significant subtest scatter, we need to
know more about these weak and strong areas.
In educational situations, it is essential that
parents understand the nature of the weak areas, what skills need to be
learned to strengthen those areas, and how the strong areas can be used to
help remediate the child's weak areas. The spread or variability between the
subtest scores is called subtest scatter.
How do these concepts (composite scores and subtest
scatter) relate to the information contained in your child's evaluations?
The results of educational tests given to children are
often provided in composite scores. On the Wechsler Intelligence
Scale for Children, Third Edition (WISC-III), three scores are usually
provided -- a Verbal IQ (VIQ), a Performance IQ (PIQ), and a Full
Scale IQ (FSIQ). Each of these IQs are composite scores. Both the Verbal
and Performance IQ scores are composites of five different subtests, each of
which measures a different area of ability. The Full Scale IQ is a composite
of the Verbal and Performance scores -- which makes it a composite of ten
different subtests. IQs between 90 and 110 are considered within the
"average range."
If we rely on composite IQ scores, we may easily be
misled -- with serious consequences. Katie is the 14 year old youngster whose
situation was outlined earlier in this article. On the Wechsler Intelligence
Scale for Children-III, Katie achieved a Full Scale IQ of 101. If the only
number you had was her Full Scale IQ score, you would probably assume that her
IQ of 101 placed her squarely in the "average range" of intellectual
functioning. Is Katie an "average" child?
Remember:
The Full Scale IQ score is actually a
"composite" of the Verbal IQ and Performance IQ scores. Checking
further, you learn that Katie's Verbal IQ is 114 and her Performance IQ is 86.
IQ scores between 110 and 90 are considered "average." You see that
there is a 28 point difference between Katie's Verbal and Performance IQ
scores. If you did not have these additional two IQ scores, you might view
Katie as an "average" child but you would be mistaken.
Katie's Verbal IQ of 114 translates into a percentile
rank of 82 (PR=82). Her Performance IQ of 86 converts to a percentile rank of
18 (PR = 18). We see that Katie has a percentile rank fluctuation of 64 points
(82-18=64) between her verbal and performance abilities. We will look at more
of Katie's test scores shortly.
One of the commonly administered individual
educational achievement tests is the Woodcock-Johnson
Psycho-Educational Battery-Revised (WJ-R). The Woodcock-Johnson consists
of a number of mandatory and optional subtests. The results obtained by the
child on these different subtests are combined into composite or cluster
scores. If we rely on composite or cluster scores, without examining the
child's scores on the individual subtests, we can easily overlook obvious
deficiencies and significant strengths. Relying on composite or 'cluster'
scores can lead to faulty educational decision-making, having tragic
consequences for children. To advocate effectively, parents must obtain all
of the subtest scores on the tests that have been administered on
their child.
D. When Apparent Progress Means
Actual Regression
One serious concern that many parents have relates to
the belief that their child is not making adequate progress in a special
education program. How can parents determine if their perception is accurate?
And, how can parents persuade school officials that the special education
program being provided to the child needs to be strengthened?
Earlier in this article, we discussed how statistics
can be used in medical treatment planning. We demonstrated how a medical
problem was identified and the efficacy of treatment measured, using objective
tests. In our example, the patient had pre- and post- testing as a means to
determine whether or not the intervention was working. Based on the results of
new testing, more medical decisions would be made -- to continue, terminate or
change the treatment plan.
This practice of measuring change, called pre- and
post-testing, has great relevance to educational planning. After the child's
performance level is identified, we can re- test the child later to measure progress,
regression, or whether the child is maintaining the same position within
the group.
In this way, pre- and post-testing enables us to
measure educational benefit (or lack of educational benefit). Using the scores
obtained from pre- and post-testing, we can create graphs to visually
demonstrate the child's progress or lack of progress in an academic area.
To see how this works, let's revisit our fifth grade
fitness class. According to our earlier testing in September, Erik completed
13 push-ups which placed him in the top 84 percent of all youngsters in his
class. After a year of fitness training, all of the fifth grade children were
re-tested. When Erik was re-tested, he completed 14 push-ups.
Question: Has
Erik progressed?
Answer: Yes and no.
The average performance of the fifth grade class
improved by 2 push-ups (from an average raw score of 10 to an average raw
score of 12). Erik's raw score increased by 1 push-up, from 13 to 14.
So, we see that although Erik's age equivalent and grade equivalent scores
increased slightly from the prior testing, his actual position in the group
dropped from the 84th to about the 75th percentile level. While still ahead of
his peers, Erik did regress.
What about Sam? Sam's push-up performance also
improved, from a raw score of 7 to a raw score of 8. Although Sam's age
equivalent and grade equivalent scores increased slightly, he also regressed.
According to the new scores, his percentile rank dropped from the 16
percentile to about the 9th percentile rank. Sam is continuing to fall further
behind his peer group.
Let's assume that we test Sam again when he re-enters
school in the fall. Now, we have three sets of test data (beginning 5th grade,
end 5th grade, beginning 6th grade). Has Sam's score changed? If his
percentile rank continues to drop, Sam is experiencing regression. We need to
know how long will it take for Sam to recoup the skills he lost during the
summer. Regression and recoupment are primary issues in determining the
child's legal need for extended school year services (ESY) during the summer.
E. Norm Referenced versus
Criterion Referenced Tests
Most standardized tests are either norm referenced
or criterion referenced.
When we evaluated our sample group of fifth graders,
we compared each child's performance to the norm group of fifth
graders. Both Erik (raw score of 13, percentile rank of 84) and Sam (raw score
of 7, percentile rank of 16) were referenced or compared to this norm group
of fifth graders. To evaluate benefit, we looked at the norm group and the
individual child's relative position in that group at the time of the first
and second tests. We computed each child's change in position, i.e. progress
or regression.
In our example, we also referenced the criteria
of number of push-ups completed. A criterion reference analysis
determines whether or not a child meets certain criteria (without reference to
a norm group.) For example, at the beginning of the year, Sam completed 7
push-ups. If the criteria for success was 8 push-ups, then Sam failed to reach
that goal. Let's assume that Sam received a year of physical fitness
remediation; after that year, Sam completed the 8 push-ups. Does Sam now met
the criteria for success? The answer to this question depends on whether the
criteria have increased now that Sam is a year older.
Another factor complicates this picture. We know that
Sam's peer group completed 10 push-ups at the beginning of the year and 12 at
the end of the year. Definitions of success are affected by the passage of
time. If we rely on criterion referenced measures, we can be misled as to
whether the child is falling further behind the peer group. We need to know
exactly what the criterion is and what this means when the child is compared
to a norm group.
F. Standard Deviation
Percentile ranks are
computed by determining the mean score and the amount of variation
of all scores around the mean score. Are the scores bunched around the number
10 in a tight uniform distribution? Are the scores evenly distributed? Do they
peak and taper slowly in our earlier bell curves, or do they bunch at the
ends, without any scores in the middle? In other words, is there a great variance,
with the scores spread over a wide range with two or more peaks, or is there a
normal bell curve distribution of scores?
On our push-up test, most of the 5th grade children
earned scores around 10 push-ups, with an even distribution above and below 10
push-ups. But, if one-half of the children completed 5 push-ups, one-fourth
completed exactly 14 push-ups, and the remaining one-fourth completed 16
push-ups, then the average or mean number of push-ups would still be 10.
One-half of the children would have scored above 10 and one-half below 10.
In this case, the distribution is not evenly
distributed in a smooth curve above and below the score of 10. In fact, the
variance is very large and would present a highly unusual curve with a peak at
5, a drop to zero between 6 and 13, then a jump at 14, a drop at 15, another
jump at 16. This distribution of scores would not present a normal bell curve
distribution. Educational and psychological tests are designed to present
normal bell curve distributions with predictable patterns of scores.
We simply need to know the mean and standard deviation
of the test. In most educational and psychological tests, the mean is 100 and
the standard deviation is 15. (Mean = 100, SD = 15) In most subtests, the mean
is 10 and the standard deviation is 3. (Mean = 10, SD = 3) Average scores do
not deviate far from the mean. As scores fall significantly above or below the
mean, they are referred to as being a certain value or distance from the mean,
e.g., 1 or 2 standard deviations from the mean.
In all tests, the mean is at 0 (zero) standard
deviations from the mean. The next marker on the bell curve is +1 and -1
standard deviations from the mean, followed by 2 standard deviations from the
mean. To interpret your child's test scores, you will need to know the test
instrument's mean score and standard deviation score.
Using our original push-up example, the mean
score was 10 push-ups and the standard deviation (SD) was 3 push-ups.
This push-up example is identical to the subtest scores in almost all
standardized educational and psychological testing.
REMEMBER:
With most subtest scores, the
mean is 10, and the standard deviation is 3.
One standard deviation above the mean is 10
plus 3, i.e. 10 + 3 = 13. One standard deviation below the mean is 10
minus 3; i.e. 10 - 3 = 7. One standard deviation above the mean always falls
at the 84 percent level (PR = 84); one standard deviation below the mean is
always at the 16 percent level (PR = 16). Two SD's above the mean is always at
the 98 percent level (PR = 98); and two SD's below the mean are always at the
2 percent level (PR = 2).

Looking at actual test scores, we may see that the
child scored "one standard deviation below the mean" on a particular
test or subtest If the score is one standard deviation below the mean,
then the child's percentile rank is 16.
REMEMBER:
The
subtest scores of most tests used with our children have a mean of 10 and
standard deviation of 3. If a child scores 7 on a subtest, this means that
the child scored at the 16th percentile. A subtest score of 13 means that
the child scored at the 84th percentile.
G. Standard Scores
One of the most difficult concepts for most parents to
grasp is that of standard scores. Since many educational test scores are given
in standard scores, it is essential for parents to understand what they mean.
At an IEP meeting, a parent may be told that the child
earned a standard score of 85 in one area, a standard score of 70 in another
area. Most parents are relieved when they get this news -- because they
believe that these numbers are similar to grades with 100 as the top score and
0 as the lowest. This is absolutely incorrect! Standard scores are NOT like
grades.
In standard scores, the average score or mean
is 100, with a standard deviation of 15. The average child will earn a
standard score of 100. If a child scores 1 standard deviation above the mean,
the standard score is 100 plus 15; i.e. 100 + 15 = 115. If the child scores 1
standard deviation below the mean, this is 100 minus 15, i.e. 100 - 15 = 85.
Since a standard score of 115 is 1 standard deviation
above the mean, it is always at the 84 percent level. Since a standard
score of 85 is 1 standard deviation below the mean, it is always at the
16 percent level. A standard score of 130 (+2 SD) is always at the 98
percent level. A standard score of 70 (-2 SD) is always at the 2
percent level.
Remember Katie? Earlier, we learned that on the
Wechsler Intelligence Scale, Katie earned a Full Scale IQ of 101. Later, we
saw that this score was misleading because Katie's Verbal IQ score was 114
while her Performance IQ score was 86. The psychologist found that Katie
scored 2 standard deviations above the mean on the Similarities subtest
of the Wechsler Intelligence Scale for Children, 3rd Revision (WISC-III). What
does this mean?
You are learning that a score of 2 standard deviations
above the mean places the child at the 98th percent level on the area being
measured. Since the Similarities subtest of the WISC-III measures intellectual
reasoning power, Katie's intellectual reasoning power is at the 98 percent
level.
The psychologist also found that Katie had a standard
score of 68 -- which was 2.5 standard deviations below the mean -- on
the spontaneous writing sample of the Test of Written Language (TOWL-III). Two
SD's below the mean is at the two percent level. With your new knowledge, you
know that Katie's ability to produce spontaneous writing samples was actually
lower than the one percent level.
When we first introduced Katie, we posed two
questions:
1. Do these two test scores
help to explain the academic problems Katie is having?
2. Do her test scores tell
us anything about her moodiness and her intense dislike of school?
Katie's intellectual reasoning ability places her at
the top 98 percent of all youngsters her age. However, her ability to convey
her thoughts in writing is below the one percent level. If Katie is very
bright but is unable to convey her knowledge to her teachers on written
assignments and tests, would you expect her to feel frustrated and stupid? Do
you question why, after years of frustration, Katie is angry, depressed and
now wants to quit school?
H. Rules to Remember
All educational and psychological tests that report
scores using percentile ranks or standard scores are based on the bell curve.
To interpret the tests results, you should know the mean and the standard
deviation. The Wechsler, Woodcock-Johnson, Kaufmann, and most other
standardized tests use this format.
Since most educational and psychological tests use standard
scores (SS) with a mean of 100 and a standard deviation of 15, a
standard score of 100 is at the 50% percentile rank (PR)
level. A standard scores of 85 is at the 16% PR level. A standard score of
115 is at the 84% PR level.
Most educational and psychological tests use subtest
scores with a mean of 10 and standard deviation of 3. A subtest
score of 10 is at the 50% PR level. Subtest scores of 7 and 13 are at the
16% and 84% PR levels.
One half of all children fall above and one half of
all children fall below the mean of 50% which is also represented as a
standard score of 100. A standard score of 100 = PR 50.
Two-thirds of all children are between + 1 and - 1
standard deviations from the mean.
Two-thirds of all children are between the 16% and
84% percentile ranks. (84 minus 16 = 68).
A standard deviation of -1 is at the 16% level. Zero
is at the 50% level. A +1 SD is at the 84% level.
A standard score of 85 is at the 16% level. An SS of
100 is at the 50% level. An SS of 115 is at the 84% level.
A standard deviation of -2 is at the 2% level. An SD
of +2 is at the 98% level.
A standard score of 70 is at the 2% level. A
standard score of 130 is at the 98% level.
A standard score of 90 is at the 25% level. A
standard score of 110 is at the 75% level.
One half of all children fall between the 75% level
and 25% level. (75-25 = 50).
One half of all children achieve standard scores
between 90 to 110.
A percentile rank score between 25% and 75% is the
same as a standard score of between 90 to 110 -- and are usually considered
to be within the "average range."
V. Understanding Test Data
The results of most educational tests are reported
using standard scores. Parents must know how to convert standard scores
into percentile ranks. Using the table below and bell curve above, you can
convert any standard score into a percentile rank score. The earlier
push-up example used standard educational scores.
|
Standard
score |
Subtest
score |
%
rank |
Standard
score |
Subtest
score |
%
rank |
Standard
score |
Subtest
score |
%
rank |
|
145 |
19 |
>99 |
104 |
-- |
61 |
91 |
-- |
27 |
|
140 |
18 |
>99 |
103 |
-- |
58 |
90 |
8 |
25 |
|
135 |
17 |
99 |
102 |
-- |
55 |
89 |
-- |
23 |
|
130 |
16 |
98 |
101 |
-- |
53 |
88 |
-- |
21 |
|
125 |
15 |
95 |
100 |
-- |
50 |
87 |
-- |
19 |
|
120 |
14 |
91 |
99 |
-- |
47 |
86 |
-- |
18 |
|
115 |
13 |
84 |
98 |
-- |
45 |
85 |
7 |
16 |
|
110 |
12 |
75 |
97 |
-- |
42 |
80 |
6 |
9 |
|
109 |
-- |
73 |
96 |
-- |
39 |
75 |
5 |
5 |
|
108 |
-- |
70 |
95 |
9 |
37 |
70 |
4 |
2 |
|
107 |
-- |
68 |
94 |
-- |
34 |
65 |
3 |
1 |
|
106 |
-- |
66 |
93 |
-- |
32 |
60 |
2 |
<1 |
|
105 |
11 |
63 |
92 |
-- |
30 |
55 |
1 |
>1 |
A. Other Tests: Means and
Standard Deviations
Adding to the confusion about tests is the fact that
test scores are sometimes reported differently. For example, test scores may
be reported as "Z Scores." Z scores are simply
standard deviation scores of one with a mean of zero (Mean = 0, SD = 1,
instead of a mean of 100 and SD of 15 as we found with standard scores).
If you know that a particular child earned a Z score
of -1, then you also know that the child's score was one standard deviation
below the mean, which is a percentile rank of 16. If you convert this score,
using the standard score format with a mean of 100 and a standard deviation of
15, you will see that a z score of -1 is the same as a standard score of 85.
Another test format uses T Scores. With
T scores, the mean is 50 and each unit of standard deviation is equal to 10. A
T score of 60 is the same as a Z score of +1. A T score of 60 and a Z score of
+1 are equal to a percentile rank of 84. A T score of 70 is equal to a Z score
of +2, a standard score of 130, and a percentile rank of 98.
Another measure is a Stanine test. In Stanine
tests, the mean is five and the standard deviation is 2.
B. Specific Tests
Since tests are always in a state of change with new
versions being produced, we will not attempt to review and describe each test.
There are a number of parent-oriented publications that you can refer to.
Interested people may ask the examiner to photocopy relevant portions of the
manual for you. Examiners cannot copy actual test questions for you, but may
be able to copy the instructions and explanations. This is your best source of
current test information.
Earlier in this article, you learned that both the
Verbal and Performance IQ scores are actually composites or averages of five
different subtests. Each of the separate subtests measures very different
abilities. Let's analyze Katie's subtest scores to see what else we can learn
from them.
|
Wechsler Intelligence Scale for Children, 3rd Ed. (WISC-III) |
|
Verbal Subtests |
Performance Subtests |
|
Information |
10 |
| |