Testing Oral Production
A.
What
is meant by Speaking a Second Language ?
Speaking is a complex skill requiring the simulaneous use
a number of different abilities which often develop at different rates. There
are five components generally recognized in analysis of the speech process:
1.
Pronunciation
(including the segmental features vowel and consonants and the stress and
intonation patterns)
2.
Grammar
3.
Vocabulary
4.
Fluency
(the case of speed of the flow of speech)
5.
Comprehension
(requires a subject to respond to speech as well as to initiate it)
B.
The
Major Problem in Measuring Speaking Ability
The
central reason is the lack of general agreement on what “good” pronunciation of
a second language really means: is comprehensibility to be the sole basis of
the judgement, or must we demand or high degree of both phonemic and allophonic
accuracy? We cannot put much confidence in oral ratings.
C.
Types
of Oral Production Test
Most
test of oral production fall into one of the following categories:
1.
Relatively
unstructured interviews, rated on carefully constructed scale
Scored Interview
The simplest and most frequently emploted method of
measuring oral proficiency is to have one or more trained rates interview each
candidate separately and record their evaluations of his competence in the
spoken language.
As the other types of highly subjective measures, the
great weakness of oral ratings is tendency to have rather low reliability.
Positive steps can be taken to achieve a tolerable degree of reliability for
the second interview these are:
1.
Providing
clear, precise, and mutually exclusive behaviorial statements for each scale
point
2.
Training
the raters for their tasks
3.
Pooling
the judgement of at least two raters per interview
Use of More Than One Scores
The scoring of oral ability is generally highly
subjective. Even with careful training, a single scorer is unlikely to be as
reliable as one would wish. If two testers are involved in a (loosely defined)
interview, then they can independently assess each candidate. If they agree,
there is no problem. If they disagree, even after discussion, then a third
assessor may be referred to.
2.
Highly
structure speech samples (general recorded), rated according to very specific
criteria
As a rule highly structured speech samples test are in
several parts, each designed to licit a somewhat different kind of speech
sample.
1.
Sentence
Repetition
The
examine hears and then repeats a series of short sentence
Scoring
procedure: the raters listens to the pronunciation of two specific
pronunciation points per sentence, marking wether or not each pronunced in an
acceptable way.
Sentences point
to be rated
1.
Jack
always likes good food vowel
contrast in good:food
2.
We’ll
be gone for six weeks vowel
contrast in six:weeks
3. They’ve gone farther south voice-voicelessfricative farther:south
2.Reading
Passage
The examine is given several minutes to
read a passge silently, after which he is intructed to read it aloud at normal
speed and with appropriate expression
Scoring
proceduer: the raters marks two or more pronunciation points per sentene and
then makes a geneal evaluation of the fluency of the reading
Examiner’s
copy of the test points
to be rates
While Mr.
Brown read his newspaper his wife primary
stress
finished
packing his clothes for the trip. Voiced
final consonant (s)
The suitcase
was already quite full, and she vowell
quality
was having a
great deal of difficulty finding primary
stress
room for the
shirt, socks, and handkerchiefs series
intonation
turning to her
husband, she asked, “are you consonants
cluster
sure you
really want to go on this trip?” intonation
contour
“I’m sure,”
replied Mr. Brown, intonation
contour
“but how about
you?” stress
and pitch
3.Sentence
Conversion
The examinee
is instructed to convert or transform sentence in specific way (from positive to
negative, statement to question, present tense to past). The voice on the tape
gives the sentence one at a time, the examinee supplying the conversion in the
pause that follows.
Scoring
procedures: the raters scores each converted sentence on the basis of whether
or not it is grammatically acceptable.
4.Sentence Contruction
The voice on
the tape asks the examinee to compose sentence appropriate to specific
situation.
Scoring
procedure: the raters score each sentence on an acceptable-unacceptable basis
Example:
1. “You are
trying to find the post office in a strange city. Ask a policeman for
direction.”
2. ”You have
teleponed your friend Marry, but her mother answer and tell you that Marry is
not at home. Ask her to leave a message for Marry to call you when she comes
home.”
5.Respons to
Pictorial Stimuli
The examinee
is given time to study each of a series of pictures and then briefly describes
what is going on in each scene.
Scoring
procedure: for each picture the raters gives a separate ratting of the
examinees pronunciation, grammar, vocabulary, and fluncy.
3.
Paper-and-pencil
objective test of pronunciation, presumably providing indirect evidence of
spaeking ability.
Characteristic item types appearing in paper-and-pencil
pronunciation test:
1.
Rhyme
words. The examinee is first presented wth a test word which he is instructed
to read to himself, after which he is to select the one word from aong several
alternative which rhymes with the test word.
Ex:
1. could rhymes with a. Blood
b.
food
c.
would
2. plays rhymes with a. Case
b.
raise
c.
press
2. Word stress. The examinee is to decide which
syllable in each test word receives the heaviest stress.
Ex: 1. Frequently
2. introduce
3. develop
2. introduce
3. develop
3.Phrase stress.
The examinee is to decide which one of several numbered syllables in each
utterance would receive the heaviest stress.
Ex: 1. I know
that Henry went to the movie, but where did John go ?
2.
I’m
certain Professor Brown wants to see you, but he’s in class just now
To have confidence in such paper-and pencil test of
speaking ability, we would need strong statistical evidence of their validity,
that is, evidence that they are really testing what they purport to test. We
would need some trustworthy external criterion –some reliable measure of how
the subjects actually do speak-and, it is the lack of such a measure that is
still the chief stumbling block to all our efforts to evaluate oral production
with real precision. Testing specialists who have used the paper-and-pencil objective
tests have attempted to validate them by comparing test result with judges
‘evaluations of the subjects’ oral reading of the test items. However we have
been unable to establish either the validity or the invalidity of these tests
by rigorous statistical methods, we can cite a number of observations which
cast considerable doubt on their efficacy. The users of such tests have
frequently observed that some students with superior pronunciation have done
poorly on the tests, while high scores have sometimes been obtained by students
who could barely be understood.
Secondly, one cannot help wondering about the technique
of of testing the production of the segmental phonemes by means of rhyme items.
Thirdly, even a casual examination of the range of problems
treated in these tests inspires the strongest suspicions that they sample the
total sound system most inadequately.
Summary
1.
The
validity of paper-and-pencil objective techniques remains largely unproven,
such techniques should therefore be used with caution, and certainly never as
the sole measure of oral proficiency.
2.
The
techniques of eliciting and rating highly structured speech samples shows much
promise, but such testing is still in the experimental stage and requires very
great test-writing skill and experience.
3.
The
scored interview, though not so reliable measure as we would wish for, is still
probably the best technique for use in relatively informal, small-scale testing
situation; and ways can be shown for substantially improving the effectiveness
of this testing device.
D.
Improving
The Scored Interview
General procedures
1.
Decide
in advance on interview methods and rating standards.
By devoting approximately the same length of time to the
average interview, speaking to the candidates at about the same rate of speed,
maintaining the same level of difficulty in the questions they ask. For the
scoring, the raters should be able to reach basic agreement on methods and
standards, ensuring a reasonable degree of uniformity.
2.
Conduct
the interviews in some quiet place with suitable acoustic
It will naturally impose an unfair burden on the
candidates and greatly reduce the reliability and the validity of the ratings.
3.
Reserve
sufficient time for each interview
Ten to fifteen minutes would seem essential as the
minimum for each interview, though the time required will vary somewhat from
candidate to candidate.
4.
Use
at least two raters for each candidate
At least two independent ratings are necessary if
satisfactory rater reliability is to be obtained.
5.
Rate
the candidate without reference to other test scores
Candidates score on other tests should be withheld from
the raters untill after they have completed their evaluations.
6.
Record
the rating after the interview
Scoring should be done after the candidate leaves the
room and the next examinee will not enter until the marking has been completed.
7.
Obtain
each candidate final score by pooling or averaging the two or more rating that
have been given him.
The candidates should be called back for a second
evaluation.
Suggestion for conducting the interview
1.
Beginning
the interview
The interviewer should begin with a social questions,
speak at normal conversational speed, modify his speech somewhat (if the
candidate cannot comprehend what is being said), speaking more slowly, and with
some simplification of sentence structure and vocabulary while making a mental
note to score the candidate accordingly.
2.
Continuing
the interview
The interviewer should move on to other areas of
discussion and follow lines of questioning which the examinee has not been able
to anticipate.
3.
Concluding
the interview
Whatever the precise form that the conclusion takes, care
should be taken not to give the candidate the impression that he is being cut
off in the middle of a discussion.
CONCLUSION
The accurate measurement of oral ability is not easy. It
takes considerable time and effort to obtain valid and reliable results.
Neverheless, where backwash is an important consideration, the investment of
such time and effort may be considered necessary.
Readers are reminded that the appropriateness of content,
descriptions of criterial levels, and elicitation techniques used in oral
testing will depend upon the needs of
individual institutions or organizations.
REFERENCES
Harris, David P. Testing
Language As a Second Language.
Hughes, Arthur. Testing
for Language Teachers. Cambridge University
Tidak ada komentar:
Posting Komentar