Você está na página 1de 47

1.

TOPIC
AN ANALYSIS OF ENGLISH FINAL TEST OF THE FIRST SEMESTER STUDENTS GRADE V MADE BY MGMP OF ENGLISH OF NATIONAL EDUCATION MINISTRY OF SEMARANG AND MGMP OF ENGLISH OF RELIGION MINISTRY OF SEMARANG

2.

BACKGROUND OF THE STUDY


Evaluation is a common term we always hear in our daily life. Evaluation is

used when we want to know the progress and result what we have been done. When we work or do some jobs, evaluation is useful to know whether our work done well or not. Evaluation can be used to know and check if there is some obstacle that blocked our plan. One of several ways in doing an evaluation can be done by making an assessment. The last term is more special and narrower than previous one. It is because assessment is part of evaluation. Evaluation can be done by making an assessment, but evaluation occurs in some ways by observation or performance judgment during the process. Assessment itself is more familiar in education domain rather than other since teacher, trainer, or education practitioner use the term to measure and analyze how far students understand material they taught. Somehow, it is done by some employers to assess their employee to know the progress of their work. For example, a director makes an assessment and appraisal to analyze her or his employees work; a headmaster does it to assess teachers

work; and a teacher used the term to assess and evaluate students understanding and achievement. To assess students understanding and know their achievement on material which has been taught by teachers, usually they give their students some questions in form of a test. Students have to answer with the correct answer related to the subjects material. The question can be in form of essay test in which students have to write in some sentences the theories and their understanding on it. Beside, teachers can give the questions in form multiple-choices to simply check students understanding and acceptance on the material. However, every form of the assessment have their plus and minus aspects that the teachers should know. It is teachers authority to choose what form of questions they want to give to the students. And it can be based on what aspects of students intelligence that teachers want to know. Assessing language subject in this case English as foreign language, is little bit different from assessing other non-language subject. Non-language subject is presented by questions and item tests in native language so what teachers want to knows here is about the scientific material and knowledge related to the subject students know. Whereas, testing language subject does not just examine on the knowledge and science of the subject, but practically it should cover some skills in mastering foreign language in order that students can be said as successful learners. Thus, in language testing the questions have to be able to measure language learners in mastering listening, speaking, reading and writing in foreign language. Of course, the skills they have to master are in line with students level

of education. It is for example that in level of senor high school, the students should master at least two or three skills in minimum standard. It means that even though they are not able to speak communicatively in English and write in a good arrangement, at least they understand when they listen to the conversation and read some statements in it. It can be assumed that in the level of elementary school, what students should master is not the same with senior high school one. In this level, they can be said as master in English when they can memorize some vocabularies related to themes and use them in the context of situations. It means that, students in elementary school can only understand and make statements in foreign language in very simple sentences or paragraph both orally or written. But usually teacher in this level only focus on students mastering in vocabulary. Considering that language testing of elementary school is more focus on vocabulary mastering and the use of it in context, it usually comes to the students in multiple-choice formats. The teachers prefer using this format because it simply can measure students mastering in vocabulary. Even though in multiple-choices test items not only vocabularies mastering we can measure, but also another aspects such as grammar, and language context. In addition, to evaluate other aspects, teachers usually combine multiple-choice questions with essay test items. So that it is teachers authority to use whatever format they want based on the purpose. Sometime, in any cases teachers cannot make by themselves test items given to their students. It is because the test pack is already prepared by an institution that has right to make the test items. In this case, what teachers can do is only preparing the material for students to face the test. In Indonesia, this case

is such National Examination for graduation of each education level, and final test of every semester. Ministry of National Education has the authority to make the test items, because in Indonesia every school should go with the tide of its official regulation. Besides that, Ministry of Religion has the same authority to make test pack for some schools. The schools that follow the education rule of Ministry of Religion are those which have more religion subject on their curriculum, in this case Islamic. Therefore, the schools induce on two regulation of education; Ministry of National Education and Ministry of Religion. Problem arises when there are two different test pack form the same grade of each education level. A question come up whether or not test pack organized by Ministry of National Education has the same quality and characteristics with the one arranged by Ministry of Religion. If there are any differences, for example test pack made by Ministry of National Education easier than those made by Ministry of Religion or vice versa, it is unfair to one side. Another case is that one of the test packs may not appropriate with instructional material, in this case Standard and Basic Competence. From the problem explained above, the writer want to analyze English final semester test pack made by MGMP English of Ministry of National Education compared by the one made by MGMP English of Ministry of Religion. In the end of the study, the writer hopes that there is no difference between the two test packs. If they exists, from this study the differences can be fixed so that the education system from both ministries can go in line one to another.

3.

IDENTIFICATION OF THE PROBLEMS

Analysis of items of two test-packs with the aim to know whether they are a type of good test or not, can be done by using some quantitative measurement. It is such as the measurement of validity, reliability, level of difficulty, discrimination power, and item distractors. Yet, if we want to study further about analysis of test items, we should not only tie on quantitative analysis. We can use qualitative analysis to evaluate several non-statistical aspects of test items. The study of qualitative analysis can be covers with some analysis on the appropriateness of test items with materials in teaching and learning process (school-based curriculum), test construction, and language used on test items.

4.

FORMULATION OF THE PROBLEMS

In order not to discuss something irrelevant the writer has limited the discussion by presenting and focusing her attention to the following problems: 4.1 How is the quality of the English Final test of first semester students made by MGMP English of the Ministry of National Education and Ministry of Religion of Semarang in terms of validity, reliability, difficulty level, discrimination power, and item distractors? 4.2 How are the appropriateness of those test items in terms of instructional materials (Standard Competence and Basic Competence), Test

Construction, and Language Use? 4.3 What are differences of those tests items made by MGMP English of Ministry of National Education and Ministry of Religion of Semarang?

5.

OBJECTIVES OF THE STUDY

Based on the formulated problems above this study has several objectives. They are elaborated as follows: 5.1 To describe how the quality of the English Final tests of first semester students made by MGMP English of the Ministry of National Education and Ministry of Religion of Semarang in terms of difficulty level, Discrimination Power, validity and reliability are. 5.2 To describe how the appropriateness of those test items in terms of instructional materials (Standard Competence and Basic Competence), Test Construction, and Language Use are. 5.3 To explain the differences of those tests items made by MGMP English of Ministry of National Education and Ministry of Religion of Semarang?

6.

SIGNIFICANCES OF THE STUDY

Related to the objectives of the study, this analysis is made to give some advantages as elaborated in some paragraphs below. There are three major significances that this study wants to give. The first one is theoretical significance. This study may give basic understanding toward teacher, educator, trainer, and others that assessment and evaluation cannot be made and assumed only based on students or ones outer performance or guessing in some cases. They should know that that test items should be made to evaluate students understanding and ability. The tests are useful also to develop their professionalism as being an educator.

Second one is practical significances. This study is beneficial for the test makers as additional reference in constructing and analyzing test items and their procedures. And the last one is pedagogical significance. This study provides English teachers especially elementary schools teachers with some meaningful and useful information for efficient class discussion of the test result, the general improvement of classroom instruction, evaluation in teaching learning process, and improvement in test making.

7.

LIMITATION OF THE STUDY

The limitation of the study is written to limit and border the research so that it will not go further that the researcher want to discuss about. This study is quantitative and qualitative research. It studies about such test items in form of multiplechoice questions. This test will be analyzed using quantitative approach in this case its statistical features will be measured. Not only using quantitative approach, does qualitative approach will also be used to synchronize the tests within Standard and Basic Competence, test instruction, and its language use. The test items used here is English test-packs in final test of first semester students for Grade V of Elementary School. The study analyze only in Grade V of Elementary School just because the limitation of the time of research.

8.
8.1

REVIEW OF THE RELATED LITERATURE


EVALUATION IN EDUCATION

Evaluation is the notion that the value of worth of someone or something is to be judged. It may occur by some tests, measurements, or other objective information (Nitko, 1983:7). More specific, Tuckman says that evaluation is a process wherein the parts, processes, or outcomes of a program are examined to see whether they are satisfactory, particularly with reference to the programs stated objectives, our own expectation, or our own standard of excellence (1975:12). What is meant by educational evaluation here is a way of examine, investigate, and appraise any aspects in education field. It is a process which involves the production, application and analysis of instruments of educational measurements (Nurulia, 2011:12). In narrower term, this technique is used by a headmaster to evaluate some teachers, and commonly, this technique is used also by teachers to evaluate their students understanding on materials. It is near with what Gronlund (in Nurulia, 2011:12) states that evaluation is systemic process of determining the extent to which instructional objectives are achieved by pupils. Cornbach states that evaluation is the collection and use of information to make decisions about an educational program (1984:60 in Nurulia, 2011:13).

8.2

LANGUAGE TESTING AND ASSESSMENT

A test is a method of measuring a persons ability, knowledge or performance in a given domain (Brown, 2004:3). In this statement, Brown, want to highlight on the term testing as a way or method in which peoples intelligence and

achievement are being explored. Testing becomes the important method to check many requirements or competency in some fields like medicine, law, sport, and government. Because, it test so many aspect that must be fulfilled by test takers before go deep in such fields. Yet, in teaching learning process, the term testing is little bit different with those kinds of test. Related to the term of testing, people are commonly think that assessment is the same method as testing is. They still confused and consider that testing and assessment are the synonymously. While test are prepared administrative procedures that occur identifiable times in a curriculum when learner muster to offer peak performance, knowing that their responses are being measured and evaluated, assessment is an ongoing process that encompasses a much wider domain (Brown, 2004:4). Bachman (2004:7) states that assessment is the process of collecting information about a given object of interest according to procedures that are systematic and substantively grounded. It is conducted and can be implied whenever a student responds to a question, offer a comment, or tries out the new word or structure (Brown, 2004:4). In this situation, the teacher subconsciously makes an assessment of the students performance (Brown, 2004:4). In educational programs, the result of assessment are most commonly used to describe both the processes and outcomes of learning for the purposes of diagnosis or evaluating achievement, or make decisions that will improve the quality of teaching and learning and of the program itself (Bachman, 2004:6) Language tests offer us many choices in test administration, test format, materials, scoring method, and test items.

(http://www.cal.org/flad/tutorial/practicality/2methodoftesting.html).

Language

test have potential for helping us collect information that will benefit a wide variety of individuals (Bachman, 2004:3). Alderson and others have argued that testers have long been concerned with matters of fairness and that striving for fairness is an aspect of ethical behavior, others have separated the issue of ethics from validity, as an essential part of the professionalizing of language testing as a discipline (Davies, 1997). Tests then are subset of assessment. They are certainly not the only form of assessment that a teacher can make. Test can be useful devices, but they are only one among many procedures and tasks that ultimately use to assess students (Brown, 2004:4). In short, it can be said that test is a part of assessment so that assessment is wider than test itself. Assessment can be understood as a part in teaching and learning process. Testing and assessment are two methods and ways that must be used and implied in teaching. Language assessment takes place in a variety of situations, including educational programs and real word settings (Bachman, 2004:6).

8.3

TYPES OF ASSESSMENT AND TESTING

In order to know more about assessment, in this sub chapter the writer want to explain about type and from of assessment. There are two types of assessment, informal and formal assessment (Brown, 2004:5). Informal assessment can take a number of forms starting with incidental, unplanned comments and responses,

10

along with coaching and other impromptu feedback to the student (Brown, 2004:5). In this type of assessment, teachers record students achievement by some techniques that are not systematically made. In this assessment, teachers can memorize what students do in the classroom based on their learning activity. Whereas, formal assessment are exercises or procedures specifically designed to tap into a storehouse of skills and knowledge (Brown, 2004:5). Different from informal assessment, this type of assessment is intentionally made by teacher to get students score to know their achievement. This assessment is done by teacher by making standard and official based on the rule. It is conducted systematically and periodically (Brown, 2004:6). We can say that all tests are formal assessment but not all of formal assessment is tests (Brown, 2004:6). There are two functions of assessment that usually occurs in the classroom based. They are formative and summative assessment (Brown, 2004:6). Formative assessment intends to evaluate students in the process of forming their competencies and skills with the goal of helping them to continue that growth process (Brown, 2004:6). This formative assessment usually occurs during teaching and learning process in the classroom done by teacher to know directly students achievement. This assessment is conducted to build and grow up students understanding and skills during the process. Assessment is formative when teachers use it to check on the progress of their students, to see how they have mastered what they should have learned, and then use this information to modify their future teaching plans (Hughes, 2005:5). Summative assessment, then, aims to measure, or summarize, what students has grasped, and typically

11

occurs at the end of a course or unit of instruction (Brown, 2004:6). It is used in the end of the term, semester, or year in order to measure what has been achieved both by groups and individuals (Hughes, 2005:5). This type of assessment is used by teacher to measure and evaluate what students achieved in the process of teaching learning in classroom. Final exams are the example of this test. In short, formative assessment is done in the middle of the semester in the process of teaching and learning, but summative is done in the end of the semester. The object of this study is final test of first semester, so this kind of test is formal assessment with the function of summative assessment. In Indonesia, usually a final semester test-packs consist of three parts of items. They are, first, multiple choice items, and the next is short-answer question, and the last is essay items. Every item has different definitions and characteristics. There are some different formulas and measurement that can be used. To know more about the characteristics of each item, next sub-chapter below will explain more about them.

8.4

MULTIPLE-CHOICE TEST Multiple-choice items which may appear to be the simplest kind of item to

construct are extremely difficult to design correctly (Brown, 2004:55). Multiplechoice items take many forms, but their basic structure is that it has stems or the question itself, and a number of options- one which is correct, the others being distracters (Hughes, 2005:75). The most obvious advantage of multiple-choice is

12

that scoring can be perfectly reliable (Hughes, 2005:75). Scoring in multiple choice techniques is rapid and economical. It is possible to include more items than would otherwise be possible in a given period of time. This test items are designed to elicit specific responses from the student (Valette, 1967:6). It allows the testing of receptive skills without requiring the test taker to produce written or spoken language and it makes greater reliability (Hughes, 2005:76). The principles that stand out multiple-choice test items are practicality and reliability (Brown, 2004:55). Brown states in his book Language Assessment Principles and Classroom Prectices, those multiple-choice items have prime terminology. They are: a. Multiple-choice items are all receptive, or selective, response items in that the test-takers choose from a set of responses rather than creating a response. b. Every multiple-choice item has stem, which presents a stimulus, and several options or alternatives to chooses from. c. One of those options, the key, is the correct answer, while the others serve distractors.

The most advantages of multiple choice is that scoring can be perfectly reliable. It can be rapid and economical. Another advantage is that it is possible to include more items than would otherwise be possible in a given period of time. (Hughes, 1989:76). In another case, Hughes states number of weaknesses of multiple-choice items (Hughes, 2005:76-78). Multiple-choice questions only recognition of knowledge, test takers can only guess to come with correct answer, test takers can cheat easily, the technique severely restrict what can be tested, it is very difficult

13

to write successful items and the answer is restricted by the optional answer. In this case, test-takers can not elaborate their answer and understanding of the material because the answer is limited only by optional answer. Multiple-choice comes to be the first part of test packs faced by test-takers. When we want to analyze this item we can use statistical analysis as stated in the next chapter that Research Method. Since there is only one right answer, the score can very rapidly mark an item as correct and incorrect (Valette, 1967:6). Thus, we can use simple codes to present the answer of test-takers. Number 1 presents true answer chosen by students, and 0 presents false answer. If students choose a true answer, we can note it with 1. And vice verse, if test-takers, in this case student, answer with false answer we note it with number 0. More explanation can be read in quantitative analysis later.

8.5

SHORT-ANSWER ITEMS

After test-takers have already answered the multiple choice items in first chapter of test-packs, in next chapter they have to answer on short-answer items. The question is just the same, but in this items students are not given distractors items. The answers are usually only one or two words. Those answers should be exactly correct. It is usually occurs in listening and reading tests (Hughes, 1989:79). Short-answer items deals with measurement of students knowledge acquisition and comprehension. It has two choices or formats, free and fixed. Basically, there two basic free formats they are unstructured format and fill-in or completion format. Fixed choice format include true-false, other two-choice,

14

multiple choice and matching (Tuckman, 1975:77). Short-answer items in English final semester test-packs used in this study here is the items in which students should answer with writing the answer in a short and brief. How it is different from essay test items? In essay-test items, students should explore and elaborate their answer. For example, if the question is about structure and grammar, usually students should fill in the blank with a complete sentence. Yet, in short-answer items what students should answer are usually not more than two or three words. That is why the items can be called as short-answer items. This item may require one-word answer, such as brief responses to questions, or the filling in of missing elements (Valette, 1967:8). In the short-answer items, the true answer has been determined by teachers so that students can not elaborate their answer. Both free choice and fixed choice items have previously determined correct response. In free choice type, the student is not given choices from which to select the correct response as he or she is in the fixed choice type (Tuckman, 1975:77). In this formats, basically, measurement involves asking students a question that requires that they state or name the specific information or knowledge (Tuckman, 1975:77). In this part of test-packs, usually short-answer items are in unstructured and completion/ fill-in format. In unstructured format, students can answer by a word, phrase or number. While in completion or fill-in format, students must construct their own response rather than choose an optional answer. It differs from unstructured item by requiring that they fill in or complete a sentence from which a word or phrase has been omitted (Tuckman, 1975:79).

15

In order to assure to the objective nature of short-answer items, teacher must prepare a scoring system in advance (Valette, 1967:8). Teacher should give credit score to students answer for misspelling of the world given. But since in short answer usually the answer is only one word, we can use the credit point the same as multiple choice. We can use the number 1 to presents students choose correct answer and number 0 that presents incorrect answer. We only have to mark as 1 and 0 because the answer has been determined by test-makes, and there is no optional answer for test-takers.

8.6

ESSAY TEST ITEMS In English final test of elementary school, beside multiple choice and short-

answer items, there is one more test technique that is served to the test-takers in final semester test-packs. It is essay test. Different from short-answer items, essay test need longer answer using deeper analysis. While short answer is the continuity of multiple choice items, essay test involve deep thinking about testtakers knowledge and understanding on material. In language testing, it may include in students understanding on language structure and culture. Essay items provide test-takers with the opportunity to structure and compose their own responses within relatively broad limits (Tuckman, 1975:111). Essay tests enable them to demonstrate their ability to apply knowledge and to analyze, to synthesize, and to evaluate new information in the light of their knowledge (Tuckman, 1975:111). This test is more reliable to measure students understanding. Tuckman says in his book Measuring Educational Outcomes

16

Fundamentals of Testing that there are several aspects of students that can be measured using Essay tests. They are students application, analysis, synthesis, evaluation, and combination of those four aspects (1975, 111-123). Essay questions intended to measure students application must require that the students use knowledge that has been acquired to describe a way of dealing with a concrete situation. Thus, to measure application of students the item must present a concrete situation one that can somehow be included in the reality of the students being tested and one to which that can relate (Tuckman, 1975:112). In analysis items, the questions do not contain certain problem. The situation is one with which the students presumably is familiar and that contains elements, relationships, or organization principle which can be analyzed (Tuckman, 1975:115). Unlike measuring students application in which students are given familiar problem, in measuring how students can analyze on something, as teachers we should give them a problem that need understanding in organization and relationship between several variables. When we want to measure students synthesis, the items should present problem to be solved. It should be outside of the range of the familiar or the practical and require the production of a new and unique solution of the problem. Moreover, the particular problem itself must also be new for students (Tuckman, 1975:117-118). Synthesis then can be interpreted also as how creative students in making a solution on a problem and create a new model different with what teacher has taught to them.

17

In evaluation item, the questions contain two parts that which are to be evaluated and response instructions. Response instruction also includes information about the criteria that are to be used in evaluation. In addition, an essay items to measure evaluation provides a general criterion for evaluating it and general response instruction to provide detailed support for ones evaluative position (Tuckman, 1975:122). Students understanding then have to present their skill in evaluate on the problem related with material taught by teachers. All in all, analysis, synthesis, and evaluation can all combined in a single question. Giving students an object, an organization, an occurrence, and asking them to analyze its parts or workings is the first step. Evaluating the parts or workings is the second step and redesigning or improving upon it through synthesis is the third step. (Tuckman, 1975:123). So that one item of essay questions can be used to test students intelligence and understanding performance. There are several words and keywords that can be used to prepare essay questions. Several words included are analyze, compare or contrast, describe, define, evaluate, explain, summarize, justify, outline, identify, and so on. Those words are usually used to perform and present essay items. The scoring system of this item will be very different from scoring objectives items or multiple-choice. In objective items, the score of each number is exact and all the same from number to number. Whereas, in essay items, what we should do, first, is determining the ideal answer even though no correct and wrong answer at all. The ideal answer then should be scored as highest score. And

18

the far answers of students go beyond it will be the lowest score it is. Teachers then should create interval scale to score the highest and the lowest one on each item. Interval scale will be going like picture below:

Not ideal answer

Ideal Answer

10

The interval scale then can be used to measure how far students understand the material. The highest score students get the more understand they are. Teachers have an authority to determine interval scale number between ideal and not-ideal answer. It can be a scale from 0 until 10 like the scale above, or 0 until 3 or 5 based on their preferences. It may can be decided by calculating every score of every items, from the multiple choice, short-answer items, and the last is essay items. 8.7 QUALITATIVE ANALYSIS

8.7.1 School-Based Curriculum (KTSP) Curriculum is a document of an official nature, published by a leading or central education authority in order to serve as a framework or a set of guidelines for the teaching of a subject area in a broad varied context (Celce-Murcia, 2000). A curriculum in a school context refers to the whole body of knowledge that children acquire in school (Richards, 2001:39). More specific, BSNP defines it as a set of plan and arrangement of objective, content, and lesson material, and also manner that is used as the guidance of learning activities to achieve the aim of education (2006:1751). In short, we can say that curriculum is the fundamental

19

guidelines for teachers to reach the aims of education in school. It is a groundbase teachers should know in conducting teaching learning process. School-Based curriculum is as the same as the terms curriculum has stated in the subchapter before. It is a revised-edition of curriculum of 2004 which is in Bahasa Indonesia said Kurikulum Berbasis Kompetensi (Competence-Based Curriculum). This curriculum firstly used in any educational institution since 2006. It is the way in which any school can create and make policy and rule about their educational programs. Teacher can create their own syllabus, teachinglearning process, and learning goal that are appropriate for students in their school. KTSP is operational curriculum that is arranged and applied in every educational unit (Jumadi, 2). It is because KTSP is created based on schools need and condition. In this way, schools in big city may have different curriculum from school in a small city. The arrangement of the content itself is regarding with cultural and social condition of the students of a school. In order that, students that are in different places and areas have their own learning achievement that appropriate with their natural life. Even though based on Government Rule 19, 2005 about Education National Standard, every school is mandated to develop KTSP based in Passing Competence Standard (SKL), and Content Standard (SI) and based on the guidance arranged by Education National Standard Board (BSNP). Government publishes General Guidance in arranging KTSP in order that an educational unit or a school that has ability can develop KTSP started in

20

academic year 2006/2007 (Jumadi, 1). A school is called having ability to arrange and develop KTSP if it have tried to apply Curriculum of 2004 in its institutions. Based on The Rule of Minister of National Education number 24, 2006, the arrangements of KTSP involves teachers, employees, and also School Committee with the hope that KTSP will reflect the aspiration of people, environment situation and condition, and the peoples need. That is why, this curriculum is more democratic than curriculum used in every school before. It gives place for democratization to determine the education curriculum which is appropriate to the community context where the school take place, financial context, human resources and other things of the school so that the potential of each school can be optimalized and there is competition among school (Handayani, 2010). KTSP consists of educational goal in educational unit level, structure and curriculum content in educational unit level, educational calendar, and syllabus (Jumadi, 2). Sutrisno in Handayani (22:2010) states that as a concept and also program, KTSP has characteristics as follows: a. KTSP emphasized on the students competence achievement. In KTSP, the students are formed to develop knowledge, understanding, ability, value, attitude, and wants to be skilled and independent person. b. KTSP is learning process and variety oriented. c. Learning process uses various approaches and methods. d. Teachers are not the only source, but the other educative sources are included. e. Assessment emphasizes the process and the result of study to achieve a competence.

KTSP consist of two basic documents, they are school documents and the contents. School documents here means any information about school in which

21

KTSP is arranged. They are for example introduction of KTSP, vision, mission, and goal of the school, curriculum structure and content, and education calendar that is made by the school independently. KTSP structure and content in elementary education level stated in Content Standard involves five group of subject as states below: 1) 2) 3) 4) 5) . Group of religion and morality subject Group of citizenship and personality subject Group of science and technology subject Group of aesthetics subject Group of athletics and health subject (BSNP, 2006)

The education goal in elementary level is to put basic intelligence, knowledge, personality, noble characters, and independent-lived skill and to continue into higher level of education (Jumadi, 3). Education calendar is made by school autonomously with the guidance of education calendar established by national education department. And the other basic of document is that document that relates to certain subject taught in the school. In every subject, the material consists of: a. Syllabus and Lesson Plan of the Competence Standard and the Basic Competence that are developed by Central Government; b. Syllabus and Lesson Plan of the Competence Standard and the Basic Competence that are developed by school (subject of the local content). (Handayani, 2010) English lesson in elementary education level still become extra lesson regarding this subject is less important than another local lesson. Yet, English is now

22

becoming important subject in globalization era. The goal of English subject in elementary level is to create students that having ability in developing oral communicating competence limitedly to as language accompanying action in school context and having consciousness about the essence and importance of English to increase national competitiveness in globalization era. In the table below, the writer presents competence standard and basic competence of English Lesson grade V semester I that related to this study. They are: Competence Standard Listening 1. Students are able to understand very simple instruction with an action in school context. 1.1 Students are able to respond very simple instruction with logical action in class and school context 1.2 Students are able to respond very simple instruction verbally Basic Competence

Speaking 2. Students are able to express very simple instruction and information in school context 2.1 Students are able to make a very simple conversation that follow logical action with speech act ; give an example to do an action, give a command, and give an instruction 2.2 Students are able to make a very simple conversation toa sk and or give something logically involve speech act , asking and give a help, asking and giving something 2.3 Students are able to ask and give information involve speech act; introducting, invitating, asking and giving permission, agreeing and disagreeing, and prohibiting 2.4 Students are able to express politeness using expression: Do you

23

Competence Standard

Basic Competence mind and Shall we

Reading 3. Students are able to understand English written texts and descriptive text using picture in school context 3.1 Students are able to read aloud with stress and intonation correctly involve words, phrases, and simple sentence. 3.2 Students are able to understand simple sentence, written messages, and descriptive txt using picture accurately

Writing 4. Students are able to spell and rewrite simple sentence in school context 4.1 Students are able to spell simple sentence accurately and correctly 4.2 Students are able to rewrite and write simple sentence accurately and correctly; such as Menyalin dan menulis kalimat sangat sederhana secara tepat dan berterima seperti: compliment, felicitation, invitation, and gratitution

8.7.2 Syllabus Syllabus is lesson plan in every subject and or subject group or certain theme includes competence standard, basic competence, learning material, learning activity, indicator, scoring, time allocation, and sources (Jumadi, 2). Syllabus is a part of curriculum. It is can be defined as systematically and specifically contents of curriculum that can be applied by teachers in their teaching activity. Teachers can see their teaching learnings goal, process and objectives in it. Richards states that a syllabus is a specification of the contents of a course of instruction and list what will be taught and tested (2001:2). Syllabus has to be in line with curriculum, because it is made based on curriculum. BSNP 24

defines it as learning plan on one or group of lesson/ certain theme which covers Competence Standard, Basic Competence, main material of learning, learning activities, indicator, assessment, time allocation, and source/ material/ tolls of learning (2006:1751). The development of the syllabus can be arranged by a teacher autonomously or can be done in a group of teacher of some schools, deliberation of subject

teacher (MGMP) or education official (Jumadi, 7). In Elementary school, usually teacher in grade I until VI can arrange the syllabus together. A school which cannot arrange and develop it autonomously should join together with other schools through MGMP forum to develop it. We can see from this definition that syllabus content is about specific guidelines for teachers about what they have to do with their job as educators. It is not only about what they have to do, but also how, when, and by which they have to do as a professional educator. If teachers teach without step on it, the education objectives may go out of the national education goal. To make a clear explanation about syllabus that will be used to analyze in qualitative approach in this study, here the writer presents syllabus of English lesson of grade V in semester I. It is important as the basic guidance of teaching learning process and makes an assessment of it.

8.8 8.8.1

ITEM ANALYSIS DATA (QUANTITATIVE ANALYSIS) Validity

25

Test validity refers to whether a test measures what we intend it to measure (Tuckman, 1975:229). Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and action based on test scores or other modes of assessment. (Mesick in Bachman, 2004:259). The objectives of many test is to measure the effect of certain experiences that have occurred prior to the test (Tuckman, 1975:229). A test, then, is used to monitor or assess an experience that has already occurred or to determine students learning based on the experience (Tuckman, 1975:229). When selecting a test, it is important to make sure that the information provided by the test is sufficient for the decisions we are going to make based on the test scores. (http://www.cal.org/flad/tutorial/validity/4testuse.html) Brown says that the most important principle of a test is validity (2004:22). It is the extent to which inferences made from assessment result are appropriate, meaningful and useful in terms of the purposes of the assessment (Gronlund, 1998:226). In some cases, it may be appropriate to examine the extent to which a test calls for performance that matches that of the course of unit of study being tested (Brown, 2004:22) There are two types of validity that are most relevant o classroom test, namely: face validity and content validity (Brown, 2002:26). Face validity refers to the appearance of a test that looks like it is measuring what is supposed to measure. Mousavi (in Brown, 2002:26) stated that face validity refers to the degree to which a test looks right and appears to measure the knowledge or

26

abilities it claims to measure based on the subjective judgment of the examinees who take it, the administrative personnel who decide on its use, and other psychometrically unsophisticated observers. Face validity refers to the degree to which a test look right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment, of the examinees who take it, the administrative personnel who decide on its use, and other psychometrically unsophisticated observes (Mousavi in Brown, 2004: 26). We can say that face validity refers to the performance of the test when it comes to test-takers. How it looks good or bad to test-takers and how the test-takers feel when the test-pack is given to them is known as face validity. Hughes states that a test is said to have face validity if it looks as if it measures what it is supposed to measure (Hughes, 1989:33). Brown (2004:27) states that face validity will likely be high if learners encounter: a. b. c. d. e. f. A well constructed, expected format with familiar tasks A test that is clearly doable within the allotted time limit Items that are clear and uncomplicated Directions, that are crystal clear Tasks that relate to their course work, and A difficulty level that presents a reasonable challenge

Several parts of test-packs that related with this study is about performance appears in questions sheet. They are font used in test pack whether it is easy or difficult to be read or not. If the test-packs consist of some pictures, it should be analyze also whether the picture is clear enough or not. The important aspect that should be look deeper is that the arrangement of the test-items involves vocabularies, phrases and sentences arrangement of the test. If all of the features

27

mentioned above have been well organized, the test-takers will feel confident to face and answer the test-packs. A test which does not have face validity may not be accepted by candidates, teachers, educations authorities and employer (Hughes, 1989:33). It is because the test is not standardized and the test will not perform what should be measured. In contrast to face validity, a claim of content validity requires affirmation from an expert. The expert should look into whether the test content is representative of the skills that are supposed to be measured. This involves looking into the consistency between the syllabus content, the test objective and the test contents. If the test contents cover the test objectives, which in turn are representatives of the syllabus, it could be said that the test possesses content validity (Brown, 2002:23-24). A test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc which it is meant to be concerned (Hughes, 1989:26). It means that a test will have content validity if the test-items appropriate with what teachers want to measure. If teachers want to test students understanding on grammar and structures, the test should be near of it and not out of the topic. The importance of content validity is that the greater a test have content validity, the more likely it is to be an accurate measure if what it is supposed to measure (Hughes, 1989:27). Another importance is that areas that will not test are likely to become areas ignored in teaching and learning. It means that teacher should give test-items based on what they have taught to students. 8.8.2 Reliability

28

Reliability refers to the consistency of test result. A reliable test is consistent and dependable (Brown, 2004:20). Reliable here means that a test must reliable and fit on several aspects in conducting the test itself. A test should reliable to students as test-takers. Bachman (2004: 153) states that reliability is consistency of measures across different conditions in the measurement procedures. The most common learner-related issue in reliability is caused by temporary illness, fatigue, anxiety in facing the test (Brown, 2004:21). Beside, a test must have rater reliability. Rater reliability is a principle in which the scoring process should be match and fit to the testing and assessment. This scoring process must be standardized. Unreliability may also result from the conditions in which the test is administered (Brown, 2004:21). In every test, then, no measurement instrument or procedure is perfect (Tuckman, 1975:253). Neither a mechanical device such as voltmeter nor a human device such as a test gives a result that is a perfect reflection of the property being measured (Tuckman, 1975:253) Test administration must be reliable also by which a test will go succeed and wellorganized. Bad administration and unplanned arrangements of a test can make the good preparation going worse. 8.8.3 Level of Difficulty (Item Facility)

A good test is a test which is not too easy or vice verse is too difficult to students. It should gives optional answer that is rational students may choose. Very easy item are to build in some affective feelings of success among lower ability students and to serve as warm up items, and very difficult items can provide a challenge to the highest-ability students (Brown, 2004:59). Too easy test will not

29

stimulate students to fix it, and too difficult test will make boring students to find the answer (Arikunto, 2006:207). Level difficulty or in Brown (2004:58) it states as item facility is the extent to which an item is easy or difficult for the proposed group of test-takers. It makes students know and record the characteristics of teachers test if the test given always comes to them too easy and difficult. Thus, the test should be standard and fulfill the characteristics of a good test. The number that shows the level difficulty of a test can be said as difficulty index (Arikunto, 2006:207). In this index there are minimum and maximum scores. In this index, the lower index of a test shows more difficult the test is. And vice verse, the higher the test is the easier it is. There are some factors that every test constructors must consider in constructing difficulty level of test items. Mehren and Lehmen point out that the concept of difficulty or the decision of how difficult the test should be depends on variety factors, notably 1) the purpose of the test, 2) ability level of the students, and 3) the age of grade 8.8.4 Discrimination Power (Item Discrimination)

It explains how well the items perform in separating the better students from the poorer ones (Nurulia, 2010:53). It is the extent to which an item differentiates between high and low-ability test-takers. Discrimination is important because the more discriminating the items are, the most reliable will be the test (Hughes, 1989:226) It is defined as the ability of a test to separate master students and nonmaster students (Arikunto, 2006:211). A master student is a student with higher

30

scores of test, and a non-master student is a student with lower scores on the test given. As same as the term of difficulty level, discrimination has discrimination index. It is an indicator of how well an item discriminates between weak candidates and strong candidates (Hughes, 1989:226). This index is used to measure to the ability of a test in discriminating the upper and lower group of students. Upper students are students who answer with true answer, and lower group are students with false answer. In this index, it has negative point. Different from difficulty index, the negative point in this index shows that the questions present masters students as dull students and non-masters students as smart students. A good question is a question that can be answered by upper group and cannot be answered with true answer by lower group. If a question can be answered truly by both upper and lower group or vice verse cannot be answered truly by both groups, it means that the question is a bad test because the discrimination index shows 0 point. The higher its discrimination index, the better the item discriminates in this way. The theoretical maximum discrimination index is 1. An item that does not discriminates at all (weak and strong test-takers perform equally well on it) has a discrimination index of zero. (Hughes, 1989:226) An item on which high-ability students who did well in the test (master students) and low ability students (non-master students) who did not score equally well would have poor ID because it did not discriminate between the two groups. Conversely, an item that garners correct responses from most the high-ability group and incorrect responses from most of the low ability group has good discrimination power (Brown, 2004:59).

31

8.8.5

Answer of Questions Form (Item Distractors)

In addition to calculating discrimination indices and facility values, it is necessary to analyze the performance of distractors (Hughes, 1989:228). It is defined as the distribution of testee in choosing the optional answer (distracters) in multiple choice questions (Arikunto, 2006:219). This item is as important as the other items consider that in view of nearly 50 years of research that shows that there is a relationship between the distractors students choose and total test score (Nurulia, 2010:57). It can be obtained by calculate the number of testee in choosing the distractors. We can calculate this form by seeing the answer form done by students. The distractors are good if chosen by minimum 5% of the number of test takers. One way to study responses to distractors is with frequency table that tells us the proportion of students who selected a given distractor. Remove or replace distractors selected by few or no students because students find them to be implausible (Nurulia, 2010:57). Distractors that are not chosen by any examinees should be replaced or removed. Distractors that do not work for example are chosen by very few test-takers should be replace by better ones, or the item should be otherwise modified or dropped (Hughes, 1989:228). They are not contributing the tests ability to discriminate the good students from the poor students (Nurulia, 2010:57)

9.
9.1

RESEARCH METHOD
THE RESEARCH HYPOTHESIS

32

Hatch (1982: 3) states that hypothesis is a tentative statement about outcome of the research. In line with what Hatch states, Best says that hypothesis is tentative answer to question (1977: 26). On the general definition it can be said as pre-assumption of the researcher about the product of the study. Furthermore, he states that the statistical hypothesis should be stated in negative or null form. In this research, the hypothesis is that students of both SDIT Al Kamila Semarang and MI Darus Saadah Semarang will get the same score in each of the test pack used by both schools. It is from the assumption that both of the test packs have the same degree in their quantitative and qualitative aspects.

9.2

OBJECT AND SUBJECT OF THE STUDY The object of this study is multiple-choice test items in English subject on

elementary school for Grade V. The test items used is the comparison between test pack made by Ministry of National Education and Ministry for Religion. The comparison of the two test pack is used since in Indonesia there are two ministries that deals with formal education and delivers formal test from elementary until senior high school. In order that, the writer want to compare the qualities of two test pack in form of their statistical and non-statistical features. The two test packs actually consists not only in form of multiple-choice questions, but also brief response and essay. But, in order not to discuss too broad, the writer only focus on analyzing questions in form of multiple-choice items. 33

The two test packs of multiple-choice questions, then, are given to two different classes. The two different classes consist of one from non-Islamic state elementary school in this study is taken from SDIT Al Kamila Semarang, and the other is from Islamic private elementary school, in this case taken from MI. Darus Saadah Semarang. 9.3 POPULATION AND SAMPLE

9.3.1 Population The population of the study is multiple-choice test items that are taken from English final test for Grade V of elementary school and students of Grade V that will be given the test. 9.3.2 Sample From the population above, we get sample of the test. They are multiplechoice English final test of first semester academic year 2011/2012 for Grade V and students of Grade V in SDIT Al Kamila Semarang and MI. Darus Saadah Semarang in the same academics year. 9.4 RESEARCH DESIGN AND INSTRUMENT

9.4.1 Research Design Bachman (2004:3) states that much of data obtained from language assessment is quantitative, and statistic is a set of logical and mathematical procedures for analyzing quantitative data. Thus, the methods used in this study are both quantitative. But, the writer needs not only mathematical measurement to analyze multiple choice tests. She uses also qualitative approach in her study. Quantitative approach is used to measure test items statistical features such as

34

their validity, reliability, difficulty level, and Discrimination Power. To measure those items there are several formulas that will be presents in the next sub chapter. In addition, qualitative approach is used to check whether or not the test items are appropriate with Standard and Basic Competence by which teaching learning process use as fundamental instruction. In qualitative approach, language used in test items will be analyzed to measure whether it is good enough or not. 9.4.2 Instruments/ Unit analysis In this study, instruments that are used are two test packs. It consists of multiple choice, short-answer items, and essay items. The two test packs are taken from English Final Test used by SDIT Al Kamila Semarang and MI Darus Saadah Semarang. Each of the test packs will be given into one class of grade V of SDIT Al Kamila Semarang and MI Darus Saadah. These two test-packs are delivered from different institution. The one used in SDIT Al Kamila is made by MGMP English of Ministry of National Education Semarang. The rest test-pack used in MI Darus Saadah is made by MGMP English of Ministry of Religion Semarang. Thus, students of grade V will get and will answer two different test packs. This method is conducted to see whether there are any differences in those two test packs or not. 1. Test items made by MGMP of English of National Education Ministry of Semarang and MGMP of English of Religion Ministry of Semarang in form of both multiple choice and essay test. 2. Students scores on these formative test

35

3. Cards of item analysis which map the appropriateness of the test item with the material in syllabus, test construction, and effectiveness of language used.

9.5

METHOD OF COLLECTING DATA

9.5.1 Collecting method In collecting method, the writer collect two different test pack in which multiplechoice is taken as the object of the study. The two test packs are taken from two different schools. The first one is taken from SDIT Al Kamila Semarang in which induce in the rule of Ministry of National Education and Culture. Another test packs is taken from MI Darus Saadah that induce on Ministry of Religion since its curriculum deeply cover on religion subject, especially Islamic. The two test pack is taken from English teachers teach in those schools. 9.5.2 Testing method After the test items has been collected on previous method, the tests then are given to the test takers in this case student in grade V of elementary school on class V in two different school. Every class is give both test made by MGMP English of Ministry of National Education and Ministry of Religion of Semarang to get the same result and data of each test pack. To make a clear explanation about the process of testing, the diagram below will explain more about it: X1 Y1 X2

36

X1 Y2 X2

Note: Y1 : Final Semester Test-Pack made by MGMP English of Ministry of National Education Semarang Y2 : Final Semester Test-Pack made by MGMP English of Ministry of National Education Semarang X1 X2 : SDIT Al Kamila Semarang : MI Darus Saadah Semarang

Y1 is a test-pack made by MGMP English of National Education Minister will given to both students of grade V in SDIT Al Kamila and MI Darus Saadah. So do Y2, that is a test-pack made by MGMP English of Religion Minister of Semarang, will be given to both students on both schools.

9.6

METHOD OF ANALYZING DATA

9.6.1 Quantitative Analysis Quantitative analysis deals with measurement of test items on its statistical futures. They are measurement of test items validity, reliability, level of difficulty, discrimination power, and item distractors. 37

9.5.1.1. Validity To know the validity of each number of the test, we can use formula product moment as described below:

(Arikunto, 2006:72, Bachman, 2004:86 Tuckman, 1978: 163, ) Note: rxy N X Y = = = = correlation coefficient between variable X and Y number of test-takers number of test items total score of test items multiplication of items score and total score quadrate of number of test items quadrate of total score of test items

XY = X2 = Y2 =

By significant standard of 5%, if the result of measurement we get rmeasured rtable so, it can be said that the test item is significant or valid. If rmeasured < rtable, then it can be said that the test items is not significant or valid. 9.5.1.2. Reliability Reliability is constancy. A test can be said as reliable if the test is given to any test takers whoever they are and whenever by the same result. To measure reliability we can use formula of K-R. 20 (Kuder Richardson) as follow:

38

2 k S pq r11 = k 1 S2

(Arikunto, 2006: 100, Bachman, 2004:164)

Note: r11 p q k pq S = reliability = subject proportion have true answer = subject proportion have false answer (q=1p) = number of items = multiplication between p and q = standard deviation

Varians formula: Realibility of essay test items can be measured using the Alpha formula below:
2 n 1 1 r11 = 12 n 1

Keterangan: : test of reliability

2 1

: number of varians of each item test : test items varians

: total of test items (Arikunto, 2006:178)

Classification of items reliability are: 0, 00 < r11 0, 20 0, 20 < r11 0, 40 : very low : low

39

0, 40 < r11 0,60 0, 60 < r11 0,70 0, 70 < r11 1

: medium : high : very high

By standard significant of 5%, if measurement process we get r11 rtable so it is said that test instrument is significant or reliable. If r11 < table, so it can be said that test instrument is not significant or not reliable. 9.5.1.3. Level of difficulty Number that shows difficulty or easiness of a test items is known as difficulty index. The formula that can be used to measure it is:
B JS

P=

(Arikunto, 2006:208, Brown, 2004:59)

Note: P B JS = level of difficulty = number of test-takers answering the item correclty = number of test-takers responding to that item

Classification of level of difficulty is: P = 0, 00 0, 00 < P 0, 30 0, 30 < P 0, 70 0, 70 < P 1, 00 P=1 : test items is too difficult : test items is difficult : test items is medium : test items is easy : test items is too easy

40

There is no absolute P value that must be met to determine if an item should be included in the test as is, modified, or thrown out, but appropriate test item will generally have Ps that range between 0.15 and 0.85. (Brown, 2004:59) 9.5.1.4. Discrimination Power Test Discrimination Power is a technique to discriminate smart test-takers (high intelligence) and less smart test takers (low intelligence) (Arikunto, 2006:211). Number shows the degree of test Discrimination Power is known as discrimination index. In this report, to find difference power we can use split half formula. In this case we can separate group of test takers into two groups, smart group by top group and less smart group by bottom group. The formula that can be used to measure discrimination power of multiple choice test items is:
D= BA B B JA JB

(Arikunto, 2006:213)

Note: D = test Discrimination Power

BA = number of top test takers that have true answer BB = number of bottom test takers that have true answer JA JB = total participant of top test-takers = total participant of bottom test takers

Classifications of test Discrimination Power are: D = 0, 00 0, 20: poor Discrimination Power D = 0, 20 0, 40: sufficient Discrimination Power 41

D = 0, 40 0, 70: good Discrimination Power D = 0, 70 1, 00: very good Discrimination Power D = negative, all of test items is not good. Thus, the items that have same negative D score should be skipped.

The formula that can be used to measure discrimination power of essay test items is by using t-test as stated in Arifin (2009:278) below:
t= ( MH ML )

x +x
2 1

2 2

ni ( ni 1)

Note: MH ML = average of high class = averga of low class


2 1

x x
ni ni N

= quadrate total of high class individual deviation = quadrate total of low class individual deviation = total of test-takers high and low class = 27% x N = total of test takers

2 2

Next, tmeasured is compared to ttable by dk = (n1-1) + (n2 -2) with = 5% with the charactersistics : If tmeasured > value ttable , so discrimination power is significant. Practical use for discrimination power indices is to select items from a test bank that includes more items than we need (Brown, 2004:60).

42

9.6.2 Qualitative analysis While qualitative analysis deals with analyze and study on non-statistical features on test items. There are three aspects on this sub chapter that the writer going to study, analysis of instructional materials, analysis of test construction, and analysis of language use. 9.6.2.1 Analysis of Instructional Materials

Analysis of instructional materials deals with appropriateness of test items with instructional materials of teaching and learning process stated in curriculum as Standard and Basic competence. In this sub chapter, the test items will be review whether or not they are match with Standard and Basic Competence especially on elementary school. In order that in this study the writer will presents Standard and Basic Competence of Elementary School for Grade V in order to match the test items with it. 9.6.2.2 Analysis of Test Construction

Test construction analysis deals with the appropriateness of test items construction making by test makers with principles of good multiple-choice questions. In this analysis, the test items will be analyze whether or not they fulfill characteristics of a good test as principles of a good test stated in previous chapter. It means that the analysis covers several aspects, such as the question with optional answer is effective or not. It means that may be the answer is too easy or vice versa too difficult to be found out. Another problem will be fixed in this sub chapter is that the questions is easy to be understood or vice versa.

43

Another case is that, if some questions insert a picture, the picture may easy to be read or not and so on. 9.6.2.3 Analysis of Language Use

Analysis of language use is simply clear that this sub chapter will analyze on language use in constructing the questions and optional answer on test items. It can be assumed that somehow test makers use difficult word or the grammatical features of the questions hardly to be understood toward students in their level of knowledge.

10.

ORGANIZATION OF THE STUDY


In order to make the readers become easier in understanding this study

report, the writer is going to organise this research paper as follow: Chapter I is Introduction. It includes the explanation about the background of the study, reasons for choosing the topic, statements of the problem, objectives of the study, significance of the study, and the outline of the study report. Chapter II presents review of related literature that presents some theoretical source about language test and assessment, measurement, assessment and evaluation, types of assessment, form of assessment, and some theories on how to design and make a good test and analyze it. Chapter III deals with method of investigations. It presents methodology of investigation, including object of the study, population and sample, method and instrument, method of collecting the data, method of analyzing the data, and technique of reporting the result.

44

Chapter IV presents finding and interpretation. It consists of analysis and discussion of the research findings. Chapter V as the end of the discussion includes the conclusions and suggestions.

11. BIBLIOGRAPHY
Arikunto, S. 2006. Dasar-Dasar Evaluasi Pendidikan. Jakarta: Bumi Aksara. Bachman, L.F. 2004. Statistical Analyses for Language Assessment. London: Cambridge University Press. Best, J. W, 1977. Research in Education. New Zealand: Prentice Hall,Inc. Brown, H. Douglas. 2002. Principles Language Learning and teaching (4th Ed). New York: Addison Wesley Longman Inc. Brown, H.D. 2004. Language Assessment Principles and Classroom Prectices. San Francisco : Longman, Inc. BSNP. 2006. Standar Isi dan Standar Kompetensi Lulusan Tingkat Sekolah Menengah Pertama dan Madrasah Tsanawiyah. Jakarta: PT. Binatama Raya Celce-Murcia et.al. 2000. Discourse and Context in Language Teaching. London: Cambridge University Press. Davies, A. (1997). Demands of being professional in language testing. Language Testing, 14(3), 328-39 at Alderson, J.C and Banarjee, J. (Ed) 2008. Gronlund, N. E. 1998. Assessment of Students Achievement. 6th Edition. Boston: Allyn and Bacon in Brown, H.D. (Ed) 2004.

45

Hatch, E. and Farhady, H, 1982. Research Design and Statistics for Applied Linguistics. London: Newbury House Publishers, Inc. Hughes, A. 2005. Testing for Language Teachers. 2nd Ed. London: Cambridge University Press. Jumadi. ___. Pengeretian KTSP dan Pengembangan Silabus dalam KTSP. A journal presented on Training and Implementation of KTSP in SD Wedomartini. Mehrens, W and Lehmen, I.J. 1984. Measurement and Evaluation in Educational and Psychology. New York: Halt Rinehart and Winston. Meizaliana. 2009. Teaching Structure Through Games to The Studentss of Madrasah Aliyah Negeri I Kapahiang Bengkulu. A Thesis. Semarang: Diponegoro University. Nitko, A. J. 1983. Educational Test and Measurement an Introduction. Horcourt: Brace Javanovich, Inc. Nurulia, L. 2011. An Analysis of Multiple-choice English Formatuve Test for Grade VIII of MTsN 1 and MTsN 2 Semarang. A Thesis. Semarang: Semarang State University. Richards. 2001. Curriculum Development in Language Teaching. London: Cambridge University Press. Tuckman, B. W. 1975. Measuring Educational Outcomes Fundamentals of

Testing. New York: Harcourt Brace Javanovich Inc. Valette, R.M. 1967. Modern Language Testing. 2nd Ed. New York: Harcourt Brace Jovanovich Publishers

46

__________. 2011. Understanding Assessment a Guide for Foreign Language Educators accessed at http://www.cal.org/flad/tutorial/

47

Você também pode gostar