IS THE TEST SENSIBLE? DEVELOPING A CRITICAL READING TEST FOR INDONESIAN EFL LEARNERS

: This study aims at developing a critical reading test for Indonesian EFL students. This (R & D) project is intended to assess students' progress and is supposed to offer accurate and trustworthy results. The subjects were twenty students from the English Language and Literature Department at Brawijaya University who took part in the try out stage. The test items' difficulty indices ranged from 0.26 to 0.89. The reliability coefficient for the KR-20 was determined to be 0.95, indicating that this multiple-choice critical reading comprehension test is valid and reliable. Despite of the fact that the Critical Reading test is declared valid and reliable, the students’ score taking the Critical Reading test during the try out is less satisfactory. This might be caused by limited number of students taking the try out or problems dealing with teaching and learning process of critical reading in the classroom causing students’ inability to do the test well. Future researchers are suggested to address this issue by either involving more participants during the try out to get the reasons underlying students lack of ability in doing critical reading test.


INTRODUCTION
Reading is a vehicle for the development of critical thinking skills. Reading is an intellectual activity that necessitates thought (Nuttal, 2005& Grabe, 2009). Reading exercises allow readers to deepen their understanding and respond to the subject they are reading. Readers participate in a sequence of tasks to interpret writing, comprehend the context of the reading, examine the author's goals, evaluate the quality and value of the material, and make content decisions. The succession of reading assignments involves a mental process.
Reading is classified into different levels based on the level of thinking processes employed. Burns, Roe, and Ross (1996) and Nurhadi (2009) create a four-level reading taxonomy namely (1) literal reading, (2) interpretive reading, (3) critical reading, and (4) creative reading. Literal reading involves the ability to get stated information directly in reading. Interpretive reading involves ability obtains the information implied from the interline statement. Critical reading involves the ability to gain knowledge through critical thinking techniques. The ability to envision and be creative in order to generate ideas is required for creative reading.
Critical reading is one of the reading categories that is very much needed in today's times since it exercises learners' ability to think critically. Critical thinking is a crucial skill to have in an increasingly competitive world. The main competency necessary to perform an effective and successful role in modern information technology is critical thinking (Morocco, et al., 2008;Trilling & Fadel, 2009; The Partnership for 21st Century Skills, 2011). To meet these needs, critical thinking abilities should ideally become the main foundation of learning activities at all levels of education, including higher education (university). Critical thinking abilities are vital in determining the success of education and professional growth in the workplace for students. Lang and Evans (2006) explain that Critical thinking is the process of properly employing thinking abilities to assist someone make, analyze, and make decisions about what they believe or do. Thus, critical thinking is an activity that assesses the veracity of information. Reading skills can be used to teach critical thinking. A study conducted by Zin & Eng (2014) yields that critical reading is a form of reading skill associated to critical thinking. It is due to the way critical reading requires the readers to go beyond the literal meaning by reading it analytically and then give further judgment on the value of the passage (Douglas, 2000). Pirozzi (2003) and Skiddel (2001) propose different sorts of exercises for developing critical reading skills, thus making inference, distinguishing between facts and opinion; recognizing purpose; recognizing tone; analyzing advertisements; analyzing newspapers; problem solving; and expressing personal viewpoints.
In light of the importance of having critical reading skills for EFL students, several researchers both from within and outside Indonesia have attempted to raise the issue. The research findings reveal significant roles of critical reading skill on students' critical thinking and their academic success (Par, 2018;Sultan, et al., 2018;Alqatanani, 2017;Karabay, 2015;Khodary & AbdAllah, 2014;Zin & Eng, 2014;Tsai, et al., 2013;Camp & Camp, 2013;Ellozy & Mostafa, 2010;Tomasek, 2009). So far the researchers have concerned, there is no study concerning the development of Critical Reading Test.
Consequently, this study aimed to fill a void created by previous research by developing the optimal type of critical reading exam. This study focuses on the development of a critical reading exam for English department students at the university level. This test design consists of multiple-choice questions. Widespread usage of multiple-choice exams in testing and evaluation. Multiple-choice tests have been proven to have a crucial role in measuring reading comprehension skills, according to research (Yildiz & Centikaya, 2017;Ozdemir & Akyol, 2019).

LITERATURE REVIEW
This section discusses two fundamental themes, namely critical reading and critical reading assessment. Critical reading involves not just comprehending a book, but also judging if the text's material is true or false and evaluating the text's knowledge and viewpoints (Sultan et al., 2018). Critical readers must thus rely on their expertise of literal and interpretative reading. This indicates that literal and interpretative reading are prerequisites for becoming a critical reader (Sudarwati, 2013). Critical reading is a form of reading that equips the reader with the ability to examine, synthesize, and evaluate what is read (Hudson, 2010). The significance of developing critical reading skills is determined by two factors: (1) the reading factor and (2) the reader factor. From the reading factor, the development of information and communication technology causes the availability of reading material to be more abundant. Various information can be obtained easily, both print and electronic. Every time, people are treated to a variety of information through newspapers, magazines and social media. However, the information presented does not necessarily have accuracy and truth value. The available reading material is not necessarily in accordance with the needs of the reader. Instead, some reading material is deliberately presented to fulfill certain interests, such as influencing public perception, gaining sympathy, or instilling ideology.
One of the parameters to measure students' understanding on Critical reading in by testing them. Tests tell the instructor about the effectiveness of his or her instruction. The basic aim of the assessment is to determine the quality or importance of the instructional curriculum or the skill of the pupils. A Critical reading test can be used to achieve this. So far the researchers have concerned, there is no research studies specifically focuses on developing test for Critical reading. When utilizing a critical reading test, a researcher should provide certain guideline to help him/her in developing the test items and the guidance is called as a blue print. A test blueprint is a guide for test construction and use. Since a long time, the researchers have used the standard way of evaluation, which is traditional assessment. It is a straightforward strategy that often use a pen-andpaper or computer-based examination method with a same pattern of questions, such as multiple-choice, true-false, or matching items. This study employs a multiple-choice style for its examinations. Multiplechoice tests are chosen because they are a common sort of test that is relatively accurate and valid so that a large number of people can use them quickly. They are also inexpensive and reusable, so they occupy a special place in the selection of test questions (Yildiz & Centikaya, 2017;& Ozdemir & Akyol, 2019).

METHOD
The primary goal of this research is to create critical reading tests for Indonesian EFL learners as a product. This is a research and development study (R&D). R&D is a product development model in which results are employed to create new products (Gall et al., 2003). During the tryout procedure, twenty students were selected as student participants. The participants of this study is third semester English Language Literature Department students at Brawijaya University in the Academic Year 2020/2021. Two doctoral students in ELT and two lecturers teaching critical Reading courses were participated in the validation process in the peer debriefing stage.
This research and development study follows a certain procedure. Initially, a preliminary research was undertaken to establish the importance of creating a Critical Reading exam. The process was then continued by developing a critical reading test that adheres to several procedures, such as a) developing the test based on the course description, b) identifying the goal based on the course description and formulating instructional objectives/indicators/course learning outcomes, c) developing the test item, d) checking and rechecking the test item, and e) validating the test (peer debriefing). The procedure was then continued by administering a practice test, examining the results, and evaluating the level of difficulty and reliability of the test items (multiple choice). The procedure is represented in the diagram that follows.  Brown (2004) defines validity as the degree to which a data collection procedure measures what it wants to measure. The validity and reliability of the test were used to examine the data from the reading test. The researchers utilized item validity to establish the validity of the test. Using Point-Biserial Correlation, the researchers investigated the validity of each item on the reading test as part of the item validity test. Point-Biserial Correlation measures the degree of students' ability to perform on a test, particularly a reading exam, as well as the test's validity. In addition to data validity, the researchers utilized item difficulty/ease and item discrimination. The reliability of the test should next be assessed. The test should be utilized to establish its dependability. The test should get equivalent findings  (Brown, 2004). The researchers use the Kuder Richardson-20 algorithm to evaluate the trustworthiness of key readings.

FINDINGS
The findings elaborate what the researchers have done throughout the research systematically based on the aforementioned procedures. Here is the finding of this R & D research. The findings concern with the description of test development procedures which were done chronologically.
First of all, the researchers conducted a preliminary study. The result of preliminary study conducted in the previous semester was used as the basis on the urgency of developing an ideal type Critical Reading test. The third semester students were asked to respond to the evaluation on the Critical Reading course that they took last semester via google form. The Google form is concerned with obtaining an overall view of the implementation of the CR course. According to the results of the Google form that represented students' responses, one of the obstacles students confront is difficulties in taking the final test, which results in an unsatisfactory score. The pie chart below depicts students' reactions to the Critical Reading course they took last semester.  f. Checking the outcome of testing.
g. Examining the test item's degree of difficulty and dependability.
The purpose of this test is to assess the students' ability to read and comprehend popular articles (about 2,500 words in length) using a variety of reading methods. In the meantime, the instructional objectives are as follows: a) differentiate between facts and opinions in reading texts; b) draw inferences from reading texts; c) generalize from reading texts; d) identify the writer's tone in reading texts; e) identify the writer's purpose in reading texts; and f) identify the writer's bias in reading texts.
Based on the goal and instructional objectives the criteria/indicators of critical reading can be formulated as shown in the blue print in Table 1.
Based on the blue print, the researchers created the reading test as the product. The test is an objective test with 30 items and four options. The reading test scores techniques are 1 for correct answer and 0 for incorrect answer. Each accurate answer is worth one point, while each incorrect response is worth zero points. The texts of the reading test are adapted from critical reading books and online sources. The next stage is to validate the product. The product created in this step was subsequently validated through peer debriefing. Peer debriefing is also known as analytic triangulation, which is the procedure through which a researcher contacts a disinterested peer namely a peer who is not a participant in the research endeavor to assist in probing the researcher's thoughts on all or portions of the research process. Two doctoral students in ELT who happened to have the same interest on producing reading tests, as well as two critical reading lecturers, were among the peers involved. The suggestions concern on the number of test choice which should be suited with time allotment. Another suggestion concerns the choice of response, which should be changed in terms of its difficulty level, and the role of distractors, which should be well-functioning to distract test takers while they are taking the test. Another idea deals with the response choices, which were formerly three into four. The detailed feedback by peers can be seen on Table 2. The product was revised after receiving feedback from peers during a peer debriefing session. Some revisions were made based on what was addressed at the peer debriefing session. To provide the best type of questions, several test items were changed from easy to moderate category. The study also gave the best form of distractions to help students improve their critical thinking skills. The effective utilization of distractions will assist students in maximizing their critical thinking abilities. The researchers also changed the test items, which were previously three to four choices, to provide additional variety and distractions. Time allotment for test takers to do the test was reduced from 60 minutes into 45 minutes. The time was made up in such a way so that it minimizes students' dishonesty as this is conducted online which means having less supervision.
The next stage was to test the product. In this step, the researchers ran a test to determine the test's validity and reliability. The researchers conducted a small group try-out for twenty fourth semester students from the English Language and Literature Department at Brawijaya University. Due to the epidemic, the trial was held online. The google form link is https://extendedforms.io/form/9393bd79-215b-4454-aa8b-0529182e7f87/login. The researchers next examined the test's validity using the Point-Biserial Correlation formula and its reliability using the Kuder Richardson-20 formula. The outcome of this stage is then used as the basis for the next stage.

Validity Test of Test Item
This study used biserial correlation statistics (r Pbi) with the criterion that if the value of r Pbi is greater than r table (0.48), the test item is deemed legitimate. The item validity test results are provided in the Table 3 below.  Table 3, the biserial correlation value on all items is more than 0.48. These results conclude that all items have met the validity requirements.

Reliability Test of Test Item
The reliability test in this study used the KR-20 statistic. The KR-20 test criterion is if the KR-20 statistic value is more than 20, it can be stated that the item is reliable. The results of the reliability test are presented in the following Table 4.

The Difficulty Level of Test Item
The formula for assessing the difficulty level of the questions is derived from Arikunto (1999), which may be found below.

=
There are three variables to consider: the difficulty index (P), the number of right answers (B), and the total number of test takers (J k). The following table shows how challenging each level is on a scale from 1 to 100: Table 5. Level of difficulty classification P Classification 0,00 -0,29 The test is difficult 0,30 -0,69 The test is moderate 0,70 -1,00 The test is easy Ningrum, A.S.B. & Sudarwati, E. (2022). Is the test sensible? Developing a critical reading test for Indonesian EFL learners.

221
The results of the calculation of the difficulty index are presented in the following Table 6. According to Table 6, the majority of the items have a moderate level of difficulty. There are 9 questions with an easy difficulty level, notably items 2, 4, 5, 6, 7, 8, 9, 10, and 15.
Meanwhile, there is one question with a difficult level namely question 30. The remaining items fall within the moderate category.

DISCUSSION
The critical reading test for Indonesian EFL students was developed in this study. The multiple-choice critical reading test devised was shown to be reliable and valid by the study's findings. Distractors play an essential role in items that test for misunderstandings and careful thought, both in terms of consistency and difficulty. Despite the fact that the test is supposed to be accurate and genuine, several students received lower scores than expected during the tryout session. As shown in this study, the stereotype that Asian students are non-critical readers and thinkers is confirmed. In countries like Malaysia, where studies show that EFL students have inadequate critical reading abilities (Zin & Eng, 2014), in Jordan (Alqatanani, 2017), in Indonesia (Par, 2018) and in Saudi Arabia (Khodary & AbdAllah, 2014).
In this study, the students' poor critical reading exam scores raise some questions. The first issue is that just a small number of pupils were allowed to participate in the tryouts. Twenty students may not be enough to adequately describe the topic under investigation. To gain a more accurate picture of how well kids can perform on the test, a larger field trial with more participants is required. Increasing the number of test takers increases the likelihood of obtaining a broader range of test outcomes (Karim and Haq, 2014;Ozdemir and Akyol, 2019;Azmi, 2020).
The second option focuses on how critical reading is taught in schools. In spite of the fact that a legitimate and accurate exam is available, students' poor scores may be due to issues in the classroom teaching and learning process of critical reading. When students are subjected to a critical reading test, they may have difficulties receiving the information they learn from the teacher, which can lead to a lack of understanding. Some research has been done to help students improve their critical reading skills, which is a beneficial move.
In a decade several studies are conducted to investigate the effectiveness of some strategies in developing the students' critical reading ability. For example, the implementation of multiple intelligent program on increasing students' critical reading skill by Alqatanani (2017). The result of the study reveals that there is a significant effect of the multiple intelligent base program to the students' critical reading skill. Another example is the implementation of a WebQuest Model conducted by Khodary & AbdAllah (2014). Their study yields that a Web Quest model is effective in developing critical reading achievement of Arabian EFL learners. A study conducted by Sultan et al. (2017) find that the critical literacy approach is effective in improving students' critical reading skills. Other strategies successfully implementing to increase students' critical reading are utilizing e-maps (Ellozy & Mustafa, 2010), employing science news text ( Tsai et al., 2013), using content reading assignment (Camp & Camp, 2013), and utilizing science reading prompts (Tomasek, 2009). These studies advocate that EFL teachers should design appropriate teaching and learning activities in order todevelop their students' critical reading skills.
The term as a low critical reader is not solely seen from one variabel contribution. A study conducted by Par (2018) found that cognitive style of the students give a significant contribution to their critical reading ability. Par's study investigating the EFL students' critical reading skill across cognitive style. He explores the difference between the field independent (FI) and field independent (FD) students' critical reading skills. The result shows that there is a significantly different between FI and FD in their critical reading ability.
In a nut shell, the students characterized as non-critical readers need to be further investigated. Other factors contributing such as students' different settings, culture, language proficiency, learning style, and learning motivation probably will lead to diffrent result. This is because different students with different culture, language background, learning style, and motivation might have different ways in tackling the texts when involving in reading process activities in which then trigger different critical reading ablity. This is in line with the concept of learning proposed by Dunkin and Biddle (1974) that learning-product variables, in this case critical reading skill, is in accordance to other prior variables namely presage variables and context variables. Presage variables are such as things related to teacher attributes; meanwhile context variables are such as things related to students attributes.

CONCLUSION
The following are the findings that are derived from the research. This study has already executed a systematic series of test development procedures consisting of sequential stages beginning with identifying the goal based on the course description, formulating the instructional objectives/indicators/course learning outcomes, developing the test item, checking and rechecking the test item, conducting a try out test, and checking the result, as well as the difficulty level of the test items and the reliability level in developing test.
According to the research findings, the biserial correlation value for each item is more than 0.48, suggesting that each item satisfies the validity criteria. Additionally, the examination found that the KR-20 value is 0.95, which exceeds 0.90. This indicates that all item inquiries match the reliability criteria. Therefore, one may claim that a test of critical reading is suitable for use in a critical reading course. In terms of test complexity, the majority of items were found to be at a moderate degree of difficulty. There are nine (nine) questions with an Easy degree of difficulty, including items 2, 4, 5, 6, 7, 8, 9, 10, and 15. Meanwhile, there is one question with a difficult level, question 30. The remaining items fall into the moderate category. This signifies that the test is suitable for use. Thus, the critical reading test does not need to be updated because it meets the validity and reliability standards.
There are still unsatisfactory results for students' scores, despite this exam being valid and trustworthy. This might be due to the restricted number of participants participated in the trial session. This critical reading exam has to be tested with a greater number of students so that researchers may learn more about their students' abilities in this critical reading test. This might be utilized to alleviate the study's shortcoming.
Another reason why the result concerning students' score is low might be there are problems during the teaching and learning process of critical reading in which causes students' inability to grasp the critical reading texts well. Accordingly, it is suggested that the English teachers should provide appropriate teaching strategies in the class activities which encourage the students to develop their critical reading skills in reading.