EDD 631 College of Staten Island
Staten Island, NY
Statement of the Problem
A big concern of educators and parents today is whether high school students are ready for the rigors of college. In New York, students must pass a set of New York State Regents exams in order to receive their Regents diploma. Their ability to pass the exams is often seen as a sign of college readiness. A CBS poll conducted in 2013 found that nearly 80% of the city’s graduates needed remedial work before they could begin college courses (New York CBS Local, 2013). There have been concerns that even though New York City Regents exam scores have increased over the past few years, students still seem to have academic deficiencies (Crotty, 2013). More specifically, there is suspicion that the Regents exams have gotten easier through the years. For example, David Steiner, education commissioner in 2010, noted that while Regents score spiked dramatically across the state there were no similar score gains on other measurements, such as national exams (Medina, 2010).
The present study was to study this hypothesis that New York’s Regents’ tests might have changed in its complexity level over the years. In New York high school, students must take three units of science in order to get a Regents diploma. One of these units is living environment – a life science – which has an associated Regents exam. Students are also required to take a physical science class such as chemistry, earth science, or physics. Chemistry also has an associated Regents exam. In order to see if these Regents exams have gotten easier, it is important to look at the trends in the complexity or difficulty levels of the questions on the tests. There are many measures of difficulty, among which cognitive level of difficulty is an essential and fundamentally important measure. The purpose of this paper is to look at the cognitive level of questioning on the Regents exams in chemistry and living environment to determine if they have gotten “easier” in this respect during the past ten years.
Bloom’s taxonomy is widely used in education as a framework for assessments and curriculum design (Crowe, Dirks, & Wenderoth, 2008). There are six levels to the taxonomy, starting with the basic knowledge category and moving through various difficulty levels of cognitive processes to the highest (Bloom, 1956). Traditionally, the cognitive processes are recognized to have consisted of the following categories demonstrating a gradual rise in difficulty levels from low to high: knowledge, comprehension, application, analysis, synthesis, and evaluation. Aside from this traditional classification of cognitive processes, a revised Bloom’s taxonomy has also been used in educational research that consolidated the terms so that they become simplified and relatively easy for application (Krathwohl, 2002). In this version, the definition of the terms remains the same, but it adopted both a super-category structure to further consolidate the original categories into two major dimensions and a sub-category structure to further differentiate the original categories. For example, the taxonomy is broken down into a knowledge dimension (factual, conceptual, procedural, and metacognitive) and a cognitive process dimension. The cognitive dimension of the revised taxonomy orders the cognitive processes in the following manner: remember, understand, apply, analyze, evaluate, and create (Anderson, Krathwohl, Airasian, Cruikshank, Mayer, Pintrich, Raths, & Wittrock, 2001). The revised and original versions of the taxonomies are very similar with respect to cognitive processes. The six levels have shown to have a high correspondence with the original taxonomy categories. Both the original and revised taxonomies have been used as a scaffolding tool in instruction. The taxonomy helps teachers formulate questions to help students think at a higher level (Davis, 1969). It also assists teachers in thinking about how to differentiate a lesson plan or an assessment (Noble, 2004).
Tests and exams in all fields and subjects can be examined and revised in light of Bloom’s taxonomy to enhance the cognitive levels of the question items involved (Kastberg, 2003; Van Hoeij, Haarhuis, Wierstra, & Beukelen, 2004). Kastberg used a content-by-process matrix to analyze administered exams and noticed they had failed to assess “higher level processes” for graphing lines and algebraic form of lines on a math test. The matrix consists of learning content on one axis and the cognitive processes on the other axis. This matrix allows educators to visualize cognitive level processes by content area. This distinction between “higher level processes” and “lower level processes” is the focal point of Kastberg’s study and other researchers’ use of Bloom’s taxonomy. Researchers in various countries such as Turkey and the United Kingdom have also used Bloom’s taxonomy to assess the cognitive levels of science exams (Jones, Harland, Reid, Thayer, & Bartlett, 2009; Kocakaya & Gonen, 2010). Kocakaya and Gonen (2010), for example, compared the cognitive levels of physics exam questions given in high school classrooms and college entrance exams in physics. They focused on the differentiations between lower order questions (knowledge, comprehension, application) from higher order questions (those that utilize analysis, synthesis, and evaluation). In their analysis, they compared the relative predominance of question types and concluded that high school physics exams had a predominance of low level application questions while the college entrance exam had a predominance of high level analysis questions, making a valuable case for the need of the use of high level cognitive questions in high school teaching.
Beyond comparing two different tests, the taxonomy is useful in the evaluation of trends in standardized tests. Researchers often use the taxonomy to analyze the tests to infer the possible changes occurring in curriculum (Zawicki, Falconer, Henry, & Fischer, 2003). Zawicki et al. (2003) used Bloom’s taxonomy for test analysis and found that the conceptual level and overall exam difficulty has increased significantly since the implementation of the physics core curriculum in New York. Both theoretically and practically, alignment between exams and curricula is important but not always possible to achieve. Without the alignment, exams can’t capture the instructional effects. With the alignment, there is no guarantee that exams and curricula will be rigorous enough for promoting learning. But by looking at the trends in the tests, researchers should at least be able to infer the curricular implications. Researchers have noticed this specificity of the test trends in helping understand the curricula and have studied to some extent the trend in designing standardized test (e.g., Liu & Fulmer, 2008). In fact, Liu and Fulmer (2008) noted that in designing standardized tests, the cognitive difficulty level of an exam has always been a crucial issue. Liu and Fulmer (2008) have looked into the cognitive difficulty levels over time and found the discrepancies in topic and cognitive levels between the exams and the prescribed curricula in physics and chemistry. They found that in the analyzed chemistry tests low level category “remember” is consistently overrepresented, while all other cognitive levels are under-represented. In physics tests, low level category understand is underrepresented while remember and apply are consistently over-represented. These findings point towards the needs to further study the trends in other areas to both inform us about what is required in tests and in turn infer what school instruction is like.
Beyond physics and chemistry, researchers have also examined the difficulty level of living environment Regents exams through Bloom’s taxonomy. Day and Matthews (2008) tried to assess whether living environment Regents exams adequately involved higher order thinking questions in inquiry related assessments. They tried to determine which questions on the exams were designed to evaluate inquiry and what level of thinking was asked on these questions. The researchers examined regents from June 2004 to August 2006. Their studies revealed that the inquiry questions were predominantly higher order questions in the application, analysis, synthesis, and evaluation categories (63%) rather than lower order questions (37%) in the knowledge and comprehension categories, indicating that the test makers may associate inquiry with higher order thinking. Methodologically, unlike Kocakaya and Gonen (2010), they grouped knowledge and comprehension into the lower order thinking group, and they grouped application, analysis, synthesis, and evaluation into the higher order thinking group. It appears that researchers have their own interpretation of what is “lower order thinking” and “higher order thinking.” Even though the interpretation differs among various researchers, the cognitive categories (such as knowledge, comprehension, application, analysis, synthesis, and evaluation) remain the same. Both Kocakaya and Gonen (2010) and Day and Matthews (2008) divided cognitive categories into higher and lower order groups without looking at specific percentages of various cognitive categories.
However, it should be noted that Day and Matthews (2008) only analyzed questions related to inquiry. They did not perform an overall assessment of all the questions on the living environment regents. Even the analyses of chemistry and living environment Regents exams have only focused on a select number of Regents exams. Liu and Fulmer (2008) analyzed certain chemistry and physics regents without looking at trends in questioning through the years. A systematic analysis of the trends in questioning in and between chemistry and living environment regents is lacking. In addition, it is unclear whether students at a higher grade level (10th or 11th compared to 9th) are asked higher order questions. An analysis of the trends in questioning throughout the years has also been lacking. There has been speculation in the media that the tests have gotten easier, but there hasn’t been a definitive quantitative analysis of the cognitive level of the test questions. This study attempts to address these issues which haven’t been addressed in the literature so far.
The research questions include the following:
1) What is the trend in the level of questioning throughout the years the living environment and chemistry Regents have been available for the past ten years?
2) What cognitive level of questioning is stressed in the living environment Regents and chemistry Regents?
3) Is there a shift in the level of questioning between living environment Regents and chemistry Regents?
New York chemistry and living environment Regents exams from 2003 to 2013 were analyzed. New York chemistry and living environment Regents exams are usually given three times a year in January, June, and August. There is an exception: since 2011 the chemistry Regents has only been administered twice a year. Living environment Regents exams are usually given to 9th graders and chemistry Regents exams are usually given to 10th or 11th graders. All chemistry and living environment Regents exams between 2003 and 2013 will be analyzed. More specifically, the unit of analysis is the individual questions on the Regents exams.
I used a directed content analysis approach (Krippendorff, 2004) using the Bloom’s taxonomy as the coding scheme for the content analysis. Questions will be read and classified into the six cognitive categories (knowledge, comprehension, application, analysis, synthesis evaluation). All questions were coded according to the scheme once. The cognitive categories are very specific. Questions were coded as “knowledge” if the problem primarily involved the retrieval of relevant knowledge from memory. Types of knowledge questions included recognition of science concepts and recalling of learned concepts. For example, questions such as “what is a hypotonic solution?” constitutes as a knowledge question because it asks the student to recall information. Questions were coded as “comprehension” when it involved determining the meaning of the concepts. Elements of comprehension include interpretation, exemplification, summarizing, inferring, classifying, comparing, describing, and explaining. For example, the question “compare a cell in a hypotonic solution versus a hypertonic solution” is a comprehension question because it requires understanding of the concepts and not just recall of facts. Questions were coded as “application” when the question asks students to use information to solve new problems or respond to novel situations. The question “predict how a cell might respond to an increase in extracellular sodium ions” asks students to apply concepts of tonicity to a novel situation involving sodium ion concentration. This would be considered an application question. Questions will be coded as “analysis” if it involved inference and requires an understanding of how parts work together or how parts relate to the whole. The question “interpret the results of the graph about water depth and fish species” would require inference and understanding of how the independent and dependent variables relate to each other. This question would be coded as “analysis.” Questions were coded as “synthesis” if it involves designing solutions, developing new models, or integrating ideas or parts. For example, the question “design an experiment to test how plant growth is affected by air concentration of nitrogen” is considered a synthesis question. The students are asked to design a novel solution (the experiment) to a new situation based on previous understanding. Questions were coded as “evaluation” if it involves making a judgment or an assessment about concepts or ideas. The question “critique an experiment on glucose’s effect on cell osmosis” would be considered an evaluation question because it involves determining the merit of the experiment.
To ensure consistency in coding, the present study used an intra-coder repeated coding procedure. The steps were as follows. After the first round of coding, a period of time – two week – passed when a second round of coding for approximately 20% of the original set of questions proceed. In total, 33 living environment Regents were coded and analyzed. Among these tests 7 were randomly chosen to be recoded during the second round of coding. 30 chemistry Regents were coded and analyzed. Among these tests 7 were randomly chosen to be recoded as well. Any possible differences in coding results were resolved through careful scrutiny and re-coding of the questions items. An analysis of the fraction of different cognitive level questions was done for each test as a whole (see Table 1). Trends in the percentages from each stratum (chemistry, living environment) were analyzed through time (2003 to 2013) and graphed. A comparative analysis of the trend in questioning between living environment and chemistry regents were done through quantitative percentage analysis. The average Bloom’s taxonomy level of items was calculated and the trend seen through the years was graphed.
Questions are categorized numerically into distinct cognitive categories. “1” represents knowledge level questions, “2” represents comprehension questions, “3” is an application type question, “4” stands for an analysis question, “5” is for a synthesis question, and “6” is for questions involving evaluation.
The living environment average Bloom’s taxonomy level of analysis fluctuates between 1.5 and 2.25 between 2002 and 2013 (see Chart 1), indicating that the cognitive level is below the mean of the taxonomy. The average level prior to 2005 was 1.8. After 2005 there was an increase in the average level of thinking. By 2006 the average level of thinking was 0.5 points higher than pre-2004 levels. Afterwards, it fluctuated slightly between 2006 and 2013 without going back to the levels seen prior to 2005. The mean level throughout the years was 1.8732 with a standard deviation of 0.1489. The chemistry Regents average level of thinking had micro-fluctuations between the 1.7 and 2.1 range, staying relatively stable throughout the ten year period (see Chart 1). The mean level throughout the years was 1.9105 with a standard deviation of 0.0805.
Throughout 2003-2013 period there is a fairly consistent fraction of knowledge and comprehension questions on the chemistry Regents which make up more than 70% of the test questions. There were very few application type questions prior to 2005. After 2005 there was a hike in the number of application type questions. The increase seen persisted and continued into the 2013 test (see Chart 3).
All living environment Regents between 2003 and 2013 had a predominance of knowledge and comprehension type questions which made up more than 75% of the test questions. In 2006 there was a significant increase in the number of analysis type questions. The increase persisted and grew slightly between 2006 and 2013 (see Chart 2).
An independent unpaired t-test was carried out comparing chemistry with living environment Regents, and the results indicated no significant difference [(t(21)=.7302, p>.05].
The two codings of the knowledge type questions is strongly correlated (r =.949), indicating a highly reliable coding and recoding processes [t(12)=10.427, p<.0001]. The two codings for the other types of questions are also strongly correlated, showing a consistency in coding reliability between categories (see Table 4).
I would discuss here the results of the data analysis to respond to each of the research questions raised. The first two questions ask what are the trends in questioning and cognitive level stressed in the living environment and chemistry Regents. Apparently, there is a consistent trend of emphasizing comprehension and knowledge cognitive levels on the living environment and chemistry regents. In fact, more than 70% of the test questions can be categorized into these two categories. This indicates that there is a stable pattern of question types in both Regents exams; the exams stay fairly consistent throughout the years. According to the data, there is a slight increase in the proportion of analysis questions in the living environment regents after 2005, while chemistry Regents seemed to have seen a slight tick-up in the proportion of application questions after 2004. The data indicated that the increase in both tests has been maintained for the subsequent years. This shows that there a slight shift in emphasis towards higher level questioning even though the majority of questions on both exams still focus on knowledge and comprehension.
The third research question asks if there is a shift in questioning between the living environment and chemistry regents. There is a slight shift in the type of higher order questions asked between the living environment and chemistry Regents. At the higher cognitive levels, analysis and evaluation is stressed more than synthesis and application on the living environment regents. In the chemistry Regents, application and analysis are stressed more than synthesis and evaluation. Both tests had very few synthesis questions with some tests excluding these types of questions altogether. Higher order questions on the living environment Regents tend to have questions that probe the “whys” of biology. The majority of the evaluation and analysis type questions on the living environment regents tend to be constructed responses rather than multiple choice. For example, students were asked to describe the factors that affect an experimental setup. On the other hand, higher order questions on the chemistry Regents tend to focus on applying conceptual understanding to different situations. For example, students were asked to apply their understanding of gas laws to a set of new conditions. This slight shift in questioning reflects the emphasis placed on the different scientific disciplines.
Implications of this study include lack of stress on higher order thinking and the consequences of this negligence. From a survey of the question types we can conclude that a lot of “what” is asked and not a lot of “why” is asked. Higher level thinking is not stressed in the regents, especially thinking at the level of synthesis and evaluation. This holds huge consequences in terms of the implicit message we are sending teachers, students, and administrators. It says that content knowledge is more important than evaluating the scientific concepts behind the content knowledge. In terms of the standards, the Regents place more focus on the knowledge component of the core curriculum standards than process skills outlined. Teachers might have the perception that as long as students can answer the multiple choice questions (which are mostly composed with comprehension and knowledge type questions), then it is ok to skimp on the long responses because it is sufficient to pass the regents exams. However, high cognitive level exams encourage deeper processing which prepares students for performances at high levels (Jensen, McDaniel, Woodard & Kummer, 2014). Focusing on low level questions does not prepare students for college where high level performance is required. Students may see science as a body of knowledge than as an evolving, dynamic body of scientific thinking and knowledge. Synthesis in the form of designing a scientific inquiry is such a huge component to science in real life, yet this component of thinking in the regents is largely underrepresented or absent. Scientific inquiry helps students think like scientists, which is central to the process of science (Scott, Tomasek, Matthews, 2010).
However, there are a few important limitations to the study. Only a decade worth of tests between 2003 and 2013 were studied; more tests could have allowed a better overall perspective of the trend in question types throughout the history of the Regents exams. In addition, only two rounds of coding were done and the second coding only involved 20% of the original questions. A more thorough recoding or a third round of recoding could be done to increase the reliability of the data. Most importantly, the coding of the questions was done by one person from one perspective because of restraints in resources. This may introduce the issue of bias into the study. Coding from multiple perspectives could have enhanced this study tremendously.
In terms of the future direction of this research, the next step is to understand how teachers are affected by the types of questions on the Regents exam. It is important to figure out how test content is reflected in teaching practices. Instruction may be narrowly focused on drilling students on questions that appear on previous regents exam, which may mean students may know how to answer test specific questions without gaining mastery of the broader scientific understanding (Hout & Elliot, 2011). Future studies can look at how teachers prepare for the regents exams and what type of questions teachers focus on while preparing for the regents exams.
In summary, both Regents exams place a huge emphasis on comprehension and knowledge, thereby setting the cognitive expectations below the mean of Bloom’s taxonomy. The pattern of emphasis is fairly stable despite a slight uptick in higher level questions in 2004 (for chemistry) and 2005 (for living environment). There has been no drastic change in the setup of the exam for the period of time analyzed. This has great implications for teachers who utilize regents as a guide for teaching living environment and chemistry. Although there are important limitations to this study, there is significant data to show that the pattern of questioning found is of no coincidence. As educators, we must ask ourselves if what we expect in the regents exam will affect how we ultimately teach these dynamic disciplines. REFERENCE LIST
Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., Rath, J., & Wittrocks, M. C. (2001). A Taxonomy for Learning, Teaching, and Assessing — A Revision of Bloom’s Taxonomy of Educational Objectives. New York, NY: Longman.
Bloom, B. S. (1956). Taxonomy of Educational Objectives, Handbook I: The Cognitive Domain. New York: David McKay Co Inc.
Crotty, J. M. (2013, May 8). Are New York City Students Getting Smarter Or Are Regents Exams Getting Easier? Forbes Magazine. Retrieved from http://www.forbes.com.
Crowe, A., Dirks, C., Wenderoth, M. P. (2008) Biology in Bloom: Implementing Bloom’s Taxonomy to Enhance Student Learning in Biology. Life Sciences Education, 7(4), 368-381.
Davis, O. L. (1969). Studying the Cognitive Emphases of Teachers’ Classroom Questions. Educational Leadership, 26(7), 1-12.
Day. H. L. & Matthews, D. M. (2008). Do Large-Scale Exams Adequately Assess Inquiry? An Evaluation of the Alignment of the Inquiry Behaviors in New York State’s Living Environment Regents Examination to the NYS Inquiry Standard. The American Biology Teacher, 70(6), 336-341.
Hout, M. & Elliot, S. W. (2011). Incentives and Test-Based Accountability in Education. Washington, DC: The National Academic Press.
Jensen, J. L., McDaniel, M. A., Woodard, S. M., Kummer, T. A. (2014). Teaching to the test…or testing to teach: Exams requiring higher order thinking skills encourage greater conceptual understanding. Educational Psychology Review.
Kastberg, S. (2003) Using Bloom’s Taxonomy as a Framework for Classroom Assessment. Mathematics Teacher, 96(6), 402-405.
Kocakaya, S., Gonen, S. (2010). Analysis of Turkish High-School Physics-Examination Questions According to Bloom’s Taxonomy. Asia-Pacific Forum on Science Learning and Teaching, 11(1), 1-15.
Krathwohl, D. R. (2002). A Revision of Bloom’s Taxonomy: An Overview. Theory Into Practice, 41(4), 212-218.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology. Thousand Oaks, CA: Sage Publications.
Liu, X., Fulmer, G. (2008). Alignment Between the Science Curriculum and Assessment in Selected NY State Regents Exams. Journal of Science Education and Technology, 17(4), 373-383.
Medina, J. (2010, July 19). State’s Exams Became Easier to Pass, Education Officials Say. The New York Times. Retrieved from http://www.nytimes.com
Noble, T. (2004). Integrating the Revised Bloom’s Taxonomy with Multiple Intelligences: A Planning Tool for Curriculum Differentiation. Teachers College Record, 106(1), 193-211.
Officials: Most NYC High School Grads Need Remedial Help Before Entering CUNY Community Colleges. (2013, March 7). Retrieved from http://newyork.cbslocal.com.
Scott, C., Tomasek, T., Matthews, C.E. (2010). Thinking Like A Scientist. Science & Children, 48(1), 38-42.
Van Hoeij, M. J., Haarhuis, J. C., Wierstra, R. F., Van Beukelen, P. (2004). Developing a Classification Tool Based on Bloom’s Taxonomy to Assess the Cognitive Level of Short Essay Questions. Veterinary Medical Education, 31(3), 261-270.
Zawicki, J., Jabot, M., Falconer, K, MacIsaac, D. Henry, D., Fischer, R. (2003).A preliminary analysis of the June 2003 New York State Regents Examination in Physics. Perspectives on Science Education (April). Penn Yan, NY: New York State Science Education Leadership Association.
Fraction of question types at different levels and mean level of questions for living environment regents
*Tests are labeled by type (L=living environment), month, and year given.
Fraction of question types at different levels and mean level of questions for chemistry regents
*Tests are labeled by type (L=living environment), month, and year given.
Recoding of random regents tests: fraction of question types at different for chemistry and living environment regents
*Tests are labeled by type (L=living environment), month, and year given.
Analysis of Regents Using Bloom’s Taxonomy
Intercoding reliability data