Thinking Skills and Creativity

j ourna l h o mepa ge: h t tp : / /www.e lsev ier .com/ locate / tsc

mproving critical thinking skills and metacognitive onitoring through direct infusion

. Alan Bensley ∗, Rachel A. Spero1

rostburg State University, Department of Psychology, 101 Braddock Road, Frostburg, MD 21532, USA

a r t i c l e i n f o

rticle history: eceived 28 June 2013 eceived in revised form 1 December 2013 ccepted 7 February 2014 vailable online 19 February 2014

eywords: ritical thinking rgument analysis ritical reading etacognitive monitoring irect infusion

a b s t r a c t

To test the effectiveness of the direct infusion, instructional approach on the acquisition of argument analysis, critical reading, and metacognitive monitoring skills, we compared three similar groups of college students receiving different instruction of the same course material. The group receiving direct infusion of critical thinking (CT) was explicitly taught application of rules for analyzing psychological arguments and critical reading infused into their course work and given practice with assessments and feedback to guide skill acquisition. Compared to a second group receiving direct infusion of principles of memory improvement and a third focusing on content knowledge acquisition, the CT group showed significantly greater gains on tests of argument analysis and critical reading skills. Students in the CT group also showed significantly greater gains on the ability to accurately postdict their CT test scores. The results suggest that direct infusion can improve both CT skills and metacognitive monitoring with implications for how they are related.

© 2014 Elsevier Ltd. All rights reserved.

. Introduction

The acquisition of critical thinking (CT) skills has for decades been a highly valued outcome of higher education; yet, nstructors continue to question whether their pedagogical practices promote the acquisition of these important skills. cquiring CT skills is important because they provide the means for students to question assumptions, analyze arguments, nd evaluate the quality of information inside and outside of their chosen fields. The purpose of the present study was to est direct infusion, an approach for explicitly teaching CT skills that promotes their efficient acquisition.

Numerous studies have shown that explicit instruction is effective in promoting the acquisition of CT skills (e.g., Bensley, rowe, Bernhardt, Buckner, & Allman, 2010; Marin & Halpern, 2011; Nieto & Saiz, 2008). See Abrami et al. (2008) for a review. lthough the research on skills and explicit instruction has increased understanding of what makes CT instruction effective,

focus on skills alone is incomplete because CT is a multi-dimensional construct (Bensley, 2011). Research shows that CT erformance involves, not only various reasoning skills but also CT dispositions (Clifford, Boufal, & Kurtz, 2004; Taube, 1997)

nd metacognition (Ku & Ho, 2010; Magno, 2010). Although many theorists have linked CT skills with metacognition (e.g., alpern, 1998; McGuinness, 1990; Swartz, 1989; Tarricone, 2011), few studies have examined how explicit CT instruction ay affect the acquisition of CT skills and metacognitive monitoring. Accordingly, this study tested the effectiveness of a∗ Corresponding author. Tel.: +1 301 687 4195; fax: +1 301 687 7418. E-mail addresses: (D. Alan Bensley), (R.A. Spero).

1 1412 Eagle Run, Morgantown, WV 26508, USA. Tel.: +1 304 638 2915. 871-1871/© 2014 Elsevier Ltd. All rights reserved.

56 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

form of explicit instruction called direct infusion on the acquisition of argument analysis and critical reading skills and on the ability to accurately estimate CT test performance on tests of those skills.

The complexity of CT as a multi-dimensional construct presents many challenges to those seeking to scientifically study its instruction and assessment (Bensley & Murtagh, 2012).

Despite the attention paid to the improvement of CT skills, identification of specific skills and their relationship to com- ponents of CT remain controversial. Taxonomies have identified a wide range of CT skills, sometimes listing the same and sometimes different terminological labels to what appear to be similar skills (e.g., Ennis, 1987, 1992; Facione, 1990; Halpern, 1998). For example, although these taxonomies commonly include argument analysis skills as core CT skills, they identify different subskills for argument analysis.

The lack of consensus about CT skills is found in discussions of reflection and metacognition across disciplines as well. Although philosophers have emphasized the importance of self-reflection in CT (e.g., Ennis, 1987; Paul, 1993), their lists of skills do not include the related psychological concept of metacognition developed by Flavell (1979). Metacognition refers to knowledge, awareness, and control of one’s own cognition. Of particular relevance to our study is the component of metacognition called monitoring which involves a person’s self-assessment of how well they are comprehending, acquiring certain knowledge, and thinking. Psychologists have identified metacognitive monitoring as central to CT (e.g., Bensley, 2011; Halpern, 1998; Tarricone, 2011). Being able to accurately monitor one’s learning and performance is needed for effective self-regulation of cognitive activities such as knowing whether more study is needed or whether one is reasoning well (Stone, 2000). Unfortunately, with few exceptions (e.g., Ku & Ho, 2010; Magno, 2010) empirical studies have not directly examined the relationship between CT skills for argument analysis and metacognition.

From the perspective of psychological science, the problem of CT skill identification remains unsolved because the skills listed in taxonomies have been mostly identified through introspection, philosophical analysis, and informal, post hoc inspection of thinking instead of through systematic, scientific investigation. Empirical research is needed on how instruction impacts performance on tasks requiring specific CT skills and how skill use is related to metacognition. More specifically, the present study examined whether a form of explicit instruction called direct infusion could facilitate acquisi- tion of both argument analysis and critical reading skills and metacognitive monitoring. Direct infusion involves the explicit instruction of CT rules and principles infused into course work, providing practice with exercises and formative assess- ments with feedback to guide skill acquisition. It might be expected that direct infusion of CT instruction could increase students’ awareness of their levels of CT knowledge and skills, facilitating acquisition of both CT skills and the ability to accurately monitor performance on tests of CT. To understand why this might be, we first examine explicit CT instruction in general, and then direct infusion, in particular, followed by a discussion of how metacognition may be related to direct infusion.

1.1. Review of research on explicit instruction

Ennis (1989) has used the explicitness of instruction as a criterion for classifying different CT instructional approaches. His scheme separates CT instructional approaches into four different types (general, immersion, infusion, and mixed) with the types differing in how explicitly CT principles are taught and how principles are taught in relation to course content. The general approach focuses instruction on explicitly teaching principles for thinking, usually separate from regular course content instruction and sometimes in a more abstract form as in a formal logic course. A second approach called ‘immersion’ does not make rules or principles of thinking explicit but instead relies on intense, thoughtful exposure or immersion to CT in subject matter. A third approach called ‘infusion’ resembles the general approach in that it involves explicit instruction of CT, but this explicit instruction is delivered in conjunction with the study of relevant subject matter as students are encouraged to think deeply about it. Finally, the mixed approach combines explicit teaching of CT principles as a separate thread of instruction with either immersion or infusion.

In one of the few empirical studies directly comparing the different approaches, Angeli and Valanides (2009) compared students taught with the general, immersion, and infusion approaches on their ability to write a CT discussion of an ill- defined issue. The infusion group received guided instruction in the use of CT skills while the immersion group was engaged in Socratic questioning about the essay without any explicit mention of the skills and a control group received no explicit instruction but simply prepared an outline of the essay. Statistically controlling for students’ initial CT test performance, Angeli and Valanides found that both the infusion and immersion groups performed significantly better than the control group on the outline task showing large effect sizes; however, the infusion group had the largest effect size.

Obtaining additional support for explicit infusion of CT, Abrami et al. (2008) assigned studies testing the different approaches identified by Ennis and performed a meta-analysis on their outcomes. Specifically, Abrami et al. (2008) found that the effect size for the infusion approach was larger than that of either the immersion or general approaches, but that the mixed approach had the largest effect size. This suggests that instructors should design courses in which CT skills are explicitly taught as a separate thread of instruction and then in concert with course content. Abrami et al. (2008) noted that more research was needed to clearly identify the instructional elements that promote CT.

Recently, Marin and Halpern (2011) conducted a study in which they more clearly specified the conditions of explicit CT instruction than in many previous studies. Explicit instruction involved modeling CT for their high school student partici- pants, encouraging them to thoughtfully respond to questions, having them practice specific CT skills, as well as recognize the structure of problems, and discuss the process of thinking to promote metacognition. A second imbedded instruction

g f s r

t r H s H b p

c ( C a d i c c g

c o a P a g t

i 2 s p o (


n g W m c t i

m t w r a A c

d i m S

D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68 57

roup received instruction that encouraged CT in the classroom and covered principles of CT within course content coverage ocusing on cognition and development. Although instruction of the imbedded group was designed to encourage use of CT kills, no explicit teaching of them was provided. A third wait-listed group received no explicit CT instruction and instead eceived the usual instruction in their classes.

To assess CT skill acquisition, Marin and Halpern (2011) pretested and posttested the explicit and imbedded groups with he Halpern Critical Thinking Assessment (HCTA), a test of various CT skills incorporating both forced-choice and open-ended esponses to many everyday examples. They found that the explicit instruction group showed significant improvement on the CTA in both studies; but in their first study the imbedded approach also showed significant improvement although the effect

ize for the explicit group was substantially larger. In the second study, the explicit instruction group had significantly higher CTA posttest scores than the imbedded group, providing stronger support for the superiority of explicit CT instruction; ut because Marin and Halpern did not also posttest the wait-listed control group, they were unable to assess change in erformance without CT-related instruction.

A closer examination of explicit CT instruction in one field, psychology, may help isolate instructional components that ommonly promote the acquisition of CT skills into content instruction. In two of these studies of psychology students Nieto & Saiz, 2008; Solon, 2007), explicit instruction based on Halpern (2003) has produced gains on the Cornell Test of ritical Thinking-Form Z, a test of general CT skills primarily assessing argument analysis skill. Solon (2007) found that fter instruction a group of general psychology students receiving explicit instruction of CT skills infused into the course id significantly better on the Cornell Test than a similar control group studying the same content but not receiving the CT

nstruction. Further testing of the two groups showed that the explicit CT-instructed group also performed as well as the ontrol group on a test of general psychology, suggesting that infusing explicit CT instruction into regular course instruction an improve CT skills without negatively impacting acquisition of subject matter knowledge while explicitly teaching more eneric skills for reasoning.

Other studies with psychology students have shown that skills related to thinking critically in the discipline of psychology an be explicitly infused into course content instruction, affording the opportunity to teach students how to think critically n authentic tasks important to disciplinary thinking (Bensley & Murtagh, 2012). In one such study, Penningroth, Despain, nd Gray (2007) explicitly taught a special CT class methods for analyzing psychological research based on Stanovich (2004). enningroth et al. used the Stanovich textbook to model examples of thinking errors in psychology accompanied by quizzes nd written assignments and feedback to guide CT instruction. They found that the CT class showed significantly greater ains than psychology students in another course not receiving the explicit CT instruction on a test of psychological critical hinking developed by Lawson (1999) and his colleagues.

An examination of the characteristics of instruction in the successful demonstrations of the explicit CT skill instruction n psychology students reveals several commonalities. For example, in many, (e.g., Bensley & Haynes, 1995; Nieto & Saiz, 008; Penningroth et al., 2007; Solon, 2007) the methods of instruction employed common components such as targeting pecific CT skills, making CT rules and principles explicit though the use of assignments and exercises designed to provide ractice in their use, and providing corrective feedback on the assignments and exercises. These components are similar to nes found in effective, direct instruction approaches and in approaches to teaching CT proposed by Angelo (1995), Beyer 1997), and Halpern (1998).

.2. The direct infusion approach to CT instruction

Consistent with the above findings, an approach called direct infusion (Bensley et al., 2010) incorporates these compo- ents of effective instruction with others found in the literature. These include explicit teaching of CT rules and principles, uided instruction (Mayer, 2004), direct instruction (Walberg, 2006) and feedback from formative assessments (Black & iliam, 1998) in the service of infusion or teaching students how to use CT rules and standards in thinking about subject atter. Specifically, it involves targeting and making specific rules explicit, teaching those rules and principles infused into

oursework and subject matter discourse, guiding instruction through modeling of thinking, and providing practice in using he rules followed by corrective feedback. Pretests and posttests used in summative assessment differ in content from CT nstructional and assessment materials but are otherwise well aligned with instruction (Bensley, 2011).

The direct infusion approach is also consistent with the findings of Abrami et al. (2008) showing the superiority of the ixed approach in which explicit instruction of CT rules and principles are taught as a separate thread of instruction, and

hen infused into subject matter instruction. In direct infusion, rules and principles for thinking are first directly taught ith examples, and then their use is instantiated into the discourse of content instruction in exercises and assessments that

equire appropriate application of those rules and principles. Lockhart (1992) has proposed that CT is like pattern recognition nd that thinkers must learn to access relevant reasoning schemata or conceptual structures to think effectively on a task. long these lines, we might expect direct infusion provides the means for explicitly associating reasoning rules with relevant ontextual cues in subject matter discourse to promote access and later appropriate application in different discourse.

To test the effectiveness of direct infusion of argument analysis skills, Bensley et al. (2010) compared students from

ifferent research methods classes in which they either did or did not receive explicit argument analysis instruction directlynfused into their coursework on a test of their argument analysis skill in psychology. They found that the CT-infused research ethods group showed significantly greater gains than students in traditionally taught research methods classes on the test.

upporting the study’s internal validity was the fact that the groups were similar in academic achievement measured by

58 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

GPA, in academic ability measured by SAT, and in CT disposition measured by the Need for Cognition scale; however, the groups differed by instructor and textbook used, creating potential threats to internal validity.

1.3. Limitations of previous studies

Methodological shortcomings found in the studies reviewed weaken the support they provide for the explicit instruction hypothesis. In some, comparison groups were taught by different instructors using different textbooks (e.g., Bensley et al., 2010; Penningroth et al., 2007) or when taught by the same instructor students were not randomly assigned to conditions (Solon, 2007). In other studies that did use randomization, individual students in different treatment groups were not always completely randomized before instruction (e.g., Marin & Halpern, Experiment 2, 2010; Nieto & Saiz, 2008). Such limitations are understandable given the difficulty of testing students receiving CT instruction in educational settings.

When random assignment is not possible due to classroom constraints, the best strategy is to make instruction in the comparison groups as comparable as possible, except for explicit infusion of CT. This can be accomplished by using the same instructor and textbook, and controlling for other individual difference variables that could affect CT performance. For example, this can be done by assessing students on academic performance-related variables and then matching partici- pants in different instructional groups before instruction. This strategy, however, is difficult to implement when comparing students in treatment groups in different intact classrooms of differing sizes, as in the present study. Instead, individual differences in treatment group participants can be assessed to determine whether any differences exist in the groups that require statistical control, much as Solon (2007) did.

Another more theoretical limitation of the previous studies on explicit CT instruction was that none of them examined the role of metacognition in the acquisition of CT skills despite the fact that some were conducted within theoretical frameworks assuming a metacognitive component (e.g., Bensley, 2011; Halpern, 1998). Consequently, the impact of explicit instruction on metacognition remains unknown and whether or not the ability to accurately monitor CT test performance is related to acquisition of CT skills.

1.4. Review of metacognition research related to CT

Two recent studies have examined metacognition in the context of argument analysis skills (Ku & Ho, 2010; Magno, 2010). Magno (2010) gave students the Watson-Glaser Critical Thinking Appraisal and the Metacognitive Assessment Inventory (MAI) of Schraw and Dennison (1994). Magno developed a model in which monitoring and other metacognitive factors from the subscales of the MAI successfully predicted critical thinking test performance on the Watson-Glaser. To study metacognitive processes involved in CT task performance, Ku and Ho (2010) had students think aloud as they performed tasks from part of the HCTA “Using Everyday Situations” of Halpern (2007). Ku and Ho found that good critical thinkers used more high-level metacognitive strategies than poorer critical thinkers, including monitoring, but especially evaluation and planning. These two studies of individual differences suggest the importance of monitoring in CT, but they did not test whether explicit instruction affects the acquisition of argument analysis skills or the role of metacognitive monitoring in this process.

The ability to accurately monitor skill levels is likely related to the skills levels themselves. Dunning and his colleagues have extensively studied monitoring of test performance on a variety of tests (Dunning, Johnson, Ehrlinger, & Kruger, 2003). They have consistently found that those people who performed better on various tests were also more accurate in postdicting or estimating their performance on tests they had taken. Dividing test takers into quartiles based on their performance, they found that those in the lowest quartile on the test tended to greatly overestimate the scores they obtained. As test performance improved, people became increasingly accurate in postdicting their actual scores. Interestingly, test takers in the top quartile tended to slightly underestimate how well they did although much better calibrated than poorer performers.

Kruger and Dunning (1999) have explained the relationship between test performance and self-monitoring skill by proposing that poor performers are “unskilled but unaware.” Poor performers are doubly cursed in that not only do they lack the knowledge and skill of better performers but also their lack of knowledge and skill robs them of the ability to know that they lack knowledge and skill. In this view, knowledge and skill in a domain provide information that allows people to accurately monitor their test performance and to know if their performance is deficient. Postdiction accuracy is important because those who overestimate their scores may fail to realize that their knowledge or skill levels are inadequate and that they need to work on improving them.

Few, if any, studies have specifically examined the relationship between CT skill and test monitoring accuracy, but two experiments by Kruger and Dunning (1999) did examine reasoning skills and students’ ability to accurately estimate or postdict their performance on tests of those skills. In one, they examined the monitoring accuracy of 45 college students who took a logical reasoning test constructed from LSAT test preparation questions. As before, those who did the worst on the test greatly overestimated their performance on the test while the best performers tended to underestimate how well they did but were better calibrated.

In a follow-up study of disjunctive reasoning performance using the Wason (1966) selection task, Kruger and Dunning tested whether improving skill on the selection task would produce better self-monitoring in a group trained to do the selection task compared to a control group not receiving the training. In the first phase of their study before training, they obtained the usual pattern of estimation errors in their 140 participants. The poorest performers greatly overestimated the

n a o O a t M a s w

& i t t d a p p

t p c p p a a

r “ t t p a s

m t b g m a N i


e i s a o

t d g s o


D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68 59

umber of selection problem questions they answered correctly while those who performed better were better calibrated, nd the best performers underestimated their scores. In the second phase of the experiment, they randomly assigned half f the students to a group that received a deductive reasoning training packet adapted from Cheng, Holyoak, Nisbett, and liver (1986) while the other half did a filler task. After training, they had all participants look at their original tests again nd re-estimate how many they had correctly answered. This time, the accuracy scores of the poor performers in the raining group rose to nearly the expert levels of the best performers with calibration increasing as performance improved.

onitoring accuracy of the untrained group showed no improvement. Kruger and Dunning (1999) interpreted their results s suggesting that the training improved reasoning skill on the Wason selection task and this also improved the accuracy of elf-monitoring of that skill. When poorer performers were trained, they acquired knowledge that helped them recognize hich questions they answered correctly and which they did not, resulting in improved calibration.

Other studies have also improved monitoring accuracy through training (e.g., Hacker, Bol, Horgan, & Rakow, 2000; Huff Nietfeld, 2009; Nietfeld, Cao, & Osborne, 2006). These studies used interventions in which students received practice

n estimating their scores on similar questions and feedback about their estimates. For example, Nietfeld et al. (2006) ested the effect of distributed exercises in monitoring with feedback. Specifically, participants in the treatment group rated heir confidence in answering practice questions over the material and how well they understood the material covered uring that weekly group period. Then, at the end of each group period, students received feedback about their answers nd compared their performance to their confidence estimates. A second control section of the same group did not get the ractice questions or feedback. In subsequent tests over course material, Nietfeld, et al. found the group receiving distributed ractice in monitoring was significantly better calibrated in estimating their test performance than the control group.

This review of monitoring studies suggests that increasing knowledge and skills needed to improve performance on ests of those skills may also improve the calibration of estimates of performance on those tests. Other studies showed that roviding students with distributed practice on questions in which people estimate their performance and receive feedback an also improve calibration of estimates of test performance. Because the direct infusion approach incorporates distributed ractice on CT exercises and formative assessments and provides feedback on that work, it might also be expected to improve ostdiction accuracy as part of skill acquisition. In the process of learning how to improve performance on the CT exercises nd formative assessments, students should improve their ability to postdict their test performance on later summative ssessments without receiving explicit instruction in monitoring their test performance.

The improvement in CT and monitoring of CT test performance through direct infusion is expected to be greatest when elevant rules and principles for reasoning effectively on a task are explicitly taught. Although William James argued that the art of remembering is the art of thinking. . .” (cited in Lockhart, 1992, p. 55), the rules and principles for effective thinking o improve memory differ from those used in the CT skills for argument analysis and critical reading. If so, then explicitly raining students to use rules and principles for improving memory infused into their course work should not improve CT erformance on tests of argument analysis or critical reading nor the monitoring of CT test performance in which such rules re used. We were unable to find any study in the literature that tested this hypothesis in either the laboratory or a classroom etting.

Metacognitive studies have mostly been done in the laboratory, highlighting the need more research on metacognitive onitoring in the classroom. Hacker, Bol, and Keener (2008) persuasively argued for the importance of studying metacogni-

ive monitoring in classroom settings to increase the ecological validity of such studies. Although laboratory studies may be etter controlled and have greater internal validity than classroom studies, they can also produce results that do not readily eneralize to actual classroom settings. Some kinds of classroom-based instruction, like instruction of CT and metacognitive onitoring, may take a fairly long time to produce results. This makes random assignment to treatment groups very difficult

nd quasi-experimental comparison of different classes receiving different forms of instruction a more practical option. evertheless, quasi-experimental comparison groups should be made as similar as possible except for the instructional

nterventions to increase internal validity and avoid some of the aforementioned methodological shortcomings.

.5. Overview of the study

To achieve these goals, the present study tested the effectiveness of direct infusion of CT in a classroom setting to increase cological validity while controlling potentially confounding variables related to differences in instructional groups. To ncrease internal validity, we measured other individual difference variables that could have contributed to CT performance, uch as CT disposition, academic ability, academic achievement, background knowledge, and assessment motivation. We ssessed their relationship to CT test performance and compared the groups on these variables to determine whether any f them needed to be statistically controlled.

Specifically, students from different classes of the same course using the same chapters from the same textbook and aught by the same instructor were assigned to groups either receiving or not receiving direct infusion of CT. To test the irect infusion hypothesis, we compared three groups, one receiving direct infusion of CT skills (CT-infused group), a second roup receiving direct infusion of memory improvement skills (MI-infused group) but no explicit CT instruction, and a third

imilar group that received no explicit instruction of either CT or MI but instead focused on content knowledge acquisition f the same subject matter covered by the other groups (content knowledge group).We expected that after instruction the CT-infused group would show significantly greater gains on tests of argument nalysis and critical reading skills than the MI-infused group and the content knowledge group. Explicit instruction for

60 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

how to think with rules for improving memory of course-related information should not improve argument analysis and critical reading performance because rules and principles of memory improvement are not useful for thinking critically about psychological subject matter. Nor should any implicit rules that the content knowledge group may have acquired as they studied course material be sufficient to improve their argument analysis and critical reading skills because we assume that people do not readily acquire these skills through immersion without receiving explicit CT instruction.

It might also be expected that after instruction the CT-infused group would show significantly greater gains than the two control groups in the ability to accurately postdict their performance on both the argument analysis and critical reading tests. This would be expected if receiving feedback about practice in using CT rules appropriately helped improve students’ knowledge of their performance. Students receiving direct infusion of MI principles given feedback from exercises and practice assessments would not be expected to improve on their ability to postdict CT test performance because they had not received practice in monitoring the use of relevant rules for performing on CT tests. Nor would the knowledge acquisition group be expected to improve their postdiction accuracy because they received not relevant feedback on CT rule use.

2. Method

2.1. Participants

The sample included 103 psychology students from five sections of an upper level cognitive psychology course taught by the first author at a small mid-Atlantic public university. Of the original 103 students tested, 87 participated in both the pretest and posttest administrations of the CT tests and were retained for this sample. Of the 87 participants, none had received CT instruction related to argument analysis from the first author in other classes or were repeating the course. The 87 remaining participants included 72 women and 15 men. Their ages ranged from 19 to 40 (M = 21.84, SD = 3.59) years old and included 59 Caucasians, 23 African Americans, four Hispanic Americans, and one Asian American. Students received 2% extra credit in total course points for participating in the assessments. The five classes were divided into three groups with two classes assigned to the CT-infused group (N = 35), two classes receiving traditional instruction focused on content knowledge acquisition (N = 33), and one class receiving direct infusion of memory improvement (N = 19).

2.2. Instruction

Students in all three groups studied the same eight chapters from the manuscript of a cognitive psychology textbook written by the first author (Bensley, 2009a). This included chapters on an introduction to cognition, methods for studying cognition, perception, attention, an introduction to memory, long-term memory and accuracy, metamemory and memory improvement, and part of a problem solving chapter and an entire chapter on creative thinking with the latter two covered after all assessments were completed. All students received a set of study guide practice questions for each chapter to help them prepare for chapter quizzes and took three unit exams with approximately the same number of total course points available to each group. All three groups were taught the term “metacognition” and that people generally have some accurate knowledge of their memory performance.

Instruction in the three groups differed in terms of the skill focus of instruction, the exercises completed, and the content of certain quizzes and some questions on exams.

The CT-infused group focused on material from the textbook’s second chapter on argument analysis and scientific research methods used in the study of cognition. Discussion centered on rules and guidelines for evaluating the quality of evidence provided by different research methods such as true experiments versus correlation and evidence provided by non-scientific sources such as anecdotes, commonsense belief, and statements of authority. To scaffold this presentation, the CT-infused group received two tables that compared the strengths and weaknesses of non-scientific and scientific kinds of evidence adapted from Bensley (1998, 2011).

For practice, students in the CT-infused group first completed a graded exercise on recognizing kinds of evidence, compar- ing the quality of evidence, and on distinguishing arguments from non-arguments. After receiving feedback on this exercise, they completed three other exercises based on textbook chapters and took quizzes posing questions similar to those in the respective exercise. For example, the second exercise provided students evidence relevant to the question of whether per- ception is accurate from a literature review in the perception chapter, asking them to analyze and evaluate the evidence and draw a good inductive conclusion from it. They completed two other, similar, critical reading exercises accompanying the memory chapters, one presenting evidence from a literature review on whether flashbulb memory is accurate and another discussing whether hypnosis improves memory. Finally, they watched a video, Kidnapped by Aliens, and completed a graded exercise in which they identified and evaluated the kinds of evidence used in the video that discussed why people falsely remembered having been abducted by space aliens. After this, they received feedback on their identification and evaluation of the kinds of evidence.

The MI-infused group was also instructed in cycles of practice, feedback, assessment, and more feedback. Those students,

however, were instructed in use of principles and strategies for improving memory and practiced using these to learn material in the course, receiving feedback on their application in exercises and quizzes. This instruction included learning how to use memory principles such as paying attention, forming meaningful associations, visualizing material, using organizational strategies and mnemonic strategies such as the first letter mnemonic, method of loci, and keyword method. Part of this

t j n b i

p n o m t T a t i


a b p i n r

m t i p i o t

r o l

h m a o

& R t


t o C w f t

fi i p a

D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68 61

raining involved deciding which strategy would be most effective to learn certain kinds of material and other metamemory udgments. Although they received feedback on their choice and implementation of MI strategies mnemonics, they received o specific instruction in monitoring their performance. The MI-infused group watched the same Kidnapped by Aliens video ut instead answered questions about problems with the memory retrieval methods, such as leading questions in hypnotic

nterviews, featured in the video as problematic techniques to access memories of alien abduction. Students in the content knowledge group received instruction focused on acquisition of concepts and facts about cognitive

sychology. They completed no graded exercises or quizzes designed to help improve their CT or memory skills and received o practice in monitoring their test performance. Instead, they took quizzes focusing on basic content and factual knowledge f cognitive psychology and received feedback on their performance. Their chapter quizzes were longer and covered more aterial from each chapter than the other two groups. Besides the same five questions posed to the other two groups in

heir chapter quizzes, the quizzes of the content knowledge group contained five additional content knowledge questions. his compensated for the additional questions the two direct infusion groups received that were focused on CT and MI skills nd resulted in the content knowledge group having approximately the same number of total course points as the other wo groups. The content knowledge group watched the same Kidnapped by Aliens video as the CT and MI-infused groups but nstead answered factual questions about information in the video and received feedback on their responses.

.3. Measures

All students completed two tests of a psychological CT battery (Bensley, 2009b), including an argument analysis test and critical reading test. The argument analysis test “Analyzing Psychological Statements” (APS) had 16 multiple-choice items ased on the original 15-item test used by Bensley et al. (2010). The revised APS included eight items describing everyday, sychology-related situations and eight describing psychological research or clinical practice examples containing three

tems on recognizing kinds of evidence, five items on evaluating evidence, four items on distinguishing arguments from on-arguments, three items on finding assumptions, and a new item on drawing an appropriate inductive conclusion from esearch.

The critical reading test (CRT) entitled “Memory and Aging” had a 1500-word literature review on the question of whether emory declines with age. The literature review presented evidence and arguments on two sides of the question. Following

he passage was a 16-item, multiple-choice test of the passage containing: four items on identifying kinds of evidence, three tems on evaluating the quality of evidence, two items on interpreting evidence, one item on argument identification, two rediction/application items, one item on finding assumptions, and three items on drawing a conclusion from the evidence

n the passage. Although the CRT asked some argument analysis questions like those on the APS, it required comprehension f extended arguments because claims, evidence, arguments, assumptions, and conclusions were all embedded within the ext of the literature review.

After the CRT, participants completed a form containing self-report scales for rating motivation and prior knowledge elated to the test. Instructions asked participants to rate how much effort they expended and then how much knowledge f the topic they had on 5-point Likert scales ranging from 1 = very little to 5 = an extreme amount. A third question had them ist any courses they had taken relevant to the topic.

To test metacognitive monitoring of test performance on the CRT and APS, another form asked participants to estimate ow many questions out of 16 they had correctly answered on each respective test after completing it and to estimate how any the average student in their course got right. Finally, students rated how good of a critical thinker they were, in general,

nd then in psychology, on two 5-point, Likert items with scale values ranging from 1 = very poor to 5 = very good. After each f these questions, a parallel question asked them to rate the average student in their class on the same dimensions.

To assess CT disposition, we administered a 20-item version of the short form of the Need for Cognition scale (Cacioppo Petty, 1982), a reliable and valid measure of open-minded, intellectual engagement that we adapted from the longer ational-Emotive Inventory of Pacini and Epstein (1999). Participants also completed a brief demographics form after all esting was done.

.4. Testing procedure

The first author tested all students in their classrooms beginning on the first day of class and again at approximately the enth week of the semester after each class had finished the second unit of the course covering memory. Each time, he went ver the consent form with participants, informing them that they would be asked to complete measures assessing their T skills. They were told that the purpose was to assess students for program and group purposes and that the assessment as not part of their permanent academic record. They were urged, nevertheless, to do their best on the tests. They were

urther told that the department valued their participation and they would get extra credit for participation whether or not hey consented to let us use their data for presentation or publication purposes.

Participants first read the CRT passage and then completed the questions, putting their answers on a Scantron form. After

nishing this task, they were given the form for estimating their CRT score and the other ratings of their CT skill. After turningn all forms for the CRT, they completed the APS and again completed the metacognitive measures for the APS. This same rocedure was conducted for the posttest. Pretesting and posttesting took about 40–50 min each time. After the posttest dministration, students completed the Need for Cognition scale and demographics form.

62 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

Table 1 Mean pretest, posttest, and change scores on the critical reading test for the groups with and without explicit critical thinking instruction.

Instructional group Critical reading test

Pretest Posttest Change


Critical thinking (n = 35) 8.80 3.09 11.14 2.71 2.34 2.33 Memory improvement (n = 19) 10.10 2.45 10.32 2.94 0.21 2.49 Content knowledge (n = 33) 9.12 2.65 9.79 3.00 0.67 2.47

After completion of all testing, the university information service supplied each student’s overall grade point average (GPA) and psychology GPA to assess academic achievement and Scholastic Aptitude Test (SAT) score to assess academic ability.

3. Results

3.1. Critical thinking test performance

Initial comparisons of the three groups on their pretest CRT scores revealed no significant differences, F(2,84) = 1.36, p = .26. Table 1 shows the means and standard deviations of the pretest CRT scores. Likewise, the three groups showed no significant differences on the pretest APS scores, F(2,83) = 1.09, p = .34. Table 2 shows the means and standard deviations of the pretest APS scores. To examine changes due to instruction, pretest scores for the CRT and APS were subtracted from their corresponding posttest scores of each participant to produce change scores for each variable. For ease of interpretation, we used a one-way ANOVA to analyze change scores (cf. Huck & McLean, 1975). The means of the CRT and APS change scores appear in the far-right columns of Tables 1 and 2, respectively.

An ANOVA on the CRT change scores was significant, F(2,84) = 6.30, p = .003, �2p = .13. Analyses using Levine’s test revealed no significant differences in the variances of the groups on the CRT and any other measure reported in this study. We used Tukey’s HSD test to compare the group means on the CRT change scores and all other measures in this study following significant omnibus F-test results. As expected, post hoc comparisons of the CRT change scores revealed that the CT-infused group showed a significantly greater gain (p < .01) than the MI-infused group and significantly greater gain (p < .05) than the content knowledge group. The effect size of the comparison of the CT-infused group to the MI-infused group (d = .90) was large, and the comparison with the content knowledge group (d = .70) was medium-sized according to criteria set by Cohen (1988). No significant sex difference was found in the CRT change scores.

An ANOVA on the APS change scores was also significant, F(2,82) = 6.21, p = .003, �2p = .14. Again as expected, post hoc comparisons using Tukey’s HSD showed that the mean gain of the CT-infused group was significantly greater than both the MI-infused group, p < .05, and the content knowledge group, p < .01. The effect size for the comparison of the CT-infused group to the MI-infused group was medium-sized (d = .74) as was the comparison of the CT-infused group to the content knowledge group (d = .78).

Comparison of male and female students on APS change scores revealed a significant difference, t(82) = 2.89, p = .005 with males (M = 3.27, SD = 3.58) showing a significantly greater gain than females (M = .65, SD = 3.08). This difference was based on only 15 males who were divided in about equal proportions of about 20% to each of the three groups. To test for possible sex differences in the instructional groups on the APS change scores, we conducted a 2 (sex) × 3 (instructional group) ANOVA. A test of the interaction between the two variables was not significant, suggesting that sex of subject did not contribute more to APS change for one group than another; however, the small number of males makes interpretation problematic.

3.2. Metacognitive analyses

3.2.1. Analyses of metacognition before instruction We conducted a series of analyses to examine students’ metacognitive knowledge before instruction. Applying the method

of Kruger and Dunning (1999), we calculated quartile ranges for the pretest CRT and APS scores and assigned participants

Table 2 Mean pretest, posttest, and change scores on the APS, the argument analysis test, for the groups with and without explicit critical thinking instruction.

Instructional group Argument analysis test

Pretest Posttest Change


Critical thinking (n = 34) 8.41 3.39 11.03 2.76 2.62 3.47 Memory improvement (n = 19) 9.74 2.84 9.95 3.37 0.21 2.82 Content knowledge (n = 33) 8.76 3.09 8.91 2.67 0.15 2.82

t e W f t t a

b t t d g

r b s o c a c a t

w o s r t L n


i b N t p n b t

F w c

t f S t g g

c b p t

c l

D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68 63

o four quartile groups corresponding to the quartile range within which they fit. Then, we calculated the mean correct for ach quartile and compared it to the number estimated to be correct on the CRT and APS, respectively, for each quartile. e found that students in the first quartile on the CRT pretest overestimated their scores by 30.6% while students in the

ourth quartile underestimated their scores by 4%. On average, students overestimated their CRT scores by 8.5%. Applying he same procedure to the pretest APS scores, we obtained similar results, showing that students in the first quartile on he APS overestimated their scores by 33.8% while students in the fourth quartile underestimated their scores by 12.5%. On verage, all students overestimated their APS scores by 8%.

To further examine individual differences in the postdiction accuracy of students overall, we calculated correlations etween the number correct and the estimated number correct on the pretest APS and CRT scores. The number correct on he CRT pretest had a low, but significant, positive correlation with the estimated number correct, r(82) = .24, p = .03 as did he pretest APS number correct with the estimated number correct on the APS, r (83) = .24, p = .03. These results suggest that, espite the large overestimation of their performance by the lowest performing students before instruction, students, in eneral had some knowledge of their test performance.

To further examine the extent to which students were aware of their overall levels of CT ability before instruction in elation to their CT pretest performance, we examined the ratings of critical thinking self-efficacy, calculating correlations etween the number correct on the CT tests and self-estimates of CT ability. The number correct on the CRT pretest was not ignificantly correlated with either the rating of how good of a critical thinker students judged themselves to be, in general, r in psychology, in particular. In contrast, the number correct on the APS pretest showed a low, but significant, positive orrelation with the ratings of how good of a critical thinker students judged themselves to be, in general, r(83) = .25, p = .02 nd with their self-estimates of their ability to think critically in psychology, r(83) = .33, p = .002. These low, but reliable orrelations suggest that before instruction students had some, but minimal, knowledge of their own CT ability, in general, nd in psychology, in particular, that was related to their score on the argument analysis test, but not the critical reading est.

To assess accuracy in postdicting CT performance of the three groups before instruction in a way that was unbiased by hether students over- or underestimated their scores, we computed the differences between each student’s CT test scores

n the CRT and APS and their respective estimated test score for each test and then took the absolute value of each difference core. These calibration scores, therefore, represented the total absolute difference between the correct and estimated scores, egardless of whether that difference was under- or overestimated. ANOVAs on the pretest monitoring accuracy scores for he CRT and the APS revealed no significant differences in calibration on either of these new measures before instruction. ikewise, an ANOVA on the sum of the APS and CRT monitoring accuracy scores showed the three instructional groups did ot differ in their calibration before instruction.

.2.2. Analyses of metacognitive knowledge and postdiction accuracy after instruction To measure change in monitoring accuracy, pretest monitoring accuracy scores were subtracted from posttest mon-

toring accuracy scores. This produced calibration change scores that, when negative, indicated that the discrepancy etween actual and estimated scores on the posttest was reduced or decreased relative to the discrepancy on the pretest. egative scores, therefore, represented a gain in absolute monitoring accuracy from pretest to posttest while posi-

ive scores indicated an increasing discrepancy between correct and estimated scores on the posttest with respect to retest and a decline in absolute monitoring accuracy. It should be noted that a calibration change score of 0 indicated o change between the pretest and posttest correct and estimated scores even if the participant was perfectly cali- rated both times, making a score of 0 ambiguous; however, no participant scored 0 indicating perfect calibration both imes.

An ANOVA on the calibration change scores for the APS argument analysis test was significant, F(2,81) = 3.55, p < .05. ollow-up comparisons with Tukey’s HSD test, however, revealed that the gain of the CT-infused group (M = −1.20, SD = 2.63) as only approaching significance (p = .06) compared to the content knowledge group (M = .27, SD = 2.41). Likewise, the

omparison to the MI-infused group (M = .05, SD = 2.53) was not significant (p = .09). A second ANOVA on the calibration change scores for the CRT was not significant, F(2,81) = 1.94, p = .15. Further inspec-

ion of these means revealed that, although not significantly different, the magnitude of the gain in calibration accuracy or the CT-infused group (M = −.97, SD = 2.31) was relatively greater than that of the content knowledge group (M = −.13, D = 2.43) and the MI-infused group (M = .26, SD = 2.38). As with the marginally significant gains in calibration accuracy in he APS, the magnitude of the increases in CRT calibration accuracy were in the predicted direction, that is, the CT-infused roup showed a consistently greater (although not significantly greater) reduction in discrepancy than did the control roups.

These results suggested the existence of a consistent, albeit weak, relationship across instructional conditions on the alibration change scores for the two variables, not easily detectable with a single measure but which might be detected y combining the two measures. Supporting the rationale for combining the two variables, we found that the pretest and osttest measures of APS and CRT calibration scores were intercorrelated, suggesting they were, to some extent, measuring

he same construct.Accordingly, to increase the sensitivity of the measurement of calibration change, we computed a new composite alibration measure from the APS and CRT calibration change scores. We combined the two measures by first calcu- ating z scores for the APS and CRT calibration scores, respectively, and then summing the z scores to compute a total

64 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

calibration for the pretest and posttest. On this scale, a reduction in discrepancy between the actual and estimated scores from pretest to posttest is indicated by total standardized calibration change scores that are negative because when greater discrepancy on the pretest is subtracted from less discrepancy on the posttest this produces negative scores in total standardized calibration units. Although ANOVA on the total calibration scores on the pretest was not significant, F(2,78) = 2.10, p = .13, the ANOVA on the total standardized calibration change scores was significant, F(2,78) = 5.47, p < .01, �2p = .12.

Inspection of the means of the three groups showed that only the total standardized calibration change mean of the CT- infused group was negative. Post hoc comparisons of the means revealed that the CT-infused group showed a significantly greater gain in overall calibration accuracy (M = −.74, SD = 1.75) than the MI-infused group (M = .68, SD = 1.22) and the content knowledge group (M = .37, SD = 1.77) both at p < .05. The effect size for the comparison of the CT-infused group to the MI- infused group was large (d = .90) while the effect size of the comparison of the CT-infused group to the content knowledge group was medium (d = .63).

To verify the significant effects for the CT-infused group on this complex dependent variable, we deconstructed the total standardized calibration scores and conducted paired sample t-tests on the pretest and posttest calibration scores for the APS and CRT for each group separately. We found that only the CT-infused group showed significantly less discrepancy on the posttest APS calibration than on the pretest, t(31) = 2.51, p = .02, with the total posttest calibration mean (M = 2.25, SD = 1.55) showing significantly less discrepancy than the pretest mean APS calibration (M = 3.47, SD = 2.11). Likewise, the CT-infused group showed significantly less discrepancy on posttest CRT calibration than on the pretest, t(31) = 2.33, p = .03, with the total posttest calibration mean (M = 1.88, SD = 2.08) showing significantly less discrepancy than the pretest mean APS calibration (M = 2.85, SD = 2.30). A reanalysis of these two t-test results and the four t-test results of the other two groups, using the more conservative Bonferonni procedure, revealed that the CT-infused group no longer showed significant gain in calibration after instruction on either measure. Together, these results are consistent with those obtained with the total standardized calibration variable, and are consistent with the non-significant findings of the earlier ANOVAs testing calibration on APS and CRT separately.

To further describe the overall postdiction calibration of the three groups as a percentage for the posttests, we calculated a difference between the mean of CT scores on the two posttests and the mean of the estimated scores for the two tests and divided by 16, the total number of mean points possible. On average, all three groups continued to overestimate on the tests with the CT-infused group overestimating by 5.4%, the Content knowledge group by 12.4% and the MI-infused group overestimating by 6.6%. Although the CT-infused group overestimated the least, the differences between the groups were not significant.

3.3. Individual differences and validity

To better understand performance on the CT tests, we examined the correlations of the pretest scores on the CRT and APS with scores on academic variables including SAT, overall GPA, and psychology GPA. Pretest CRT scores showed significant, positive correlations with SAT, r(60) = .46, p < .001, overall GPA, r(76) = .46, p < .001, and with psychology GPA, r(75) = .38, p < .001. Likewise, pretest APS scores showed significant, positive correlations with SAT, r(59) = .51, p < .01, overall GPA, r(75) = .39, p < .001, and with psychology GPA, r(74) = .39, p < .001. These results suggest that those students with better academic ability as measured by the SAT and those with better academic achievement as measured by GPA all tended to do better on both the APS and CRT.

There was a also significant positive correlation between posttest APS scores and need for cognition scores (thought to measure intellectual engagement), r(84) = .23, p < .05 but not between pretest APS scores and NFC, r(84) = .15, p = .16. Likewise, the posttest CRT scores showed a significant positive correlation with NFC, r(84) = .23, p < .05 but the pretest CRT scores were not significantly correlated with NFC, r(84) = .15, p = .17. These results provide partial support for the contribution of CT disposition as measured by the NFC to CT performance at least on both posttests. The three groups, however, showed no significant difference on NFC before instruction, F(2,83) = 0.36, p = 70, supporting the study’s internal validity and suggesting that CT disposition in the groups did not explain their differences on the CT tests.

The positive correlations between CT test performance and the academic variables and with NFC suggest the possibility that group differences in these individual differences could have confounded CT test results. To examine this possibility, we conducted ANOVAs comparing the three groups on these variables. Results of the ANOVAs showed that the three instructional groups did not differ significantly, suggesting that individual differences in academic ability and achievement did not account for the significant gains on the CRT and APS reported earlier. See Table 3 for descriptive statistics on these individual difference variables.

Comparisons of the three groups on measures of test motivation and subject-relevant task knowledge on the CRT also revealed no differences. Specifically, the three groups did not differ significantly on self-reported amount of knowledge of memory and aging they possessed before reading the literature review of the CRT on either the pretest or posttest. Nor did

they differ significantly on ratings of their effort on either the pretest or posttest CRT. On average, the three groups reported slightly more than a moderate amount of effort on the pretest (M = 3.40, SD = .66) and on the posttest (M = 3.42, SD = .61). These results suggest that differences in the groups of the CRT were not due to differences in background knowledge or in test motivation.

D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68 65

Table 3 Means and standard deviations of instructional groups on academic background variables and need for cognition.


Instructional group Psyc. GPA Overall GPA SAT NFC

Critical thinking M 3.04 2.85 934.52 75.50 SD 0.65 0.55 141.06 10.47

Memory improvement M 3.13 2.93 952.73 76.37 SD 0.63 0.55 139.72 6.40

Content knowledge M 3.26 3.03 1011.00 74.24 SD 0.56 0.52 184.56 8.87




i n s o c d

h m o t t t r q i p

e a s w 2

t t f t s s w 1 c s



ote: Psyc. GPA = grade point average in psychology. Overall GPA = grade point average for all courses. SAT = combined verbal and quantitative SAT. FC = need for cognition total score adapted from the rational-experiential inventory of Pacini and Epstein (1999).

. Discussion

.1. Interpretation of CT test performance

The results of this study provided additional support for the effectiveness of direct infusion of CT, a form of explicit nstruction that infuses CT rule instruction into regular content instruction. Although the three instructional groups did ot differ on the argument analysis and critical reading pretests, after instruction the group receiving direct infusion of CT howed significantly greater gains on the posttests of those skills as compared to a control group receiving direct infusion f MI instruction and a traditionally-taught control group focused on content knowledge acquisition. The effect sizes in omparing the CT-infused group to the other two groups were medium to large, suggesting a substantial contribution from irect infusion.

The results of the present study cannot be easily explained by individual differences in several variables that could ave affected CT performance. The groups did not differ on academic ability as measured by SAT, on academic achieve- ent as measured by overall GPA and psychology GPA, on CT disposition as measured by the Need for Cognition Scale,

n self-reported motivation in completing the CRT, or on self-reported prior knowledge relevant to the critical reading ask. Also, students who had received explicit CT instruction in previous courses were removed from the sample. Fur- her supporting the study’s internal validity was the similarity of instruction in all groups except for the manipulation of he independent variable. The same instructor taught all three groups from the same textbook, covering the same mate- ial, except for portions of the second chapter specific to their treatment. Each group answered the same questions on uizzes and exams except for a small subset of questions focusing on their particular instructional focus. The two direct

nfusion groups had the same number of exercises, and all three groups had approximately the same number of total oints.

These new findings with direct infusion replicated results from numerous studies that have demonstrated the greater ffectiveness of explicit CT instruction as compared to less explicit CT instruction in college students from a variety of subject reas (Abrami et al., 2008). In addition, they showed that direct infusion was effective in producing significant gains in the ame participants for both argument analysis and critical reading in psychology, findings previously obtained separately ith explicit instruction of critical reading (Bensley & Haynes, 1995) and for argument analysis skills (e.g., Bensley et al.,

010; Nieto & Saiz, 2008; Solon, 2007). The greater gains from the group receiving direct infusion of CT compared to the group receiving explicit MI skill instruc-

ion further suggest that it was specific instruction with CT rules and not thinking about rules for memory improvement hat produced the gains in CT. The MI control group applied general rules for learning and remembering course material or which they also likely applied strategic thinking and metacognitive monitoring. Yet, practice in thinking about how o improve memory did not increase either CT test scores or accuracy in monitoring CT test performance. These results eem inconsistent with extreme forms of the generality hypothesis predicting that practice in thinking about one subject, uch as memory improvement, would generalize to improvement of CT. At the same time, these findings are inconsistent ith the views of those who maintain the specificity of thinking but also advocate immersion in CT instruction (McPeck,

990). Although the findings of this study are relevant to questions about the domain specificity of CT, no conclusion an be drawn until a study systematically compares explicit instruction of each skill on acquisition of both CT and MI kills.

.2. Interpretation of postdiction and other metacognitive results

The results also supported the hypothesis that direct infusion of CT skills would produce greater gains in accuracy of ostdicting CT test performance than the other two kinds of instruction. Initial analyses of metacognitive measures revealed

66 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

that, in general, students’ awareness of their test performance and CT ability was limited with large individual differences in their postdiction accuracy on the pretest. Specifically, students in the lowest quartile showed significantly and substantially poorer calibration in their postdictions than students in the top quartile. After instruction, however, the CT-infused group showed significantly greater gains in the accuracy of their estimates than the other two groups on the total standardized calibration change scores. We found that the lowest performing students on the pretest in the CT-infused group showed the most substantial increase in their postdiction accuracy on the posttest. The two control groups showed almost no gain in calibration accuracy, including those poorer performing students who continued to overestimate their performance. The effect size of the comparison of the CT-infused group to the MI-infused group was large while the comparison to the content knowledge group showed a medium effect size, suggesting a substantial influence of CT direct infusion on the gain in overall postdiction accuracy.

It should be recalled, however, that the differences in the gain in calibration accuracy on the APS for three instructional groups only approached significance with the CT-infused group, showing only marginally significant improvement as com- pared to the content acquisition group (p = .052) and only relatively, not significantly, greater gains in calibration on the CRT. These findings raise questions about the efficacy of direct infusion in improving accuracy in postdicting test performance. Although the CT-infused group showed a medium to strong gain in effect size on the total calibration scores compared to the other two groups, these results should be replicated with other groups receiving direct infusion of CT.

One explanation for the increase in total postdiction accuracy on the CT tests may be that CT direct infusion, by increas- ing argument analysis and critical reading skills, also increased students’ knowledge of what they knew and didn’t know, supporting the “unskilled but unaware” hypothesis of Kruger and Dunning (1999). Although the design of the present study allowed us to show that CT direct infusion improved both argument analysis and metacognitive monitoring accuracy, it remains for future studies to determine the direction of the relationship of these two variables and how to dissociate them. For example future studies might also examine whether varying the explicitness of metacognitive monitoring instruc- tion along with direct infusion of CT skills would affect the effect sizes of gains in both CT skills for argumentation and metacognitive monitoring accuracy.

Another explanation for the CT-infused group’s gains in postdiction accuracy is that some components of the direct infusion approach operate to improve both CT skills and monitoring skill because they are essentially the same. Although direct infusion of CT skills did not also involve direct instruction of monitoring, it did include components of effective monitoring instruction that other studies have found to improve monitoring accuracy. Specifically, practicing with the same type of questions and receiving feedback on them, even though the content of formative and summative test questions differed, may have produced gains in both CT skills and monitoring accuracy. Also, because students were motivated to improve their CT performance as part of their course grade, this likely motivated them to reflect on feedback from practice exercises and assessments and engage in strategic practice to improve their performance (Hacker et al., 2000).

Our findings have practical implications for instructors and researchers interested in improving CT. One implication is that direct infusion can provide a useful structure for explicitly and systematically infusing CT rules and principles into reg- ular course instruction to promote acquisition of CT knowledge and skills. A second, based on the findings with our control groups, is that instructors should not assume that studying a challenging subject like cognitive psychology or even focus- ing instruction on improving another cognitive skill requiring thinking such as memory improvement will, in themselves, improve specific CT skills such as argument analysis and critical reading. A third is that CT direct infusion may both improve CT test performance related to disciplinary thinking and improve metacognitive monitoring accuracy without additional instruction beyond what students receive through direct infusion.

Our results, although showing promise, should be interpreted cautiously. Although the CT-infused group demonstrated significant gains in the ability to answer questions about some everyday psychological issues, these results do not demon- strate far transfer of CT, an important goal of CT instruction (Halpern, 1998) such as when students have acquired a rule for reasoning in the classroom and are later able to apply it correctly in an everyday context (e.g., Fong & Nisbett, 1991). Although this kind of transfer has been demonstrated, it probably occurs infrequently (Barnett & Ceci, 2002; Detterman, 1993) and may be especially difficult to produce with direct infusion because it is a deductive approach. Direct infusion promotes efficient acquisition through explicit instruction of CT principles and close alignment between training exercises and assessments, but may not promote far transfer.

The success of the direct infusion approach has already led our department to add a new standalone, CT course to our curriculum that uses direct infusion to teach students to think critically about a variety of psychological questions (Bensley & Murtagh, 2012). This new CT course, required for all majors early in the curriculum, may also help to solve an ethical problem created when in the present study only two classes received the successful CT intervention while three other control group classes did not. We continue to assess the CT skills of seniors who took the CT course for maintenance and transfer of the skills and knowledge they acquired earlier.

Despite potential limitations in promoting transfer, the direct infusion approach may actually facilitate the study of transfer of the use of CT rules and principles because it offers a systematic approach to incorporating the use of rules and principles into instructional and assessment discourse. Future studies on direct infusion should test for transfer of skills by

systematically varying the content and context of instruction and assessment. Instructors interested in taking a scientific approach to teaching and assessing acquisition of CT may find that direct infusion is, not only effective in improving skills for argumentation, critical reading, and metacognitive monitoring but also for studying the conditions that promote transfer of CT (Bensley, 2011).


a w s i m f i i t

s c m p C v i g


b L a M













D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68 67

. Conclusion

The results showed that direct infusion of CT was effective in producing significantly greater gains than control groups in rgument analysis and critical reading test performance. The study showed that a particular kind of explicit CT instruction as effective under both controlled and ecologically valid instructional conditions. Moreover, the results extended previous

tudies of CT skill acquisition by showing that the group getting direct infusion of CT produced greater gains in calibration n postdicting their CT test scores than the control groups. A plausible explanation for this improvement in both CT and

etacognitive monitoring was that the CT direct infusion approach shares components of successful instruction commonly ound to facilitate acquisition of various kinds of knowledge and skill including those shown to improve monitoring. These nclude explicit practice, feedback, and proper motivation. The results suggest that instructors may find explicit CT instruction n the form of CT direct infusion useful for improving argument analysis, critical reading, and metacognitive monitoring of est performance.

Finally, our results further support the need for greater attention to the operational definition of CT in scientifically tudying it (Williams, 1999). In particular, they illustrate the value of approaching the study of CT as a multi-dimensional onstruct with measures that assess the contribution of knowledge and skills for thinking critically, CT dispositions, and etacognition as well individual differences in academic ability and achievement variables that may also be related to CT

erformance as opposed to testing a single outcome variable (Bensley, 2011). We found all of these variables to be related to T performance, and measuring them allowed for the possibility of controlling them when necessary to increase the internal alidity of our classroom quasi-experiment. Moreover, we found that operationally defining CT instruction as the explicit nstruction of rules directly infused into the discourse of the course’s subject matter was, not only effective in producing ains in CT performance but also in making the scientific study of explicit instruction more tractable.


We thank Selena Smith of the Frostburg State University Office of Institutional Research for helping us obtain academic ackground information on participants and the student participants who consented to let us use their data. Thanks to auren Powell for help with data entry and Stephanie Kuehne and Crystal Rainey for their help with the manuscript. Thanks lso to our Frostburg State University colleagues, Paul Bernhardt for helpful discussions about statistical matters and to Chris asciocchi, and two anonymous reviewers and the editor of this journal for helpful suggestions for revising our manuscript.


brami, P. C., Bernard, R. M., Borokhovski, E., Wade, A., Surkes, M. A., Tamim, R., et al. (2008). Instructional interventions affecting critical thinking skills and dispositions: A stage 1 meta-analysis. Review of Educational Research, 4, 1102–1134.

ngeli, C., & Valanides, N. (2009). Instructional effects on critical thinking performance on ill-defined issues. Learning and Instruction, 19, 322–334. ngelo, T. A. (1995). Classroom assessment for critical thinking. Teaching of Psychology, 22, 6–7. ensley, D. A., & Murtagh, M. P. (2012). Guidelines for a scientific approach to critical thinking assessment. Teaching of Psychology, 39, 5–16. arnett, S. M., & Ceci, S. J. (2002). When and where do we apply what we learn? A taxonomy for far transfer. Psychological Bulletin, 128, 612–637. ensley, D. A. (1998). Critical thinking in psychology: A unified skills approach. Pacific Grove, CA: Brooks/Cole. ensley, D. A. (2009a). Thinking about cognition: An Applied approach. (unpublished manuscript). ensley, D. A. (2009b). The critical thinking in psychology test battery (unpublished manuscript). ensley, D. A. (2011). Rules for reasoning revisited: Toward a scientific conception of critical thinking. In C. P. Horvath, & J. M. Forte (Eds.), Critical thinking

(pp. 1–36). Hauppauge, NY: Nova Science Publishers. ensley, D. A., Crowe, D. S., Bernhardt, P., Buckner, C., & Allman, A. L. (2010). Teaching and assessing critical thinking skills for argument analysis in

psychology. Teaching of Psychology, 37, 91–96. ensley, D. A., & Haynes, C. (1995). The acquisition of general purpose strategic knowledge for argumentation. Teaching of Psychology, 22, 41–45. eyer, B. K. (1997). Improving student thinking: A comprehensive approach. Boston: Allyn & Bacon. lack, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5, 7–73. acioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42, 116–131. heng, P. W., Holyoak, K. J., Nisbett, R. E., & Oliver, L. M. (1986). Pragmatic versus syntactic approaches to training deductive reasoning. Cognitive Psychology,

18, 293–328. lifford, J. S., Boufal, M. M., & Kurtz, J. E. (2004). Personality traits and critical thinking skills in college students: Empirical tests of a two-factor theory.

Assessment, 11, 169–176. ohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. etterman, D. K. (1993). The case for the prosecution: Transfer as an epiphenomenon. In D. K. Detterman, & R. J. Sternberg (Eds.), Transfer on trial: Intelligence,

cognition, and instruction (pp. 1–24). Norwood, NJ: Ablex. unning, D., Johnson, K., Ehrlinger, J., & Kruger, J. (2003). Why people fail to recognize their own incompetence. Current Directions in Psychological Science,

12, 83–87. nnis, R. H. (1987). A taxonomy of critical thinking dispositions and abilities. In J. B. Baron, & R. F. Sternberg (Eds.), Teaching thinking skills: Theory and

practice (pp. 9–26). New York: Freeman. nnis, R. H. (1989). Critical thinking and subject specificity: Clarification and needed research. Educational Researcher, 18, 4–10. nnis, R. H. (1992). Clarifications and directions for research. In S. P. Norris (Ed.), The generalizability of critical thinking: Multiple perspectives on an educational

ideal (pp. 17–37). New York: Teachers College Press. acione, P. A. (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction: Research findings and

recommendations (The Delphi Report). Prepared for the Committee on Pre-College Philosophy of the American Philosophical Association. ERIC ED, 315,

423.lavell, J. (1979). Metacognition and cognitive monitoring. American Psychologist, 34, 90–911. ong, G. T., & Nisbett, R. E. (1991). Immediate and delayed transfer of training effects in statistical reasoning. Journal of Experimental Psychology, 120, 34–45. acker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test prediction and performance in a classroom context. Journal of Educational Psychology, 92,


68 D. Alan Bensley, R.A. Spero / Thinking Skills and Creativity 12 (2014) 55–68

Hacker, D. J., Bol, L., & Keener, M. C. (2008). Metacognition in education: A focus on calibration. In J. Dunlosky, & R. A. Bjork (Eds.), Handbook of metacognition and memory (pp. 429–455). New York: Psychology Press.

Halpern, D. F. (1998). Teaching critical thinking for transfer across domains: Dispositions, skills, structure training, and metacognitive monitoring. American Psychologist, 53, 449–455.

Halpern. (2003). Thought and knowledge. An introduction to critical thinking (4th ed.). Mahweh, NJ: Erlbaum. Halpern, D. F. (2007). The nature and nurture of critical thinking. In R. Sternberg, H. Roediger, & D. Halpern (Eds.), Critical thinking in psychology (pp. 1–14).

Cambridge, UK: Cambridge University Press. Huck, S. W., & McLean, R. A. (1975). Using a repeated measures ANOVA to analyze the data from a pretest-postest design: A potentially confusing task.

Psychological Bulletin, 82, 511–518. Huff, J. D., & Nietfeld, J. L. (2009). Using strategy instruction and confidence judgments to improve metacognitive monitoring. Metacognition and Learning,

4, 161–176. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments.

Journal of Personality and Social Psychology, 77, 1127–1134. Ku, K. Y., & Ho, I. T. (2010). Metacognitive strategies that enhance critical thinking. Metacognition and Learning, 5, 251–267. Lawson, T. J. (1999). Assessing psychological critical thinking as an outcome for psychology majors. Teaching of Psychology, 26, 207–209. Lockhart, R. L. (1992). The role of conceptual access in the transfer of thinking skills. In S. P. Norris (Ed.), The generalizability of critical thinking: Multiple

perspectives on an educational ideal (pp. 54–65). New York: Teachers College Press. Magno, C. (2010). The role of metacognitive skills in developing critical thinking. Metacognition and Learning, 5, 137–156. Marin, L. M., & Halpern, D. F. (2011). Pedagogy for developing critical thinking in adolescents: Explicit instruction produces greatest gains. Thinking Skills

and Creativity, 6, 1–13. Mayer, R. E. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist,

59, 14–19. McGuinness, C. (1990). Talking about thinking: The role of metacognition in teaching thinking. In K. J. Gilhooly, M. T. Keane, R. H. Logie, & G. Erdos (Eds.),

Lines of thinking (Vol. 2) (pp. 301–312). New York: John Wiley. McPeck, J. E. (1990). Teaching critical thinking: Dialogue and dialectic. New York: Routledge. Nietfeld, J. L., Cao, L., & Osborne, J. W. (2006). The effect of distributed monitoring exercises on monitoring accuracy. Metacognition and Learning, 2, 159–179. Nieto, A. M., & Saiz, C. (2008). Evaluation of Halpern’s structural component for improving critical thinking. The Spanish Journal of Psychology, 11(1), 266–274. Pacini, R., & Epstein, S. (1999). The relation of rational and experiential information processing styles to personality, basic beliefs, and the ratio-bias

phenomenon. Journal of Personality and Social Psychology, 76, 972–987. Paul, R. (1993). Critical thinking: Fundamental to education for a free society. Santa Rosa, CA: Foundation for Critical Thinking. Penningroth, S. L., Despain, L. H., & Gray, M. J. (2007). A course designed to improve psychological critical thinking. Teaching of Psychology, 34, 153–157. Schraw, G., & Dennison, R. S. (1994). Assessing metacognitive awareness. Contemporary Educational Psychology, 19, 460–475. Solon, T. (2007). Generic critical thinking infusion and course content learning in introductory psychology. Journal of Instructional Psychology, 34, 972–987. Stanovich, K. E. (2004). How to think straight about psychology (7th ed.). Boston: Allyn & Bacon. Stone, N. J. (2000). Exploring the relationship between calibration and self-regulated learning. Educational Psychology Review, 12, 437–475. Swartz, R. J. (1989). Making good thinking stick: The role of metacognition, extended practice, and teacher modeling in the teaching of thinking. In D. M.

Topping, D. C. Crowell, & V. N. Kobayashi (Eds.), Thinking across cultures (pp. 417–436). Hillsdale, NJ: Erlbaum. Tarricone, P. (2011). The taxonomy of metacognition. New York: Psychology Press. Taube, K. (1997). Critical thinking ability and disposition as factors of performance on a written critical thinking test. Journal of General Education, 46,

129–164. Walberg, H. J. (2006). Improving educational productivity: A review of extant research. In R. F. Subotnik, & H. J. Walberg (Eds.), The scientific basis of

educational productivity (pp. 103–159). Greenwich, CT: Information Age. Wason, P. W. (1966). Reasoning. In B. M. Foss (Ed.), New horizons in psychology (pp. 135–151). Baltimore: Penguin Books. Williams, R. L. (1999). Operational definitions and assessment of higher-order cognitive constructs. Educational Psychology Review, 11, 411–427.

  • Improving critical thinking skills and metacognitive monitoring through direct infusion
    • 1 Introduction
      • 1.1 Review of research on explicit instruction
      • 1.2 The direct infusion approach to CT instruction
      • 1.3 Limitations of previous studies
      • 1.4 Review of metacognition research related to CT
      • 1.5 Overview of the study
    • 2 Method
      • 2.1 Participants
      • 2.2 Instruction
      • 2.3 Measures
      • 2.4 Testing procedure
    • 3 Results
      • 3.1 Critical thinking test performance
      • 3.2 Metacognitive analyses
        • 3.2.1 Analyses of metacognition before instruction
        • 3.2.2 Analyses of metacognitive knowledge and postdiction accuracy after instruction
      • 3.3 Individual differences and validity
    • 4 Discussion
      • 4.1 Interpretation of CT test performance
      • 4.2 Interpretation of postdiction and other metacognitive results
    • 5 Conclusion
    • Acknowledgements
    • References