Assessment of Student Skills for Critiquing Published Primary Scientific Literature Using a Primary Trait Analysis Scale

MANUEL F. VARELA,*1 MARVIN M. F. LUTNESKY,1 AND MARCY P. OSGOOD2
Biology Department, Eastern New Mexico University, Portales, New Mexico 88130,1 and Department of Biochemistry and Molecular Biology, University of New Mexico School of Medicine, Albuquerque, New Mexico 871312

Instructor evaluation of progressive student skills in the analysis of primary literature is critical for the development of these skills in young scientists. Students in a senior or graduate-level one-semester course in Immunology at a Masters-level comprehensive university were assessed for abilities (primary traits) to recognize and evaluate the following elements of a scientific paper: Hypothesis and Rationale, Significance, Methods, Results, Critical Thinking and Analysis, and Conclusions. We tested the hypotheses that average recognition scores vary among elements and that scores change with time differently by trait. Recognition scores (scaled 1 to 5), and differences in scores were analyzed using analysis of variance (ANOVA), regression, and analysis of covariance (ANCOVA) (n = 10 papers over 103 days). By multiple comparisons testing, we found that recognition scores statistically fell into two groups: high scores (for Hypothesis and Rationale, Significance, Methods, and Conclusions) and low scores (for Results and Critical Thinking and Analysis). Recognition scores only significantly changed with time (increased) for Hypothesis and Rationale and Results. ANCOVA showed that changes in recognition scores for these elements were not significantly different in slope (F1,16 = 0.254, P = 0.621) but the Results trait was significantly lower in elevation (F1,17 = 12.456, P = 0.003). Thus, students improved with similar trajectories, but starting and ending with lower Results scores. We conclude that students have greatest difficulty evaluating Results and critically evaluating scientific validity. Our findings show extant student skills, and the significant increase in some traits shows learning. This study demonstrates that students start with variable recognition skills and that student skills may be learned at differential rates. Faculty can use these findings or the primary trait analysis scoring scale to focus on specific paper elements for which they desire to improve recognition.

 

Development of critical thinking skills is universally acknowledged as a fundamental goal of higher education (4, 5, 7, 9–11). Critical thinking is briefly defined as a self-aware process that uses reasoned consideration of evidence, methods, and discipline-appropriate criteria to interpret, analyze, and evaluate knowledge (13). More specifically, in natural science disciplines, critical thinking means to: (i) apply and use known scientific facts and principles to solve a problem, and (ii) understand the process (method) by which science tests and applies scientific knowledge (facts and principles) and to use this process (4). In order to develop these abilities, our students need to be inquisitive, open-minded, and flexible in considering alternatives (5). In terms of the classic Bloom’s Taxonomy of Educational Objectives model (3), critical thinking requires the use of all levels of cognitive demand, from knowledge through evaluation, with an emphasis on the higher levels of analysis, synthesis, and evaluation. In terms of a more recent taxonomy of educational objectives (1), development of critical thinking skills likely requires the application of the cognitive process dimensions upon what they define as four knowledge dimensions: factual knowledge, which includes recalling and understanding technical terminology and details; conceptual knowledge, which encompasses classifying, summarizing, and comparing theories and models; procedural knowledge, which includes knowledge of techniques and the skills to use them, and the ability to analyze when to apply them; and metacognitive knowledge, which includes the awareness of one’s own ability to evaluate the cognitive demands of a particular task in context. Clearly, encouraging the development of such skills in students is no small task.

The first job of science faculty interested in fostering critical thinking skills in their students is to help them learn to use the criteria exploited by our disciplines in deciding which ideas to accept or to select among alternatives (18). Our goal should be to help develop students into scientists who can review conflicting perspectives and make decisions that are based on evidence and analysis (21). To do this, students must first be able to recognize, discriminate between, and critically evaluate the “pieces” of scientific inquiry: hypothesis, data, the techniques used to gather data, and the conclusions drawn from the results. It can be difficult to measure the various recognition and evaluative skills that combine to make a “critical thinker.” It is thus difficult to determine if our students are improving over time in these skills or to assess any of our pedagogical attempts to foster such improvement.

It is poorly understood what skills students possess for critically analyzing published literature and to what extent such skills are useful in their analyses. This has hindered university faculty development of effective teaching techniques for critical thinking skills. We hypothesize that practice over time increases student abilities in the recognition or evaluation of certain elements of published papers. The rationale for this hypothesis is that if assessment tools were available that made it possible to measure student abilities in the recognition or evaluation of certain elements of published papers, it would then be possible for university faculty to develop new pedagogical tools to improve critical thinking skills that students use while reviewing scientific literature. We developed and tested an objective and easy-to-use primary trait analysis scale based on that by Walvood and Johnson-Anderson (24), designed to assess the abilities (primary traits) of students to recognize and critically evaluate pieces of a standard scientific paper. Using this method, we were able to measure changes over time in students’ recognition and evaluative abilities.

METHODS

Students participating in the study (enrollment = 25 students) were junior- or senior-level undergraduate Biology majors with plans to continue in professional or graduate school (plus one Masters-level graduate student) and were enrolled in a one-semester course in Immunology, for which a fundamental course in Microbiology (sophomore-level) was a prerequisite. No attempt was made in the study to differentiate between the levels of students; the goal of the study was to assess the change in the overall class abilities and not in individual student abilities. The institution where the study was conducted is a Masters-level comprehensive university.

Over the course of one semester (15 weeks of instruction), students attended 34 lectures and 11 discussions. The “recognition analysis” (primary trait) scoring scale for grading of students’ critiques is shown in Table 1. Students were required to read at least 10 (out of 11 total) recently published papers in areas related to Immunology and evaluate these articles beforehand by providing written answers to the six questions listed in Table 2. Students were given detailed guidelines for their summaries (shown in Table 2). It was important to require that these summaries of answers to specific questions be turned in to the instructor ahead of discussion time to ensure that all students had made an honest effort to focus on and understand the various parts of the paper. Other studies have used similar strategies to improve the level of student engagement in discussions of primary scientific literature (8, 9, 11, 12).

 

TABLE 1. Primary trait analysis scale Trait Scale

 

TABLE 2. Elements of the scientific paper and the criteria for student homework Published paper element (designation) Criteria and instructions for students Hypothesis and Rationale (HR) What are the hypotheses, and what are the logically formulated rationales for each? Explain these in terms of the biology.

 

The choice of the paper topic and the order of the evaluated papers were determined by the topic order of the textbook used in the course. If anything, papers became more complex during the semester, not less, thereby reducing the chance that any learning observed would be due to the scheduling of the subjects. Topics discussed in the papers were introduced beforehand in the course lectures and accompanied by assigned readings from the textbook. The choice of appropriate articles for the students was critical to achievement of the goals of the course. Muench (14) has elaborated on the importance of the selection of papers for use in undergraduate classes. We chose ours based upon both content and process aims. Content was matched to basic Immunology concepts that were introduced in lectures and auxiliary assigned readings. The papers needed to be challenging but not too difficult, up-to-date and reflective of current questions in the field. In addition, we also wanted to expose the students to a variety of papers for the broader reasons that are mentioned by many other educators who utilize the primary literature in their courses (6, 7, 8, 9, 11): to provide insight into the process and ways of thinking exemplified by scientific research, to allow practice in technical reading skills, to introduce vocabulary, to increase content understanding, and, we hoped, to hone the ability to analyze and evaluate data; in other words, to improve critical thinking skills. The published papers critiqued by the students in the present study are shown in Table 3. However, each time the course has been taught, the set of papers used has been different in order to keep up-to-date with the current immunological literature.

 

TABLE 3. Papers used in this study for analysis by students

 

Student ability to recognize or evaluate the basic elements of the published paper was assessed using the recognition scoring scale (Table 1) as a rubric. Students’ answers to the assigned questions (Table 2) were scored on a scale from 1 (worst) to 5 (best). This primary trait analysis (“recognition”) scale, developed according to criteria and standards for grading validity established by Walvood and Johnson-Anderson (24), was used as a simple index of intensity (13) using the criteria in Table 1 to measure intensities of understanding. Average class scores for recognition and analysis (traits) of each element in the papers, denoted “recognition scores,” were calculated for each assigned paper.

All students were required to participate in a classroom discussion of each paper and to hand in their written answers for assessment beforehand. Discussion centered on the questions in Table 2 and lasted one lecture period (50 minutes). Since the class size was relatively small (approximately 25 students), the class worked together as a group during oral discussions.

Grading bias was minimized by introducing the order of topics of the papers beforehand in lectures and required readings in the textbook and by one grader consistently and strictly adhering to the primary trait analysis scale in Table 1. Thus, students started the papers with a similar content knowledge base, and grading objectivity was maximized by simply following the rubric. Student identity was unknown to the grader during assessment. Students were given detailed guidelines (Table 2) for paper evaluation but not the specific primary trait analysis scale. The data collection protocol and research reported in this article were reviewed and approved by the institution’s institutional review board.

D’Agostino’s tests (25) were performed on each data set to test for significant differences from normality. No significant differences from normality were found (P > 0.05, all tests) so parametric analyses were employed. Single-factor ANOVA and Tukey multiple comparison tests (26) were used to test for significant differences in primary trait recognition scores. Data used in these analyses were grand average values obtained from the analysis of 10 papers (i.e., n = 10 for analyses, but average values were obtained from 17.9 ± 3.3, Χ ± SD students for each paper). The data from one paper were omitted from analyses due to an irretrievable error in data recording. Thus, only 10 of the 11 papers were included in the study. Simple linear regression (26) and ANCOVA (22) were used to test for changes in recognition and evaluation scores over time and differences in how scores changed over time, respectively.

RESULTS

The primary trait recognition scores (grand means and standard errors) for all evaluated papers for the entire semester are shown in Fig. 1. We found that students showed significant differences in recognition and evaluation of various elements of published papers (singlefactor ANOVA, F5,54 = 8.623, P < 0.001). Tukey multiple comparison tests (least significant, P < 0.04) showed recognition scores fell into two groups statistically: high scores for traits Hypothesis and Rationale, Significance, Methods, and Conclusions, and low scores for traits Results and Critical Thinking and Analysis. Recognition and evaluation scores by students for all elements of the published paper showed an increase over the semester, but the change was significant for Hypothesis and Rationale and Results only (Fig. 2).

 


FIG. 1. Average score for the six elements evaluated in published papers. Grand average scores shown for all students for all papers during the whole of the semester. There was a significant difference among traits (single-factor ANOVA, F5,54 = 8.623, P < 0.001). Different letters (a versus b) indicate significant differences among means using Tukey multiple comparison tests (least significant, P < 0.04). Sample size (n) was equal to 10 papers for each bar; error bars equal standard error of the mean.

 


FIG. 2. Scores for student recognition and evaluation of components in published literature. The students’ scores are indicated for recognition and critical evaluation of Hypothesis and Rationale (HR), Significance of the Biological Sciences (SBS), Methods, Results, Conclusions, and Critical Thinking Skills and Analysis of validity of the published primary literature in the course as a function of time. Regression lines are shown only for significant relationships (P < 0.05). Sample size was n = 10.

 

 

A comparison of the change over time of recognition scores for Hypothesis and Rationale versus Results is shown in Fig. 3. The ANCOVA showed that the increases in recognition scores for these traits were not significantly different in slope (F1,16 = 0.254, P = 0.621) but Results was significantly lower in elevation (F1,17 = 12.456, P = 0.003). Recognition scores for Results were lower to begin with, compared to that for Hypothesis and Rationale, but the students improved in both traits with the same rate. Students improved in their ability to recognize and analyze the paper elements with a similar trajectory (Score = 3.487 + 0.012[d], and 2.518 + 0.016[d], Hypothesis and Rationale and Results, respectively), but starting and ending with lower scores for Results.


FIG. 3. Scores for student recognition and evaluation of hypothesis and rationale versus results. Student scores are indicated for ability to find and critically evaluate HR (•) and R ({) as a function of time. The regression lines are not significantly different in slope, but they are in elevation (see text), thus students learned with the same trajectory, but had different initial abilities.

 

DISCUSSION

In this study we attempted to assess the ability of students to recognize and evaluate the elements of a standard scientific paper. We used our primary trait analysis scoring scale as a rubric to grade their ability (i) to clearly and logically identify hypotheses and their rationales in the published papers (HR), (ii) to thoughtfully and logically convey the significance of the biological sciences research areas in the paper (SBS), (iii) to describe fully the experimental methods used by the authors of the papers and evaluate whether the hypotheses were directly tested by the methods used (M), (iv) to clearly, concisely and completely describe the results (R), (v) to logically and clearly identify the conclusions (or deduce alternative conclusions) (C), and (vi) to critically (and thoughtfully) evaluate the experimental design (i.e., proper controls, direct testing of hypothesis, etc.) by providing an analysis that was independent of authors’ evaluation (i.e., an original student-derived analysis) (CTA).

We tested the hypothesis that average recognition scores varied among the distinctive elements of published papers. This implies that some elements are easier to recognize and critically evaluate than others. We also tested the hypothesis that the recognition scores change as a function of time differently by paper element. This implies that practice could improve certain recognition and evaluative skills. We found students’ scores to be the lowest for their recognition and evaluation of R (3.3 ± 0.22) and CTA (3.1 ± 0.15) in the assigned papers (Fig. 1). Other investigators have recognized this student weakness in the interpretation of the Results may present difficulties to the students

in their analyses. Further studies involving a systematic application of papers that are similar in scope, field, style, and degree of difficulty would be required to test this contention. However, the improvement over time in recognition scores for R was significant (Fig. 3), indicating that practice in this activity improves ability of students in general to evaluate published data. We think that the choice of papers used in our study did not influence the positive trend observed here because they were chosen based on content, and students were provided with introductory lectures and assigned textbook readings prior to their paper evaluations.

Though R scores improved over time, the other trait with low initial recognition scores, CTA, did not improve with time (Fig. 2). We suspect that since critical thinking skills require the use and integration of the highest levels of both knowledge and cognitive process dimensions (1, 3), more practice than can be provided in a single semester is necessary to see improvement, if indeed practice can develop such abilities. In fact, Parslow (17) wonders if critical thinkers are simply born that way, or if they can actually be created through our attempts at teaching the skill sets. We appreciate that grading bias is a complication in studies showing increases in critical thinking skills (10). However, again, we observed no such increases in CTA scores. Standardized tests can provide quantitative measures of critical thinking ability and could be used as a pre- and poststudy metric for determination of alterations in broadly defined critical thinking skills. Other studies have used such tools to determine changes in student critical thinking scores as a result of participation in a course that used (among other learning strategies) the analysis of primary scientific literature (12).

Our students in general had extant recognition skills in the areas of HR, SBS, M, and C (Fig. 1). It was not determined whether the observed extant skills were developed in previous classroom experiences or if recognition of these elements in scientific papers is simply intrinsically easier. A fundamental course in Microbiology was a prerequisite for the Immunology course; in that class, a critique of one published paper was required. It is possible that such previous experience was helpful. Further research would be necessary to distinguish between these two, or other, possibilities. In any case, the significant increase in HR recognition scores with time (Fig. 3) suggests that there is always room for improvement, even in students with demonstrated academic abilities. In addition, there appeared to be a correlation between improvement over time in the recognition scores for HR and R, suggesting a possible link between recognition and evaluation of the two elements; i.e., if students understand HR, they have a better grasp of R, or vice-versa.

The recognition scores for SBS, M, and C were relatively high at the beginning of the study but did not significantly improve as a function of time. The use of readings in the primary literature is common in science classes at all levels (6, 7, 8, 9, 11). Many instructors hope that such exposure will improve the ability of their students to perform as critical thinkers (the loftiest goal). Students become more familiar with difficult terminology and complex methodology as they read published scientific literature. These would help the students to understand what scientists do and how they do it (19). We certainly hope that they will learn to use the descriptions of methods to design similar experiments and that they will understand the necessity for controls and statistical analysis. We hope that they will learn to recognize clear and lucid writing and emulate it. Beyond such general aspirations for this pedagogical technique, an individual instructor can determine what literature-reading skills are most important for his or her class. Our primary trait analysis (“recognition”) scale could be generally helpful to instructors in the identification of skills in need of improvement in student utilization of the primary literature.

Students need to understand the role of scientific literature in the practice of investigative science. Conclusions of the executive summary of the National Research Council’s report on the future of biology education suggest that faculty emphasize teaching approaches that strengthen the abilities of students to communicate scientific knowledge and to design quantitative experiments (15). A synopsis of this report (23) further suggests that changes in biology curricula should include emphasis in teaching writing, reading, and critical thinking skills that result in clear communication. The use of the primary literature at all levels can promote these skills.

In conclusion, we have developed a simple-to-use, fairly objective evaluative scale for assessing recognition and understanding of the basic elements of a standard scientific paper. This scale may be useful in the identification of both extant and missing recognition and evaluative skills in university students. In addition, it may provide a tool for faculty recognition of student skills, including the important skill of critical thinking in science. Furthermore, such analyses may help faculty identify young scientists for possible recruitment to or recommendation for graduate and professional schools.

REFERENCES

  1. Anderson, L. W., and D. R. Krathwohl (ed.). 2001. A taxonomy for learning, teaching, and assessing: a revision of Bloom’s educational objectives. Allyn & Bacon, Boston.
  2. Antony-Cahill, S. 2001. Using the protein folding literature to teach biophysical chemistry to undergraduates. Biochem. Mol. Biol. Educ. 29:45–49.
  3. Bloom, B. 1956. Taxonomy of educational objectives: the classification of educational goals. Handbook I. Cognitive domain. Longmans, Green and Co., London.
  4. Borthick, A. F., H. Dangel, and C. Springer. 2003. Pedagogy and assessment that support critical thinking. In AAHE learning to change conference 2003, communities of practice, role, and identity. American Association for Higher Education, Washington, D.C.
  5. Fiancarlo, C. A., and P. A. Facion. 1997. A look across four years at the disposition toward critical thinking among undergraduate students. J. Gen. Educ. 50:29–55.
  6. Fortner, R. W. 1999. Using cooperative learning to introduce undergraduates to professional literature. J. Coll. Sci. Teaching 28:261–265.
  7. Herman, C. 1999. Reading the primary literature in the jargon-intensive field of molecular genetics. J. Coll. Sci. Teaching 28:252–253.
  8. Houde, A. 2000. Student symposia on primary research articles. J. Coll. Sci. Teaching 30:184–187.
  9. Janick-Buckner, D. 1997. Getting undergraduates to critically read and discuss the primary literature. J. Coll. Sci. Teaching 27:29–32.
  10. Kitchen, E., J. D. Bell, S. Reeve, R. R. Sudweeks, and W.S. Bradshaw. 2003. Teaching cell biology in the large-enrollment classroom: methods to promote analytical thinking and assessment of their effectiveness. Cell Biol. Educ. 2:180– 194.
  11. Levine, E. 2001. Reading your way to scientific literacy. J. Coll. Sci. Teaching 31:122–125.
  12. Mangurian, L., S. Feldman, J. Clements, and L. Boucher. 2001. Analyzing and communicating scientific information. J. Coll. Sci. Teaching 30:440–445.
  13. Martin, P., and P. Bateson. 1986. Measuring behaviour: an introductory guide. Cambridge University Press, Cambridge, England.
  14. Muench, S. B. 2000. Choosing primary literature in biology to achieve specific educational goals. J. Coll. Sci. Teaching 29:255–260.
  15. National Research Council. 2003. Bio2010: transforming undergraduate education for future research biologists. The National Academies Press, Washington, D.C.
  16. Nelson, C. E. 1999. On the persistence of unicorns: the tradeoff between content and critical thinking revisited. In B. A. Pescosolido and R. Aminzade (ed.), The social worlds of higher education: handbook for teaching in a new century. Pine Forge Press, Thousand Oaks, Calif.
  17. Parslow, G. R. 2002. Commentary: critical thinking: can we teach it? Should we teach it? Biochem. Mol. Biol. Educ. 30:65.
  18. Perry, W. G. 1970. Forms of intellectual and ethical development in the college years: a scheme. Holt, Rinehart & Winston, New York.
  19. Smith, C. N. 2002. Using the cell signaling literature to teach molecular biology to undergraduates. Biochem. Mol. Biol. Educ. 30:380–383.
  20. Smith, G. 2001. Guided literature explorations. J. Coll. Sci. Teaching 30:465–469.
  21. Stokstad, E. 2001. Trends in undergraduate education. Science 293:1608–1610.
  22. Systat. 1996. SYSTAT for windows, statistics, version 6 ed. Systat, Evanston, Ill.
  23. Tabor, D., and E. Jakobsson. 2004. The Bio2010 revolution: what it is, why NIGMS is helping to lead it, and why you should join it. NIGMS Minority Programs Update Spring 2004:3.
  24. Walvood, B. E., and V. Johnson-Anderson. 1998. Effective grading: a tool for learning and assessment. Jossey-Bass Publishers, San Francisco.
  25. Zar, J. H. 1974. Biostatistical analysis. Prentice Hall, Englewood Cliffs, N.J.
  26. Zar, J. H. 1999. Biostatistical analysis, 4th ed. Prentice Hall, Upper Saddle River, N.J.

___________

*Corresponding author. Mailing address: Department of Biology, Station 33, Eastern New Mexico University, Portales, NM 88130. Phone: (505) 562-2464. Fax: (505) 562-2192. E-mail: Manuel.Varela@enmu.edu.

(Return to Top)


DOI: 10.1128/jmbe.v6.80
MICROBIOLOGY EDUCATION, May 2005 Vol. 6
Copyright © 2005, American Society for Microbiology. All Rights Reserved.



JMBE
ISSN: 1935-7885