Assessment is an integral part of instruction, as it determines whether or not the goals of education are being met. Assessment affects decisions about grades, placement, advancement, instructional needs, curriculum, and, in some cases, funding (“Edutopia Staff”, 2008). Educators must always have a clear understanding of the reason for assessment, what is being assessed, the criteria for success, the method by which assessment is made, and knowledge of the school community’s efficacy of the program (McTighe and Wiggins, 2005). This paper will review various forms of assessment, including the definition of formative and summative assessments and a variety of assessment strategies used within each method. Also discussed are criterion referenced and norm referenced assessment and their strengths and weaknesses. Last addressed will be the holistic role of assessment and its relation within IB’s PYP program.

FORMATIVE AND SUMMATIVE ASSESSMENTS

Summative assessments are used to report out level of student competence or program effectiveness. Summative assessments are usually shown in the form of a symbol, letter grade or number, or comparison to a standard. Results could be used, for example, to determine how many students are and are not meeting standards in a certain subject for purpose of accountability. (Chappius, 2009, pg. 5-6). Examples of summative assessments include state or district benchmark testing to determine level of achievement on state content standards and to determine program effectiveness (Chappius, 2009, pg. 8).

Formative assessments are designed to meet students’ information needs to maximize both motivation and achievement by involving students from the start with their own learning (Chappius, 2009, pg.11). This assessment method is carried out during the instructional process for the purpose of improving teaching or learning, and is immediately used to make adjustments to form new learning (Chappius, 2009, pg. 4). Formative assessments give regular and frequent feedback to students to encourage engagement in thoughtful reflection, self-assessment, and enables students to recognize the criteria for success (“IB Organization”, 2009, pg. 45).

It is important to note that it is not the instrument which is formative or summative, but the use of information gathered to adjust teaching and learning which defines the assessment method being used (Chappius, 2009, pg.4-5). In classrooms we assess formally through assignments, quizzes, performances, projects and surveys; or informally through questioning and dialogue, observing, and anecdotal note taking. If the information from any of these assessments, whether formal or informal, is used to communicate achievement status to others (typically a symbol or letter grade) or to compare students to a standard, it is summative. In some cases, an assessment that was is to be summative could be formative as well. On a district benchmark, results are used summatively to determine program effectiveness. If a group of students perform lower than their peers, results could be used in a formative form to identify program needs and plan targeted interventions (Chappius, 2009, pg. 5). Likewise, weekly lab reports used to adjust instructional methods and provide student feedback are considered formative, while weekly lab reports graded by level of mastery is considered summative.

CRITERION AND NORM REFERENCED ASSESSMENTS

Norm-referenced assessments are created by academic and curriculum experts to identify major skills and bodies of knowledge that students are expected to know. The content is selected according to how well it ranks students from high-achievers to low-achievers, or their relative-status. These questions are first distributed to select students as the representative group, or the normative group (“Association of Test Publishers”, 2014). After the normative group completes the test, the same tests are given to the general student body. The results are compared to the normative group and rank the order of students across a continuum of achievement from high-achievers to low-achievers. Test-takers cannot “fail” norm referenced assessment, as each test taker receives a score that compares the individual to others that have taken the test. At a classroom level, norm referenced assessments allow teachers to assess their students’ strengths, weaknesses, and learning attributes. They allow teachers to identify students who have similar learning needs, students who are eligible for special education, or students that should be placed in gifted programs (Bond, 1996). Any form of assessment can be considered norm-referenced, depending on the content selected and how the results are used. For example, a student must take a multiple-choice test and an on-the-road driving test to obtain a driver’s license. On a norm-referenced assessment driving test, test-takers would be compared to who knew most or least about driving rules or who drove better or worse. Scorers would be reported as a percentage rank with half scoring above and half below the mid-point. Auditions are also considered norm-referenced as their goal is to identify the best candidate compared to others, not to determine how many of the candidates meet a fixed list of standards (“FairTest”, 2007) High-stakes test include the SAT, LSAT, GRE, and MCAT.

PROS: Norm-referenced assessments are the best test method to rank best and worst achieving students. Students and teachers know what to expect from the test and how the test will be conducted and graded. Norm referenced assessments are thus fairly accurate as far as results are concerned. They do not seek to enforce an expectation of student understanding, rather levels of performance and inequity are taken as fact, not as defects to be removed by a redesigned system. Goals of student performance are not raised every year until all are proficient. (“Wikipedia”, 2014, Advantages and limitations).

CONS: Sometimes norm-referenced assessments can be focused on low-level, basic skills, with no focus on conceptual understanding. Teachers feel pressured to teach information from the tests, resulting in an emphasis on low-level skills in the classroom. The scores give little information about what the student actually knows or can do; it does not accurately measure subject mastery. Ranking on a percentile scale is not constant in terms of standard-score units. (Glutting, 2002). In some cases, one more question right or wrong can cause a big change in a student’s score based on the percentile score. Tests can be biased to favor one kind of student or another for reasons that have nothing to do with the subject area being tests, so questions which, for example, favor minority groups may be eliminated. (“FairTest”, 2007).

Criterion-referenced assessment differs from norm-referenced assessment in intended purposes, content selection, scoring process, and result interpretation, or how we derive information from a score (Bond, 1996). In the same driving license example above, if a student were to take both the multiple choice test and driving test as a criterion referenced assessment, students would receive a passing or failing mark depending on a pre-set standard about driving rules and driving well. Results would be used to determine if the student had achieved “mastery” as a driver and was able to obtain his/her license (“FairTest”, 2007). Thusly, criterion referenced assessments are used to determine what test takers can do and what they know, and not how they compare to others. Criterion referenced assessments report how well students do relative to a pre-determined performance level on a specified set of educational goals or standards (Glutting, 2002). The content is selected by how well it matches the learning outcomes deemed most important, and whether or not a student can display a level of “mastery” by scoring enough points correctly on the assessment. Many tests and quizzes written by school teachers can be considered criterion referenced assessments if the objective is to see whether the student has mastered the material and not to see whether the test taker is more skilled than another test taker. (“Wikipedia”, 2013, paragraph 1 )

PROS: Results give detailed information about how well a student has performed on each educational goal. Criterion-referenced assessments give more information about how much of the valued content has been learned than a norm-referenced assessment. They are considered better at reflecting actual achievement of individual students than norm-referenced tests. (“Wikipedia”, 2013, comparison of criterion-referenced and norm-referenced tests).

CONS: Students are assessed with regards to standards that define what they “should” know as defined by a state’s objectives. Judges set bookmarks around items of varying difficulty without considering whether the items actually are compliant with grade level content standards or are developmentally appropriate. The difficulty level of items themselves and the cut-scores to determine passing levels are also changed from year to year, making the exam inconsistent for students and teachers (“Wikipedia”, 2013, Comparison of criterion-referenced and norm-referenced tests).

ROLE OF ASSESSMENT AND CONNECTION TO PYP

Assessment is defined as a gathering and analysis of information about student performance (“IB Organization”, 2009, pg. 44). It is designed by the teacher, often with student involvement, and provides feedback on the learning process as a basis for future learning (“IB Organization”, 2007, pg. 14). Students should be observed in a variety of situations, and a wide range of assessment strategies should be implemented. Assessment provides feedback on the learning processes of each element of learning and should drive the process of inquiry (“IB Organization”, 2009, pg. 47). Effective feedback acts like a global positioning system for students telling them how close they are to the target and what steps they can take to reach it (Chappius, 2008, pg. 56). An explicit expectation of the PYP is that successful inquiry will lead to responsible action as a result of the learning process. Student action reveals evidence of transferability and understanding through the process of inquiry (“IB Organization”, 2009). Students should test their new-found knowledge within realistically contextualized situations and should be encourage students to take action (McTighe and Wiggins, 2005).

The assessment strategies and tools proposed by the PYP--rubrics,exemplars, anecdotal records, checklists, continuum, portfolios of work--are designed to accommodate a variety of intelligences and ways of knowing. Where possible, they provide an effective means of recording students' responses and performances in real-life situations that present real problems to solve (“IB Organization”, 2009, pg. 19). A unique characteristic of assessment in the PYP lies in The Exhibition, where students in their final year of the PYP conduct an extended, collaborative inquiry project, or summative assessment, under the guidance of their teachers. The exhibition is a significant moment in the student's’ culminating, transdisciplinary experience, as they exhibit the attributes of the learner profile that have developed through their program with the whole school community, specifically with regards to the five essential elements of the program: knowledge, concepts, skills, attitudes, and action. Students in the PYP do not take part in external examinations, but do take other forms of summative assessments as defined by the standards and guidelines of their schools, districts, and states (“IB Organization”, 2009).

In conclusion, it is vital for a student’s success and knowledge development to implement and understand successful assessment strategies. Whether using formative or summative assessments, or criterion or norm referenced assessments, educators must understand the role each assessment method has within a student’s curriculum framework and select the best method possible to encourage continual inquiry, understanding, open and continual feedback, and motivation to learn.

REFERENCES

Association of Test Publishers (2014). Questions About Testing in Schools. Retrieved on March 17, 2014 from http://www.testpublishers.org/testing-in-schools

Bond, Linda A. (1996).Practical Assessment, Research, & Evaluation. Retrieved on April 15, 2014 on http://pareonline.net/getvn.asp?v=5&n=2

Chappius, Jan. (2009). Seven Strategies of Assessment for Learning. Boston, MA, Pearson Education, Inc.

Edutopia Staff. (July 15, 2008). Why is Assessment Important? The George Lucas Educational Foundation. Retrieved March 17, 2014 from http://www.edutopia.org/assessment-guide-importance

FairTest. (2007). Criterion-and Standards-Referenced Tests. Retrieved on April 15, 2014 on http://www.fairtest.org/criterion-and-standards-referenced-tests

Glutting, Joseph J. (2002). Glutting’s Guide for Norm-Referenced Test Score Interpretation, Using a Sample Psychological Report. Retrieved on April 15, 2014 on http://www.udel.edu/educ/gottfredson/451/Glutting-guide.htm

International Baccalaureate (IB) Organization. (2007). A Continuum of International Education.

International Baccalaureate (IB) Organization. (2008). Primary Years Programme, Middle Years Programme, and Diploma Years Programme: Towards a Continuum of International Education. Cardiff, Wales, UK., Peterson House

International Baccalaureate (IB) Organization. (2009). Primary Years Programme, Making the PYP Happen: A curriculum framework for international primary education. Cardiff, Wales, UK., Peterson House

McTighe, Jay and Wiggins, Grant. (2005). Understanding by Design, Expanded 2nd Edition. Alexandria, VA, Association for Supervision and Curriculum Development

Wikipedia (2013) Criterion-referenced test. Retrieved on April 15, 2014 at http://en.wikipedia.org/wiki/Criterion-referenced_test

Wikipedia (2014) Norm-referenced test. Retrieved on April 15, 2014 at http://en.wikipedia.org/wiki/Norm-referenced_test