The Design of Interprofessional Performance Assessments in Undergraduate
 Healthcare & Social Work Education: A Scoping Review

The increase in the number of people with multiple chronic conditions requires that future healthcare professionals learn to collaborate with professionals of various backgrounds. It is important that after graduation, students are ready to think from an interprofessional perspective. To make valid inferences about students’ interprofessional competencies, an interprofessional performance-oriented assessment approach is needed that yields evidence for the acquisition of interprofessional competencies. To date, little is known about the design of interprofessional assessments. The objective of this article is to examine the design quality of current interprofessional performance assessments in undergraduate healthcare and social work education. For this purpose, we did a scoping review and selected 28 studies. Based on a theoretical assessment design framework, we specifically analyzed the description of the purpose of the assessment, the assessment performances, the assessment exercises, and the assessment rating plan. Results show that in current interprofessional assessment practice, a rich variety of assessment exercises and assessment instruments is used. Much less is known about the role of the assessors and the performance criteria that are applied for the assessment of interprofessional competencies. We conclude that the design of interprofessional assessment performances and exercises that are aligned with the goals of interprofessional education needs further attention. We provide recommendations for future research regarding a more systematic design of well-aligned performance assessments in interprofessional education.


Introduction
Within our global health situation, there has been a rapid increase in the number of people with multiple chronic conditions (Centers for Disease Control and Prevention, 2019). As a consequence, it becomes increasingly important that healthcare professionals learn to collaborate with professionals of various backgrounds to assure healthcare quality. Interprofessional (IP) collaboration can lead to improvements in quality of life for patients, clients, and their families, improved access to care, and enhanced patient and client safety (Hammick, Freeth, Koppel, Reeves, & Barr, 2007;Mickan, 2005;Reeves et al., 2008). The World Health Organization (WHO) recommends that health and social work education institutes enable students to develop IP competencies by engaging students in IPE (WHO, 2010). IP education (IPE) focuses on learning from, with, and about other professionals (in training) to improve collaboration and quality of care (CAIPE, 2014). IPE is crucial in the transformation of medical, nursing, and public healthcare education in the 21st century (Frenk et al., 2010). Recent reviews showed that IPE is effective in improving attitudes and perceptions towards professionals of other disciplines, and in improving the acquisition of knowledge, skills, and behavior regarding IP collaboration (Guraya & Barr, 2018;Spaulding et al., 2019). IPE has been shown to help healthcare practitioners in their approach to resolve complex issues with clients and has been instrumental in dispelling stereotypes (Guraya & Barr, 2018).
To ensure that students are ready to work in IP contexts with health professionals from multiple disciplines, well-designed performance assessments are needed to make reliable and valid inferences about students' IP competencies. Performance assessments focus on the ability to use combinations of acquired skills and knowledge, and the assessment task is described in terms of a certain performance that is perceived as worthwhile and relevant to students and teachers (Wiggins, 1989). The design of performance assessments that yield the appropriate evidence for the acquisition of IP competencies is a complex task (Delandshere & Petrosky, 1998;Sluijsmans, Prins, & Martens, 2006). To ensure that the assessment fits its purpose, an assessment approach is required that specifically focuses on the design of performance assessments. For the purpose of this study, we chose to use the approach of Stiggins (1987). Stiggins described that the design of any performance assessment includes four main steps (Stiggins, 1987; see Table 1). Stiggins' work originates from more than 40 years ago, but it is still very topical and has been applied in many other contexts and research studies (e.g., Sluijsmans, Prins & Martens, 2006). The design steps also align with current evidence-informed instructional design models, for example the Four-Component Instructional Design model (4C/ID-model; Van Merriënbo- Step 1. Clarify reason(s) for assessment a. Specify decision(s) to be made b. Specify decision maker(s) c. Specify use to be made of results d. Students to be assessed Step 2. Clarify performance to be evaluated a. Specify the content or skill focus of the assessment b. Select type of performance to be evaluated c. List performance criteria Step 3. Design exercises a. Select form of exercises b. Determine the obtrusiveness of assessment c. Determine amount of evidence needed Step 4. Design performance rating plan a. Determine type of score needed b. Determine who is to rate performance c. Clarify score recording method  Kirschner, 2018), which helps to better prepare learners for real practice using authentic tasks.
Designing the performance assessment starts with clarifying the reason for the assessment. No assessment should ever be conducted unless and until the evaluator knows exactly how the results are to be used. In the first step (1a), the decision to be made on the basis of assessment results has to be clarified, for example grading of students' performance or evaluation of a module. The decision maker(s) should be specified (1b), who are the people who make decisions about the use of results (1c) which could be for example to determine mastery.
Step 1d entails describing the students to be assessed, specifying how many, what background, in what year and other characteristics that may be important.
A key to quality assessment is the clear definition of performance to be evaluated (step 2). First (step 2a), the content or skill focus of the assessment is specified, referring to the subject matter area. In step 2b, the performance to be evaluated is chosen, which can be either a product or a process, or a combination of both. No single specification will contribute more to the quality of the performance assessment than step 2c, the performance criteria, which are observable behaviors or attributes of products that are considered in the rating (Stiggins, 1987).
After the performance to be evaluated is defined, decisions have to be made regarding how to sample performance, to observe and assess it. First (step 3a), the form of exercises has to be chosen, which can either be everyday events as they occur, or specifically designed exercises to elicit performance. Next, a decision has to be made on the obtrusiveness (3b), whether it is an announced assessment, or whether the assessment takes place without examinees knowing beforehand.
Step 3c consists of determining the amount of evidence that is gathered.
To complete the design of the performance assessment, it is necessary to plan how performance can be scored. A first step (4a) is to determine the type of scores, which could be holistic, a general overall index of scoring, and analytic scoring, a more detailed breakdown of dimensions in student performance.
Step 4b refers to who is to rate performance and their characteristics. As a last step (4c), the score recording method is to be chosen, in which the performance criteria are translated into the scoring process.
When it comes to the design of interprofessional performance assessments (IPA), there are several knowledge gaps. First, there is lack of clarity on the definition of the content or skill focus to be assessed. While most authors agree that IPE is important, there is little agreement on how to compile an IP curriculum, aligned with competencies and assessment (Rogers et al., 2017). For the design of performance assessments that elicit the appropriate evidence, clarity about which IP competencies students need to acquire is imperative. Many frameworks are available on IP competencies, and much is known about the definition of IP competencies, but there is no empirical evidence to determine which areas of outcomes need to be assessed and achieved in order to ensure that graduates can provide effective collaborative care that benefits patients (Reeves, 2012). A major issue is knowing what 'competent to enter practice' is, therefore more insights are required in what pre-licensure students can reasonably achieve regarding IP competencies, as opposed to the expert or master with many years of professional experience (Rogers et al., 2017). Although the resources on competencies important for IPE might prove to be very useful, they often include lists of criteria that could include too many skills or concepts.
Another gap lies in the lack of knowledge regarding the nature, frequency, and timing of a minimum set of assessment exercises required, across an aligned health professional curriculum, to ensure that students are collaboration ready at graduation (Rogers et al., 2017). To date, we lack understanding of which exercises and performances are needed to assess the many different domains which make up the IP competency, nor do we know how these assessment tasks can be best organized within the undergraduate curriculum (Reeves, 2012).
Looking at step 4 of the design of performance assessments (Table 1), we lack understanding of who the assessors should be for IPA (Rogers et al., 2017). Contrary to more traditional forms of testing, performance assessments in which the students often are confronted with ill-defined problems, do not provide clear-cut right or wrong answers. The performance is evaluated in a way that allows informative scoring on multiple criteria. This may be difficult for the assessors of the performance. In IPE, it remains unclear what characteristics are of IP assessors, whether assessors should be from the same professional background as the examinee or a different profession and how assessments should be moderated. Regarding the rating form to be used in IPA, literature shows that several tools have been developed to assess teamwork performance (Havyer et al., 2016). However, it is important to note that tools developed to assess the performance of teams in health care are unlikely to be suitable for the assessment of teams comprised of students (Rogers et al., 2017).
Based on the aforementioned gaps in current knowledge, little is known about the quality of IPA in IPE from an instructional design perspective. The objective of this study is to explore the design quality of current IPA practices in undergraduate healthcare and social work education. Research questions are: • What is the purpose of the IPA?
• Which performances are assessed in IPA?
• Which assessment exercises are used in IPA?
• Which performance rating plans are used in IPA?

Materials and methods
For this review, the five stages to conduct a scoping review as described by Levac, Colquhoun, and O'Brien (2010) were followed.

Stage 1: Identifying Relevant Literature
The search strategy was developed by one librarian (MvB) and one researcher (HS), using the research questions as a guide. The search terms (Table 2) were applied to six electronic databases: PubMed, ERIC, Sci-enceDirect, Web of Sciences, PsycINFO, and EMBASE. Eight journals that regularly publish articles in the fields of assessment and/or IPE were hand-searched. A thorough search of the grey literature using Google Scholar and ResearchGate was conducted to identify non-indexed literature, such as student manuals, practice reports, and educational documents using the following search terms: "interprofessional", "assessment", "education", and "undergraduate". A total of eight experts from the United Kingdom, Australia, Belgium, and Canada in the field of IPE were consulted to collect unpublished articles or other relevant documents.  Table 2. Search terms scoping review IPA Note. the asterisk (*) indicates that the search engine also looks for the plural.

Stage 2: Study Selection
Inclusion and exclusion criteria were discussed with the research team at the beginning of the screening process. Inclusion and exclusion criteria were constantly adapted during the screening of the articles at each stage, based on the fit of the literature to our research questions. We included publications on performance assessments for IPE in undergraduate healthcare and social work education. We excluded publications on attitudes or perceptions of students regarding IP collaboration. We excluded literature reviews from this scoping review, however, we did look into the reference lists of the found reviews for publications that could match our inclusion criteria. Publications about development, or measuring validity or reliability of an assessment tool were excluded, due to the little amount of information provided on the several elements of the design steps for performance assessments.
Three reviewers (HS, DS, XJ) independently screened and scored each title based on whether it reported information about IPE or IPA and each study was then labelled yes/no/maybe. Next, two researchers (HS, DS) screened 10 abstracts together to specify selection criteria. Subsequently, three researchers (HS, XJ, DS) independently screened the abstracts and scored them as yes/no/maybe. During weekly meetings, all discrepancies between the researchers in the choice of publications to include, as well as any studies labelled "maybe", were discussed (see Figure 1 for the numbers of all included publications). Hereafter, all full-text articles were screened by two researchers (HS, XJ). If the two researchers did not reach consensus, a third reviewer (AM) was consulted; this occurred for 21 publications.

Stage 3: Charting the Data
The publications were charted using a data extraction form constructed collaboratively by the research team (HS, JvM, AM, DS). Information from the publications was entered into a template (Excel © spreadsheet). Two reviewers (HS, DS) independently charted the data and discussed the results. In case of discrepancies or doubts, a third researcher (AM) was consulted, which was the case for five publications.

Stage 4: Collating, Summarizing, and Reporting
A basic overview of the numbers and types of included study was created and, then a thematic analysis-de-ductively and inductively-of the publications was conducted. We applied a deductive thematic analysis using the steps in Table 1, which allowed us to use key concepts for performance assessments. As a first step in this analysis, the publications were carefully studied. Next, the publications were coded using the four design steps for performance assessment (Table 1), and then the codes were collated to formulate themes. The last step was a comparison of these themes among all publications to come to a final thematic map used within this scoping review (Braun & Clarke, 2006). In addition, an inductive method was used to identify relevant themes regarding IPA not covered by the framework.

Stage 5: Consultation
A consultation session about the findings was organized with six Dutch experts in the field of IPE. The findings as presented in stage four were used as input for the consultation. These experts were also asked to reflect critically on the results to identify any missing information.

Results
The literature search yielded 28 publications published between 2003 and 2019 ( Figure 1). These publications included empirical research articles, book chapters, and descriptive practical articles. The largest share of included publications (n = 16) was from North America, followed by nine publications from Europe (UK, Germany, and Belgium). One publication was from Asia, one from Australia, and in one publication, crosscountry collaboration was apparent. Many publications from different parts of the world use the definition for IPE as provided by the World Health Organization (2010) : "Interprofessional education occurs when two or more professions learn about, from and with each other to enable effective collaboration and improve health outcomes".
Appendix A presents the full list of selected publications, including a short summary of the findings per publication regarding the four main design steps and underlying elements. A subset of 10 publications used a quantitative design in which quasi-experimental designs were employed most frequently. Four studies used qualitative methods, and seven studies used a mixedmethods design. Seven publications used a descriptive design to explain their IP module or specifically their IPA.

Reason for the Assessment
Six publications describe how the assessment results can be used to make decisions on certification, evaluation or other activities. Two publications used IPA for certification of the students. Three publications described evaluation as the reason for IPA, and one publication described a combination of certification and evaluation. In most publications, it remained unclear what the exact purpose of the IPA was. In some publications the purpose was not explicitly described, but for example the purpose of monitoring student learning was named with the aim to provide feedback.
No information was provided on the decision makers for IPA.
In two publications, how the results would be used was mentioned, and in both publications this use was to determine mastery.
All publications provided information about the students to be assessed in IPA. Students participated in  IPA within a broad range of educational programs, including nursing, occupational therapy, social work, and dentistry. The inclusion of both nursing and medical students (in combination with other professions) was the most common, namely in 14 publications. The educational year(s) in which IPA was implemented varied to a large extent, from the first to the last year of the educational programs. In some publications, the years in which students participated differed per educational program. For example, in the publication by , medical students in year three participated in the IPA together with students of nursing and pharmacy in year four.

Performances
Almost all publications (n = 27) specified the content or skill focus which was at the basis of the IPA. Thirteen of these referred to (inter)national IP competency frameworks (see Table 3). Some publications reported a mix of different competency frameworks, for example Kirwin et al. (2017) who used a combination of IP and profession specific competencies, namely the IPEC Core Competencies for Interprofessional Education and Collaborative Practice and the Centre for Advancement of Pharmacy Education competencies. Thirteen publications described competency domains instead of a framework; for example, Garbee et al. (2013) reported team performance, existing of communication, role clarity, and situational awareness as IP competency domains.
Nineteen publications described the IPA performance to be evaluated. Nine publications reported that they assessed IP competence by evaluating an assessment product, such as a reflective report or a care plan. Four publications reported that they assessed IP competence by evaluating assessment processes. An example can be found in Oza et al. (2015), in which students have to gather a history from a standardized patient, perform a physical examination, and counsel the standardized patient. Six publications mentioned a mix of assessment products and processes. For example, Zaudke et al. (2016) described interviewing the standardized patient, creating a care plan, and communicating this care plan to the simulated patient(s) as assessment tasks.
Seven publications reported performance criteria that are considered in rating the students in the IPA. These were presented in the assessment instrument, for example in a rubric that was used to evaluate students' IP competencies.

Exercises
All but one publication described the exercises used in their IPA, of which 18 described simulation or roleplay situations. The content of the simulations differed. Kirwin et al. (2017), for example, described a "curbside consult session, " in which pharmacy students were asked a drug-related question by professionals from other disciplines.
Five publications mentioned an (IP team) objective structured clinical examination (OSCE/iTOSCE/ TOSCE). An OSCE usually includes a circuit of several stations focusing on a certain content, in which the student is examined, using either real or simulated patients. Cullen et al. (2003), for example, described complicated childbirth situations in their Team OSCE.  Table 3. IP Competency frameworks used in publications Furthermore, a great variety of assessment exercises or a mix of exercises was reported, such as coaching volunteer patients who are at risk for developing chronic diseases, IP international practice placement, and discussions of patient cases. A mix of assessment exercises was used in eight publications. An example of combined assessment tasks can be found in Anderson and Kinnair (2016), who used an IP workshop, single best answer and short answer question tests, combined with simulation and objective structured clinical examinations.
The obtrusiveness of IPA was mentioned explicitly in one publication, namely in Reising et al. (2017), who mentioned that students receive the performance rubric and preparation one week ahead of the IPA.
In four publications, information was provided about the amount of evidence for acceptable performance that is planned to be gathered. In all cases, it concerned multiple samples of evidence that were collected either at one time or at multiple times. An example is Jorm et al. (2016), in which students had to hand in both a management plan for a patient and a creative video about an IP case to be evaluated.

Performance rating plan
No publication provided explicit information about the type of score (holistic / analytic) needed.
Twenty-seven publications included information about who is to rate the performance of the IPA. The role of the assessor differed between professionals (in clinical practice), faculty members, patients, peers, or the learners themselves. Eight publications mentioned the professional background of the assessors; in all of these publications, the professional background was the same as the professions for which the students were being educated. In six publications, students were assessed by standardized (simulated) patients. Ten publications elaborated on other assessor characteristics, such as training of the assessors or calibration sessions between assessors. Lie et al. (2015), for example, mentioned that the faculty members were experienced in teaching and assessing students, but had no experience with IPA; therefore, they received training prior to the Team OSCE. In another publication, observers were paired-one nursing and one medical assessor (Ker et al., 2003).
The majority of publications (n = 21) described the score recording method used. The most cited methods were rubrics, observation checklists, and rating scales to assess IP performance. Seven publications described checklists to rate IP performance. These seven publications reported the content of the checklist; for example, Poirier et al. (2017) used a yes/no checklist for realtime assessment rating of the communication, process, and team dynamics during a simulation training. Seven publications reported using rating scales, such as the McMaster-Ottawa Team Objective Structured Clinical Examination (Riesen et al., 2012), which assesses six IP competencies rated on a nine-point Likert scale. In six publications rubrics were used to assess team performance or communication within teams. An example is the Indiana University Simulation Integration Rubric , in which items contain behaviors consistent with the communication and teamwork domains in the IPEC competencies of communication and teamwork.

Discussion
The objective of this study was to explore the design quality of current IPA practices in undergraduate healthcare and social work education. For this purpose, the design steps of Stiggins (1987) were used. Results show that much is already known about several elements of designing IPA. The results for the first research question -the purpose of IPA -revealed that little is published about the reason for IPA, the decision maker(s), and the use to be made of the results. Much is published about the participating students, which varied to a large extent in professional backgrounds and educational years in which IPA is implemented.
With regard to the second research question -the performances -findings show that a variety of competency domains or competency frameworks are used. The majority of the studies refer to (inter)national IP competency frameworks and competency domains, using similar competencies regarding IP collaboration. We found much information about the performances, processes or products, that were assessed, such as written care plans or reflections as products, or communication as process, but little was published about the performance criteria.
The results for the third research question -the exercises -revealed that many different exercises are used in IPA, mostly simulation-based tasks and case discussions. Little is published about obtrusiveness of assessment and the amount of evidence that is needed to make a decision about IP competence.
With regard to the fourth research question -the performance rating plan -the findings indicated that little is known about the type of score used in IPA. The majority of the studies included information about the assessors in IPA, but less was published about assessor characteristics. Findings also showed that multiple instruments are used to evaluate IP performance, most of which were rubrics, observation checklists, and rating scales.
There is a lack of balance in the available knowledge regarding the four steps of performance assessments for assessing IP competencies. Design steps that were described, often lack clarity in description. An example of this regards the performance criteria to be assessed. No other specification contributes more to the quality of the assessment than the performance criteria, formulated in observable terms (Stiggins, 1987). As mentioned by Reeves (2012) it is encouraging that most of the publications used similar core concepts of IP collaboration such as skill focus, communication, collaboration, and patient-centered care. However, the statements within the competency frameworks, or cluster of domains, are very complex, often combining multi-faceted attributes related to attitudes, values, knowledge, skills, and behaviors (Reeves, 2012). Many of the competencies are abstract and difficult to translate into observable and measurable behaviors (Rogers et al. 2017). In the publications reviewed, processes to be assessed were sometimes described merely as "teamwork", which is difficult for assessors to evaluate. It was difficult to determine which specific student performances were evaluated.
We also see a lack in clear definitions of the reason for the assessment, especially in the decision to be made, the decision makers, and the use to be made of the assessment results. The lack of information about performances and performance criteria to be assessed has implications for the quality of the design of the performance assessment as a whole. Therefore, it is necessary for high quality performance assessments to invest in clear definitions of the four steps of purpose, performances, exercises, and rating plan of the IPA.
In addition to clarity within the four steps, an important aspect of designing performance assessments is constructive alignment between the steps of a performance assessment. Constructive alignment is an outcome-based approach to teaching in which the learning outcomes that students are intended to achieve are defined before teaching takes place (Biggs, 1996). Teaching and assessment methods are then designed to best achieve those outcomes and to assess the standard at which they have been achieved. Alignment aims at aligning both teaching and assessment to the intended learning outcomes. From this review it appeared that much information was provided about the different elements of IPA. Many publications also provided -some -information about all four steps of their IPA (e.g. Forgey & Colarossi, 2003;Morison & Stewart, 2005), but it remained implicit as to how these components were related to one another. For example, most authors did refer to competencies students worked on, but it remained unclear in what way these competencies were translated into observable performance criteria, the assessment exercises, and the rating form. There seems to be a fragmented view on the design of IPA.
This study and other publications reveal that despite an increase in the amount of IPE literature inconsistencies in IPA remain (e.g. CAIPE, 2017;Olson & Bialocerkowski, 2014;Reeves et al., 2010;Shrader et al., 2016). One of these inconsistencies concerns the timing of the IPA in undergraduate education; lack of clarity remains on whether IPA should best be implemented at the undergraduate, postgraduate, and/or practice level(s) (Michalec et al., 2015;Paradis & Whitehead, 2018;Reeves, 2012 To enhance insights in designing IPAs, it is recommended that future curriculum design aims at designing valid and reliable performance assessments for IPA. Since the design of performance assessments is a difficult task (Delandshere & Petrosky, 1998;Sluijsmans, Prins & Martens, 2006), we should make use of the evidence-informed guidelines to design curricula and assessments. Examples are the design steps described by Stiggins (1987) and the theory of constructive alignment by Biggs (1996). Assessing student performance is a complex cognitive task (Govaerts et al., 2011) and to draw valid inferences about IP performance, expert judgments are required (Govaerts & Van der Vleuten, 2013). IP rating forms as presented in the selected studies mainly contain subjective judgments, whether quantitative (using rating scales or rubrics) or qualitative (using narrative feedback). Considering this, and the guidelines provided by Stiggins (1987), it seems important to use a multitude of assessors to assess students' IP competencies. This aids as well in narrowing bias in assessors' judgments (van Merriënboer & Kirschner, 2018). Next to a multitude of assessors, training of assessors is a crucial element in good assessment (Stiggins, 1987). Van der Vleuten, Heeneman, and Schuwirth (2017) state that the quality of the assessment lies in the users of the assessment instruments, not in the instruments themselves.
This review has some strengths and limitations. Initially, we approached IPA very generally, which led to a large number of studies on the assessment of attitudes measured by self-assessment. A strength of this review is that we narrowed down the inclusion criteria to focus on assessing student IP performance. This enabled us to provide a scoping review on the assessment of IP performance, which is an area that has received relatively little research attention (Havyer et al., 2016;Rogers et al., 2017).
The methodological rigor of included publications was not examined. The reason for this was that we wanted to include grey literature and book chapters,. However, this study succeeded in providing an overview of IPA which can be used to inform future research or sys-tematic reviews. Second, while also purposive, the lens for analysis was the theory by Stiggins (1987) on performance assessments. This is one of many assessment theories which could be applied to the data analysis. We chose this theory because it is one of the most fundamental theories for designing performance assessments, suited for assessing complex competencies like IP competencies.

Conclusions
The aim of this scoping review was to explore the design quality of current IP performance assessment practices in undergraduate healthcare and social work education. Several areas of IPA are well researched, such as IP competencies, exercises, and assessment rating plans. Less is known about other aspects of IPA, like the required characteristics of assessors and the performance criteria for assessing IP competencies. There is little attention being paid to determining how students' performance should be assessed. Existing systematic and evidence-informed assessment design approaches provide new and promising possibilities for the design of IPA. The findings of this scoping review underline the desirability for such a design-based approach in IPA and provide recommendations for an assessment model that can be used to design IPA. Future research should focus on theory-oriented, design-based IPA, in which the alignment between the constructs of the assessment program determines its quality.