Group Performance Trajectories of Athletic Training and Physical Therapy Students Engaged in Case-Based Learning Activities

PURPOSE This study examines the group performance trajectories of athletic training and physical therapy students as they engaged in case-based learning experiences over the course of a semester. METHODS We apply Tuckman and Jensen’s (1977) model of group development which predicts a non-linear performance trend as groups progress through the stages of forming, storming, norming, performing, and adjourning. We also examine the extent to which performance trajectories differ between interprofessional and uniprofessional groups. RESULTS As predicted, results suggest a significant non-linear trajectory of group performance across case-study trials. Specifically, a significant cubic trajectory was observed, such that group performance was characterized by two inflections points over time. Although we observed raw mean differences in performance ratings between interprofessional and uniprofessional groups, no significant differences were found across trials. We offer some speculative explanations for why this occurred. CONCLUSION Ultimately, our findings provide insight to educators and practitioners regarding why and when groups may need additional support. Received: 03/16/2019 Accepted: 08/20/2019 © 2019 Briggs, et al. This open access article is distributed under a Creative Commons Attribution License, which allows unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. HIP E & Group Performance Trajectories ORIGINAL RESEARCH 3(4):eP1181 | 2 HIP E &


Introduction
Extant research suggests that interprofessional collaboration is an important part of providing effective patient care (Baggs et al., 1999;WHO 2010). While some empirical work has examined interprofessional collaboration with respect to group effectiveness (Vinokur-Kaplan, 1995), little empirical research has been dedicated to understanding how groups collaborate and perform over time (i.e. group development). Arguably, this is important to understand in order to help groups overcome obstacles hindering collaboration at particular stages in their development. This study addresses a gap in the literature by examining the development of groups across performance trials, as well as the extent to which interprofessional groups and uniprofessional groups differ in their progression through each stage.
While many groups may struggle with conflict related to communication and clarifying roles and responsibilities, interprofessional groups may especially struggle with effectively integrating their distinct but overlapping professional identities into a cohesive unit. Thus, establishing roles and responsibilities within an interprofessional group goes beyond a general understanding of the work styles and task preferences of individual group members. Instead, elements of ownership and scope of practice issues are also involved. Case by case, interprofessional groups face the challenge of navigating instances in which multiple group members may feel that particular tasks or aspects of patient care belong to them as a function of their professional identity.
By anchoring our research with two strong theoretical frameworks, we are able to make and test specific hypotheses and avoid post hoc rationale to explain potentially spurious findings. Specifically, using Tuckman and Jensen's (1977) model of group development, we make predictions about how we expect groups to perform over time. In addition, with Hackman's (1987) model of group effectiveness, we offer theoretical rationale for the mechanisms that may drive performance fluctuations (e.g. conflict, resolution). As a strong theoretical underpinning is squarely at the center of sound scientific discovery, our aim is to prompt more theorydriven research on this topic, and to provide insight to practitioners and educators regarding patterns of performance they might see in their work groups.

Theoretical Framework and Literature Review
The primary theoretical framework guiding this study is Tuckman's (1965) model of group development, which describes the developmental stages of forming storming, norming, performing, and adjourning (Tuckman & Jensen, 1977). In the forming stage, group members focus on general orientation and the establishment of both interpersonal and task boundaries. The storming stage is characterized by conflict, in which interpersonal issues are emphasized, emotional responses are common, and task work suffers. The norming stage is denoted by an overcoming of conflict and resistance in which group members experience cohesion, standards emerge, and individual roles are accepted. In the Implications for Interprofessional Practice • As interprofessional competencies rely heavily on effective group functioning, it is imperative to understand how groups develop over time, the challenges they face, and why their performance may fluctuate. We show that Tuckman and Jensen's (1977) model extends to interprofessional contexts.
• Our findings may help practitioners and educators better understand when groups may need additional support in terms of clarifying roles and responsibilities, • help practitioners consider developmental group progress (especially during early and late stages) and the implications it has on patient care needs, and • help support the inclusion of group development models in interprofessional curriculum, as it is reasonable to expect that learning about group development may facilitate the resolution of group conflict in more applied settings.
performing stage, task work is facilitated by interpersonal structure, and energy is channeled into task work. The adjourning stage represents the final stage of the group's life cycle in which group work concludes and group members disband. The stages and corresponding performance trajectory are presented in Figure 1.

Figure 1. Stages of group development
In a narrative review, Bonebright (2010) notes that Tuckman's model was originally popular among practitioners and later became common in the academic literature. One of the only studies explicitly testing the model found support for it within a classroom setting (Runkel, Lawrence, Oldfield, Rider, & Clark, 1971). However, this study has been criticized because the observers who rated group member behavior were presented with the theoretical framework and were asked to fit their interpretations to the model, which may have contributed to biased findings (Tuckman & Jensen, 1977). In addition, Runkel et al. (1971) conducted no statistical analyses to qualify their conclusions.
Following these initial studies, group development has drawn some research interest, but we argue that is has not generated enough empirical evidence. As technological advancements emerged, some focused on the online experiences of groups such as Vroman and Kovacich (2002) who found that virtual, interdisciplinary teams experience the same challenges and developmental stages that traditional teams experience, and Glowacki-Dudka and Barnett (2007) who examined student reflections following an online course and found evidence to support the stages predicted by Tuckman and Jensen. Other studies used Tuckman's model to guide their initiatives in practical settings.
For instance, Hope et al. (2005) evaluated the effectiveness of a team-building program while having team members self-report in which developmental stage they perceived their team to be. Lilley, David, and Hinson (2007) described the implementation of interprofessional supervision within a hospice setting and noted how groups were observed going through the stages of forming, storming, norming, and performing. Similarly, Lee (2008) discussed how a large scale team in the field was observed experiencing Tuckman's stages.
Yet other studies have taken broader views either by considering how Tuckman's model applies beyond groups or by providing organizational recommendations. Specifically, two papers have taken a macro perspective by using Tuckman's model to describe the development of the automobile industry in South Africa (Anstey, 2006) and to describe the growth and development of local enterprise partnerships . Moreover, many papers have surfaced encouraging facilitators to be mindful of group dynamics, to understand how groups are progressing through stages, and to provide added support to groups during problematic stages such as storming (Davoli & Fine, 2004;Badali, 2008;Cusack et al., 2012;Kumar, Deshmukh, & Adhish, 2014).
Taken together, while group development appears to be a popular topic, the extant literature does not provide strong evidence about how well the model generalizes across groups. Important questions remain regarding whether groups tend to progress across these stages in a uniform manner, whether certain boundary conditions impact the sequence of stages, or whether performance is affected by a completely different set of theoretical mechanisms. Conceptually, we contend that it is quite possible for groups to experience these stages in many ways. Some groups might take longer than others to progress such that a single stage (e.g. storming) extends across more than one performance trial. Other groups might find themselves encountering the storming stage multiple times over the course of their life cycle, as new task demands emerge, and roles are once again clarified. Along these lines, one study used Tuckman's model to examine software development teams and argues for group development models to expand and include stages of decay such as de-norming (McGrew, Bilotta, & Deeney, 1999).
Similarly, among groups who experience higher levels of attrition and more fluid group membership, the forming stage might be encountered many times. Some have even gone as far as re-conceptualizing Tuckman's group development stages as orientations to the individual, the group, the purpose, and the work, rather than behavioral outcomes (Cassidy, 2007). While our study does not address all of these questions, we do provide some hypotheses and em-pirical tests regarding whether or not groups generally experience this sequence of stages, and whether this progression is qualified by the composition of the group. Furthermore, our study provides a much needed empirical test of Tuckman's model, which has not been offered by other studies in the extant literature.
As seen in Figure 1, the model of group development predicted by Tuckman and Jensen (1977) follows a cubic form, in which the relationship between group performance and time is characterized by two main inflection points. Based on the literature, we expect to observe similar trajectories. Thus, our first hypothesis is as follows: H1: There will be a significant relationship between group performance and time, such that it will follow a cubic form across case-study trials.
The secondary theoretical framework guiding this study is the model described by Hackman (1987), which conceptualizes group effectiveness based on the criteria of output, viability, and individual group member satisfaction. Respectively, these criteria refer to the work the group produces, the extent to which the group is able to engage in future work, and the extent to which the experience is positive for individual members. Moreover, Hackman (1987) argues that group composition is the most important factor for facilitating the application of knowledge and skill. A visual representation of this framework is presented in Figure 2.
We are particularly interested in group member diversity as it pertains to the professional identities of group members. It is thought that groups are more effective when composed of individuals who are diverse enough to be complementary, but also similar enough to facilitate communication (Hackman, 1987). In the current study, group diversity is conceptualized as whether groups are composed of uniprofessional or interprofessional group members. As Shaw and Barrett-Power (1998) highlight, diversity within groups is important and development models can be quite useful for explaining the links between diversity and performance (See Figure 2).
The current study also examines the extent to which performance trajectories differ between uniprofessional and interprofessional groups. As shown in Figure 1, group performance is expected to fluctuate over time. Specifically, group performance is expected to decrease from the forming to the storming stage, and then increase from the storming to norming and performing stages.
When comparing uniprofessional and interprofessional groups, we expect to find two main differences. First, we expect that interprofessional groups will exhibit larger decreases in group performance from the forming stage to the storming stage. Our rationale is that interprofessional groups will experience more initial conflict and confusion as they communicate and establish an understanding across professional boundaries. That is, more conflict is expected to emerge as interprofessional groups negotiate task ownership, scope of practice issues, confront disagreements, and deal with potential dissatisfaction and withdrawal among group members.
Second, compared to uniprofessional groups, we expect interprofessional groups to exhibit larger increases in group performance from the storming stage to the norming and performing stages. Our rationale is based directly on Hackman's (1987) framework describing group member diversity as beneficial to group perfor-mance. Although we expect interprofessional groups to experience more conflict initially, we maintain that the benefits of the interprofessional groups' composition will ultimately prevail. Specifically, once they resolve any conflict experienced in the storming stage, the diverse perspectives within interprofessional groups are expected to contribute to aspects of group performance that are important for patient care outcomes (e.g. comprehensiveness of treatment plan). As such, we offer the following hypotheses: H2: Compared to uniprofessional groups, interprofessional groups will exhibit significantly larger decreases in performance from the forming stage to the storming stage.
H3: Compared to uniprofessional groups, interprofessional groups will exhibit significantly larger increases in performance from the storming stage to the norming and performing stages.

Sample
Our sample consists of physical therapy (PT) and athletic training (AT) students ranging from 18 to 25 years old enrolled in a therapeutic modalities course during the first semester of their professional programs (Doctor of Physical Therapy and Master of Athletic Training, respectively). A total of 90 PT students, and 22 AT students were randomly divided into 36 three-to-four person groups, such that 18 groups consisted of only PT students and 18 groups consisted of both PT and AT students.
Our study utilizes the widely accepted definition that interprofessional education occurs when individuals from two or more professions learn about, from and with each other (WHO, 2010). As such, we conceptualize the groups with both PT and AT students as interprofessional, and those with only PT students as uniprofessional. While our study readily adopts this definition, it is worth noting that controversy still exists regarding the assumption that group demographics define the extent to which a group is interprofessional. We believe that this debate is very important for our paper and for the IPE literature more broadly. However, an extended discussion of this debate is beyond the scope of our paper. We feel that articulating the boundaries conditions of what is and is not interprofessional would require a dedicated paper. For the purposes of this paper, we adopt the WHO definition and make a call for researchers in the interprofessional arena to formally engage in discourse in order to build much needed consensus.

Procedures
As a first step, approval from our university's Institutional Review Board (IRB) was obtained. Course announcements were used to recruit participants, and only students enrolled in the course could participate. All participants were enrolled in the same course section (i.e. a single cohort). A syllabus was provided to each enrollee, and students were notified that their participation in the research was voluntary. Students were given the option to have their data excluded from the research study. Data were de-identified using four digits codes and were then aggregated to the group level. A random number generator function built with R Programming Language was used to 1) assign each participant a three digit number, 2) assign each participant to a team, and 3) assign each team a case for each trial. We did this separately for each condition such that each of the six cases were distributed evenly across conditions. Since we had 18 uniprofessional teams and 18 interprofessional teams, each of the six cases was randomly assigned to three teams from each condition at each trial.

Case-Based Learning Experiences
Students in our sample engaged in four case-based learning activities (i.e. performance trials) with their assigned groups over the course of the semester. During each of the four trials, groups were given one of six cases that reflected relevant course content. Each trial lasted approximately 30 minutes, followed by ten minutes of facilitated discussion. We conceptualize Trial 1 as the forming stage, Trial 2 as the storming stage, Trial 3 as the norming and performing stages, and Trial 4 as the adjourning stage.
Ideally, we would have used five trials to test the five stage model, but the course structure and timing only allowed for four trials. It is important to note that, had our design used five trials, we would have conceptualized the fourth trial as the performing stage rather than the adjourning stage. However, as previously defined, the adjourning stage represents the final stage of the group's life cycle and is characterized by the point at which the group's work concludes and group members disband. That is, the adjourning stage is marked by a more reflective period rather than a performance orientation. Since students knew that Trial 4 would be their last together, we argue that this knowledge prompted students to "take their foot of the gas" with respect to performance. As such, we believe it was most appropriate to conceptualize the last trial as the adjourning stage. Had students not known that Trial 4 would be their last together, we would have conceptualized it as the performing stage.
The cases used in this study included patient history and presentation. Groups were asked to establish treatment goals and to specify appropriate modalities to accomplish those goals. As such, students documented appropriate therapeutic interventions (i.e. main outcomes, assessment methods, physiological effects, contraindications, precautions, and parameters). An example of one of the case studies is as follows: "A 34 year old female presents with a diagnosis of chronic elbow tendinosis and complains of right lateral elbow pain and stiffness that she attributes to tennis activities. Exam reveals pain with palpation at the common wrist extensor tendon at the lateral elbow, pain and weakness with resisted wrist extension, and decreased active and passive range of motion (A/PROM) wrist flexion. Primary Medical History (PMHx): currently four months pregnant with gestational hypertension (HTN).
Responses were evaluated based on accuracy, reasoning, and thoroughness. Evaluations were only used for purposes of the study, had no bearing on student grades, and were not shared with students. Responses were de-identified, such that the evaluator was blind with respect to whether particular responses were provided by interprofessional or uniprofessional groups. In addition, the evaluator was not aware of the theoretical framework underlying this study prior to her evaluations.

Measures
Case assignments were evaluated on a seven-point scale (1 = Least; 7 = Most) assessing accuracy, reasoning, and thoroughness, as these are important criteria in practical settings. For the purposes of this study, accuracy is defined as the degree of effectiveness of the treatment in addressing the patient's case and the correctness of the information provided regarding treatment application. Reasoning is defined as the degree to which valid, logical justification is used in the selection of a treatment. Thoroughness is defined as the degree of completeness in addressing all aspects of the patient case and in presenting all relevant information.
The evaluation criteria were determined by the course instructors and were judged by a single evaluator who is a certified athletic trainer by the Board of Certification, in good standing with the National Athletic Trainers' Association, and licensed in the state of Missouri. The evaluator has seven years of professional athletic training experience and is involved as an author of this research. The use of a single evaluator is not ideal but is in line with work settings in which a single supervisor rates performance, and thus this approach supports external validity (Campbell, 2017). Table 1 shows the rubric used.

Results
All statistical analyses were conducted using R Programming Language for Statistical Computing (R Development Core Team, 2008). Descriptive statistics across performance trials are presented in Table  2. Correlation analyses revealed that for each trial, the accuracy, reasoning, and thoroughness ratings were strongly correlated (ranging from .65 to .81) and ex-hibited relatively the same functional form (i.e. trajectory across trials). Therefore, a composite variable was created using the average of accuracy, reasoning, and thoroughness ratings for each trial, which we conceptualize as overall performance. We test our performance hypotheses using the composite variable of overall performance.
To test H1 which predicted a cubic form of overall performance across trials, a hierarchical regression was conducted. As shown in Table 3, results from the hierarchical regression suggest a significant cubic relationship between overall performance and trial. More specifically, no significant linear relationship was found, and the cubic function was found to account for variance over and above the quadratic function. Overall, results support our hypothesis and suggest that our data (Figure 3) approximates the model of group development proposed by Tuckman and Jensen (1977).

Table 3. Hierarchical regression summary of relationship between performance and trial
To test H2 and H3, which predicted that compared to uniprofessional groups, interprofessional groups would experience significantly larger decreases in performance from the forming to the storming stage and significantly larger increases from the storming to the norming and performing stages, a regression analysis was conducted. Difference scores were calculated by subtracting overall performance scores of the storming stage (Trial 1) from the scores of the forming stage (Trial 0), and by subtracting scores of the storming stage (Trial 1) from scores of the norming and performing stages (Trial 2). Descriptive statistics between conditions are presented in Table 4.
Using these difference scores as the dependent variables, results of the regression analysis suggest that no significant differences were found between team conditions. Compared to uniprofessional groups, interprofessional groups did not experience significantly larger decreases from the forming stage (Trial 0) to the storming stage (Trial 1), or significantly larger increases from the storming stage (Trial 1) to the norming and performing stages (Trial 2). Results of the regression analysis are presented in Table 5.

Discussion
In this study, we examine how groups who were engaged in case-based learning activities fared across four performance trials. We tested and found support for the model of group development posited by Tuckman and Jensen (1977), as we found a significant cubic relationship between group performance and trial. Specifically, our results suggest that after forming their groups in the first trial, groups experienced a marked decrease in performance during the second (i.e. storming) trial, presumably due to the conflict related to roles and responsibilities. Following the storming phase, groups generally experienced a marked uptick in performance during the third trial (i.e. norming and performing), again presumably as they resolved conflict and channeled their efforts back toward the task. During the fourth (i.e. adjourning) trial, group performance seems to have generally leveled off, which represents a change in trajectory compared to the previous trial. However, we did not observe a marked decrease in performance in the adjourning stage, as predicted by the model.
In addition, we did not find meaningful differences between the trajectories of uniprofessional groups and interprofessional groups as we hypothesized. Since we observed notable differences in the raw means scores, we speculate that we may not have had a large enough sample size to detect a significant effect. Moreover, students involved in this study were in the first semester of their respective professional programs. So it is also possible that the professional identities of the stu-dents may not have been well formed at this early stage of their education, thus the viewpoint of the interprofessional group may have been too closely aligned to the viewpoints of the uniprofessional group members which would dampen any effect. Future research may consider selecting professions that have more unique scopes of practice, as this would lend to more robust findings and more broad conclusions regarding performance trajectories.
Our study has several limitations. First, we have a modest sample size of 36 groups, which did not provide enough power to detect smaller effect sizes. We suspect that with a larger sample, the pattern of performance would have resembled the model proposed by Tuckman and Jensen (1977) even more closely. Additionally, the imbalance between the available number of students (90 PT students, 22 AT students) in each respective profession limited the ability to create interprofessional groups with a similar number of students from each profession. To add clarity, the interprofessional versus uniprofessional group differences were tested, and for this comparison, the sample sizes at the group level were equal. That is, there were as many interprofessional groups as there were uniprofessional groups. More importantly, we did not find any differences between these groups and so we made no conclusions in this regard. Testing our hypotheses with a larger sample size, more balanced groups, more disparate professions or those further along in their education might yield different results with respect to the interprofessional versus uniprofessional comparison. However, we would not expect this to have changed our Table 5. Regression summary predicting differences scores overall finding related to the trajectory of group performance, since this test included the entire sample. Thus, while the disproportionate size of AT versus PT students might have impacted the results of the interprofessional versus uniprofessional comparisons, it has no implications for our primary finding regarding group performance trajectories. The group performance trajectory hypothesis was tested using the sample that included all groups and all participants, and we have no theoretical reason to believe that this trajectory would be different for AT versus PT students. Theory suggests that in general, groups follow the stages of forming, storming, norming, performing, and adjourning, regardless of their composition. So, while some of our hypotheses predicted that group composition would affect the extent to which groups experienced one or more of these stages (i.e. more conflict during storming and lower levels of performance; more cohesion during performing and higher levels of performance), we would not expect that groups bypass any of these stages as a function of their composition. Put differently, theory dictates that the general trajectory (i.e. non-linear with two inflection points) of performance would still be observed across different types of group compositions, which is what we found in our study.
Another limitation of our study is the use of four performance trials to test a five-stage model. As such, we had to combine the norming and performing stages and represent both by a single trial. By including an additional trial, we suspect that we would have observed a notable increase in group performance from the norming to the performing stages. If so, our observed data would have fit the theoretical model more closely. Lastly, although some may consider the use of a single evaluator a limitation, most work settings use a single evaluator for performance assessment, thus providing justification in terms of external validity. In addition, we did not ask group members to report their experiences from trial to trial. Thus, we cannot conclude for example, that the performance decreases we observed in the second trial were indeed a result of intragroup conflict. Instead, we rely on the theoretical model to speculate. Future research should consider collecting data regarding the intragroup experience and map this onto the trials and trajectories.
The use of performance ratings that were not self-reported was a strength of our methodology as it mitigated problems related to common method bias. Moreover, our study overcomes limitations of previous studies as our rater was not aware of the theoretical framework underlying this study prior to her evaluations, therefore reducing undue bias in her ratings. In addition, unlike previous studies, we qualify our findings with statistical tests of the theoretical model.
Another strength of our study was our design as we used multiple cases at each trial, and we used a random number generator to assign each team a case for each trial. Some might question whether the relative difficulty of each case had any impact on the results. It is important to highlight that our primary finding (i.e. group performance trajectory) is based on the average level of group performance (i.e. across all groups) at each trial. Therefore, our methodology controls for and eliminates any case effects (i.e. differences in difficulty across cases) by ruling out the possibility that the performance trajectory we observed was a function of systematic change in case difficulty from one trial to the next. Put differently, performance ratings were aggregated to the trial level and even with small (i.e. non-significant) differences in difficulty across cases, groups exhibited the hypothesized trajectory. For reference, Table 6 shows the performance differences between cases across trials.  To conclude, our study provides additional evidence to support the model of group development proposed by Tuckman and Jensen (1977). Our results may help both educators and practitioners understand the typical performance trajectory of groups and may help inform decision making regarding when groups may need the most support. For instance, during the storming stage, groups may need additional support to facilitate communication and to clarify the roles and responsibilities of different group members, both of which have been highlighted as important competencies of interprofessional collaboration (Wood, Flavell, Vanstolk, Bainbridge, & Nasmith, 2009).