The American Psychiatric Association (APA) has updated its Privacy Policy and Terms of Use, including with new information specifically addressed to individuals in the European Economic Area. As described in the Privacy Policy and Terms of Use, this website utilizes cookies, including for the purpose of offering an optimal online experience and services tailored to your preferences.

Please read the entire Privacy Policy and Terms of Use. By closing this message, browsing this website, continuing the navigation, or otherwise continuing to use the APA's websites, you confirm that you understand and accept the terms of the Privacy Policy and Terms of Use, including the utilization of cookies.

×
Original ArticlesFull Access

Beyond the “Acid Test”: A Conceptual Review and Reformulation of Outcome Evaluation in Clinical Supervision

Abstract

Theoretical models abound within clinical supervision, but these rarely have been applied to supervision evaluation. Instead, it appears that reviewers and researchers have simply transferred to supervision the conceptual frameworks used within medicine, especially the idea that clinical outcomes are the “acid test” of supervisory effectiveness or quality. Following a careful examination of the key literature, in this paper I argue that this has led to an overemphasis on clinical outcomes, with the net effect of reducing scientific confidence, understanding, and the effectiveness of supervision. To begin to rectify this bias, an augmented fidelity framework is used to reformulate evaluation, drawing on some of the key concepts guiding evaluation within related fields (i.e. service evaluation; staff development; psychotherapy; applied research). The resulting evaluation model is specific to clinical supervision and can help to increase our understanding, enhance our practice, re-prioritise research, and inspire confidence in supervision.

Introduction

How should we evaluate clinical supervision? The dominant view is that the clinical effectiveness of the supervisee (therapist) should serve as the definitive outcome: “The impact of clinical supervision on client outcome is considered by many to be the acid test of the efficacy of supervision” (Ellis & Ladany, 1997, p. 485). According to this conceptual metaphor, the good clinical outcomes of the supervisee signify that the supervision can be approved or verified as achieving “the gold standard” (the acid test was originally used to verify gold). Such metaphors play a valuable role in science, but they can also constrain or distort our understanding, leading to “crooked thinking.” According to Thouless (1930), using verbal devices such as metaphors or analogies to deduce conclusions is “not necessarily dishonest or a crooked way of thought, although it is a dangerous one, always requiring careful examination” (p. 140). In this review I will therefore examine this metaphor in detail. Following a definition and overview of the current issues in evaluating supervision, the concept of intervention fidelity will serve to structure a reformulation of the acid test reasoning, with constructive suggestions drawn from several parallel literatures.

Definition

The history of outcome evaluation within modern healthcare can be traced to Donabedian (1966), who argued that in medicine it represented the most frequently used of three quality criteria: structure, process, and outcome. He defined health outcome as a change that resulted from antecedent healthcare, in terms of recovery, restoration of function, or survival. This builds on the dictionary definition, which refers to an outcome as a result or consequence (Concise Oxford English Dictionary, 2004). In supervision, Wampold and Holloway (1997) defined an outcome as “… any phenomena representing a change or state that persists beyond the actual supervision session” (p. 16). Thus, outcomes refer to improvements that are targeted by an intervention like supervision, and outcomes are traditionally regarded as the paramount measure of various forms of healthcare. In turn, an evaluation is “the use of social research methods to systematically investigate the effectiveness of social intervention programs … and are designed to inform social action …” (Rossi et al., 2003, p. 16). Put more concretely, outcome evaluation entails a judgment about the extent to which supervision objectives are achieved.

At the time of their review, Ellis & Ladany (1997) stated that only nine poor-quality outcome evaluations of the supervisees’ clinical effectiveness had been published. A decade later (Ellis, D’Iuso, & Ladany, 2008), it was again concluded that little progress had been made, and the need for an “evolutionary leap” (p. 496) was acknowledged. Specifically, Inman and Ladany (2008) concluded that “more comprehensive and thoughtful” (p. 512) evaluation was required to address the complex processes and outcomes of supervision, and the disconnection between theory and research. These negative conclusions on evaluations of the acid test have been recognized repeatedly (e.g. Freitas, 2002; Holloway, 1984; Holloway and Neufeldt, 1995; Kilminster and Jolly, 2000; Watkins, 2011; Wheeler and Richards, 2007). Nonetheless, Ellis & Ladany (1997) recommended further research on client outcomes, as “the importance of formulating and testing inferences about the relations of clinical supervision to client outcomes seems obvious” (p. 488). But conversely, Freitas (2002) thought that it “axiomatic that clinical supervision is conducted for the benefit of … trainees …” (i.e. supervisees: p. 363), whilst Wampold and Holloway (1997) argued that “… changes in the therapist characteristics represent the primary goal of supervision” (p. 21). By contrast, Milne (2009) reverted to a traditional medical conception of the acid test, that of safety (“first, do no harm,” from the Hippocratic Oath). That review of the many valued outcomes had “safe and effective therapy” (p. 37) bracketed together as the ultimate purpose of supervision. Wheeler and Richards (2007) reviewed 18 carefully selected studies and found the “impact” of supervision had been measured by a total of eight types of supervisee outcomes (e.g., enhanced competence), and by means of 40 diverse instruments and qualitative approaches and concepts. Not surprisingly, Wheeler and Richards (2007) concluded that there needed to be a more clearly defined research agenda, akin to the efforts made within the therapy literature to develop and apply core outcome measures (Barkham & Wheeler, 2014).

In summary, there is no consensus about the definitive test of supervision, undermining the social validity of the acid test metaphor, nor has empirical support been forthcoming. Also limiting progress are current conceptualizations of evaluation within supervision that derive from medicine. How else might we construe evaluation? Does a discriminating use of the structure-process-outcome logic complement other informative criteria? One promising approach is the fidelity framework.

The Fidelity Framework

Fidelity is defined as those methodological strategies that help to monitor and enhance the reliability and validity of behavioral interventions. Specifically, “the overall goal of enhancing treatment fidelity is to increase scientific confidence that changes in the dependent variable are attributable to the independent variable” (Borrelli, Sepinwall, Ernst, Bellg., Cjajkowski, Gregor, DeFrancesco, Levesque, et al., 2005, p. 852). According to Borelli et. al., (2005), there are five successive steps in a systematic evaluation, which essentially maintains the tradition of developing taxonomies to guide comprehensive evaluation in the social sciences (e.g., Anderson & Krathwohl, 2001). Step one in the framework is. intervention design (Borelli et al. 2005). In relation to the present review, this step offers a conceptualization of supervision, indicating how it should be evaluated (addressing the question: “what is the right thing to do?”). Step two concerns the “training” of supervisors, to ensure that the right thing is being done (describing and standardizing the way that supervisors are trained). Evaluation of this step includes demonstrating that competence has been acquired and maintained faithfully. Having designed and delivered a program for training supervisors, the next step is to evaluate whether the supervision has actually been conducted correctly: “Has the right thing been done?” This focuses on whether supervision was implemented faithfully, with what has traditionally been termed “integrity.” This is usually assessed by monitoring the supervisor’s proficiency, including interpersonal effectiveness (e.g. developing a strong alliance). Having designed a system for training supervisors and ensured that it is being delivered as intended, the next step in the fidelity framework is to evaluate whether the “right” initial impacts occur with the supervisee. This is termed “receipt” and corresponds to mini-outcomes in therapy (McCullough, Winston, Farber, Porter, Laikin, Vingiano & Trujillo, 1991). It is the first fidelity step that involves the traditional view of outcome evaluation and asks: “Is supervision resulting in the right (mini) outcomes?” The next step is “enactment,” an assessment of whether or not the supervisees apply what they have learned in supervision within their subsequent therapy. This is the final step in the fidelity framework as set out by Bellg et al. (2004) and Borrelli et al. (2005). But because I have added a supervisor, we require a sixth step: evaluation of the clinical outcomes obtained by the supervisee (therapist) with clients, another form of generalization. As noted above, the use of this criterion has often been regarded as the acid test of supervision (e.g., Ellis & Ladany, 1997; Stein & Lambert, 1995). A final logical step that has been added is systemic evaluation. These six steps are summarized in Table 1.

Table 1 APPLYING AN AUGMENTED FIDELITY FRAMEWORK TO REFORMULATE OUTCOME EVALUATION IN CLINICAL SUPERVISION (BELLG, ET AL., 2005; BORRELLI, ET AL., 2004)

Fidelity Dimensions and Objectives:Reformulation of the Evaluation of Supervision:
A. Evaluation design
Ensure that a study can properly test its hypotheses, based on the relevant theory (i.e. hypothesis validity). Addresses the question:
“What is the right way to supervise”?
1.

“Do no harm”: examine for side effects of supervision and the associated therapy (e.g., Procedural evaluation: Were supervisors suitably qualified and experienced?).

2.

Supervision model-development and testing: replace medical model with supervision-specific formulation; conduct better-focused evaluations (e.g., measure “active ingredients”).

3.

Research question guided by model, leading to specific evaluation objectives.

B. Training in supervision
Standardise training (e.g., by using manuals) so as to meet competence criteria; monitor and boost competencies; assess adherence. Addresses the question:
“Has the right supervision been done”?
4.

Manipulation check: evaluate the training of supervisors, so that adherence to the intervention can be assessed (e.g., evaluation of “structure”: frequency, duration or content of supervision)

C. Delivery of supervision
Monitor whether supervision is being provided as specified (i.e. tape sessions to check adherence and competence); attempt to control for non-specific factors; strengthen adherence. Addresses the question:
“Has supervision been done right”?
5.

Intervention integrity: monitor the supervisor’s skill and interpersonal effectiveness in adhering to the intervention (process evaluation of competence and alliance).

D. Receipt of supervision
Attending to whether the supervisee (i.e. the therapist) benefits during the supervision session, as in showing signs of better understanding or greater proficiency (e.g., demonstrating a competence within an educational role-play). Addresses the question:
“Did supervision result in the right outcomes”?
6.

Evaluate the successive impacts of supervision (“mini-outcomes” or mediator/mechanism evaluation: e.g., assessing if reflection or action-planning took place during supervision, using content evaluation).

E. Enactment of supervision
Extent to which the supervisee demonstrates these competencies in therapy (transfer). Addresses the question:
“Did supervision result in the right therapy”?
7.

Evaluating the supervisee’s competence in providing therapy (stepwise evaluation).

F. Effects of therapy
Generalization of supervision to therapy and beyond (across people, settings and time). Monitor social context for adverse and positive reactions to therapy (e.g., heightened marital distress). Addresses the question:
“Did supervision result in the right clinical outcome”?
8.

Clinical outcome evaluation (the “acid test”): effect of supervision on clients’ daily lives (e.g., symptomatic distress; interpersonal problems; social role functioning). Can include comparative outcome evaluation or “efficiency” assessments, comparing alternative supervision approaches.

9.

System outcome evaluation: effect of changes in client on client’s social system (e.g., relationships; work functioning). Final step in stepwise outcome evaluation.

Table 1 APPLYING AN AUGMENTED FIDELITY FRAMEWORK TO REFORMULATE OUTCOME EVALUATION IN CLINICAL SUPERVISION (BELLG, ET AL., 2005; BORRELLI, ET AL., 2004)

Enlarge table

The fidelity framework essentially unpacks Donabedian’s (1966) structure-process-outcome approach, but it does not subsume all relevant types of evaluation. For example, evaluations of training-based interventions can also usefully encompass information on the organizational system (“procedures”) and the “contents” of an intervention. To offer a more comprehensive account, Milne (2007) summarized an evaluation’s SCOPPE (i.e. Structure; Content; Outcome; Processes; Procedures; and Efficiency). These six evaluation criteria are, therefore, incorporated within the augmented version of the fidelity framework set out in Table 1, to contribute to a supervision-specific reformulation of outcome evaluation.

In the next part of this review I offer justifications for all six criteria within this reformulation, alongside suggestions for future evaluations. I propose to do this by drawing, as appropriate, on related literatures, following Lambert and Ogles (1997) and Milne (2006). Here I will extend this reasoning by turning for guidance to a wider sample of relevant literatures: those of staff development, service evaluation, applied research, and therapy.

Evaluation Design

1: “Do No Harm:” Client (And Supervisee) Safety Comes First

One reason to question the acid test is the traditional prioritization of patient safety within medicine (Milne, 2009). Before we turn our attention to effectiveness, we need to check whether our intervention is harmful to clients (or to supervisees). In a review of randomized clinical trials, about 35% of clients were considered to have experienced no benefit, and between 5% and 10% of patients deteriorated (Hansen, Lambert, & Forman, 2002).

One might say, by extension, that doing no harm to the supervisee is the second most important objective in supervision. Ellis (2010) summarized a survey of 363 multi-disciplinary supervisees, suggesting that 36% of these supervisees were currently receiving harmful supervision. Furthermore, 33% of them judged that their inadequate supervision was harmful to their clients. Ellis (2010) concluded that “some supervisors are harming supervisees and harming clients” (p. 109). These data underline the importance of clarifying the right way to supervise, and question the acid test logic.

2: Evaluate Supervision-Specific Models

One possible reason for harmful supervision is that there is no guiding model for beneficial supervision. According to the fidelity framework, a clear conceptualization of supervision contributes to hypothesis validity (Wampold, David and Good, 1990). In terms of evaluation, theoretically guided outcome measurement improves our inferences about process-outcome relationships (especially mechanisms of change), potentially enhancing supervision. For example, Lambert (1980) noted that measures up until 1980 had focused on the supervisee’s behavior, the clinical effectiveness of supervision, and the interaction between the supervisee and client. Subsequently, Holloway (1984) argued that other intervening outcomes should also be assessed, reflecting the multiple functions of supervision.

In terms of the outcome criteria, this modeling is a rare attempt to integrate the diverse outcomes and explanatory variables within the literature, and it broadly anticipated the fidelity framework. Another noteworthy feature of this general model is that Wampold and Holloway (1997) distinguished between learning that is concerned with a specific client and its generalization to future clients. This is another respect in which the acid test is questionable, in that “… change in all patients … present and future, is paramount…” (p. 21). Befitting their model, they concluded with some causal predictions.

Other instances of such explicit modeling can be found in Milne’s “evidence-based” approach (2008), which at different times has included a detailed circumplex diagram, an elaborated analogy with a tandem bicycle (Milne, 2009), and a flow-chart diagram which formulated the causal chain in terms of moderators, mediators and mechanisms (Milne, Aylott, Fitzpatrick and Ellis, 2008). All three of these models treated the supervisee’s engagement in experiential learning as the main outcome (i.e., the dependent variable within the accompanying research studies), although all recognized clinical outcomes as an important future focus within a stepwise research program.

We readily recognise the need for concepts and models that are specific to clinical supervision, as they help us to better understand how identified variables can shape our practice and contribute to success. This need is reflected in the way that models feature prominently within textbooks (e.g., Watkins, 1997). Conceptual models also heighten our awareness of the assumptions that we are making about processes and outcomes, guiding research and development activities (e.g., Milne, Aylott, Fitzpatrick & Ellis, 2008). Better theorising also increases the likelihood that causal inferences are valid (Wampold, David and Good, 1990).

3. Pose A Research Question and State Related Evaluation Objectives

Closely linked to models is the selection of a research question. This is an alternative to assuming that a particular aim, such as the acid test, should dominate research and represents a further counter to the acid test logic. In experimental research, the basic scientific process is to focus on an appropriate research question within a given study, which sets the scene for hypotheses or evaluation objectives. By drawing on a suitable model, studies should justify and test specific predictions, affording vital clarity (Petticrew, 2011). In terms of such basic research practice, the “right” outcome to evaluate is the one that makes sense in relation to the question being posed: “… measures need to be carefully selected as appropriate to the question being asked….” (Craig et al., 2008, p. 18).

What is at stake is a premature and narrow focusing of supervision research on the acid test, excluding or devaluing other potentially important outcomes. These are important in terms of the many other objectives of supervision, the links between variables, and because we do not yet know enough about how supervision works to exclude additional measures. As Holloway (1984) put it, simply studying clinical outcomes is “… a narrow definition of outcome criteria prematurely limits empirical enquiry…” (p. 167). Instead, we should be encouraging study-specific outcomes, to foster focused research (Petticrew, 2011).

Training in Supervision

4. Conduct Manipulation Checks

Many supervisors have not been trained to supervise and there are diverse concepts of training (Wheeler, 2004). The consequence of this lack of consistent “structure” is that what is practiced in the name of supervision is likely to be equally diverse. Methodologically, this undermines the kind of causal reasoning that leads to inferences about the acid test. The principal inference at risk is that clinical effectiveness is due to an assumed kind of supervision, even though the supervisor may be untrained or not possess relevant competences. Conversely, the absence of an effect may be due to problems in applying high-quality supervision.

According to Craig et al., (2008), a second key question in evaluating complex interventions such as supervision is to ask how the intervention works: What are the active ingredients? Knowledge of outcomes in the absence of knowledge of the explanatory processes does little to illuminate or advance interventions (Campbell, Murray, Darbyshire, Emery, Farmer et al., 2007; Donabedian, 1966; Rossi et al., 2003). One contender for the active ingredient/explanatory process is experiential learning, as this was included in 24 of the 28 effective supervisor training studies reviewed by Milne et al. (2008). It is by answering this kind of question that we can be more confident about our inferences, and design more effective supervisor training, leading to better therapy. To illustrate, Schoenwald, Sheidow, and Chapman (2009) determined the degree of adherence to their treatment regime, finding that their carefully trained supervisors fostered therapist adherence, which in turn predicted outcomes. Although some reviewers have deemed this attention to process (including adherence) as “side-stepping” the challenges inherent in applying the acid test (Kilminster and Jolly, 2000, p. 831), there are sound reasons for complementing outcome evaluation with suitably linked variables. Ultimately, we need to know that “the right supervision has been done,” before enquiring about outcomes. By ensuring adherence we improve outcomes.

Delivery of Supervision

5. Monitor Whether Supervision is Being Provided Competently

In asking whether “supervision been done right,” we can also monitor whether supervision has been provided competently, complementing the assessment of adherence with one concerned with proficiency (including interpersonal effectiveness, as logically one can do the right thing wrong, or vice versa). Donabedian (1966; 2005) explicitly acknowledged conceptual and methodological difficulties with outcome evaluation, emphasizing that outcomes “… must be used with discrimination” (p. 694), in conjunction with process and structure data. Only by linking the resource investment to the nature of the intervention can we logically interpret the kinds of outcomes that an intervention such as supervision achieves, logic that is accepted in the mental health field generally (e.g., Rossi et al., 2003), and within supervision specifically (e.g., Inman and Ladany, 2008; Watkins, 2011; Westefeld, 2009).

It is customary to focus on the supervisor’s strengths and weaknesses (e.g., Shanfield, Mathews & Hetherly, 1993), but we should also consider the supervisee’s contribution to the process, as either may negate the other, preventing effective delivery. A case in point is resistance, in the form of the transactional “games” that supervisees or supervisors may play (Kadushin, 1968). Another form is collusion, where both parties engage in complementary safety behaviours that ultimately prevent any outcome from occurring (Milne, Leck and Choudhri, 2009).

Receipt of Supervision

6. Evaluate Mediators and Mechanisms

If we want to answer the question “did supervision result in the right outcomes,” we are required to know about the positive reaction of the supervisee in relation to mini-outcomes. In the staff development literature this mini-outcome has been recognised as commencing with attendance, followed by “participation or completion” (Belfield, Thomas, Bullock et al., 2001). We should be particularly interested in active ingredients or change mechanisms, such as experiential learning, as these improve our understanding and effectiveness (Campbell et al., 2007). In this sense, collusion can be viewed as blocking the processes that lead to receipts and outcomes. A mechanism is integral to a change process, the means through which the supervision achieves an effect on the outcome variable (e.g., reflection increasing understanding). By contrast, a mediator variable is a supervisory activity (e.g., Socratic questioning) that has a main or interactive effect on the outcome variable (Baron & Kenny, 1986). While this language implies a mechanistic or quantitative approach, qualitative methods are actually well-suited to the study of these often subtle processes. To illustrate, Johnston and Milne (2012) used a Grounded Theory Methodology, based on interviews with seven supervisees, to develop a model of receipt. This indicated that learning was facilitated when several core processes interacted (including, reflection, Socratic information exchange, and alliance), against a developmental backdrop. Similarly, the “episode” approach (Ladany, Friedlander, & Nelson, 2005) draws on task analysis within psychotherapy research to highlight critical events. This is an example of “content evaluation”, as illustrated by Breese, Boon and Milne (2012), who reported an intensive case study that detailed several mediators and mechanisms. If we can illuminate such explanatory processes and mechanisms we stand to improve models, training, and measurement (Campbell et al., 2007).

Enactment of Supervision

7. Conduct Stepwise Evaluations

It is one thing to establish that the supervisee has properly received supervision, in terms of such outcomes as engaging in experiential learning and developing competence; it is another thing to establish that such development has transferred to therapy and that it: “resulted in the right therapy.” Generalization across time, situations, and people is surely amongst the most challenging of objectives within applied psychology, and it is difficult to evaluate. One coping strategy for researchers is to take things one step at a time. This is a well-established approach within staff development, where outcome evaluation often draws on the Kirkpatrick taxonomy (1967). This comprises four successive levels of outcome, ranging from simple reactions (i.e., supervisees’ satisfaction with their supervision), to impacts on a service system (including the acid test). This taxonomy has latterly been augmented by Kraiger et al. (1993), Alliger et al. (1997) and Belfield et al. (2001). As a result, changes in knowledge, skills, and attitudes as a result of learning have been added. This builds on the use of taxonomies in education (Anderson & Krathwohl, 2001) and makes it highly relevant to supervision, as anticipated by Holloway (1984), Holloway and Neufeldt (1995), and Wampold & Holloway (1997). Specifically, the Wampold and Holloway (1997) evaluation model is a taxonomy that proceeds from the most proximal outcomes of supervision (supervisees’ reactions) to the most distal (clinical benefits). More distinctively, it presents a diagrammatic model to explicitly detail a general causal chain between each step, from supervisee reactions (e.g., satisfaction; relationship quality) through supervisee performance (in supervision: e.g., reflective), to therapist performance (e.g., skill) and on to client change, mediated by the supervisee’s characteristics as a therapist (e.g., attitudes). Wampold and Holloway (1997) judged that all five of these foci should be treated as supervision outcomes, based on the logic that each was a necessary condition for the next successive outcome (e.g., reflection is necessary for learning).

Such stepwise articulations of outcome evaluation complement the emphasis on mediators and mechanisms, helpfully drawing out the possible links between supervision and clinical effectiveness. They also afford an appropriately complex view of supervision as a multi-dimensional intervention (Heppner, Kivlighan, Burnet et al., 1994), reflecting the multifaceted roles of supervisor and trainee, together with their reciprocal influences (Holloway, 1984). Additionally, this gels with reviews of the psychotherapy outcome literature in stressing different levels of analysis treated collectively (Norcross, 2002). This includes using micro-analysis of content and outcome, treated as responsive to one another (Shapiro, 1995). It also accords with general evaluation logic. For example, Rossi et al. (2004) distinguished between “proximal” (mediating) and “distal” outcomes. In summary, adopting a stepwise (taxonomy-based) strategy to evaluation encourages a well-founded, progressive research agenda (Wampold & Holloway, 1997).

Effects of Therapy

8. Precise Clinical Outcome Evaluation

If we can establish that generalization of the supervision has taken place within therapy, then the penultimate step within the fidelity framework is to ask whether supervision carries consequential clinical benefits for the patient, the so-called “acid test:” Did supervision result in the right clinical outcomes? The kinds of outcomes that have been assessed were itemized in a review of 24 studies by Milne (2007a). These studies, many of which were from the learning disabilities field, mostly assessed clinical functioning (e.g., self-harm: 42% of studies), followed by measures of distress (e.g., low mood: 25% of studies), followed by clients’ quality of life, measured in 17% of studies. Similar criteria were noted in a subsequent review of 18 supervision outcome studies by Watkins (2011), drawn mostly from adult mental health. Additionally, he noted such outcome variables as client satisfaction, social functioning, therapeutic alliance, and goal-attainment. However, he regarded only three of these studies as interpretable. In these three studies, the clinical outcomes reported by Watkins (2011) were depressive symptom reduction and treatment completion (Bambling et al., 2006), reductions in psychotic symptoms (Bradshaw et al., 2007), and treatment satisfaction and quality of care (White & Winstanley, 2010). These criteria overlap with current thinking on outcome criteria, which recognizes that there is “… no single measure that can serve as the sole indicator of clients’ treatment-related gains …,” but that symptomatic distress, functional impairment and quality of life are key domains to consider (Comer and Kendall, 2013, p. 32).

In summary, there is no consensus on the clinical outcomes of supervision, which is similar to the disagreement as to what constitutes a good therapy outcome (Sharpless and Barber, 2009). Although such diversity is entirely consistent with the need for specific research questions and model-driven outcomes, the lack of consensus undermines the idea of a definitive or acid test. The implication is to select and justify precise clinical outcome measures, as these enhance falsification efforts (Comer & Kendall, 2013).

9. Systemic Outcome Evaluation

Logically, the final outcome question is whether supervision resulted in the right outcome, as assessed systemically. This starts with clients’ interpersonal problems and social role functioning (e.g., work performance), and can extend to various side effects (unintended outcomes) or successes that are increasingly distal (generalizing over time, settings or people). For instance, for supervisees it might reduce job-related burnout across large geographical areas, courtesy of web-based training (e.g., Weingardt, Cucciare, Bellotti and Lai, 2009), or lead to improvements in the quality of care across several hospital wards, through the application of team supervision (e.g., Hyrkas and Lehti (2003). The outer limits can be defined as those of community psychology (e.g., Orford, 1992). For reasons of space and the paucity of relevant research, the interested reader is referred to informative examples from research on staff development and parent management. For example, Beidas & Kendall’s (2010) systemic perspective on staff development concluded that training only enhanced clinical outcomes when studies manipulated contextual factors (quality of training, organizational support, etc.). Amongst the organizational support factors was supervision, which was regarded as “crucial” for ensuring skilful therapy (p. 26). Beidas and Kendall (2010) regarded the therapist as nested within a range of variables that behaved in a transactional fashion, underlining the complex, dynamic nature of systems. Similarly, a review of parent management training (Kazdin & Weisz, 1998) concluded that treatment effects have been evident in clinically significant improvements on a wide range of measures, given favorable contextual factors. In summary, thinking systemically helps us to recognize the complex determinants of effective supervision (Beidas & Kendall, 2010).

Discussion

The impact of clinical supervision on client outcome (the “acid test”) is widely accepted as a definitive measure of the effectiveness of supervision. This argument, has been given careful examination in the present review. I questioned the acid test criterion on many grounds, starting with the traditional argument that client safety (doing no harm) is the most important outcome. However, automatically prioritizing any single out-come within a complex, hierarchical system seems misguided, as there are many reasons for providing supervision, each with a valid outcome, and it would be unfortunate if these were devalued by the acid test emphasis.

This is not to question calls for evidence of a direct link (Watkins, 2011), but rather to encourage reflection on the research priorities, leading to a systematic approach. To support this effort, I have outlined an integrative reconceptualization of outcome evaluation, built on the scaffold of the fidelity framework. This was augmented to include the supervisor, and used to reformulate outcome evaluation in clinical supervision. Including clinical outcomes, nine suggestions for enhanced outcome evaluation were made. These drew from the neighbouring literatures in applied psychology, education, medicine, and research methodology.

Although I am enthusiastic about the augmented fidelity framework, I recognise that this taxonomy (and the appeal to neighbouring literatures) places a heavy reliance on a succession of analogies. As stressed in the introduction, analogies need to be treated sceptically and tested empirically. A second note of caution is that there needs to be a balance between fidelity and adaptability: We need to allow supervisors to exercise judgement (McHugh, Murray, & Barlow, 2009). Third, the framework may be unnecessarily cumbersome, in that there are elegant methodological options, such as regression analysis, which allow one to work back from precise clinical outcomes to the contributory steps (e.g., Callahan, Almstrom, Swift, Borja & Heath (2009). Another criticism of the present review is that it leans heavily on a quantitative paradigm and on postpositivist reasoning. It is surely the case that qualitative, constructivist methodologies can enhance this approach.

In conclusion, this reconceptualization of outcome evaluation represents a supervision-specific, systematic agenda for evaluating clinical supervision. It affords a “more complete conceptual and empirical exploration of the definition of quality”, as urged by Donabedian (1966; 2005; p. 716) and suggests a way to tackle the “supervision-patient outcome riddle” (Watkins (2011, p. 252). The framework draws on accepted models and firm findings from related fields, proposing nine principles to guide supervision research and encourage innovative practice. It may even provide the foundation for an overdue “evolutionary leap” in evaluation (Ellis, D’Iuso and Ladany, 2008, p. 496), given these advantages:

emphasising better theorising, increasing the likelihood that causal inferences are valid (Wampold, David and Good, 1990);

encouraging study-specific outcomes, fostering focused research (Petticrew, 2011);

ensuring adherence, improving outcomes (Schoenwald, Sheidow and Chapman (2009);

improving proficiency, so we can logically interpret outcomes (e.g., Rossi et al., 2003);

illuminating the explanatory processes and mechanisms (Campbell et al., 2007), improving models, training, measurement, etc.;

adopting a stepwise (taxonomy-based) strategy to evaluation (Holloway and Neufeldt, 1995), encouraging a well-founded, progressive research agenda;

selecting precise clinical outcome measures, enhancing falsification efforts (Comer & Kendall, 2013), and

thinking systemically (Beidas & Kendall, 2010), to identify any side-effects and recognize the complex determinants of effective supervision.

School of Psychology, Newcastle University, England. 4th Floor, Ridley Building, University of Newcastle, Newcastle upon Tyne, NE1 7RUNE1 7RU, England. e-mail:

Acknowledgments:

I am indebted to Robert Reiser for his comments on an earlier draft.

References

Alliger, G.M., Tannenbaum, S.I., Bennett, J.R., Traver, H., & Shotland, A. (1997). A meta-analysis of the relations among training criteria. Personnel Psychology, 50, 341–358.CrossrefGoogle Scholar

Anderson, L.W.Krathwohl, D.R. (Eds.). (2001). A taxonomy for learning, teaching and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman.Google Scholar

Bambling, M., King, R., Raue, P., Schweitzer, R., & Lambert, W. (2006). Clinical supervision: Its influence on client-rated working alliance and client symptom reduction in the brief treatment of depression. Psychotherapy Research, 16 (3), 317–331.CrossrefGoogle Scholar

Barkham, M., & Wheeler, S. (2014). A core evaluation battery for clinical supervision. In: C.E. WatkinsD.L. Milne (Eds.): International handbook of clinical supervision. Chichester: Wiley.Google Scholar

Barron, R. & Kenny, D. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic and statistical considerations Journal of Personality and Social Psychology, 51, 1173–82.Crossref, MedlineGoogle Scholar

Beidas, R.S., & Kendall, P.C. (2010). Training therapists in evidence-based practice: A critical review of studies from a systems-contextual perspective. Clinical Psychology: Science & Practice, 17, 1–30.Crossref, MedlineGoogle Scholar

Belfield, C., Thomas, H., Bullock, A., Eynon, R., & Wall, D. (2001). Measuring the effectiveness for best medical education: a discussion. Medical Teacher, 23, 164–170.Crossref, MedlineGoogle Scholar

Bellg, A.J., Borrelli, B., Resnick, B., Hecht, J., Minicucci, D.S., Ory, M., Ogedegbe, G., Orwig, D., Ernst, D., & Czajkowski, S. (2004). Enhancing treatment fidelity in health behavior change studies: Best practices and recommendations from the NIH Behaviour Change Consortium. Health Psychology, 23, 443–451.Crossref, MedlineGoogle Scholar

Borrelli, B., Sepinwall., Ernst, D., Bellg, A.J., Czajkowski, S., Greger, R., DeFrancesco, C., Levesque, C., Sharp, D.L., Ogedegbe, G., Resnick, B., & Orwig, D. (2005). A new tool to assess treatment fidelity and evaluation of treatment fidelity across 10 years of health behaviour research. Journal of Consulting and Clinical Psychology, 73, 852–860.Crossref, MedlineGoogle Scholar

Bradshaw, T., Butterworth, A., & Mairs, H. (2007). Does structured clinical supervision during psychosocial intervention education enhance outcome for mental health nurses and the service users they work with? Journal of Psychiatric and Mental Health Nursing, 14, 4–12.Crossref, MedlineGoogle Scholar

Breese, L., Boon, A., & Milne, D.L. (2012). Detecting excellent episodes in clinical supervision: A case study, comparing two approaches. The Clinical Supervisor, 31, 121–137.CrossrefGoogle Scholar

Callahan, J.L., Almstrom, C.M., Swift, J.K., Borja, S.E., & Heath, C.J. (2009). Exploring the contribution of supervisors to intervention outcomes. Training & Education in Professional Psychology, 3, 72–77.CrossrefGoogle Scholar

Campbell, N.C., Murray, E., Darbyshire, J., Emery, J., Farmer, A., Griffiths, F., Guthrie, B., Lester, H., Wilson, P., & Kinmonth, A.L. (2007). Designing and evaluating complex interventions to health care. British Medical Journal, 334, 455–459. doi: 10.1136/bmi.39108.379965.BE.Crossref, MedlineGoogle Scholar

Comer, J.S., and Kendall, P.C. (2013). Methodology, design and evaluation in psychotherapy research. In: M.J. Lambert (Ed.), Bergin and Garfield’s Handbook of psychotherapy and behavior change (6 th edition): Hoboken, NJ: Wiley (pp. 21–48).Google Scholar

Concise Oxford English Dictionary. (2004). Oxford University Press.Google Scholar

Craig, P., Dieppe, P., Macintyre, S., Michie, S., Nazarth, I., & Petticrew, M. (2008). Developing and evaluating complex interventions: the new Medical Research Council guidance. British Medical Journal, 337, a1655. doi: 10.1136/bmj.a1655.Crossref, MedlineGoogle Scholar

Donabedian A. (1966). Evaluating the quality of medical care. Milbank Memorial Fund Quarterly, 44: Suppl: 166–206. Reprinted in The Milbank Quarterly, 83, 2005 (pp. 691–729). Cited page numbers are from the 2005 version.CrossrefGoogle Scholar

Ellis, M.V. (2010). Bridging the science and practice of clinical supervision: Some discoveries, some misconceptions’. The Clinical Supervisor, 29, 95–116. doi: 10.1080/07325221003741910.CrossrefGoogle Scholar

Ellis, M.V., D’Iuso, N., & Ladany, N (2008). State of the art in the assessment, measurement & evaluation of clinical supervision. In: A.K. HessK.D. HessT.H. Hess (Eds.). Psychotherapy supervision: Theory, research & practice (2 nd edition; pp. 473–499).Google Scholar

Ellis, M. & Ladany, N. (1997). Inferences concerning supervisees and clients in clinical supervision: An integrative review. In Watkins, C.E. (Ed.), The handbook of psychotherapy supervision. Chichester: WileyGoogle Scholar

Epstein, R.M., & Hundert, E.M. (2002). Defining and assessing professional competence. Journal of the American Medical Association, 287, 226–235.CrossrefGoogle Scholar

Freitas, G.J. (2002). The impact of psychotherapy supervision on client outcome: A critical examination of two decades of research. Psychotherapy: Theory/Research/Practice/Training 39, 354–367.CrossrefGoogle Scholar

Grey, N., Salkovskis, P., Quigley, A., Clark, D.M., & Ehlers, A. (2008). Dissemination of cognitive therapy for panic disorder in primary care. Behavioural & Cognitive Psychotherapy, 36, 509–520.Crossref, MedlineGoogle Scholar

Hansen, N.B., Lambert, M.J., & Forman, E.V. (2002). The psychotherapy dose-response effect and its implications for treatment delivery services. Clinical Psychology: Science and Practice, 9, 329–343.CrossrefGoogle Scholar

Heppner, P.P., Kivlighan, D.M., Burnett, J.W., Berry, T.R., Goedinghaus, M., Doxsee, D.J., Hendricks, F.M., Krull, L.A., Wright, G.E., Bellatian, A.M., Durham, R.J., Tharp, A., Kim, H., Prossart, D.F., Wang, L-F., Witty, T.E., Kinder, M.H., Hertel, J.B., & Wallace, D.L. (1994). Dimensions of characterize supervisor interventions delivered in context of live supervision of practical counselors. Journal of Counseling Psychology, 41, 227–235.CrossrefGoogle Scholar

Holloway, E.L. (1984). Outcome evaluation in supervision research. The Counselling Psychologist, 12, 167–174.CrossrefGoogle Scholar

Holloway, E.L. & Neufeldt, S.A. (1995). Supervision: Its contribution to treatment fficacy. Journal of Consulting and Clinical Psychology, 63, 207–213.Crossref, MedlineGoogle Scholar

Hyrkas, K., & Lehti, K. (2003). Continuous quality improvement through team supervision supported by continuous self-monitoring of work and systemic patient feedback. Journal of Nursing Management, 11, 177–188.Crossref, MedlineGoogle Scholar

Inman, A.G., & Ladany, N. (2008). Research: The state of the field. In: A.K. HessK.D. HessT.H. Hess (Eds.). Psychotherapy supervision. (pp. 500–517). Hoboken, NJ: Wiley.Google Scholar

Johnston, L.H., & Milne, D.L. (2012). How do supervisee’s learn during supervision? A Grounded Theory study of the perceived developmental process. The Cognitive Behaviour Therapist, 5, 1–23.CrossrefGoogle Scholar

Kadushin, A. (1968). Games people play in supervision. Social Work, 13, 23–32.Google Scholar

Kazdin, A.E., & Weisz, J.R. (1998). Identifying and developing empirically-supported child and adolescent treatments. Journal of Consulting and Clinical Psychology, 66, 19–36.Crossref, MedlineGoogle Scholar

Kilminster, S.M. & Jolly, B.C. (2000). Effective supervision in clinical practice settings: A literature review. Medical Education, 34, 827–840.Crossref, MedlineGoogle Scholar

Kirkpatrick, D.L. (1967). Evaluation of training. In R.L. CraigL.R. Bittel (Eds.) Training and development handbook (pp. 87–112). New York: McGraw-Hill.Google Scholar

Kraiger, K., Ford, J.K., & Salas, E. (1993). Application of cognitive skills-based and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78, 311–328.CrossrefGoogle Scholar

Ladany, N., Friedlander, M., & Nelson, M. (2005). Critical events in psychotherapy supervision. An interpersonal approach. Washington, DC: American Psychological Association.CrossrefGoogle Scholar

Lambert, M.J. (1980). Research and the supervisory process. In A.K. Hess (Ed): Psychotherapy supervision: Theory, research and practice (pp 423–450). New York: Wiley.Google Scholar

Lambert, N.J. & Ogles, B.M. (1997). The effectiveness of psychotherapy supervision. In: C.E. Watkins (Ed.) Handbook of psychotherapy supervision (pp. 421–446). New York: Wiley.Google Scholar

McCullough, L., Winston, A., Farber, B.A., Porter, F., Laikin, M., Vingiano, W., & Trujillo, M. (1991). The relationship of patient-therapist interaction to outcome in brief psychotherapy. Psychotherapy, 28, 525–533.CrossrefGoogle Scholar

McHugh, R.K., Murray, H.W., & Barlow, D.H. (2009). Balancing fidelity and adaptation in the dissemination of empirically-supported treatments: The promise of trans-diagnostic interventions. Behavior Research & Therapy, 47, 946–953. doi: 10.1016/j.brat.2009.07.005.Crossref, MedlineGoogle Scholar

Milne, D.L. (2006). Developing clinical supervision through reasoned analogies with therapy. Clinical Psychology and Psychotherapy, 13, 215–222.CrossrefGoogle Scholar

Milne, D.L. (2007) Evaluation of staff development: the essential “SCOPPE”. Journal of Mental Health, 16, 389–400.CrossrefGoogle Scholar

Milne, D.L. (2007a). An empirical definition of clinical supervision, British Journal of Clinical Psychology, 46, 437–447.Crossref, MedlineGoogle Scholar

Milne, D.L. (2008). Evaluating and enhancing supervision: an experiential model. In: C. FalenderE. Shafranske (Eds.). Clinical supervision: A competency-based approach-casebook. Washington DC: American Psychological Association.CrossrefGoogle Scholar

Milne, D. (2009). Evidence-based clinical supervision: Principles and practice. Chichester, U.K.: BPS Blackwell.Google Scholar

Milne, D.L., Aylott, H., Fitzpatrick, H., & Ellis, M.V. (2008). How does clinical supervision work? Using a best evidence synthesis approach to construct a basic model of supervision. The Clinical Supervisor, 27, 170–190.CrossrefGoogle Scholar

Milne, D.L., Leck, C., & Choudhri, N.Z. (2009). Collusion in clinical supervision: literature review and case-study in self-reflection. The Cognitive Behaviour Therapist, 2, 106–114.CrossrefGoogle Scholar

Norcross, J.C. (2002). Psychotherapy relationships that work. Oxford: Oxford University Press.Google Scholar

Orford, J. (1992). Community psychology: Theory and practice. Chichester, U.K.: Wiley.Google Scholar

Petticrew, M. (2011). When are complex interventions complex? When are simple interventions simple? European Journal of Public Health, 21, 397–399.Crossref, MedlineGoogle Scholar

Rossi, P.H., Freeman, H.E., & Lipsey, M.W., (2004), Evaluation: A systematic approach, London, U.K.: Sage.Google Scholar

Schoenwald, S.K., Sheidow, A.J., & Chapman, J.E. (2009). Clinical supervision in treatment transport: Effects on adherence and outcomes. Journal of Consulting and Clinical Psychology, 77, 410–421.Crossref, MedlineGoogle Scholar

Shanfield, S.B., Mathews, K.L., & Hetherly, V. (1993). What do excellent supervisors do? American Journal of Psychiatry, 150, 1081–1084.Crossref, MedlineGoogle Scholar

Shapiro, D.A. (1995). Finding out how psychotherapies help people change. Psychotherapy Research, 5, 1–21.CrossrefGoogle Scholar

Sharpless, B.A., & Barber, J.P. (2009). A conceptual and empirical review of the meaning, measurement, development, and teaching of intervention competence in clinical psychology. Clinical Psychology Review, 29, 47–56.Crossref, MedlineGoogle Scholar

Stein, D.M., & Lambert, M.J. (1995). Graduate training in psychotherapy: Are therapy outcomes enhanced? Journal of Consulting and Clinical Psychology, 63, 182–196.Crossref, MedlineGoogle Scholar

Thouless, R.H. (1930). Straight and crooked thinking. London, U.K.: Pan Books.Google Scholar

Wampold, B.E., David, B., & Good, R.H. (1990). Hypothesis validity of clinical research. Journal of Consulting and Clinical Psychology, 58, 360–367.Crossref, MedlineGoogle Scholar

Wampold, B.E., & Holloway, E.L. (1997). Methodology, design, and evaluation in psychotherapy supervision research. In: C.E. Watkins (Ed.): Handbook of psychotherapy supervision. New York: Wiley (pp. 11–27).Google Scholar

Watkins, C.E. (Ed) (1997). Handbook of psychotherapy supervision. New York: Wiley.Google Scholar

Watkins, C.E. (2011). Does psychotherapy supervision contribute to patient outcomes? Considering thirty years of research. The Clinical Supervisor, 30, 235–256.CrossrefGoogle Scholar

Weingardt, K.R., Cucciare, M.A., Bellotti, C., & Lai, W.P. (2009). A randomized trial comparing two models of web-based training in cognitive-behavioral therapy for substance abuse counselors. Journal of Substance Abuse Treatment, 37, 219–227.Crossref, MedlineGoogle Scholar

Westefeld, J.S. (2009). Supervision of psychotherapy: Models, issues, and recommendations. The Counseling Psychologist, 37, 296–316.CrossrefGoogle Scholar

Wheeler, S. (2004). A review of supervisor training in the UK. In I. FlemingL. Steen (Eds.). Supervision and clinical psychology: Theory, practice and perspectives. New York: Brunner-Routledge. (pp. 15–35).Google Scholar

Wheeler, S., & Richards, K. (2007). The impact of clinical supervision on counselors and therapists, their practice and their clients. A systematic review of the literature. Counseling and Psychotherapy Research, 7, 54–65.CrossrefGoogle Scholar

White, E., & Winstanley, J. (2010). A randomized controlled trial of clinical supervision: Selected findings from a novel Australian attempt to establish the evidence base for causal relationships with quality of care and patient outcomes, as an informed contribution to mental health nursing practice development. Journal of Research in Nursing, 15, 151–167.Google Scholar