Skip to main content
Research Article
Originally Published 1 September 2002
Free Access

Improving the Assessment of Outcomes in Stroke: Use of a Structured Interview to Assign Grades on the Modified Rankin Scale

Abstract

Background and Purpose— The modified Rankin Scale is widely used to assess changes in activity and lifestyle after stroke, but it has been criticized for its subjectivity. The purpose of the present study was to compare conventional assessment on the modified Rankin Scale with assessment through a structured interview.
Methods— Sixty-three patients with stroke 6 to 24 months previously were interviewed and graded independently on the modified Rankin Scale by 2 observers. These observers then underwent training in use of a structured interview for the scale that covered 5 areas of everyday function. Eight weeks after the first assessment, the same observers reassessed 58 of these patients using the structured interview.
Results— Interrater reliability was measured with the κ statistic (weighted with quadratic weights). For the scale applied conventionally, overall agreement between the 2 raters was 57% (κw=0.78); 1 rater assigned significantly lower grades than the other (P=0.048). On the structured interview, the overall agreement between raters was 78% (κw=0.93), and there was no overall difference between raters in grades assigned (P=0.17). Rankin grades from the conventional assessment and the structured interview were highly correlated, but there was significantly less disagreement between raters when the structured interview was used (P=0.004).
Conclusions— Variability and bias between raters in assigning patients to Rankin grades may be reduced by use of a structured interview. Use of a structured interview for the scale could potentially improve the quality of results from clinical studies in stroke.
The Rankin Scale1 has wide acceptance as a measure of functional outcome after stroke and has become one of the most popular end points for clinical trials.2 The version of the scale most commonly used in trials is the modified Rankin Scale,3 which is a simple 6-point assessment that includes reference to both limitations in activity and changes in lifestyle.
The reliability of the modified Rankin Scale has been investigated and found to be satisfactory.3 However, comparison with the Barthel Index4 indicates lower levels of interrater agreement for the modified Rankin Scale5 and suggests that some raters may systematically assign higher or lower Rankin grades than others. The descriptions given for the categories of the modified Rankin Scale are broad and open to subjective interpretation. Walking is the only explicit criterion for assessment mentioned, and even for this criterion, it is not specified whether someone requiring an aid should be considered able to walk. It is therefore left open to raters to develop idiosyncratic criteria or to apply the scale in an impressionistic manner. Discrepancies are particularly striking for Rankin grades 2, 3, and 4, and it has been suggested that interviewers should use a checklist of activities of daily living (ADL) to produce greater uniformity in the application of the scale.3 Wolfe and colleagues5 have advocated using Barthel scores to generate ratings on the Rankin Scale and shown that this improves reliability. However, this approach can be applied only to the lower outcome categories of the modified Rankin Scale that relate to the basic ADL assessed by the Barthel Index.
The Glasgow Outcome Scale6 is similar in concept to the Rankin Scale, and the problem of impressionistic use of the Glasgow Outcome Scale has been addressed through the use of a structured interview.7 The purpose of the present study was to develop a structured interview for the modified Rankin Scale and to compare this form of assessment with the conventional application of the scale. We wanted to investigate whether using a structured interview could increase agreement between raters on the modified Rankin Scale. We carried out 2 interrater reliability studies with the same patients 8 weeks apart, the first with the conventional scale and the second with the structured interview. To reduce the possibility of change between the first and second assessments, only patients who had suffered stroke ≥6 months previously were included in the study.

Methods

Structured Interview

The structured interview (Table 1) differs from the conventional guided interview for the modified Rankin Scale by defining specific questions to grade each category. The structured interview developed for the study consists of 5 sections: (1) constant care, (2) basic ADL, (3) instrumental ADL, (4) limitations in participation in usual social roles, and (5) checklist for the presence of common stroke symptoms. Items for inclusion in the interview were selected after review of outcome assessments used in stroke and focus groups held with stroke patients. An initial draft of the interview was piloted before the final version for the study was produced. Section 2 of the interview is based on items from the Barthel Index and followed the definitions provided by Collin et al.8 Sections 3 and 4 are adapted from the structured interview for the extended Glasgow Outcome Scale.7 Unlike a questionnaire, a structured interview allows reformulation of questions to suit the particular circumstances. In each section, restrictions in activities before stroke are recorded, and raters are instructed to discount preexisting limitations in the final rating. Raters were encouraged to interview a relative when possible and to base their assessments on ability to do the task rather than performance. Further details concerning the principles involved in administering a structured interview are given elsewhere.7The structured interview took ≈15 minutes to administer.
The Modified Rankin Scale and Corresponding Sections of the Structured Interview
Modified Rankin Scale3Structured Interview for the Modified Rankin Scale
5=Severe disability: bedridden, incontinent, and requiring constant nursing care and attention.5=Severe disability; someone needs to be available at all times; care may be provided by either a trained or an untrained caregiver. Question: Does the person require constant care?
4=Moderately severe disability: unable to walk without assistance, and unable to attend to own bodily needs without assistance.4=Moderately severe disability; need for assistance with some basic ADL, but not requiring constant care. Question: Is assistance essential for eating, using the toilet, daily hygiene, or walking?
3=Moderate disability; requiring some help, but able to walk without assistance.3=Moderate disability; need for assistance with some instrumental ADL but not basic ADL. Question: Is assistance essential for preparing a simple meal, doing household chores, looking after money, shopping, or traveling locally?
2=Slight disability; unable to carry out all previous activities but able to look after own affairs without assistance.2=Slight disability; limitations in participation in usual social roles, but independent for ADL. Questions: Has there been a change in the person’s ability to work or look after others if these were roles before stroke? Has there been a change in the person’s ability to participate in previous social and leisure activities? Has the person had problems with relationships or become isolated?
1=No significant disability despite symptoms; able to carry out all usual duties and activities.1=No significant disability; symptoms present but not other limitations. Question: Does the person have difficulty reading or writing, difficulty speaking or finding the right word, problems with balance or coordination, visual problems, numbness (face, arms, legs, hands, feet), loss of movement (face, arms, legs, hands, feet), difficulty with swallowing, or other symptom resulting from stroke?
0=No symptoms at all.0=No symptoms at all; no limitations and no symptoms.

Participants

The study was confined to patients surviving stroke by ≥6 months. Study inclusion criteria were as follows: age ≥18 years; diagnosis of stroke 6 to 24 months previously; living at home, living in an institution, and/or attending outpatient clinics; and ability to respond appropriately to interview in English. Excluded from the study were patients with terminal cancer, seizure disorder, dementia, substance or alcohol abuse, and major organ failure (unstable cardiopulmonary function, impaired hepatic or renal function resulting in episodic alterations in functional ADL); those unable and/or unlikely to comprehend and follow the study protocol; and patients not contacted on the advice of their general practitioners. Informed consent was obtained for each study participant.

Procedure

Both raters were neurologists in training; rater 1 (T.B.) was a specialist registrar in neurology with 4 years of experience; and rater 2 (U.G.R.S.) was a senior house officer with 2 years of experience. Before beginning the study, the raters practiced applying the modified Rankin Scale in a stroke population. Patients were assessed on 2 occasions 8 weeks apart. On the first occasion, the 2 raters interviewed each patient independently and assigned a rating on the modified Rankin Scale. The raters were instructed not to confer about ratings of individual patients. Rankin grades were assigned immediately after the initial interview. After all patients had been assessed, the raters were trained to use the structured interview to assign Rankin grades. The patients were then recalled, and each patient was independently assessed with the structured interview.

Statistical Analysis

Strength of agreement between raters is described with the κ statistic that corrects for agreement by chance. When there are >2 points on an assessment, it is appropriate to use a weighted value (κw) to take into account the size of disagreements. To facilitate comparison with previous studies, we used quadratic weights for this analysis. Quadratic weights penalize extreme disagreements particularly heavily (differences are squared), and it has been shown that when weighted this way, κw is comparable to the intraclass correlation coefficient used for continuous measures.9 Brennan and Silman10 suggest the following interpretation of the κ statistic (weighted appropriately) for the agreement between clinical measures: 0 to 0.20=poor, 0.21 to 0.40=fair, 0.41 to 0.60=moderate, 0.61 to 0.80=good, and 0.81 to 1.00=very good. The 95% CIs for κw values are given. Ratings were also compared through the use of appropriate nonparametric tests.

Results

Sixty-three patients were initially recruited into the study and took part in the first assessment in which the modified Rankin Scale was applied in a conventional manner; 58 patients returned (92%) for a second assessment 8 weeks after the first with the Rankin structured interview. Reasons for loss to follow-up were death (n=1), serious illness (n=2), alcohol abuse (n=1), and did not attend (n=1).
The 58 patients (31 men) who took part in both assessments were between 37 and 90 years of age (mean, 68.3 years; SD, 10.95 years). The first assessment took place 6 to 24 months after stroke (mean, 17.1 months; SD, 5.2 months).
Ratings from the first interview, in which the modified Rankin Scale was applied in the conventional manner, are given in Table 2. The overall agreement between raters was 57%; the unweighted κ statistic was 0.44, and κw was 0.78 (95% CI, 0.53 to 1.0). In 8 cases, rater 1 rated patients less favorably than rater 2, and in 17 cases, rater 1 rated patients more favorably. Comparison of grades given by raters indicated a significant overall difference between observers (Wilcoxon Z=−1.98, P=0.048, 2-tailed test).
Modified Rankin Scale: Cross-Tabulation of Rankin Grades Assigned by 2 Raters Who Independently Interviewed 58 Patients
Rater 1Rater 2Total
01234
02    2
1 73  10
21392 15
3  55313
4 1 71018
Total31117141358
Ratings from the structured interview for the modified Rankin Scale are given in Table 3. The overall agreement between raters was 78%, κ=0.70, and κw=0.93 (95% CI, 0.67 to 1.0). In 4 cases, patients were rated less favorably by rater 1, and in 9 cases, they were rated more favorably. There was no significant difference in the overall rankings assigned (Wilcoxon Z=−1.4, P=0.17).
Structured Interview: Cross-Tabulation of Rankin Grades Assigned by 2 Raters for 58 Patients
Rater 1Rater 2Total
01234
01    1
1 14   14
2 782 17
3  27211
4    1515
Total1211091758
To compare the studies, we analyzed disagreements between raters. There were 25 disagreements between raters in the first study and 13 in the second study. Rankin grades for rater 2 were subtracted from Rankin grades for rater 1, and the absolute differences with and without the structured interview were compared. The analysis showed that the extent of disagreement was less when the structured interview was used (Wilcoxon Z=−2.85, P=0.004).
The overall distributions of ratings for each rater (given in the “total” columns in Tables 2 and 3) do not differ substantially between the 2 assessments. Although there were differences in the individual ratings between the 2 assessments (22 for rater 1, 19 for rater 2), only 1 difference was by >1 category. To test whether there was a significant shift in overall scoring for each rater, we compared the Rankin scores given without the structured interview with those obtained with the structured interview using the Wilcoxon test. For both observers, there was no overall difference in the ratings assigned on the 2 assessments (Wilcoxon Z=−1.8, P=0.072 for rater 1; Z=−0.69, P=0.491 for rater 2). The 2 assessments were also highly correlated (Spearman’s correlation, 0.82, P<0.001 for rater 1; Spearman’s correlation, 0.90; P<0.001 for rater 2).

Discussion

The results for the conventionally applied modified Rankin Scale indicate good interrater reliability but indicate that significant bias may be present. The structured interview for the modified Rankin Scale had very good interrater reliability, the extent of disagreement between raters was less, and significant bias was not present. Comparison of ratings on the structured interview and the conventional Rankin showed that they were highly correlated, indicating that the structured interview has satisfactory criterion validity when measured against the conventional Rankin as a standard.
The findings of the present study are consistent with previous studies of the reliability of the modified Rankin Scale.3,5,11 The interrater reliability of conventional assessment with the modified Rankin Scale is satisfactory but nonetheless open to improvement. Direct comparison of the present findings with previous reports is complicated because the distributions of gradings differ. The recruitment criteria for the present study tended to eliminate the most mildly disabled and the most severely disabled groups. To allow comparison with the present study, we reanalyzed the study of Van Swieten et al,3 confining analysis to 67 patients in Rankin categories 0 to 4. This analysis yielded an overall agreement of 61%, a κ of 0.49 and a κw of 0.80 (quadratic weights). Wolfe et al5 reported values of κw ranging from 0.75 to 0.96 for interobserver agreement on the Modified Rankin, and Bamford et al11 gave a value of 0.72 for the version of the Rankin Scale used in the Oxfordshire Community Stroke Project. The reliability of the conventional Rankin Scale in the present study is thus very similar to previous reports. In agreement with the study of Wolfe et al,5 the present study also demonstrates that significant bias may be present even when the κ value is satisfactory. Wolfe et al reported systematic differences in the overall rankings produced by their 3 raters.
Limitations of the present study are that only 2 raters were used, and both came from similar professional backgrounds. In large clinical trials, multiple observers contribute data, and 2 observers do not represent this situation. Using multiple observers with different backgrounds may lead to greater divergence in the application of the conventional Modified Rankin (ie, lower reliability), and there could consequently be a larger effect of introducing a standardized procedure. In the present study, it was not possible to counterbalance order of assessment with the conventional and structured interview, because exposure to the structured interview will inevitably affect the style of approach adopted in the conventional assessment. Once exposed to the structured interview, raters may simply continue to ask the same questions when asked to assess patients on the conventional scale. It is possible that some of the reduction in variability is due to a practice effect for raters or another time difference. However, interrater reliability for the conventional assessment obtained in our study was similar to previous reports; and interrater reliability with the structured interview was similar to that found for the use of a structured interview for the Glasgow Outcome Scale in head-injured patients.7 Our results are therefore comparable with previous studies and suggest that major differences resulting from practice are unlikely. Further study could define the relevance of any time effects and investigate interrater reliability when the assessment is used by multiple raters from different professional backgrounds. After the study, the raters were debriefed in detail, and reasons for disagreements were identified. We have subsequently developed a set of detailed guidelines for the interview and a video for use in training raters. Training raters is particularly likely to be of importance in multicenter clinical trials, and raters should have the opportunity to observe an interview being conducted and record responses themselves. It may also be desirable to have an accreditation process to ensure that observers are applying the scale appropriately.
Van Swieten et al3 noted that further improvement in the Rankin Scale should be possible, and our results confirm that it is possible to reduce variability in ratings. The present findings probably have most relevance to the conduct of multicenter clinical trials. Use of the modified Rankin Scale as a functional assessment in clinical trials is supported by a recent analysis,12 and it is likely to remain a popular end point. Demonstrating convincing treatment effects in acute stroke trials has proved to be a challenge, and it has been suggested that use of less-than-optimal methods to measure outcome may be responsible for problems of inconsistent findings.2 Choi and colleagues13 have demonstrated that misclassification not only reduces the power of a clinical trial but also reduces the size of the observed treatment effect on dichotomous outcomes. The present investigation shows that use of a structured interview for the modified Rankin Scale may help to reduce variation between raters and improve the quality of results in clinical studies.

Acknowledgments

This project was supported by a research grant from Pfizer UK to the University of Stirling. We would like to thank Peter J. Snyder, PhD, Robert Bagdorf, MD, and Michael Krams, MD, for their comments and support of the project.

Footnote

Copies of the Structured Interview for the Modified Rankin Scale and accompanying notes can be obtained from the corresponding author or at http://www.stir.ac.uk/psychology/staff/JTLW1.

References

1.
Rankin J. Cerebral vascular accidents in patients over the age of 60: II. Prognosis Scottish Med J. 1957; 2: 200–215.
2.
Duncan PW, Jorgensen HS, Wade DT. Outcome measures in acute stroke trials: a systematic review and some recommendations to improve practice. Stroke. 2000; 31: 1429–1438.
3.
van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke. 1988; 19: 604–607.
4.
Mahoney FI, Barthel DW. Functional evaluation: the Barthel index. Maryland State Med J. 1965; 14: 61–65.
5.
Wolfe CDA, Taub NA, Woodrow EJ, Burney PG. Assessment of scales of disability and handicap for stroke patients. Stroke. 1991; 22: 1242–1244.
6.
Jennett B, Bond M. Assessment of outcome after severe brain damage: a practical scale. Lancet. 1975; 1: 480–484.
7.
Wilson JTL, Pettigrew LEL, Teasdale GM. Structured interviews for the Glasgow Outcome Scale and Extended Glasgow Outcome Scale: guidelines for their use. J Neurotrauma. 1998; 15: 573–585.
8.
Collin C, Wade DT, Davies S, Horne V. The Barthel ADL Index: a reliability study. Int Disabil Stud. 1988; 10: 61–63.
9.
Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation as measures of reliability. Educ Psychol Meas. 1973; 33: 613–619.
10.
Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures. BMJ. 1992; 304: 1491–1494.
11.
Bamford JM, Sandercock PAG, Warlow CP, Slattery J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke. 1989; 20: 828.Letter.
12.
Broderick JP, Lu M, Kothari R, Levine SR, Lyden PD, Haley EC, Brott TG, Grotta J, Tilley BC, Marler JR, Frankel M. Finding the most powerful measures of the effectiveness of tissue plasminogen activator in the NINDS tPA Stroke Trial. Stroke. 2000; 31: 2335–2341.
13.
Choi SC, Clifton GL, Marmarou A. Misclassification and treatment effect on primary outcome measures in clinical trials of severe neurotrauma. J Neurotrauma. 2002; 19: 17–22.

eLetters(0)

eLetters should relate to an article recently published in the journal and are not a forum for providing unpublished data. Comments are reviewed for appropriate use of tone and language. Comments are not peer-reviewed. Acceptable comments are posted to the journal website only. Comments are not published in an issue and are not indexed in PubMed. Comments should be no longer than 500 words and will only be posted online. References are limited to 10. Authors of the article cited in the comment will be invited to reply, as appropriate.

Comments and feedback on AHA/ASA Scientific Statements and Guidelines should be directed to the AHA/ASA Manuscript Oversight Committee via its Correspondence page.

Information & Authors

Information

Published In

History

Received: 31 October 2001
Revision received: 6 May 2002
Accepted: 10 May 2002
Published online: 1 September 2002
Published in print: 1 September 2002

Permissions

Request permissions for this article.

Keywords

  1. clinical trials
  2. disability evaluation
  3. outcome
  4. outcome assessment

Authors

Affiliations

J.T. Lindsay Wilson, PhD
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).
Asha Hareendran, PhD
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).
Marie Grant, BSc
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).
Tracey Baird, MD
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).
Ursula G.R. Schulz, MD
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).
Keith W. Muir, MD
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).
Ian Bone, MD
From the Department of Psychology, University of Stirling, Stirling, UK (J.T.L.W., M.G.); Outcomes Research, Pfizer Ltd, Sandwich, UK (A.H.); and Department of Neurology, Institute of Neurological Sciences, Glasgow, UK (T.B., U.G.R.S., K.W.M., I.B.).

Notes

Correspondence to J.T.L. Wilson, Department of Psychology, University of Stirling, Stirling FK9 4LA, UK. E-mail [email protected]

Metrics & Citations

Metrics

Citations

Download Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Select your manager software from the list below and click Download.

  1. Endovascular Therapy in the Extended Time Window for Large Vessel Occlusion in Patients With Pre-Stroke Disability, Journal of Stroke, 26, 2, (269-279), (2024).https://doi.org/10.5853/jos.2023.04259
    Crossref
  2. ALTA DENSIDADE DE ECTOPIAS SUPRAVENTRICULARES ASSOCIADO COM ACIDENTE VASCULAR CEREBRAL ISQUÊMICO, Revista Contemporânea, 4, 8, (e5382), (2024).https://doi.org/10.56083/RCV4N8-060
    Crossref
  3. Intraoperative goal-directed fluid therapy in adult patients undergoing craniotomies under general anaesthesia: A systematic review and meta-analysis with trial sequential analysis, Indian Journal of Anaesthesia, 68, 7, (592-605), (2024).https://doi.org/10.4103/ija.ija_240_24
    Crossref
  4. Chronic Kidney Disease and Cerebrovascular Pathology: Incidence and Functional Outcomes in Riga East University Hospital, Medicina, 60, 2, (219), (2024).https://doi.org/10.3390/medicina60020219
    Crossref
  5. Quality Indicators and Clinical Outcomes of Acute Stroke: Results from a Prospective Multicenter Registry in Greece (SUN4P), Journal of Clinical Medicine, 13, 3, (917), (2024).https://doi.org/10.3390/jcm13030917
    Crossref
  6. Correlation between DWI-ASPECTS Score, Ischemic Stroke Volume on DWI, Clinical Severity and Short-Term Prognosis: A Single-Center Study, Brain Sciences, 14, 6, (577), (2024).https://doi.org/10.3390/brainsci14060577
    Crossref
  7. Is Mild Really Mild?: Generating Longitudinal Profiles of Stroke Survivor Impairment and Impact Using Unsupervised Machine Learning, Applied Sciences, 14, 15, (6800), (2024).https://doi.org/10.3390/app14156800
    Crossref
  8. Endovascular treatment of distal anterior cerebral artery aneurysms using flow modulation devices: mid- and long-term results from a two-center study, Frontiers in Neurology, 15, (2024).https://doi.org/10.3389/fneur.2024.1368612
    Crossref
  9. Pulsed Radiofrequency Neuromodulation for Post-Stroke Shoulder Pain in Patients with Hemorrhagic Stroke, Journal of Korean Neurosurgical Society, 67, 5, (568-577), (2024).https://doi.org/10.3340/jkns.2023.0204
    Crossref
  10. Impact of Baseline Characteristics on Stroke Outcomes in Pakistan: A Longitudinal Study Using the Modified Rankin Scale, Galician Medical Journal, 31, 2, (2024).https://doi.org/10.21802/e-GMJ2024-A13
    Crossref
  11. See more
Loading...

View Options

View options

PDF and All Supplements

Download PDF and All Supplements

PDF/ePub

View PDF/ePub

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Personal login Institutional Login
Purchase Options

Purchase this article to access the full text.

Purchase access to this article for 24 hours

Improving the Assessment of Outcomes in Stroke
Stroke
  • Vol. 33
  • No. 9

Purchase access to this journal for 24 hours

Stroke
  • Vol. 33
  • No. 9
Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.

Media

Figures

Other

Tables

Share

Share

Share article link

Share

Comment Response