Assess the agreement amongst inexperienced raters and between inexperienced and experienced raters using a modified abstract spin checklist in physiotherapy clinical trials investigating robotics and virtual reality interventions
This is a sub study of a meta-research review. The search was performed in August 2020 for 2-armed clinical trials on Robotics and Virtual reality in any population. Details on searching, screening, data extraction, quality assessment and the 7-item spin checklist will be presented with the parent study. Spin ratings were performed by 10 pre-licensure Master of Physiotherapy coursework students using a modified 7-item spin checklist. Items and item descriptions have been presented in the protocol (Stubbs et al. (2021) Physical Therapy Reviews. 26:2;102-108). Items were omission of primary outcomes (item-1), non-significant primary outcomes (item-2) and negative primary outcomes (item-3), omission of primary outcomes and including significant secondary outcomes (item-4), not mentioning adverse events (item-5), interpretation of non-significant primary outcomes overenthusiastically (item-6) and recommendation of interventions without clinically important effects on primary outcomes (item-7). Prior to rating, students read the protocol, received training on the checklist and completed 5 calibration articles not included in the finally rated articles. Articles were rated independently by 2 randomly paired students (≈42 studies each student). When student ratings disagreed, students discussed their ratings to gain consensus. Consensus ratings from pairs of experienced researchers were taken from another study. Spin items were rated as ‘Yes’ or ‘Not yes’ (‘No’ or ‘Not relevant’). Agreement between inexperienced raters and the consensus ratings between 2 experienced and 2 inexperienced raters were measured using Fleiss’ Kappa (κ). Cut-offs were 0-0.20 (slight), 0.40-0.59 (weak), 0.60-0.79 (moderate), 0.80-0.90 (strong) or >0.90 (almost perfect).
207 articles were rated. Agreement amongst inexperienced raters was slight for item-3 (κ=0.02) and item-7 (κ=0.197), minimal for item-1 (κ=0.31), 2 (κ=0.38), 4 (κ=0.24) and 6 (κ=0.29) and moderate for item 5 (κ=0.66). Agreement in consensus scores between inexperienced and experienced raters was slight for item-3 (κ=0.14) and item-7 (κ=0.17), minimal for item-6 (κ=0.30), weak for item-1 (κ=0.54) and item-2 (κ=0.55), moderate for item-4 (κ=0.62), and almost perfect for item-5 (κ=0.94).
Agreement amongst inexperienced raters was 0.50 for 6/7 items. Once consensus was reached, agreement between experienced and inexperienced raters was >0.50 for 4/7 items. Some elements of spin were easier to identify than others. The updated definitions and protocol guidance was unhelpful for identifying all forms of spin, despite additional prompts.
The updated spin checklist helps inexperienced raters identify some elements of spin but not all. Better training on identification of spin for students/clinicians is required. Minimising spin in published articles is recommended through journals focussing on spin in peer-review and authors reducing spin when preparing/submitting articles.
education
research methods