Development of a Machine Learning Model for Predicting Low Back Pain in High School Soft Tennis Players: A Pilot Study

Tomonari Sugano, Takumi Watabu, Katsunori Mizuno, Hiroichi Miaki, Toru Tanabe
Purpose:

To develop and evaluate ML algorithms for predicting LBP in elite high school male soft tennis players.

Methods:

In Verification 1, we used data from a cross-sectional study previously collected from 176 elite high school male soft tennis players to construct a ML model to classify the presence of LBP. The dataset contained 23 variables, including basic attributes, flexibility, range of motion (ROM), and LBP status. ML algorithms (logistic regression, SVM, decision trees, random forests, XGBoost, and multilayer perceptron) were compared using 5-fold cross-validation. Recursive feature selection identified three features: hip IR ROM on the dominant and non-dominant sides, and GIRLoss (the difference in shoulder IR between non-dominant and dominant sides). The metrics for comparison were accuracy and area under the curve (AUC). Feature importance was assessed using permutation importance, and SHAP values were calculated to understand the direction of feature influence. Bayesian optimization was used to fine-tune hyperparameters.

In Verification 2, we validated the predictive validity of the best-performing model from Verification 1 using a separate cohort of 20 elite high school male soft tennis players not included in the original dataset. Physical data, including the identified features, were collected before the season. The players were monitored every two weeks to record the occurrence of LBP, defined as pain persisting for more than two weeks and affecting performance. Predictive validity was assessed using accuracy and AUC. All analyses were performed in MATLAB.

Results:

Logistic regression demonstrated the highest classification accuracy (accuracy: 0.80 [95%CI: 0.78-0.83], AUC: 0.81 [95%CI: 0.77-0.85]) and was selected as the optimal model. Permutation importance indicated that non-dominant hip IR ROM was the most important feature (0.23), followed by dominant hip IR ROM (0.06) and GIRLoss (0.05). SHAP values showed that lower non-dominant hip IR increased LBP risk, while higher values of dominant hip IR and GIRLoss increased LBP risk. In Verification 2, the logistic regression model achieved an accuracy of 0.77 and an AUC of 0.75.

Conclusion(s):

Logistic regression appears to be a promising ML model for predicting LBP in elite high school male soft tennis players. The identified factors may contribute to increased mechanical stress on the lower back during stroke movements. Further research with a larger sample size is needed to refine the model and validate its predictive accuracy.

Implications:

Predicting LBP in elite high school male soft tennis players can contribute to developing effective preventive strategies.

Funding acknowledgements:
This research received no external funding.
Keywords:
Low Back Pain Prediction
Machine Learning
Hip and Shoulder Internal Rotation
Primary topic:
Sport and sports injuries
Second topic:
Innovative technology: information management, big data and artificial intelligence
Third topic:
Musculoskeletal: spine
Did this work require ethics approval?:
Yes
Name the institution and ethics committee that approved your work:
Ethics Review Committee of the Nittaduka Medical Welfare Center
Provide the ethics approval number:
2019-56
Has any of this material been/due to be published or presented at another national or international conference prior to the World Physiotherapy Congress 2025?:
No

Back to the listing