International multispecialty expert physician preoperative identification of extranodal extension in patients with oropharyngeal cancer using computed tomography: Prospective blinded human inter‐observer performance evaluation
Onur Sahin, Serageldin Kamel, Kareem A. Wahid, Cem Dede, Nicolette Taku, Renjie He, Mohamed A. Naser, Christina S. Sharafi, Antti Mäkitie, Benjamin H. Kann, Kimmo Kaski, Jaakko Sahlsten, Joel Jaskari, Moran Amit, Gregory M. Chronowski, Eduardo M. Diaz, Adam S. Garden, Ryan P. Goepfert, Jeffrey P. Guenette, G. Brandon Gunn, Jussi Hirvonen, Frank Hoebers, Katherine A. Hutcheson, Nandita Guha‐Thakurta, Jason Johnson, Diana Kaya, Shekhar D. Khanpara, Kristofer Nyman, Stephen Y. Lai, Miriam Lango, Kim O. Learned, Anna Lee, Carol M. Lewis, Anastasios Maniakas, Amy C. Moreno, Jeffrey N. Myers, Jack Phan, Kristen B. Pytynia, David I. Rosenthal, Vlad C. Sandulache, Dawid Schellingerhout, Shalin J. Shah, Andrew G. Sikora, Abdallah S. R. Mohamed, Melissa M. Chen, Clifton D. Fuller,Abstract
Background
Pathologic extranodal extension (pENE) is a crucial prognostic factor in oropharyngeal cancer (OPC), but determining pENE from imaging has high inter‐observer variability. The role of clinician specialty in the accuracy of imaging‐detected extranodal extension (iENE) remains unclear. The purpose of this study is to assess the influence of clinician specialty on the accuracy of preoperative iENE detection in human papillomavirus (HPV)‐positive OPC using computed tomography (CT) imaging.
Methods
This prospective observational study evaluated pretherapy CT images from 24 HPV‐positive OPC patients (30 scans, including duplicates). Thirty‐four expert observers (11 radiologists, 12 surgeons, 11 radiation oncologists) assessed iENE and reported radiologic criteria and confidence. Ground‐truth pENE status was confirmed pathologically. Accuracy, sensitivity, specificity, area under the receiver operating characteristic curve, and Brier scores were compared across specialties. Logistic regression determined significant predictors of pENE, whereas Fleiss’ kappa measured interobserver agreement.
Results
Median accuracy was 0.57 (95% CI, 0.39–0.73), with no specialty showing performance beyond chance (median area under the receiver operating characteristic curve, 0.64). Minor differences were noted: surgeons had lower Brier scores (0.26 vs. 0.33, p < .01) and higher sensitivity (0.69 vs. 0.48) compared to radiologists and oncologists. Predictive signs included indistinct capsular contour and nodal necrosis. Interobserver agreement was weak (κ < 0.6).
Conclusions
Diagnostic performance for iENE on CT in HPV‐positive OPC remains poor across specialties, with high variability and low accuracy. These findings highlight the need for automated systems or improved imaging methods to enhance iENE assessments.