无金标准ROC方法在心理研究中的发展应用
The Development and Application of ROC Method without Gold Standard in Psychological Research
DOI: 10.12677/AP.2022.127311, PDF,    国家社会科学基金支持
作者: 刘雨晴, 李慧玲, 余城昊, 周 强*:温州医科大学,浙江 温州
关键词: ROC分析诊断测验准确性评估贝叶斯金标准ROC Analysis Diagnostic Test Accuracy Assessment Bayesian Gold Standard
摘要: 传统ROC (receiver operating characteristic)分析方法的核心是将所测的二分结果与“金标准”做比较,通过ROC曲线及其指标对测量工具的准确性进行评估。但心理学研究往往缺乏金标准。与传统手段相比,基于贝叶斯理论的无金标准ROC分析方法(BROC分析)无须依赖金标准,从而摆脱心理研究结果缺乏金标准的困局,为心理研究指标准确性评估中的应用提供了新方向。本文介绍BROC分析并概述其在问卷测量的临界值选择与量化测量工具准确性等心理学的应用价值,并通过实例演示模拟其在心理研究中的操作实现,进而讨论以BROC分析为主的ROC分析方法的应用前景及不足。
Abstract: The core of the traditional ROC (receiver operating characteristic) analysis method is to compare the measured dichotomous results with the “gold standard”, and to evaluate the accuracy of the measurement tool through the ROC curve and its indicators. But psychological research often lacks a gold standard. Compared with traditional methods, the non-gold standard ROC analysis method (BROC analysis) based on Bayesian theory does not need to rely on the gold standard, so as to get rid of the dilemma of lack of gold standard in psychological research results, and provides a new direction for the application of the accuracy evaluation of psychological research indicators. This manuscript firstly introduces BROC analysis and outlines its application value in psychology, such as the selection of critical value of questionnaire measurement and the accuracy of quantitative measurement tools, followed by demonstration and simulation of its operation in psychological research through examples, and finally discusses the application prospect and deficiency of ROC analysis method, especially the BROC analysis.
文章引用:刘雨晴, 李慧玲, 余城昊, 周强 (2022). 无金标准ROC方法在心理研究中的发展应用. 心理学进展, 12(7), 2612-2622. https://doi.org/10.12677/AP.2022.127311

参考文献

[1] 陈卫中, 张菊英(2012). 金标准为等级变量时诊断试验的评价及其在冠心病诊断试验中的应用. 中国卫生统计, 29(2), 172-174.
[2] 王肖南, 周晓华, 刘强, 高颖(2019). 无金标准下两种诊断方法准确度的贝叶斯估计. 中国卫生统计, 36(5), 653-657.
[3] Alizadeh, Z., Feizi, A., Rejali, M. et al. (2017). The Predictive Value of Personality Traits for Psychological Problems (Stress, Anxiety and Depression): Results from a Large Population-Based Study. Journal of Epidemiology and Global Health, 8, 124-133.[CrossRef
[4] Amini, M., Kazemnejad, A., Zayeri, F., Montazeri, A., Rasekhi, A., Amirian, A., & Kariman, N. (2020). Diagnostic Accuracy of Maternal Serum Multiple Marker Screening for Early Detection of Gestational Diabetes Mellitus in the Absence of a Gold Standard Test. BMC Pregnancy and Childbirth, 20, Article No. 375.[CrossRef] [PubMed]
[5] Anglim, J., Horwood, S., Smillie, L. D., Marrero, R. J., & Wood, J. K. (2020). Predicting Psychological and Subjective Well-Being from Personality: A Meta-Analysis. Psychological Bulletin, 146, 279-323.[CrossRef] [PubMed]
[6] Arora, P., Thorlund, K., Brenner, D. R., & Andrews, J. R. (2019). Comparative Accuracy of Typhoid Diagnostic Tools: A Bayesian Latent-Class Network Analysis. PLOS Neglected Tropical Diseases, 13, e0007303.[CrossRef] [PubMed]
[7] Artieda-Urrutia, P., Delgado-Gómez, D., Ruiz-Hernández, D., García-Vega, J. M., Berenguer, N., Oquendo, M. A., & Blasco-Fontecilla, H. (2015). Short Personality and Life Event Scale for Detection of Suicide Attempters. Revista de Psiquiatría y Salud Mental (English Edition), 8, 199-206.[CrossRef
[8] Bansal, A., & Heagerty, P. J. (2018). A Tutorial on Evaluating the Time-Varying Discrimination Accuracy of Survival Models Used in Dynamic Decision Making. Medical Decision Making, 38, 904-916.[CrossRef
[9] Bansal, A., & Heagerty, P. J. (2019). A Comparison of Landmark Methods and Time-Dependent ROC Methods to Evaluate the Time-Varying Performance of Prognostic Markers for Survival Outcomes. Diagnostic and Prognostic Research, 3, Article No. 14.[CrossRef] [PubMed]
[10] Battaglia, Y., Zerbinati, L., Piazza, G., Martino, E., Provenzano, M., Esposito, P., Massarenti, S., Andreucci, M., Storari, A., & Grassi, L. (2020). Screening Performance of Edmonton Symptom Assessment System in Kidney Transplant Recipients. Journal of Clinical Medicine, 9, Article No. 995.[CrossRef] [PubMed]
[11] Behar, E., Alcaine, O., Zuellig, A. R., & Borkovec, T. D. (2003). Screening for Generalized Anxiety Disorder Using the Penn State Worry Questionnaire: A Receiver Operating Characteristic Analysis. Journal of Behavior Therapy and Experimental Psychiatry, 34, 25-43.[CrossRef
[12] Benjamin, A. S. (2013). Where Is the Criterion Noise in Recognition? (Almost) Everyplace You Look: Comment on Kellen, Klauer, and Singmann (2012). Psychological Review, 120, 720-726.[CrossRef] [PubMed]
[13] Bleidorn, W, & Hopwood, C. J. (2019). Using Machine Learning to Advance Personality Assessment and Theory. Personality and Social Psychology Review, 23, 190-203.[CrossRef] [PubMed]
[14] Bowers, A. J., & Zhou, X. (2019). Receiver Operating Characteristic (ROC) Area under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes. Journal of Education for Students Placed at Risk, 24, 20-46.[CrossRef
[15] Chambless, L. E., & Diao, G. (2006). Estimation of Time-Dependent Area under the ROC Curve for Long-Term Risk Prediction. Statistics in Medicine, 25, 3474-3486.[CrossRef] [PubMed]
[16] Chenneville, T., Gabbidon, K., Drake, H., & Rodriguez, C. (2019). Comparison of the Utility of the PHQ and CES-D for Depression Screening among Youth with HIV in an Integrated Care Setting. Journal of Affective Disorders, 250, 140-144.[CrossRef] [PubMed]
[17] Choi, J. Y., Kim, M. J., Kim, J. H., Kim, S. H., Ko, H. K., Lim, J. S., Oh, Y. T., Chung, J. J., Yoo, H. S., Lee, J. T., & Kim, K. W. (2006). Detection of Hepatic Metastasis: Manganese- and Ferucarbotran-Enhanced MR Imaging. European Journal of Radiology, 60, 84-90.[CrossRef] [PubMed]
[18] Choi, Y. K., Johnson, W. O., Collins, M. T., & Gardner, I. A. (2006). Bayesian Inferences for Receiver Operating Characteristic Curves in the Absence of a Gold Standard. Journal of Agricultural, Biological, and Environmental Statistics, 11, 210-229.[CrossRef
[19] Collins, J., & Huynh, M. (2014). Estimation of Diagnostic Test Accuracy without Full Verification: A Review of Latent Class Methods. Statistics in Medicine, 33, 4141-4169.[CrossRef] [PubMed]
[20] Crawley, D., Zhang, L., Jones, E. J. H., Ahmad, J., Oakley, B., San José Cáceres, A., Charman, T., Buitelaar, J. K., Murphy, D. G. M., Chatham, C., den Ouden, H., Loth, E., & EU-AIMS LEAP Group (2020). Modeling Flexible Behavior in Childhood to Adulthood Shows Age-Dependent Learning Mechanisms and Less Optimal Learning in Autism in Each Age Group. PLOS Biology, 18, e3000908.[CrossRef] [PubMed]
[21] Dendukuri, N., & Joseph, L. (2001). Bayesian Approaches to Modeling the Conditional Dependence between Multiple Diagnostic Tests. Biometrics, 57, 158-167.[CrossRef
[22] Dey, R., Sebastiani, G., & Saha-Chaudhuri, P. (2020). Inference about Time-Dependent Prognostic Accuracy Measures in the Presence of Competing Risks. BMC Medical Research Methodology, 20, Article No. 219.[CrossRef] [PubMed]
[23] Diebig, M., & Angerer, P. (2021). Description and Application of a Method to Quantify Criterion-Related Cut-Off Values for Questionnaire-Based Psychosocial Risk Assessment. International Archives of Occupational and Environmental Health, 94, 475-485.[CrossRef] [PubMed]
[24] Dwyer, D. B., Falkai, P., & Koutsouleris, N. (2018). Machine Learning Approaches for Clinical Psychology and Psychiatry. Annual Review of Clinical Psychology, 14, 91-118.[CrossRef] [PubMed]
[25] Fawcett, T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27, 861-874.[CrossRef
[26] Fleming, S. M., & Lau, H. C. (2014). How to Measure Metacognition. Frontiers in Human Neuroscience, 8, Article No. 443.[CrossRef] [PubMed]
[27] Flor, M., Weiß, M., Selhorst, T., Müller-Graf, C., & Greiner, M. (2020). Comparison of Bayesian and Frequentist Methods for Prevalence Estimation under Misclassification. BMC Public Health, 20, Article No. 1135.[CrossRef] [PubMed]
[28] Goecks, J., Jalili, V., Heiser, L. M., & Gray, J. W. (2020). How Machine Learning Will Transform Biomedicine. Cell, 181, 92-101.[CrossRef] [PubMed]
[29] Goyal, A., Yolcu, Y. U., Goyal, A., Kerezoudis, P., Brown, D. A., Graffeo, C. S., Goncalves, S., Burns, T. C., & Parney, I. F. (2019). The T2-FLAIR-Mismatch Sign as an Imaging Biomarker for IDH and 1p/19q Status in Diffuse Low-Grade Gliomas: A Systematic Review with a Bayesian Approach to Evaluation of Diagnostic Test Performance. Neurosurgical Focus, 47, E13.[CrossRef
[30] Hartung, T. J., Friedrich, M., Johansen, C., Wittchen, H. U., Faller, H., Koch, U., Brähler, E., Härter, M., Keller, M., Schulz, H., Wegscheider, K., Weis, J., & Mehnert, A. (2017). The Hospital Anxiety and Depression Scale (HADS) and the 9-Item Patient Health Questionnaire (PHQ-9) as Screening Instruments for Depression in Patients with Cancer. Cancer, 123, 4236-4243.[CrossRef] [PubMed]
[31] Heagerty, P. J., & Zheng, Y. (2005). Survival Model Predictive Accuracy and ROC Curves. Biometrics, 61, 92-105.[CrossRef
[32] Heagerty, P. J., Lumley, T., & Pepe, M. S. (2000). Time-Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker. Biometrics, 56, 337-344.[CrossRef
[33] Higham, P. A., & Higham, D. P. (2019). New Improved Gamma: Enhancing the Accuracy of Goodman-Kruskal’s Gamma Using ROC Curves. Behavior Research Methods, 51, 108-125.[CrossRef] [PubMed]
[34] Jafarzadeh, S. R., Johnson, W. O., & Gardner, I. A. (2016). Bayesian Modeling and Inference for Diagnostic Accuracy and Probability of Disease Based on Multiple Diagnostic Biomarkers with and without a Perfect Reference Standard. Statistics in Medicine, 35, 859-876.[CrossRef] [PubMed]
[35] Janssens, A. C. J. W., & Martens, F. K. (2020). Reflection on Modern Methods: Revisiting the Area under the ROC Curve. International Journal of Epidemiology, 49, 1397-1403.[CrossRef] [PubMed]
[36] Kamarudin, A. N., Cox, T., & Kolamunnage-Dona, R. (2017). Time-Dependent ROC Curve Analysis in Medical Research: Current Methods and Applications. BMC Medical Research Methodology, 17, Article No. 53.[CrossRef] [PubMed]
[37] Kan, A. (2017). Machine Learning Applications in Cell Image Analysis. Immunology and Cell Biology, 95, 525-530.[CrossRef] [PubMed]
[38] Kassing, F., Godwin, J., Lochman, J. E., & Coie, J. D. (2019). Using Early Childhood Behavior Problems to Predict Adult Convictions. Journal of Abnormal Child Psychology, 47, 765-778.[CrossRef] [PubMed]
[39] Komura, D., & Ishikawa, S. (2019). Machine Learning Approaches for Pathologic Diagnosis. Virchows Archiv, 475, 131-138.[CrossRef] [PubMed]
[40] Lehr, D., Koch, S., & Hillert, A. (2010). Where Is (Im)balance? Necessity and Construction of Evaluated Cut-Off Points for Effort-Reward Imbalance and Overcommitment. Journal of Occupational and Organizational Psychology, 83, 251-261.[CrossRef
[41] Levis, B., Sun, Y., He, C., Wu, Y., Krishnan, A., Bhandari, P. M. et al. (2020). Accuracy of the PHQ-2 Alone and in Combination with the PHQ-9 for Screening to Detect Major Depression: Systematic Review and Meta-Analysis. Journal of the American Medical Association, 323, 2290-2300.[CrossRef] [PubMed]
[42] Lin, G. M., Nagamine, M., Yang, S. N., Tai, Y. M., Lin, C., & Sato, H. (2020). Machine Learning Based Suicide Ideation Prediction for Military Personnel. IEEE Journal of Biomedical and Health Informatics, 24, 1907-1916.[CrossRef
[43] Ling, D. I., Pai, M., Schiller, I., & Dendukuri, N. (2014). A Bayesian Framework for Estimating the Incremental Value of a Diagnostic Test in the Absence of a Gold Standard. BMC Medical Research Methodology, 14, Article No. 67.[CrossRef] [PubMed]
[44] Liu, G. M., Zeng, H. D., Zhang, C. Y., & Xu, J. W. (2019). Identification of a Six-Gene Signature Predicting Overall Survival for Hepatocellular Carcinoma. Cancer Cell International, 19, 138.[CrossRef] [PubMed]
[45] Lui, P. P., Samuel, D. B., Rollock, D., Leong, F. T. L., & Chang, E. C. (2020). Measurement Invariance of the Five Factor Model of Personality: Facet-Level Analyses among Euro and Asian Americans. Assessment, 27, 887-902.[CrossRef] [PubMed]
[46] Ma, Y., Ji, J., Huang, Y., Gao, H., Li, Z., Dong, W., Zhou, S., Zhu, Y., Dang, W., Zhou, T., Yu, H., Yu, B., Long, Y., Liu, L., Sachs, G., & Yu, X. (2019). Implementing Machine Learning in Bipolar Diagnosis in China. Translational Psychiatry, 9, 305.[CrossRef] [PubMed]
[47] Mandrekar, J. N. (2010). Simple Statistical Measures for Diagnostic Accuracy Assessment. Journal of Thoracic Oncology, 5, 763-764.[CrossRef
[48] Martínez-Camblor, P., & Pardo-Fernández, J. C. (2019). The Youden Index in the Generalized Receiver Operating Characteristic Curve Context. International Journal of Biostatistics, 15, Article ID: 20180060.[CrossRef] [PubMed]
[49] Nguyen, P. (2007). NonbinROC: Software for Evaluating Diagnostic Accuracies with Non-Binary Gold Standards. Journal of Statistical Software, 21, 1-10.[CrossRef
[50] Numan, T., van den Boogaard, M., Kamper, A. M., Rood, P. J. T., Peelen, L. M., & Slooter, A. J. C. (2019). Dutch Delirium Detection Study Group. Delirium Detection Using Relative Delta Power Based on 1-Minute Single-Channel EEG: A Multicentre Study. British Journal of Anaesthesia, 122, 60-68.[CrossRef] [PubMed]
[51] Obuchowski, N. A. (2005). Estimating and Comparing Diagnostic Tests’ Accuracy When the Gold Standard Is Not Binary. Statistics in Medicine, 20, 3261-3278.
[52] Obuchowski, N. A., & Bullen, J. A. (2018). Receiver Operating Characteristic (ROC) Curves: Review of Methods with Applications in Diagnostic Medicine. Physics in Medicine and Biology, 63, Article ID: 07TR01.[CrossRef] [PubMed]
[53] Peng, F., & Hall, W. J. (1996). Analysis of ROC Curves Using Markov-Chain Monte Carlo Methods. Medical Decision Making, 16, 404-411.[CrossRef
[54] Richardson, M., Hussain, Z., & Griffiths, M. D. (2018). Problematic Smartphone Use, Nature Connectedness, and Anxiety. Journal of Behavioral Addictions, 7, 109-116.[CrossRef] [PubMed]
[55] Schoop, R., Beyersmann, J., Schumacher, M., & Binder, H. (2011). Quantifying the Predictive Accuracy of Time-to-Event Models in the Presence of Competing Risks. Biometrical Journal, 53, 88-112.[CrossRef] [PubMed]
[56] Shatte, A. B. R., Hutchinson, D. M., & Teague, S. J. (2019). Machine Learning in Mental Health: A Scoping Review of Methods and Applications. Psychology Medicine, 49, 1426-1448.[CrossRef
[57] Shen, W., Ning, J., & Yuan, Y. (2015). A Direct Method to Evaluate the Time-Dependent Predictive Accuracy for Biomarkers. Biometrics, 71, 439-449.[CrossRef] [PubMed]
[58] Stevens, M. T., Clarke, D. B., Stroink, G., Beyea, S. D., & D’Arcy, R. C. (2016). Improving fMRI Reliability in Presurgical Mapping for Brain Tumours. Journal of Neurology, Neurosurgery, and Psychiatry, 87, 267-274.[CrossRef] [PubMed]
[59] Sumner, C. J., & Sumner, S. (2020). Signal Detection: Applying Analysis Methods from Psychology to Animal Behaviour. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 375, Article ID: 20190480.[CrossRef] [PubMed]
[60] Suzuki, Y., Okabayashi, K., Hasegawa, H., Tsuruta, M., Shigeta, K., Kondo, T., & Kitagawa, Y. (2018). Comparison of Preoperative Inflammation-Based Prognostic Scores in Patients with Colorectal Cancer. Annals of Surgery, 267, 527-531.[CrossRef
[61] Tang, Z. H., Zeng, F., Yu, X., & Zhou, L. (2014). Bayesian Estimation of Cardiovascular Autonomic Neuropathy Diagnostic Test Based on Baroreflex Sensitivity in the Absence of a Gold Standard. International Journal of Cardiology, 171, 78-80.[CrossRef] [PubMed]
[62] Thapa, S., Sun, H., Pokhrel, G., Wang, B., Dahal, S., & Yu, S. (2020). Performance of Distress Thermometer and Associated Factors of Psychological Distress among Chinese Cancer Patients. Journal of Oncology, 2020, Article ID: 3293589.[CrossRef] [PubMed]
[63] van Smeden, M., Naaktgeboren, C. A., Reitsma, J. B., Moons, K. G., & de Groot, J. A. (2013). Latent Class Models in Diagnostic Studies When There Is No Reference Standard—A Systematic Review. American Journal of Epidemiology, 179, 423-431.[CrossRef] [PubMed]
[64] Wang, Q., Diemer, M. A., & Maier, K. (2012). Applying Bayesian Modeling and Receiver Operating Characteristic Methodologies for Test Utility Analysis. Educational and Psychological Measurement, 73, 275-292.[CrossRef
[65] Wilks, Z., Perkins, A. M., Cooper, A., Pliszka, B., Cleare, A. J., & Young, A. H. (2020). Relationship of a Big Five Personality Questionnaire to the Symptoms of Affective Disorders. Journal Affect Disorder, 277, 14-20.[CrossRef] [PubMed]
[66] Wixted, J. T. (2020). The Forgotten History of Signal Detection Theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 46, 201-233.[CrossRef] [PubMed]
[67] Yang, I., & Becker, M. P. (1997). Latent Variable Modeling of Diagnostic Accuracy. Biometrics, 53, 948-958.[CrossRef] [PubMed]
[68] Yonelinas, A. P., & Parks, C. M. (2007). Receiver Operating Characteristics (ROCs) in Recognition Memory: A Review. Psychological Bulletin, 133, 800-832.[CrossRef] [PubMed]