Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort
- Publikationstyp:
- Zeitschriftenaufsatz
- Metadaten:
-
- Autoren
- Philipp Rochner
- Franz Rothlauf
- Autoren-URL
- https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=fis-test-1&SrcAuth=WosAPI&KeyUT=WOS:001204450900001&DestLinkType=FullRecord&DestApp=WOS_CPL
- DOI
- 10.1016/j.ijmedinf.2024.105387
- eISSN
- 1872-8243
- Externe Identifier
- Clarivate Analytics Document Solution ID: OA1Q6
- PubMed Identifier: 38428200
- ISSN
- 1386-5056
- Zeitschrift
- INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS
- Schlüsselwörter
- Record linkage
- Data matching
- Cancer registry
- Electronic health records
- Machine learning
- Data quality
- Artikelnummer
- ARTN 105387
- Datum der Veröffentlichung
- 2024
- Status
- Published
- Titel
- Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort
- Sub types
- Article
- Ausgabe der Zeitschrift
- 185
Datenquelle: Web of Science (Lite)
- Andere Metadatenquellen:
-
- Autoren
- Philipp Röchner
- Franz Rothlauf
- DOI
- 10.1016/j.ijmedinf.2024.105387
- ISSN
- 1386-5056
- Zeitschrift
- International Journal of Medical Informatics
- Sprache
- en
- Artikelnummer
- 105387
- Paginierung
- 105387 - 105387
- Datum der Veröffentlichung
- 2024
- Status
- Published
- Herausgeber
- Elsevier BV
- Herausgeber URL
- http://dx.doi.org/10.1016/j.ijmedinf.2024.105387
- Datum der Datenerfassung
- 2024
- Titel
- Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort
- Ausgabe der Zeitschrift
- 185
Datenquelle: Crossref
- Abstract
- <h4>Background</h4>Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually. For application, it is important to know how well record linkage approaches match real-world records and how much manual effort is required to achieve the desired linkage quality. We study the task of linking reported records to the matching registered tumor in cancer registries.<h4>Methods</h4>We compare the tradeoff between linkage quality and manual effort of five machine learning methods (logistic regression, random forest, gradient boosting, neural network, and a stacked method) to a deterministic baseline. The record linkage methods are compared in a two-class setting (no-match/ match) and a three-class setting (no-match/ undecided/ match). A cancer registry collected and linked the dataset consisting of categorical variables matching 145,755 reported records with 33,289 registered tumors.<h4>Results</h4>In the two-class setting, the gradient boosting, neural network, and stacked models have higher accuracy and F<sub>1</sub> score (accuracy: 0.968-0.978, F<sub>1</sub> score: 0.983-0.988) than the deterministic baseline (accuracy: 0.964, F<sub>1</sub> score: 0.980) when the same records are manually processed (0.89% of all records). In the three-class setting, these three machine learning methods can automatically process all reported records and still have higher accuracy and F<sub>1</sub> score than the deterministic baseline. The linkage quality of the machine learning methods studied, except for the neural network, increase as the number of manually processed records increases.<h4>Conclusion</h4>Machine learning methods can significantly improve linkage quality and reduce the manual effort required by medical coders to match tumor records in cancer registries compared to a deterministic baseline. Our results help cancer registries estimate how linkage quality increases as more records are manually processed.
- Addresses
- Cancer Registry, Institute for Digital Health Data Rhineland-Palatinate, Große Bleiche 46, Mainz, 55116, Germany; Information Systems and Business Administration, Johannes Gutenberg University, Jakob-Welder-Weg 9, Mainz, 55128, Germany. Electronic address: roechner@uni-mainz.de.
- Autoren
- Philipp Röchner
- Franz Rothlauf
- DOI
- 10.1016/j.ijmedinf.2024.105387
- eISSN
- 1872-8243
- Externe Identifier
- PubMed Identifier: 38428200
- Open access
- false
- ISSN
- 1386-5056
- Zeitschrift
- International journal of medical informatics
- Schlüsselwörter
- Humans
- Neoplasms
- Medical Record Linkage
- Registries
- Databases, Factual
- Electronic Health Records
- Sprache
- eng
- Medium
- Print-Electronic
- Online publication date
- 2024
- Paginierung
- 105387
- Datum der Veröffentlichung
- 2024
- Status
- Published
- Datum der Datenerfassung
- 2024
- Titel
- Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort.
- Sub types
- Journal Article
- Ausgabe der Zeitschrift
- 185
Datenquelle: Europe PubMed Central
- Abstract
- BACKGROUND: Cancer registries link a large number of electronic health records reported by medical institutions to already registered records of the matching individual and tumor. Records are automatically linked using deterministic and probabilistic approaches; machine learning is rarely used. Records that cannot be matched automatically with sufficient accuracy are typically processed manually. For application, it is important to know how well record linkage approaches match real-world records and how much manual effort is required to achieve the desired linkage quality. We study the task of linking reported records to the matching registered tumor in cancer registries. METHODS: We compare the tradeoff between linkage quality and manual effort of five machine learning methods (logistic regression, random forest, gradient boosting, neural network, and a stacked method) to a deterministic baseline. The record linkage methods are compared in a two-class setting (no-match/ match) and a three-class setting (no-match/ undecided/ match). A cancer registry collected and linked the dataset consisting of categorical variables matching 145,755 reported records with 33,289 registered tumors. RESULTS: In the two-class setting, the gradient boosting, neural network, and stacked models have higher accuracy and F1 score (accuracy: 0.968-0.978, F1 score: 0.983-0.988) than the deterministic baseline (accuracy: 0.964, F1 score: 0.980) when the same records are manually processed (0.89% of all records). In the three-class setting, these three machine learning methods can automatically process all reported records and still have higher accuracy and F1 score than the deterministic baseline. The linkage quality of the machine learning methods studied, except for the neural network, increase as the number of manually processed records increases. CONCLUSION: Machine learning methods can significantly improve linkage quality and reduce the manual effort required by medical coders to match tumor records in cancer registries compared to a deterministic baseline. Our results help cancer registries estimate how linkage quality increases as more records are manually processed.
- Date of acceptance
- 2024
- Autoren
- Philipp Röchner
- Franz Rothlauf
- Autoren-URL
- https://www.ncbi.nlm.nih.gov/pubmed/38428200
- DOI
- 10.1016/j.ijmedinf.2024.105387
- eISSN
- 1872-8243
- Zeitschrift
- Int J Med Inform
- Schlüsselwörter
- Cancer registry
- Data matching
- Data quality
- Electronic health records
- Machine learning
- Record linkage
- Humans
- Electronic Health Records
- Medical Record Linkage
- Neoplasms
- Registries
- Databases, Factual
- Sprache
- eng
- Country
- Ireland
- Paginierung
- 105387
- PII
- S1386-5056(24)00050-9
- Datum der Veröffentlichung
- 2024
- Status
- Published
- Datum, an dem der Datensatz öffentlich gemacht wurde
- 2024
- Titel
- Using machine learning to link electronic health records in cancer registries: On the tradeoff between linkage quality and manual effort.
- Sub types
- Journal Article
- Ausgabe der Zeitschrift
- 185
Datenquelle: PubMed
- Beziehungen:
- Eigentum von