Show simple item record

dc.contributor.author Finlayson, Samuel G.
dc.contributor.author LePendu, Paea
dc.contributor.author Shah, Nigam H.
dc.coverage.spatial California
dc.date.accessioned 2014-09-11T13:00:37Z
dc.date.available 2014-09-11T13:00:37Z
dc.date.issued 2014-09-16
dc.identifier doi:10.5061/dryad.jp917
dc.identifier.citation Finlayson SG, LePendu P, Shah NH (2014) Building the graph of medicine from millions of clinical narratives. Scientific Data 1: 140032.
dc.identifier.uri http://hdl.handle.net/10255/dryad.65907
dc.description Electronic health records (EHR) represent a rich and relatively untapped resource for characterizing the true nature of clinical practice and for quantifying the degree of inter-relatedness of medical entities such as drugs, diseases, procedures and devices. We provide a unique set of co-occurrence matrices, quantifying the pairwise mentions of 3 million terms mapped onto 1 million clinical concepts, calculated from the raw text of 20 million clinical notes spanning 19 years of data. Co-frequencies were computed by means of a parallelized annotation, hashing, and counting pipeline that was applied over clinical notes from Stanford Hospitals and Clinics. The co-occurrence matrix quantifies the relatedness among medical concepts which can serve as the basis for many statistical tests, and can be used to directly compute Bayesian conditional probabilities, association rules, as well as a range of test statistics such as relative risks and odds ratios. This dataset can be leveraged to quantitatively assess comorbidity, drug-drug, and drug-disease patterns for a range of clinical, epidemiological, and financial applications.
dc.relation.haspart doi:10.5061/dryad.jp917/5
dc.relation.haspart doi:10.5061/dryad.jp917/6
dc.relation.haspart doi:10.5061/dryad.jp917/7
dc.relation.haspart doi:10.5061/dryad.jp917/8
dc.relation.isreferencedby doi:10.1038/sdata.2014.32
dc.relation.isreferencedby PMID:25977789
dc.subject electronic health records
dc.subject biomedical informatics
dc.subject data mining
dc.subject natural language processing
dc.title Data from: Building the graph of medicine from millions of clinical narratives
dc.type Article
prism.publicationName Scientific Data

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title 1_Cofrequency_Counts.tar.gz
Downloaded 10463 times
Description See ReadMe.txt
Download 1_Cofrequency_Counts.tar.gz (7.595 Gb)
Download README.txt (4.462 Kb)
Details View File Details
Title 2_Singleton_Frequency_Counts.tar.gz
Downloaded 173 times
Description See ReadMe.txt provided with "1_Cofrequency_Counts.tar.gz"
Download 2_Singleton_Frequency_Counts.tar.gz (3.006 Mb)
Details View File Details
Title 3_ID_Mappings.tar.gz
Downloaded 209 times
Description See ReadMe.txt provided with "1_Cofrequency_Counts.tar.gz"
Download 3_ID_Mappings.tar.gz (51.86 Mb)
Details View File Details
Title 4_Scripts.tar.gz
Downloaded 189 times
Description See ReadMe.txt provided with "1_Cofrequency_Counts.tar.gz"
Download 4_Scripts.tar.gz (760 bytes)
Details View File Details

Search for data

Be part of Dryad

We encourage organizations to: