-
tips for predictive modeling project: data clean up will take about ___% of the time
80% - do not take short cuts here
-
tips for predictive modeling project: try _____ things first
simple
-
tips for predictive modeling project: don't get fooled by AUC, examine...
precision recall, calibration, net-reclassification
-
tips for predictive modeling project: ask whether more ____ and/or ____ will increase performance, and whether ____ from different models are correlated.
-
tips for predictive modeling project: don't get attached to...
one model
-
tips for predictive modeling project: think about model deployment (2)
- ease of applying the model
- cost of taking action
-
learning algorithms with bigger search spaces have ____ bias, but ____ variance.
-
shrinking the search space reduces the ____, at the cost of increasing ____
-
ways to evaluate predictive models
- AUROC
- AUPRC
- train, test, validation sets
-
error metrics: regression (2)
- mean-squared error
- absolute error
-
error metrics: classification (class labels) (3)
- misclassification error
- Cohen's kappa
- sensitivity, specificity, etc.
-
error metrics: classification (probability) (2)
-
error metrics: discriminations
- distinguish between those who will die from those who will survive
- two bins: dead or alive
-
error metrics: calibration
- fine-grained accuracies at different levels of risk
- e.g. what fraction of patients placed by the score into the 35-40% survival bin actually survived?
-
-
-
field test for a predictive model (question to ask) (6)
- what would the intervention be?
- who dispenses the intervention?
- what is the threshold for action - i.e. what level of false positives are acceptable in the predictions/
- what is the outcome we track - consult rates, time between AD setup and death
- what is the performance measure fo the prediction - physicial agreement, useful consuls, actual accuracy
- what would be the mechanics of dispensing the intervention - what is the capacity to intervene
-
goals of data mining clinical text (7)
- biosurveillance
- automatic terminology management
- decision support
- automatic deidentification
- document coding
- cohort building
- discover new knowledge (text-mining)
-
biosurveillance
monitor next outbreak
-
automatic terminology management
add disease terms that are used but not in your list
-
decision support
recommend treatments
-
automatic deidentification
NLP challenges
-
document coding
query and reporting
-
cohort building
enable clinical research
-
discover new knowledge (text-mining)
make discoveries using informatics tools
-
key issues with data mining using EHR data (4)
- haiku of acronyms: ungrammatical, misspellings, concatenations, acronyms, abbreviations
- high variant in quality: clear communication (radiology reports) vs. documentation (progress notes)
- lot of copy-pasting: institution-specific template use
- ridiculous amount of agony in getting access: fear, misunderstanding, and confusion around security, privacy, de-identification, and anyonymization
-
common sources of bias in EHR data (6)
- insurance level - patients without coverage are less likely to seek professional care
- misdiagnosis, miscoding of drugs
- incomplete record keeping
- miscoding of diagnosis
- loss to follow-up
- incomplete/false record linkage
-
Things to worry about with data mining in EHR data (4)
- repeated observations
- irregular time intervals
- large number of (sparse) features
- timing - ordering of events is crucial, different questions require different time scales
-
learning health care system
- using data to directly impact point-of-care
- i.e. physicians look for evidence from other patients in EHR, applies to patient case
- allows researchers to do studies on the fly
- not adopted yet at the point of care
- stride should be used for research, not patient care - but something like this would be a good application of the learning health system
-
what are national efforts to enable reuse of electronic health data for research?
- OMOP (observational medical outcomes partnership)
- OHDSI (obsrvational health data sciences initiative)
- I2B2
|
|