quals: clinical data mining part V

  1. tips for predictive modeling project: data clean up will take about ___% of the time
    80% - do not take short cuts here
  2. tips for predictive modeling project: try _____ things first
    simple
  3. tips for predictive modeling project: don't get fooled by AUC, examine...
    precision recall, calibration, net-reclassification
  4. tips for predictive modeling project:  ask whether more ____ and/or ____ will increase performance, and whether ____ from different models are correlated.
    • data
    • features
    • errors
  5. tips for predictive modeling project: don't get attached to...
    one model
  6. tips for predictive modeling project:  think about model deployment (2)
    • ease of applying the model
    • cost of taking action
  7. learning algorithms with bigger search spaces have ____ bias, but ____ variance.
    • less bias
    • more variance
  8. shrinking the search space reduces the ____, at the cost of increasing ____
    • variance
    • bias
  9. ways to evaluate predictive models
    • AUROC
    • AUPRC
    • train, test, validation sets
  10. error metrics: regression (2)
    • mean-squared error
    • absolute error
  11. error metrics: classification (class labels) (3)
    • misclassification error
    • Cohen's kappa
    • sensitivity, specificity, etc.
  12. error metrics: classification (probability) (2)
    • AUCs
    • calibration
  13. error metrics: discriminations
    • distinguish between those who will die from those who will survive
    • two bins: dead or alive
  14. error metrics: calibration
    • fine-grained accuracies at different levels of risk
    • e.g. what fraction of patients placed by the score into the 35-40% survival bin actually survived?
  15. error metrics: underfit
    • high bias
    • low variance
  16. error metrics: overfit
    • low bias
    • high variance
  17. field test for a predictive model (question to ask) (6)
    • what would the intervention be?
    • who dispenses the intervention?
    • what is the threshold for action - i.e. what level of false positives are acceptable in the predictions/
    • what is the outcome we track - consult rates, time between AD setup and death
    • what is the performance measure fo the prediction - physicial agreement, useful consuls, actual accuracy
    • what would be the mechanics of dispensing the intervention - what is the capacity to intervene
  18. goals of data mining clinical text (7)
    • biosurveillance
    • automatic terminology management
    • decision support
    • automatic deidentification
    • document coding
    • cohort building
    • discover new knowledge (text-mining)
  19. biosurveillance
    monitor next outbreak
  20. automatic terminology management
    add disease terms that are used but not in your list
  21. decision support
    recommend treatments
  22. automatic deidentification
    NLP challenges
  23. document coding
    query and reporting
  24. cohort building
    enable clinical research
  25. discover new knowledge (text-mining)
    make discoveries using informatics tools
  26. key issues with data mining using EHR data (4)
    • haiku of acronyms: ungrammatical, misspellings, concatenations, acronyms, abbreviations
    • high variant in quality: clear communication (radiology reports) vs. documentation (progress notes)
    • lot of copy-pasting: institution-specific template use
    • ridiculous amount of agony in getting access: fear, misunderstanding, and confusion around security, privacy, de-identification, and anyonymization
  27. common sources of bias in EHR data (6)
    • insurance level - patients without coverage are less likely to seek professional care
    • misdiagnosis, miscoding of drugs
    • incomplete record keeping
    • miscoding of diagnosis
    • loss to follow-up
    • incomplete/false record linkage
  28. Things to worry about with data mining in EHR data (4)
    • repeated observations
    • irregular time intervals
    • large number of (sparse) features
    • timing - ordering of events is crucial, different questions require different time scales
  29. learning health care system
    • using data to directly impact point-of-care
    • i.e. physicians look for evidence from other patients in EHR, applies to patient case
    • allows researchers to do studies on the fly
    • not adopted yet at the point of care
    • stride should be used for research, not patient care - but something like this would be a good application of the learning health system
  30. what are national efforts to enable reuse of electronic health data for research?
    • OMOP (observational medical outcomes partnership)
    • OHDSI (obsrvational health data sciences initiative)
    • I2B2
Author
tulipyoursweety
ID
350173
Card Set
quals: clinical data mining part V
Description
a
Updated