Text & Web Mining B - Exam III

  1. establish the corpus
    step in the text mining process in which documents are collected, transformed and organized
  2. create the term-document matrix
    step in the text mining process in which it is determined how many times the terms you're looking for exist in the documents that you've gathered
  3. extract the knowledge - output is clustering models & visualization
    step in the text mining process in which something is done with the information gathered
  4. log frequencies - use logs
    binary frequencies - use 1's & 0's
    inverse document frequencies - specificity & frequency of words (this is the most common)
    common normalization methods for representing the indices in a term-document matrix
  5. singular value decomposition (SVD)
    reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability possible
  6. classification
    clustering
    association
    trand analysis
    four main categories of knowledge extraction
  7. knowledge-engineering & machine-learning
    two main approaches to classification
  8. classification
    categorization of a given data instance into a predetermined set of categories
  9. clustering
    an unsupervised process whereby objects are classified into "natural" groups
  10. scatter/gather & query-specific clustering
    two popular clustering methods
  11. association
    process of identifying the frequent sets that go together quantified by two basic measures of support and confidence
  12. trend analysis
    analysis in which different collections lead to different concept distributions for the same set of concepts
  13. web mining (or web data mining)
    the process of discovering intrinsic relationships from web data (text, linkage, or usage)
  14. the web
    ____ is one of the largest data and text repositories
  15. web content mining
    web structure mining
    web usage mining
    three main areas of web mining
  16. web content mining
    refers to the extraction of useful information from Web pages (often unstructured textual content)
  17. web crawlers
    used to read through the content of a Web site automatically
  18. authoritative pages
    pages with multiple links leading to them; often respected companies or services such as the Mayo Clinic or the Department of Educations - usually analyzed by their incoming links
  19. hubs
    pages that exist primarily to redirect users to other pages (often for a fee) - usually analyzed by their outgoing links
  20. hyperlink-induced topic search (HITS)
    algorithm that calculates links to determine a page's potential as a hub or authority
  21. web structure mining
    the process of extracting useful information from the links embedded in Web documents used to identify authoritative pages and hubs
  22. web usage mining
    the extraction of useful information from data generated through Web page visits and transactions
  23. automatically generated data - logs etc.
    user profiles
    metadata
    types of data generated through web page visits
  24. clickstream analysis
    the process of using web available data to better understand users
  25. 1.determine the lifetime value of clients
    2.design cross-marketing strategies across products
    3.evaluate promotional campaigns
    4.target electronic ads & coupons at user groups based on user access patterns
    5.predict user behavior
    6.present dynamic information to users based on their interests & profiles
    applications of Web mining
Author
mjweston
ID
242028
Card Set
Text & Web Mining B - Exam III
Description
Big Data
Updated