Text & Web Mining B - Exam III

Home

Get App

Create

establish the corpus

step in the text mining process in which documents are collected, transformed and organized
create the term-document matrix

step in the text mining process in which it is determined how many times the terms you're looking for exist in the documents that you've gathered
extract the knowledge - output is clustering models & visualization

step in the text mining process in which something is done with the information gathered
log frequencies - use logs
binary frequencies - use 1's & 0's
inverse document frequencies - specificity & frequency of words (this is the most common)

common normalization methods for representing the indices in a term-document matrix
singular value decomposition (SVD)

reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability possible
classification
clustering
association
trand analysis

four main categories of knowledge extraction
knowledge-engineering & machine-learning

two main approaches to classification
classification

categorization of a given data instance into a predetermined set of categories
clustering

an unsupervised process whereby objects are classified into "natural" groups
scatter/gather & query-specific clustering

two popular clustering methods
association

process of identifying the frequent sets that go together quantified by two basic measures of support and confidence
trend analysis

analysis in which different collections lead to different concept distributions for the same set of concepts
web mining (or web data mining)

the process of discovering intrinsic relationships from web data (text, linkage, or usage)
the web

____ is one of the largest data and text repositories
web content mining
web structure mining
web usage mining

three main areas of web mining
web content mining

refers to the extraction of useful information from Web pages (often unstructured textual content)
web crawlers

used to read through the content of a Web site automatically
authoritative pages

pages with multiple links leading to them; often respected companies or services such as the Mayo Clinic or the Department of Educations - usually analyzed by their incoming links
hubs

pages that exist primarily to redirect users to other pages (often for a fee) - usually analyzed by their outgoing links
hyperlink-induced topic search (HITS)

algorithm that calculates links to determine a page's potential as a hub or authority
web structure mining

the process of extracting useful information from the links embedded in Web documents used to identify authoritative pages and hubs
web usage mining

the extraction of useful information from data generated through Web page visits and transactions
automatically generated data - logs etc.
user profiles
metadata

types of data generated through web page visits
clickstream analysis

the process of using web available data to better understand users
1.determine the lifetime value of clients
2.design cross-marketing strategies across products
3.evaluate promotional campaigns
4.target electronic ads & coupons at user groups based on user access patterns
5.predict user behavior
6.present dynamic information to users based on their interests & profiles

applications of Web mining

Author

mjweston

242028

Card Set

Text & Web Mining B - Exam III

Description

Big Data

Updated

2013-10-22T03:07:34Z

Show Answers