@HaomingJiang 2016-08-03T08:08:39.000000Z 字数 1391 阅读 1669

Chp7 Extended Association Analysis

数据挖掘导论 笔记

Chp7 Extended Association Analysis
- 7.1 Categorical Attribute
- 7.2 Continuous Attributes

7.1 Categorical Attribute

Introduce a new “item” for each distinct attribute-value pair

Example: replace Browser Type attribute with
Browser Type = Internet Explorer
Browser Type = Mozilla
Browser Type = Mozilla

Potential Issues:
What if attribute has many possible values
Example: attribute country has more than 200 possible values
Many of the attribute values may have very low support
Potential solution: Aggregate the low-support attribute values

What if distribution of attribute values is highly skewed
Example: 95% of the visitors have Buy = No
Most of the items will be associated with (Buy=No) item
Potential solution: drop the highly frequent items

7.2 Continuous Attributes

7.2.1 Discretization-based

It is hard to set the different descreting interval.
We can use all possible intervals. But it is time expensive, and will generate redundant rules.

(PS: There is anadvanced method proposed by Approach by Srikant & Agrawal)

7.2.2 Statistics-based

Withhold the target variable from the rest of the data
Apply existing frequent itemset generation on the rest of the data
For each frequent itemset, compute the descriptive statistics for the
corresponding target variable( Frequent itemset becomes a rule by introducing the target variable as rule consequent)
Apply statistical test to determine interestingness of the rule (e.g. t-test for mean of the random variable)