@HaomingJiang 2016-08-03T08:08:39.000000Z 字数 1391 阅读 1351

# Chp7 Extended Association Analysis

数据挖掘导论 笔记

## 7.1 Categorical Attribute

Introduce a new “item” for each distinct attribute-value pair

Example: replace Browser Type attribute with
Browser Type = Internet Explorer
Browser Type = Mozilla
Browser Type = Mozilla

Potential Issues:
What if attribute has many possible values
Example: attribute country has more than 200 possible values
Many of the attribute values may have very low support
Potential solution: Aggregate the low-support attribute values

What if distribution of attribute values is highly skewed
Example: 95% of the visitors have Buy = No
Most of the items will be associated with (Buy=No) item
Potential solution: drop the highly frequent items

## 7.2 Continuous Attributes

### 7.2.1 Discretization-based

It is hard to set the different descreting interval.
We can use all possible intervals. But it is time expensive, and will generate redundant rules.

(PS: There is anadvanced method proposed by Approach by Srikant & Agrawal)

### 7.2.2 Statistics-based

• Withhold the target variable from the rest of the data
• Apply existing frequent itemset generation on the rest of the data
• For each frequent itemset, compute the descriptive statistics for the
corresponding target variable( Frequent itemset becomes a rule by introducing the target variable as rule consequent)

• Apply statistical test to determine interestingness of the rule (e.g. t-test for mean of the random variable)

• 私有
• 公开
• 删除