@nanmeng
2016-05-19T08:43:41.000000Z
字数 4208
阅读 1825
notes
Probabilistic_Graphical_Models
Why factors?
* Fundamental building block for defining distributions in high-dimensional spaces
* Set of basic operations for manipulating these probability distributions
An example for bayesian network:
The calculation rule: chain rule
The illustration of calculation the joint distribution:
(how to calculate with the value)
Bayesian Network:
- A directed acyclic graph(DAG)
- For each node a CPD
As shown below, in the calculation of Bayesian Network, the summation can be calculated on ''part'' of the whole equation.
An example of Genetic Inheritance
- Causal Reasoning(top down)
- Evidential Reasoning(bottom up)
- Intercausal Reasoning(flow information between two causes)
An illustration of the Intercausal Reasoning is hard to figure out:
Student aces the SAT contribute to the increase prob. of and the prob. of .
An example of active trail:
A trail is active if: it has no v-structures like
Notice: v-structure is a structure that two nodes point to the same one. (like ).
Then the rules for what condition influence the information flow is like what shown below.
(The final line in the table of the picture above is: and all of its descendants not in | either if or one of its descendants is in )
- Types of three-variable structures: chain (aka causal trail or evidential trail), common parent
(aka common cause), v-structure (aka common effect)- Property: A variable is independent of its non-descendants given its parents
- Property: A variable is independent of all other variables in the network given its Markov blanket, which consists of its parents, its children, and its co-parents
materials:Probabilistic Graphical Models 10-708 Recitation 1 Handout
basic for conditional independence & symbol
The definition of conditional independence
An example of conditional independence:
when people have not been told that the coin is a fair coin, then the prob of second time toss the coin and get head is higher given the first time get head. However, when people have been told that this is a fair coin, the two times tosses are independent with each other.
Recap:
A new question:
- Theorem: If factorizes over , and then satisfies
- red line: all the non-descendants of Letter
- descendants of Letter are Job, Happy.
Example:
G1 is the I-map of P1, while G2 is the I-map of P1 or P2.
I-Maps
• I-Map: A graph G is an I-map for a distribution P if
• Minimal I-Map: A graph G is a minimal I-map for a distribution P if you cannot remove any
edges from G and have it still be an I-map for P
• Perfect I-Map: A graph G is a perfect I-map for a distribution P if
• I-Equivalence: Two graphs G1 and G2 are I-equivalent if
Illustrate one example in the picture: We know that when knowing the parrent of a node then it is independent with its non-descendants.Thus
Thus, the first equation in the picture is equall to the second equation.
- What independence assumption does the Naive Bayes model make?
Given the class variable, each observed variable is independent of the other observed variables.
If given the class, the variables are independent with each other.
green: prior probabilities of two classes
blue: odds ratio
An example of Bernoulli Naive Bayes for text
SIAMIAM
- Markov Random Fields
- 3.1 Independencies in MRFs
- Two variables and are independent if there is no active trail between them; a trail is active if it doesn’t contain any observed variables.
- Property: A variable is independent of all other variables in the network given its Markov blanket, which consists of its direct neighbors in the graph.
- 3.2 Parameterization of MRFs
- Markov random fields are parameterized by a set of factors defined over cliques in the graph; factors are not distributions as they do not have to sum to 1.
- The joint probability distribution of the variables in an MRF can be written in factorized form as a normalized product of factors, i.e. where is the set of variables in the ith clique, and is the partition function.
- Memory: The amount of memory required for SMO is linear in the training set size(which allows SMO to handle very large training sets)
- Speed: SMO is fastest for linear SVMs and sparse data sets.
- relevant materials: http://research.microsoft.com/pubs/69644/tr-98-14.pdf