1、智能计算研究中心XIV. Bayesian networks(section 1-3)Autumn 2012Instructor: Wang XiaolongHarbin Institute of Technology, Shenzhen Graduate SchoolIntelligent Computation Research Center(HITSGS ICRC)2Outlines Syntax Semantics Parameterized distributions3Bayesian networks A simple, graphical notation for conditi
2、onal independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link directly influences) a conditional distribution for each node given its parents:P (Xi | Parents (Xi) In the simplest case, conditional
3、 distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values4Example Topology of network encodes conditional independence assertions:Weather is independent of the other variablesToothache and Catch are conditionally independ
4、ent given Cavity5Example Im at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesnt call. Sometimes its set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects causal knowledge: A burglar can s
5、et the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call6Example contd.7CompactnessA CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent valuesEach row requires one number p for Xi = true(the number for Xi
6、 = false is just 1-p)If each variable has no more than k parents, the complete network requires O(n 2k) numbersI.e., grows linearly with n, vs. O(2n) for the full joint distributionFor burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)8Semantics The semantics of Bayesian networks: A repres
7、entation of the joint probability distribution.(Numerical semantics) An encoding of a collection of conditional independence statements. (Topological semantics)9Numerical semantics The full joint distribution is defined as the product of the local conditional distributions:10Numerical semantics The
8、full joint distribution is defined as the product of the local conditional distributions:11Topological semanticsTopological semantics: 1. Each node is conditionally independent of its non-descendants given its parents12Markov blanket2.Each node is conditionally independent of all others given its Ma
9、rkov blanket: parents + children + childrens parentsTheorem: Topological semantics Numerical semantics13Constructing Bayesian networks1. Choose an ordering of variables X1, ,Xn2. For i = 1 to n add Xi to the network select parents from X1, ,Xi-1 such thatP (Xi | Parents(Xi) = P (Xi | X1, . Xi-1)This
10、 choice of parents guarantees:P (X1, ,Xn) = i =1 P (Xi | X1, , Xi-1)(chain rule)= i =1P (Xi | Parents(Xi)(by construction)nn14Example Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?15Example Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?NoP(A | J, M) = P(A | J)? P(A | J, M
11、) = P(A)?16Example Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?NoP(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? P(B | A, J, M) = P(B)?17Example Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?NoP(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B |
12、A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M) = P(E | A)?P(E | B, A, J, M) = P(E | A, B)?18Example Suppose we choose the ordering M, J, A, B, EP(J | M) = P(J)?No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? NoP(B | A, J, M) = P(B | A)? YesP(B | A, J, M) = P(B)? NoP(E | B, A ,J, M
13、) = P(E | A)? NoP(E | B, A, J, M) = P(E | A, B)? Yes19Example contd. Deciding conditional independence is hard in noncausal directions (Causal models and conditional independence seem hardwired for humans!) Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed20Compact conditional distribut
14、ionsCPT grows exponentially with number of parents (O(2k) )CPT becomes infinite with continuous-valued parent or childSolution: canonical distributions that are defined compactlyDeterministic nodes are the simplest case:21Compact conditional distributions contd.Noisy-OR distributions model multiple
15、noninteracting causes1) Parents U1Uk include all causes (can add leak node)2) Independent failure probability qi for each cause aloneNumber of parameters linear in number of parents (O(k)22Hybrid (discrete+continuous) networksDiscrete (Subsidy? and Buys?); continuous (Harvest and Cost)Option 1: disc
16、retizationpossibly large errors, large CPTsOption 2: finitely parameterized canonical families1) Continuous variable, discrete+continuous parents (e.g., Cost)2) Discrete variable, continuous parents (e.g., Buys?)23Continuous child variablesNeed one conditional density function for child variable giv
17、en continuous parents, for each possible assignment to discrete parentsMost common is the linear Gaussian model, e.g.,:Mean Cost varies linearly with Harvest, variance is fixedLinear variation is unreasonable over the full range, but works OK if the likely range of Harvest is narrow24Continuous chil
18、d variables All-continuous network with LG distributions full joint distribution is a multivariate Gaussian Discrete+continuous LG network is a conditional Gaussian network i.e., a multivariate Gaussian over all continuous variables for each combination of discrete variable values25Discrete variable
19、 with continuous parentsProbability of Buys given Cost should be a “soft” threshold:Probit distribution uses integral of Gaussian:26Why the probit?1.Its sort of the right shape2. Can view as hard threshold whose location is subject to noise27Discrete variable contd. Sigmoid (or logit) distribution a
20、lso used in neural networks: Sigmoid has similar shape to probit but much longer tails:28SummaryBayesian networks provide a natural representation for (causally induced) conditional independenceTopology + CPTs = compact representation of joint distributionGenerally easy for domain experts to constructCanonical distributions (e.g., noisy-OR) = compact representation of CPTsContinuous variables parameterized distributions (e.g., linear Gaussian)