A-Hierarchical-Approach-to-POMDP-Planning-and-ExecutionPOMDP规划与执行一个分层的方法.ppt_163文库

资源描述

1、Hierarchical Methods forPlanning under UncertaintyThesis ProposalJoelle PineauThesis Committee:Sebastian Thrun,ChairMatthew MasonAndrew MooreCraig Boutilier,U.of TorontoJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyIntegrating robots in living environmentsThe robots

2、 role:-Social interaction-Mobile manipulation-Intelligent reminding-Remote-operation-Data collection/monitoringJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyA broad perspectiveGOAL=Selecting appropriate actionsUSER+WORLD+ROBOTACTIONS OBSERVATIONSBeliefstateSTATEJoel

3、le PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyCause#1:Non-deterministic effects of actionsCause#2:Partial and noisy sensor informationCause#3:Inaccurate model of the world and the userWhy is this a difficult problem?UNCERTAINTYJoelle PineauThesis Proposal:Hierarchical M

4、ethods for Planning under UncertaintyCause#1:Non-deterministic effects of actionsCause#2:Partial and noisy sensor informationCause#3:Inaccurate model of the world and the userWhy is this a difficult problem?UNCERTAINTYA solution:Partially Observable MarkovDecision Processes(POMDPs)S3o1,o2S1o1,o2S2o1

5、,o2a1a2Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe truth about POMDPs Bad news:Finding an optimal POMDP action selection policy is computationally intractable for complex problems.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under Uncertainty

6、The truth about POMDPs Bad news:Finding an optimal POMDP action selection policy is computationally intractable for complex problems.Good news:Many real-world decision-making problems exhibit structure inherent to the problem domain.By leveraging structure in the problem domain,I propose an algorith

7、m that makes POMDPs tractable,even for large domains.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyHow is it done?Use a“Divide-and-conquer”approach:We decompose a large monolithic problem into a collection of loosely-related smaller problems.DialoguemanagerHealthman

8、agerSocialmanagerRemindingmanagerJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThesis statementDecision-making under uncertaintycan be made tractable for complex problemsby exploiting hierarchical structurein the problem domain.Joelle PineauThesis Proposal:Hierarchi

9、cal Methods for Planning under UncertaintyOutline Problem motivationPartially observable Markov decision processes The hierarchical POMDP algorithm Proposed researchJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyPOMDPs within the family of Markov modelsMarkov ChainHi

10、dden Markov Model(HMM)Markov Decision Process(MDP)Partially Observable MDP(POMDP)Uncertainty in sensor input?nonoControlproblem?yesyesJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyPOMDP parameters:Initial belief:b0(s)=Pr(so=s)Observation probabilities:O(s,a,o)=Pr(o|

11、s,a)Transition probabilities:T(s,a,s)=Pr(s|s,a)Rewards:R(s,a)HMMWhat are POMDPs?Components:Set of states:sSSet of actions:aASet of observations:oO 0.50.51MDPS2Pr(o1)=0.9Pr(o2)=0.1S1Pr(o1)=0.5Pr(o2)=0.5a1a2S3Pr(o1)=0.2Pr(o2)=0.8Joelle PineauThesis Proposal:Hierarchical Methods for Planning under Unce

12、rtaintyA POMDP example:The tiger problemS1“tiger-left”Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15S2“tiger-right”Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85Actions=listen,open-left,open-rightReward Function:R(a=listen)=-1R(a=open-right,s=tiger-left)=10R(a=open-left,s=tiger-left)=-100Joelle PineauThes

13、is Proposal:Hierarchical Methods for Planning under UncertaintyWhat can we do with POMDPs?1)State tracking:After an action,what is the state of the world,st?2)Computing a policy:Which action,aj,should the controller apply next?Very hard!Not so hard.bt-1?at-1otRobot:St-1stWorld:Control layer:.?Joelle

14、 PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:State trackingS1“tiger-left”S2“tiger-right”Belief vectorb0BeliefJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:State trackingS1“tiger-left”S2“tiger-right”Bel

15、ief vectorb0Beliefobs=growl-leftaction=listenJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:State trackingb1obs=growl-leftS1“tiger-left”S2“tiger-right”Belief vectorBeliefb0action=listen baoPsbassPasoPsbSsjjiiij,|,|,|01Joelle PineauThesis Proposal:Hi

16、erarchical Methods for Planning under UncertaintyPolicy OptimizationWhich action,aj,should the controller apply next?In MDPs:Policy is a mapping from state to action,:si aj In POMDPs:Policy is a mapping from belief to action,:b ajRecursively calculate expected long-term reward for each state/belief:

17、Find the action that maximizes the expected reward:)(),|Pr(),(max)(1jNjijiaisVassasRsV)(),|Pr(),(maxarg)(1jNjijiaisVassasRsJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:Optimal policyBelief vector:open-leftopen-rightlistenS1“tiger-left”S2“tiger-rig

18、ht”Optimal policy:Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyFinite-horizon POMDPs are in worse-case doubly exponential:Infinite-horizon undiscounted stochastic POMDPs are EXPTIME-hard,and may not be decidable(|n|).POMDPComplexity(per step ofvalue iteration)MDPre

19、cursiveupper-boundTimeSpaceComplexity of policy optimizationnOAS|2|nOA|12|OnAS|1|OnA|2AS|S|nJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe essence of the problem How can we find good policies for complex POMDPs?Is there a principled way to provide near-optimal po

20、licies in reasonable time?Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyOutline Problem motivation Partially observable Markov decision processesThe hierarchical POMDP algorithm Proposed researchJoelle PineauThesis Proposal:Hierarchical Methods for Planning under Un

21、certaintyA hierarchical approach to POMDP planningKey Idea:Exploit hierarchical structure in the problem domain to break a problem into many“related”POMDPs.What type of structure?Action set partitioningActInvestigateHealthMoveNavigateCheckPulseAskWhereLeft Right Forward BackwardCheckMedssubtaskabstr

22、actactionJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyAssumptionsEach POMDP controller has a subset of Ao.Each POMDP controller has full state set S0,observation set O0.Each controller includes discriminative reward information.We are given the action set partition

23、ing graph.We are given a full POMDP model of the problem:So,Ao,Oo,Mo.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:An action hierarchyPinvestigate=S0,Ainvestigate,O0,MinvestigateAinvestigate=listen,open-rightactopen-leftinvestigateopen-rightlistenJ

24、oelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyOptimizing the“investigate”controllerS1“tiger-left”S2“tiger-right”Locally optimal policy:Belief vector:open-rightlistenJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe tiger problem:An a

25、ction hierarchyPact=S0,Aact,O0,MactAact=open-left,investigateactopen-leftinvestigateopen-rightlistenBut.R(s,a=investigate)is not defined!Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyModeling abstract actionsInsight:Use the local policy of corresponding low-level co

26、ntroller.General form:R(si,ak)=R(si,Policy(controllerk,si)Example:R(s=tiger-left,ak=investigate)=open-right listen open-lefttiger-left 10-1 -100tiger-right -100-1 10Policy(investigate,s=tiger-left)=open-rightJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyOptimizing t

27、he“act”controllerS1“tiger-left”S2“tiger-right”Locally optimal policy:investigateBelief vector:open-leftJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe complete hierarchical policyS1“tiger-left”S2“tiger-right”Hierarchical policy:Belief vector:open-leftopen-rightlis

28、tenJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyThe complete hierarchical policyS1“tiger-left”S2“tiger-right”Hierarchical policy:open-leftopen-rightlistenOptimal policy:Belief vector:Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyRe

29、sults for larger simulation domainsPOMDPH-POMDPMDPNavigation Problem:|S|=11,|A|=6,|O|=6 CPU Time(secs):1119.932.840.000654 Average Reward:12.512.20.0Dialogue Problem:|S|=20,|A|=30,|O|=27 CPU Time(secs):24hrs77.996.46 Average Reward:64.4353.33%Correct actions:93.280.0Joelle PineauThesis Proposal:Hier

30、archical Methods for Planning under UncertaintyRelated work on hierarchical methodsHierarchical HMMsFine et al.,2019Hierarchical MDPsDayan&Hinton,1993;Dietterich,2019;McGovern et al.,2019;Parr&Russell,2019;Singh,1992.Loosely-coupled MDPsBoutilier et al.,2019;Dean&Lin,2019;Meuleau et al.2019;Singh&Co

31、hn,2019;Wang&Mahadevan,2019.Factored state POMDPsBoutilier et al.,2019;Boutilier&Poole,2019;Hansen&Feng,2000.Hierarchical POMDPsCastanon,2019;Hernandez-Gardiol&Mahadevan,2019;Theocharous et al.,2019;Wiering&Schmidhuber,2019.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under Uncerta

32、intyOutline Problem motivation Partially observable Markov decision processes The hierarchical POMDP algorithmProposed researchJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyProposed research1)Algorithmic design2)Algorithmic analysis3)Model learning4)System developme

33、nt and application Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyResearch block#1:Algorithmic designGoal 1.1:Developing/implementing hierarchical POMDP algorithm.Goal 1.2:Extending H-POMDP for factorized state representation.Goal 1.3:Using state/observation abstract

34、ion.Goal 1.4:Planning for controllers with no local reward information.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyAssumption#2:“Each POMDP controller has full state set S0,and observation set O0.”Can we reduce the number of states/observations,|S|and|O|?Goal 1.3:

35、State/observation abstractionJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyAssumption#2:“Each POMDP controller has full state set S0,and observation set O0.”Can we reduce the number of states/observations,|S|and|O|?Yes!Each controller only needs subset of state/obse

36、rvation features.What is the computational speed-up?Goal 1.3:State/observation abstractionNavigateLeftRight Forward BackwardInvestigateHealthCheckPulseCheckMeds POMDP recursiveupper-boundTime complexity:nOAS|2|12|OnASJoelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyGoa

37、l 1.4:Local controller reward informationAssumption#3:“Each controller includes some amount of discriminative reward information.”Can we relax this assumption?Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyGoal 1.4:Local controller reward informationAssumption#3:“Eac

38、h controller includes some amount of discriminative reward information.”Can we relax this assumption?Possibly.Use reward shaping to select policy-invariant reward function.What is the benefit?H-POMDP could solve problems with sparse reward functions.Joelle PineauThesis Proposal:Hierarchical Methods

39、for Planning under UncertaintyResearch block#2:Algorithmic analysisGoal 2.1:Evaluating performance of the H-POMDP algorithm.Goal 2.2:Quantifying the loss due to the hierarchy.Goal 2.3:Comparing different possible decompositions of a problem.Joelle PineauThesis Proposal:Hierarchical Methods for Plann

40、ing under UncertaintyGoal 2.1:Performance evaluationHow does the hierarchical POMDP algorithm compare to:Exact value function methods Sondik,1971;Monahan,1982;Littman,2019;Cassandra et al,2019.Policy search methods Hansen,2019;Kearns et al.,2019;Ng&Jordan,2000;Baxter&Bartlett,2000.Value approximatio

41、n methods Parr&Russell,2019;Thrun,2000.Belief approximation methods Nourbakhsh,2019;Koenig&Simmons,2019;Hauskrecht,2000;Roy&Thrun,2000.Memory-based methods McCallum,2019.Consider problems from POMDP literature and dialogue management domain.Joelle PineauThesis Proposal:Hierarchical Methods for Plann

42、ing under UncertaintyGoal 2.2:Quantifying the lossThe hierarchical POMDP planning algorithm provides an approximately-optimal policy.How“near-optimal”is the policy?Subject to some(very restrictive)conditions:“The value function of top-level controlleris an upper-bound on the valueof the approximatio

43、n.”Can we loosen the restrictions?Tighten the bound?Find a lower-bound?AtopA1.Vtop(b)Vactual(b)Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyGoal 2.3:Comparing different decompositionAssumption#4:“We are given an action set partitioning graph.”What makes a good hier

44、archical action decomposition?Comparing decompositions is the first step towards automatic decomposition.ManufactureExamineInspectReplacea1a2ManufactureReplaceExamine Inspecta1a2a3Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyResearch block#3:Model learningGoal 3.1:

45、Automatically generating good action hierarchies.Assumption#4:“We are given an action set partitioning graph.”Can we automatically generate a good hierarchical decomposition?Maybe.It is being done for hierarchical MDPs.Goal 3.2:Including parameter learning.Assumption#5:“We are given a full POMDP mod

46、el of the problem.”Can we introduce parameter learning?Yes!Maximum-likelihood parameter optimization(Baum-Welch)can be used for POMDPs.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyTouchscreen inputSpeech utteranceResearch block#4:System development and applicationG

47、oal 4.1:Building an extensive dialogue manager Touchscreen messageSpeech utteranceDialogue ManagerRemindermessageRobot sensor readingsMotion commandStatus informationFacemail operationsRobot moduleReminding moduleTeleoperation moduleUserRemote-controlcommandJoelle PineauThesis Proposal:Hierarchical

48、Methods for Planning under UncertaintyAn implemented scenarioPhysiotherapyPatientroomRobothomeProblem size:|S|=288,|A|=14,|O|=15State Features:RobotLocation,UserLocation,UserStatus,ReminderGoal,UserMotionGoal,UserSpeechGoalTest subjects:3 elderly residents in assisted living facilityJoelle PineauThe

49、sis Proposal:Hierarchical Methods for Planning under UncertaintyContributions Algorithmic contribution:A novel POMDP algorithm based on hierarchical structure.Enables use of POMDPs for much larger problems.Application contribution:Application of POMDPs to dialogue management is novel.Allows design o

50、f robust robot behavioural managers.Joelle PineauThesis Proposal:Hierarchical Methods for Planning under UncertaintyResearch schedule1)Algorithmic design/implementation2)Algorithmic analysis3)Model learning4)System development and application5)Thesis writingfall 01spring/summer 02spring/summer/fall

展开阅读全文