1、AutoTutor:An Intelligent Tutoring System with Mixed Initiative DialogArt GraesserUniversity of MemphisDepartment of Psychology&the Institute for Intelligent SystemsSupported on grants from the NSF,ONR,ARI,IDA,IES,US Census Bureau,and CHI SystemsInterdisciplinaryApproachComputer SciencePsychologyComp
2、utational LinguisticsEducationOverviewlBrief comments on my research on question asking and answeringlPrimary focus is on AutoTutor-a collaborative reasoning and question answering systemOverview of my Research on QuestionslPsychological Models Question asking(PREG,ONR,NSF,ARI)Question answering(QUE
3、ST,ONR)lComputer Artifacts Tutor(AutoTutor,Why/AutoTutor,Think like a commander,NSF,ONR,ARI,CHI Systems)Survey question critiquer(QUAID,US Census,NSF)Point&Query software(P&Q,ONR)Query-based information retrieval(HURA Advisor,IDA)AutoTutor Collaborative reasoning and question answering in tutorial d
4、ialogThink Like a Commander Vignettes1 Trouble in McLouth2 Save the Shrine3 The Recon Fight4 A Shift In Forces5 The Attack Begins6 The Bigger Picture7 Looking Deep8 Before the Attack9 Meanwhile Back at the RanchKeep Focus on Mission?Highers Intent?Model a Thinking Enemy?Consider Effects of Terrain?U
5、se All Assets Available?Consider Timing?See the Bigger Picture?Visualize the BattlefieldAccurately?-Realistic Space-Time ForecastDynamically?-Entities Change Over TimeProactively?-What Can I Make Enemy DoConsider Contingencies and Remain Flexible?What does AutoTutor do?Asks questions and presents pr
6、oblems Why?How?What-if?What is the difference?Evaluates meaning and correctness of the learners answers(LSA and computational linguistics)Gives feedback on answersFace displays emotions+some gesturesHintsPrompts for specific informationAdds information that is missedCorrects some bugs and misconcept
7、ionsAnswers student questionHolds mixed-initiative dialog in natural languagePedagogical Design Goals Simulate normal human tutors and ideal tutorsActive construction of student knowledge rather than information delivery systemCollaborative answering of deep reasoning questions Approximate evaluatio
8、n of student knowledge rather than detailed student modelingA discourse prosthesisFeasibility of Natural Language Dialog in Tutoring lLearners are forgiving when the tutors dialog acts are imperfect.lThey are even more forgiving when the bar is set low during instructions.lThere are learning gains.l
9、Learning is not correlated with liking.Low ExpectedPrecisionHigh ExpectedPrecisionLow Common GroundYESMAYBEHigh Common GroundMAYBENODEMOHuman TutorslAnalyze hundreds of hours human tutors Research methods in college students Basic algebra in 7th grade Typical unskilled cross-age tutorslStudies from
10、the Memphis labs Graesser&Person studieslStudies from other labs Chi,Evens,McArthur Characteristics of students that we wish were betterlStudent question askinglComprehension calibrationlSelf-regulated learning,monitoring,&and error correctionlPrecise,symbolic articulation of knowledgelGlobal integr
11、ation of knowledge Distant anaphoric reference Analogical reasoning Application of principles to a practical problemPedagogical strategies not used by unskilled tutorslSocratic method(Collins,Stevens)lModeling-scaffolding-fading(Rogoff)lReciprocal training(Brown,Palincsar)lAnchored Learning(Bransfor
12、d,Vye,CTGV)lError diagnosis&repair(Anderson,van Lehn,Lesgold)lBuilding on prerequisites(Gagne)lCascade techniques(van Lehn,Schank)lSophisticated motivational techniques(Lepper)What can AutoTutor(and most human tutors)handle?Correct InformationErrors and MisconceptionsTutor expects and Student expres
13、sesYesYesTutor correctsTutor expects butStudent does not expressYesTutor helps to fill inYesNot manifestedTutor does not expect but Student expressesNo(or rarely)Tutor ignoresNo(or rarely)Tutor ignoresAutoTutorLanguage ExtractionSpeech actclassifierLatentSemanticAnalysisCurriculumScriptTalkinghead w
14、ithgesturesDialog ManagementProblem selectionManaging One AutoTutor TurnlShort feedback on the students previous turnlAdvance the dialog by one or more dialog moves that are connected by discourse markerslEnd turn with a signal that transfers the floor to the student Question Prompting hand gesture
15、Head/gaze signalDialog MovesPositive immediate feedback:“Yeah”“Right!”Neutral immediate feedback:“Okay”“Uh huh”Negative immediate feedback:“No”“Not quite”Pump for more information:“What else?”Hint:“How does tossing the pumpkin affect horizontal velocity?”Prompt for specific information:“Vertical acc
16、eleration does not affect horizontal _.”Assert:“Vertical acceleration does not affect horizontal velicity.”Correct:“Air resistance is negligible”Repeat:“So,once again,how does tossing the pumpkin affect horizontal velocity?”Summarize:“So to recap,succinct summary.”Discourse Marker RelationsSample di
17、alogTutor:This might help.If the runner speeds up will the pumpkin land behind the runner,next to the runner,or in front of the runner?The pumpkin will land behind the runner.Tutor:Right.What is the horizontal force applied to the pumpkin after it leaves the runners hands?There is no horizontal forc
18、e applied to the pumpkin after it leaves the runners hands.Tutor:Right.expectation(1):0.836expectation(2):0.851expectation(3):0.584misconception(1)0.371misconception(2):0.605As zero force is acting on it,it will haveconstant velocity hence it will land on his head.51.66.44Expectation-1 The pumpkin h
19、as the same horizontal velocity as the runner.Expectation-2 Zero force is needed to keep an object going with constant velocity.Expectation-3 Vertical forces on the pumpkin do not affect its horizontal velocity.The horizontal velocity of the pumpkin is the same as the runner.The horizontal velocity
20、of the pumpkin is the same as the runner.99.66.87Expectation-1 The pumpkin has the same horizontal velocity as the runner.Expectation-2 Zero force is needed to keep an object going with constant velocity.Expectation-3 Vertical forces on the pumpkin do not affect its horizontal velocity.How does Why/
21、AutoTutor select the next expectation?lDont select expectations that the student has covered cosine(student answers,expectation)threshold lFrontier learning,zone of proximal developmentSelect highest sub-threshold expectationlCoherenceSelect next expectation that has highest overlap with previously
22、covered expectation lPivotal expectationsHow does AutoTutor know which dialog move to deliver?Dialog Advancer Network(DAN)for mixed-initiative dialog15 Fuzzy production rules Quality of the students assertion(s)in preceding turnStudent ability levelTopic coverageStudent verbosity(initiative)Hint-Pro
23、mpt-Assertion cycles for expected good answersDialog Advancer NetworkHint-Prompt-Assertion Cycles to Cover Good Expectations Cycle fleshes out one expectation at a timeExit cycle when:cos(S,E)TS=student input E=expectation T=thresholdHintPromptAssertionHintAssertionPromptWho is delivering the answer
24、?STUDENT PROVIDES INFORMATIONPumpHintPromptAssertionTUTOR PROVIDES INFORMATION Correlations between dialog moves and student abilityQuestion TaxonomyQUESTION CATEGORYGENERIC QUESTION FRAMES AND EXAMPLES1.Verification Is X true or false?Did an event occur?Does a state exist?2.Disjunctive Is X,Y,or Z
25、the case?3.Concept completion Who?What?When?Where?4.Feature specification What qualitative properties does entity X have?5.QuantificationWhat is the value of a quantitative variable?How much?How many?6.Definition questions What does X mean?7.Example questions What is an example or instance of a cate
26、gory?).8.ComparisonHow is X similar to Y?How is X different from Y?9.InterpretationWhat concept/claim can be inferred from a static or active data pattern?10.Causal antecedentWhat state or event causally led to an event or state?Why did an event occur?Why does a state exist?How did an event occur?Ho
27、w did a state come to exist?11.Causal consequenceWhat are the consequences of an event or state?What if X occurred?What if X did not occur?12.Goal orientationWhat are the motives or goals behind an agents action?Why did an agent do some action?13.Instrumental/proceduralWhat plan or instrument allows
28、 an agent to accomplish a goal?How did agent do some action?14.EnablementWhat object or resource allows an agent to accomplish a goal?15.ExpectationWhy did some expected event not occur?Why does some expected state not exist?16.JudgmentalWhat value does the answerer place on an idea or advice?What d
29、o you think of X?How would you rate X?Speech Act ClassifierAssertions Questions(16 categories)DirectivesMetacognitive expressions(“Im lost”)Metacommunicative expressions(“Could you say that again?”)Short Responses95%Accuracy on tutee contributionsA New Querybased Information Retrieval System(Louwers
30、e,Olney,Mathews,Marineau,HiteMitchell,Graesser,2003)Input context:Text and Screen Select Highest Matching DocumentSyntactic ParserLexiconsSurface cuesFrozen expressionsWord particles of question category Input speech actClassify speech act QUESTs 16 question categories,assertion,directive,otherAugme
31、nt retrieval cuesSearch documents via LSAEvaluations of AutoTutorLEARNING GAINS(effect sizes).42Unskilled human tutors(Cohen,Kulik,&Kulik,1982).75AutoTutor(7 experiments)(Graesser,Hu,Person)1.00Intelligent tutoring systems PACT(Anderson,Corbett,Koedinger)Andes,Atlas(VanLehn)2.00(?)Skilled human tuto
32、rsLearning Gains(Effect Sizes)Spring 2002 EvaluationsConceptual Physics(VanLehn&Graesser,2002)Four conditions1.Human tutors2.Why/Atlas3.Why/AutoTutor4.Read control86 College StudentsMeasures in Spring EvaluationlMultiple Choice Test Pretest and posttest(40 multiple choice questions in each)lEssays g
33、raded by 6 physics experts 4 pretest and 4 posttest essays Expectations versus misconceptions Wholistic gradeslGeneric principles and misconceptions(fine-grained)lLearner perceptionslTime on TasksEffect Sizes on Learning Gains(pretest to posttest,no differences among tutoring conditions)Fall 2002 Ev
34、aluationsConceptual Physics(Graesser,Moreno,et al.,2003)Three tutoring conditions1.Why/AutoTutor2.Read textbook control3.Read nothing63 subjectsMultiple Choice Scores 2002-3 EvaluationsComputer Literacy(Graesser,Hu,et al.,2003)2 Tutoring Conditions1.AutoTutor2.Read nothing4 Media Conditions 1.Print2
35、.Speech3.Speech+Head4.Speech+Head+Print96 subjectsDeep Reasoning QuestionsLATENT SEMANTIC ANALYSISSignal Detection AnalysesRecall,Precision,and F-measureWhat Expectations are LSA-worthy?Compute correlation between:(a)Experts ratings of whether essay answers have expectation E(b)Maximum LSA cosine be
36、tween E and all possible combinations of sentences in essayA high correlation means the expectation is LSA-worthyExpectations and Correlations(expert ratings,LSA)lAfter the release,the only force on the balls is the force of the moons gravity (r=.71)lA larger object will experience a smaller acceler
37、ation for the same force(r=.12)lForce equals mass times acceleration (r=.67)lThe boxes are in free fall (r=.21)OTHER EMPIRICAL EVALUATIONSAssessment of Dialogue ManagementBystander Turing testParticipants rate whether particular dialog moves in conversations were generated by AutoTutor or by skilled
38、 human tutors.Bystander saysComputer said itBystander saysHuman said itReality:Computer said itHit.51Miss.49Reality:Human said itFalse alarm.53CR.47ASL Model 501 Eye TrackerQUESTION4%ANSWER7%OFF20%TALKING HEAD40%DISPLAY29%(MAINLY KEYBOARD)Percentage of Time Allocated to Interface Components What Con
39、versational Agents Facilitate Learning?Correlation matrix for DVsLikeCompCredQualitySyncComp.50*Cred.51*.33*Quality.54*.59*.49*Sync.56*.54*.31*.53*Learning.03.07.02.04.03AutoTutor CollaborationslUniversity of Pittsburgh(VanLehn)ONR,physics intelligent tutoring systems,Why2lUniversity of Illinois,Chi
40、cago(Wiley,Goldman)NSF/ROLE,plate tectonics,eye tracking,critical stancelOld Dominion and Northern Illinois University(McNamara,Magliano,Millis,Wiemer-Hastings)IERI,science text comprehension.lMIT Media LabNSR/ROLE,Learning Companion,emotion sensors(Picard,Reilly)BEAT,gesture,emotion and speech gene
41、rator(Cassell,Bickmore)lCHI Systems(Zachary,Ryder)Army SBIR,Think Like a CommanderlInstitute for Defense Analyses(Fletcher,Toth,Foster)ONR/OSD,Human Use Regulatory Affairs Advisor,research ethics,web site with agentCollaboration with MIT Media LablAffect Computing Lab Frustration Anger Confusion Eur
42、eka highs Contemplation flow experiencelInferring emotions from sensors Blue Eyes Mouse-glove pressure and sweat Butt lDialog moves sensitive to emotionsForthcoming AutoTutor developmentslLanguage and discourse enhancements Weave in deeper semantic processing components Natural language generation f
43、or prompts lImproved animated conversational agentl3-d simulation for enhancing the articulation of explanationslImprove authoring toolslEvolution of the content of curriculum scripts through tutoring experienceThe Long-term VisionFuture human-computer interfaces will be conversational:Just like peo
44、ple talking face to face.Avatars will tutor and mentor learners on the web:students,soldiers,citizens,customers,elderly,special populations,low and high literacy,low and high motivationLearning modules will be accessed and available throughout the globe:a 24 by 7 virtual university.Learning tailored
45、 to learners abilities,talents,interests,motivation,unique histories.Proposed NSF Project will Augment AutoTutorlAutoTutor enhanced with more problem solving and complex decision making with Franklins Intelligent Distribution AgentlCourseware from MIT,Carnegie Mellon,Pittsburgh,Wisconsin,Illinois,mi
46、litary,and corporations(Merlot,Concord Consortium)lSCORM learning software standards established in the militarylUniversity of Colorado speech recognitionlFedEx Institute as ADL Co-Lab with DoD and Bureau of Labor,for expansion to business and industry?AutoTutor,Atlas,and Why2 are perhaps the most s
47、ophisticated tutorial dialogue projects in the intelligent tutoring systems community(James Lester,AI Magazine,2001)Are there properties of the expectations that correlate with LSA-worthiness?lNumber of words.07lNumber of content words.01lVector length of expectation.05lNumber of glossary terms-.03l
48、Number of infrequent words.23lNumber of negations-.29*lNumber of relative terms,symbols .06quantifiers,deictic expressionsChallenges in use of LSA in AutoTutorlWidely acknowledged limitations of LSA Negation Word order Structural composition Size of descriptionl“I thought I said that already!”coverage imperfectionlNeed for larger corpus of misconceptions Expectation d of LSA with experts=.79 Misconception d of LSA with experts=.57 lCoordinating LSA with symbolic systems