Xem mẫu

c MIT Media Lab Perceptual Computing Learning and Common Sense Technical Rep ort dec The Inve rse Hollywo o d Problem Fr o m video to scripts and storyb oards via causal analysis Matthew Brand Th e Med ia L ab M IT Ames Street Cambridge MA USA brandmediamitedu w wwmediamitedu brand Abstract thought of as t h e inver s e Hollywo o d problem b egin with a movie end with a script and storyb oard We address the problem of visually detecting causal e vents and tting them together i nto Related v i si on work a coherent story o f the action witnessed by t h e camera We show th a t this can b e d one Early approaches t o action understanding emphasized by reasoning ab out the motions and collisions reconstruction followed by analysis lately attention of surfaces using highlevel causal constraints is turning to applying causal constraints directly to derived from psychological studies of infant v isual motion traces Kuniyoshi Inoue and Ikeuch i b eh avior These constraints are naive forms Suehiro describ ed systems that recognize of basic physical laws g overning substantiality actions in assemb ly tasks wit h simple geometric contiguity momentum and acceleration We ob jects eg blo cks These systems were intended describ e two implementations One system parses instructional videos extracting plans of action and key frames suitable for storyb oarding as front en d s for rob otic p ickandplace mimicry and emphasized scene geometry taking somewhat ad ho c Since learning will play a role in making approaches t o causality and action s u ch systems robust we intro d uce a new Presently there is a growin g literature in g esture framewo r k f o r coupling hidden Markov mo dels recognition from motions Essa with an and demonstrate its use in a second system emphasis on classication rather than interpretation that segments stereo video into actions in near of structured activity Siskind Morris blurs realtime Rather than attempt accurate l ow this distinction somewhat by using Markov mo dels l e vel vision b oth systems use highlevel causal to classify s h or t sequences of individual motions as analysis to integrate fast but sloppy pixelbased throwing dropping lifting and pushing gestures representations over time The output is suitable given relative velo city proles b etween an arm and for s ummary indexing and automated editing an ob ject Mann Jepson Siskind present c AAAI All rights reserved a system that analyzes kinematic and dynamic relations b etween ob ject s on a framebyframe basis The program nds minimal systems o f Newtonian I ntro duction equations t h at are consistent with each frame but A useful result from a vision system would b e these are not necessarily consistent over time no r an answer to the question What i s happ ening do t h ey mar k causal events All of these systems This i s a question ab out causality W ha t are the require b oth a priori knowledge of the scene eg e vents and how do earlier ones cause o r enable later handsegment at ion of event b oundaries or ob jects and ones We are exploring the hyp othesis that causal limited scenes eg w hite black backgrounds sp ecic p erception rests o n inference ab out t he motions and camera views and constraints on the s hap es and colors collisions of surfaces and pro ceeds indep endent ly of of ob jects In contrast the metho ds describ ed in this pro cesses s u ch as recognition reconstruction and pap er emphasize continuous a ction parsing integration static segmentation In this pap er we present two of information over time constraint s derived from computational mo dels of t his pro cess one heuristic psychological exp eriment meaningful ou tput and one probabilistic and trainable that incorp orate general vision eg the background may b e cluttered psychological m o de l s o f causal event p erception in and ob jects may b e t extured irregular and exible infants These systems use causal landmarks to segment video into actions and higherlevel causal Psychology of motion causality constraint s t o ensure that actions are consistent over time Each system takes a video sequence of Vision sciences traditionally take highlevel vision manipulative action a s input and outputs a planof action and selected frames showing key events the to b e concerned with static prop erties of ob jects typically their identities categories and shap es The gist o f the video useful for summary indexing relationships b etween these prop erties and visual reasoning and automated editing Gisting may b e features are correlational leading t o many prop osals App e ars in Pro ceedings of AAAI Providence RI B r a n d f o r how brains and computers may compute optimal discriminators for various sets of images could extract key event s from howto videos o f the s or t t h at demonstrate pro cedures for a ssembling furniture installing CD ROMs etc The input is a Arguably causal dynamic prop erties of ob jects video of an ob ject b eing assembled or disassembled a n d scenes are more informative more universal The output is a script describing the actions of the a n d more easily computed These prop erties repairman p lu s key frames t h at highlight imp o rta nt substantiality solidity contiguity inertia and causal event s conservation of momentum are governed by simple p hysical laws at human scales and are thus consistent From visual events to causal events across most o f visual exp erience The fact that t h e s e prop erties are causal suggests t h at a small The g ister reasons ab out ch an ges in the integrity an d numb er of qualitative rules may provide satisfactory motions of a s ingle foreground blob a connected map p s y chological and computational accounts of much of of image pixels that change due primarily to motion visual understanding The blob is obtained fr om a realtime vision system Indeed there is a growing b o dy of psychological develop ed by Wren et al Discontinuities in the evidence showing that infants are uent p e rceivers of blobs visual b ehavior signal ch an ges o f ca usa lity Fo r l awful causality and violations thereof Sp elke and Van example if the b lob has a b oundary discontinuity such d e Valle found that infants aged to months as sudden swellin g at on e p oint there is an apparent will detect a w ide range of apparent violations of the violation of the cohesion constraint explicable via t he causality of motion Sp elke Van de Valle contact constraint An agent has attached a n o b ject They prop ose that three b asic principles are active in and set it in motion causing its pixels to join the blob motion understanding by late infancy Cohesion is violated b ecause the agent fuses with the ob ject Many visual discontinuity events have The principle of contact equates physical connect causal signicance including edness with causal connectedness No action at a distance no contact without action The principle of cohesion equates ob ject integrity with individuality N o splitting no fusing This guarantees that individuality b oundaries remain stable over time unless a series o f causal events combines two o b jects into one eg via attachment or splits one i nto two eg via detach ment visual event app earance disapp earance ination deation ash acceleration discontinuity disrupted causality contac t contac t cohesion cohesion cohesion contac t c ontinuity explanatory causality animacy animacy contact contact contact animacy ... - tailieumienphi.vn
nguon tai.lieu . vn