1、Robot Vision Chapter 6.2IntroductionlComputer visionEndowing machines with the means to “see”lCreate an image of a scene and extract features Very difficult problem for machineslSeveral different scenes can produce identical images.lImages can be noisy .lCannot directly invert the image to reconstru
2、ct the scene.3Human Vision (1)4Human Vision (2)5Human Vision (3)6Steering an AutomobilelALVINN system Pomerleau 1991,1993Uses Artificial Neural NetworklUsed 30*32 TV image as input (960 input node)l5 Hidden nodel30 output nodeTraining regime: modified “on-the-fly”lA human driver drives the car, and
3、his actual steering angles are taken as correct labels for the corresponding inputs.lShifted and rotated images were also used for training.ALVINN has driven for 120 consecutive kilometers at speeds up to 100km/h.7Steering an Automobile-ALVINN 8Two stages of Robot Vision (1)lFinding out objects in t
4、he sceneLooking for “edges” in the imagelEdge:a part of the image across which the image intensity or some other property of the image changes abruptly.Attempting to segment the image into regions.lRegion:a part of the image in which the image intensity or some other property of the image changes on
5、ly gradually.9Two stages of Robot Vision (2)lImage processing stageTransform the original image into one that is more amendable to the scene analysis stage.Involves various filtering operations that help reduce noise, accentuate edges, and find regions.lScene analysis stageAttempt to create an iconi
6、c or a feature-based description of the original scene, providing the task-specific information.10Two stages of Robot Vision (3)lScene analysis stage produces task-specific information.If only the disposition of the blocks is important, appropriate iconic model can be (C B A FLOOR)If it is important
7、 to determine whether there is another block on top of the block labeled C, adequate description will include the value of a feature, CLEAR_C.11Averaging (1)lOriginal image can be represented as an m*n array of numbers. The numbers represent the light intensities at corresponding points in the image
8、.lCertain irregularities in the image can be smoothed by an averaging operation.lAveraging operation involves sliding an averaging widow all over the image array.12Averaging (2)lSmoothing operation thickens broad lines and eliminates thin lines and small details.lThe averaging window is centered at
9、each pixel, and the weighted sum of all the pixel numbers within the averaging window is computed. This sum then replaces the original value at that pixel.13Averaging (3)lCommon function used for smoothing is a Gaussian of two dimensions.lConvolving an image with a Gaussian is equivalent to finding
10、the solution to a diffusion equation when the initial condition is given by the image intensity field.14Averaging (4)15Edge enhancement (1)lEdge: any boundary between parts of the image with markedly different values of some property.lEdges are often related to important object properties.lEdges in
11、the image occur at places where the second derivative of the image intensity is zero.16Edge enhancement (2)17Combining Edge Enhancement with Averaging (1)lEdge enhancement alone would tend to emphasize noise elements along with enhancing edges.lTo be less sensitive to noise, both operations are need
12、ed. (First averaging and then edge enhancing)lWe can convolve the one-dimensional image with the second derivative of a Gaussian curve to combine both operation.18Combining Edge Enhancement with Averaging (2)lLaplacian is second-derivate-type operation that enhances edges of any orientation.lLaplaci
13、an of the two-dimensional Gaussian function looks like an upside-down hat, often called a sombrero function.l Entire averaging/edge-finding operation can be achieved by convolving the image with the sombrero function(Called Laplacian filtering)196.4.4 Finding RegionlAnother method for processing ima
14、ge to find “regions”lFinding regions Finding outlines20A region of the imagelA region is homogeneous.The difference in intensity values of pixels in the region is no more than some A polynomial surface of degree k can be fitted to the intensity values of pixels in the region with largest error less
15、than lFor no two adjacent regions is it the case that the union of all the pixels in these two regions satisfies the homogeneity property.lEach region corresponds to a world object or a meaningful part of one.21Split-and-merge method 1.The algorithm begins with just one candidate region, the whole i
16、mage.2.Until no more splits need be made.1.For all candidate regions that do not satisfy the homogeneity property, are each split into four equal-sized candidate regions.3.Adjacent candidate regions are merged if their pixels satisfying homogeneity property.2223Regions Found by Split Merge for a Gri
17、d-World Scene (from Fig.6.12)24“Cleaned Up” the regions found by Split-and-merge methodlEliminating very small regions (some of which are transitions between larger regions).lStraightening bounding lines.lTaking into account the known shapes of objects likely to be in the scene.256.4.5 Using Image A
18、ttributes Other Than IntensitylImage attributes other than the homogeneity Visual texturelfine-grained variation of the surface reflectivity of the objectslEx) a field of grass, a section of carpet, foliage in tree, the fur of animalslThe reflectivity variations in objects cause similar fine-grained
19、 structure in image intensity.26Methods for analyzing texturelStructural methodsRepresent regions in the image by a tessellation (花纹) of primitive “texels” small shapes comprising black and white partslStatistical methodsBased on the idea that image texture is best described by a probability distrib
20、ution for the intensity values over regions of the image.Ex) an image of a grassy field in which the blades of grass are oriented vertically a probability distribution that peaks for thin, vertically oriented regions of high intensity, separated by regions of low intensity27Other attributeslIf we ha
21、d a direct way to measure the range from the camera to objects in the scene, we could produce a “range image” and look for abrupt range differences.Range image : each pixel value represents the distance from the corresponding point in the scene to the camera.lMotion, color286.5 Scene Analysis (1)lSc
22、ene AnalysisExtracting from the image the needed information about the sceneRequires either additional images (for stereo vision) or general information about the kinds of scenes, since the scene-to-image transformation is many-to-one.lThe required knowledge very general or quite specificexplicit or
23、 implicit296.5 Scene Analysis (2)lKnowledge of surface reflectivity characteristics and shading of intensity in the image give information about the shape of smooth objects in the scene.lIconic scene analysisBuild a model of the scene or parts of the scenelFeature-based scene analysis Extracts featu
24、res of the scene needed by taskTask-oriented or purposive vision306.5.1 Interpreting Lines and Curves in the ImagelInterpreting the line drawingAssociation between scene properties and the components of a line drawinglTrihedral vertex polyhedraThe scene to contain only planar surfaces such that no m
25、ore than three surfaces intersect in a point31Three kinds of edges in Trihedral vertex polyhedra (1/2)lThere are only three kinds of ways in which two planes can intersect in a scene edge. Occlude lOne kind of edge is formed by two planes, with one of them occluding the other.l labeled in Fig. 6.15
26、with arrows ().l the arrowhead pointing along the edge such that surface doing the occluding is to the right of the arrow.32Three kinds of edges in Trihedral vertex polyhedra (2/2)BladelTwo planes can intersect such that both planes are visible in the scene.lTwo surfaces form a convex edge.lLabeled
27、with pluses (+).FordlEdge is concave.lLabeled with minus ()33Labels for Lines at Junctions34Line-labeling scene analysis (1/2)1.Labeling all of the junctions in the image as V, W, Y, or T junctions according to the shape of the junctions in the image35Line-labeling scene analysis (2/2)2.Assign +, ,
28、or labels to the lines in the image.lAn image line that connects two junctions must have a consistent labeling.lIf there is no consistent labeling, there must have been some error in converting the image into a line drawing. the scene must no have been one of trihedral polyhedra.lConstraint satisfac
29、tion problem366.5.2 Model-Based Vision (1/2)lIf, we knew that the scene contained a parallelepiped (in Figure 6.15), we could attempt to fit a projection of a parallelepiped to components of an image of this scene.l A generalized cylinders as building blocks for model constructionl Each cylinder has
30、 9 parameters.37Model-Based Vision (2/2)lAn example rough scene reconstruction of a human figureHierarchical representationEach cylinder in the model can be articulated into a set of smaller cylinders386.6 Stereo Vision and Depth InformationlDepth information can be obtained using stereo vision, whi
31、ch based on triangulation calculations using two (or more) images.lSome depth information can be extracted from a single image.The analysis of texture in the image can indicate that some elements in the scene are closer than are others.More precise depth information; If we know that a perceived obje
32、ct is on the floor and the camera height above the floor, we can calculate the distance to the object.39Depth Calculation from a Single Image40Stereo VisionlStereo vision uses triangulation.lTwo lenses whose centers are separated by a baseline, b.lThe image point of a scene point, at distance d, cre
33、ated by these lenses.lThe angles of these image points from the lens centers, , .lThe optical axes are parallel, the image planes are coplanar, and the scene point is in the same plane as that formed by two parallel optical axes.41Triangulation in Stereo Vision42The main complication in stereo visio
34、nlIn scenes containing more than one point, it must be established which pair of points in the two images correspond to the same scene point.lWe must be able to identify a corresponding pixel in the other image. correspondence problem43Techniques for correspondence problemlGeometric analysis reveals
35、 that we need only search along one dimension (epipolar line).lOne-dimensional searches can be implemented by cross-correlation of two image intensity profiles along corresponding epipolar lines.lWe do not have to find correspondences between individual pairs of image points but can do so between pairs of larger image components, such as lines.44AssignmentslPage 111112Ex.6.2, Ex. 6.3