1、附录1 外文原文Source: the 21st century literature the applied undergraduate electronic communication series of practical teaching planThe information and communication engineering specialty in English ch02_1. PDF 120-124Ed: HanDing ZhaoJuMin, etcText A: An Introduction to Digital Image Processing1. Introd
2、uctionDigital image processing remains a challenging domain of programming for several reasons. First the issue of digital image processing appeared relatively late in computer history. It had to wait for the arrival of the first graphical operating systems to become a true matter. Secondly, digital
3、 image processing requires the most careful optimizations especially for real time applications. Comparing image processing and audio processing is a good way to fix ideas. Let us consider the necessary memory bandwidth for examining the pixels of a 320x240, 32 bits bitmap, 30 times a second: 10 Mo/
4、sec. Now with the same quality standard, an audio stereo wave real time processing needs 44100 (samples per second) x 2 (bytes per sample per channel) x 2(channels) = 176Ko/sec, which is 50 times less.Obviously we will not be able to use the same techniques for both audio and image signal processing
5、. Finally, digital image processing is by definition a two dimensions domain; this somehow complicates things when elaborating digital filters.We will explore some of the existing methods used to deal with digital images starting by a very basic approach of color interpretation. As a more advanced l
6、evel of interpretation comes the matrix convolution and digital filters. Finally, we will have an overview of some applications of image processing.The aim of this document is to give the reader a little overview of the existing techniques in digital image processing. We will neither penetrate deep
7、into theory, nor will we in the coding itself; we will more concentrate on the algorithms themselves, the methods. Anyway, this document should be used as a source of ideas only, and not as a source of code.2. A simple approach to image processing(1) The color data: Vector representation Bitmaps The
8、 original and basic way of representing a digital colored image in a computers memory is obviously a bitmap. A bitmap is constituted of rows of pixels, contraction of the words “Picture Element”. Each pixel has a particular value which determines its appearing color. This value is qualified by three
9、 numbers giving the decomposition of the color in the three primary colors Red, Green and Blue. Any color visible to human eye can be represented this way. The decomposition of a color in the three primary colors is quantified by a number between 0 and 255. For example, white will be coded as R = 25
10、5, G = 255, B = 255; black will be known as (R,G,B)= (0,0,0); and say, bright pink will be : (255,0,255). In other words, an image is an enormous two-dimensional array of color values, pixels, each of them coded on 3 bytes, representing the three primary colors. This allows the image to contain a to
11、tal of 256256256 = 16.8 million different colors. This technique is also known as RGB encoding, and is specifically adapted to human vision. With cameras or other measure instruments we are capable of “seeing” thousands of other “colors”, in which cases the RGB encoding is inappropriate.The range of
12、 0-255 was agreed for two good reasons: The first is that the human eye is not sensible enough to make the difference between more than 256 levels of intensity (1/256 = 0.39%) for a color. That is to say, an image presented to a human observer will not be improved by using more than 256 levels of gr
13、ay (256 shades of gray between black and white). Therefore 256 seems enough quality. The second reason for the value of 255 is obviously that it is convenient for computer storage. Indeed on a byte, which is the computers memory unit, can be coded up to 256 values. As opposed to the audio signal whi
14、ch is coded in the time domain, the image signal is coded in a two dimensional spatial domain. The raw image data is much more straightforward and easy to analyze than the temporal domain data of the audio signal. This is why we will be able to do lots of stuff and filters for images without transfo
15、rming the source data, while this would have been totally impossible for audio signal. This first part deals with the simple effects and filters you can compute without transforming the source data, just by analyzing the raw image signal as it is.The standard dimensions, also called resolution, for
16、a bitmap are about 500 rows by 500 columns. This is the resolution encountered in standard analogical television and standard computer applications. You can easily calculate the memory space a bitmap of this size will require. We have 500500 pixels, each coded on three bytes, this makes 750 Ko. It m
17、ight not seem enormous compared to the size of hard drives, but if you must deal with an image in real time then processing things get tougher. Indeed rendering images fluidly demands a minimum of 30 images per second, the required bandwidth of 10 Mo/sec is enormous. We will see later that the limit
18、ation of data access and transfer in RAM has a crucial importance in image processing, and sometimes it happens to be much more important than limitation of CPU computing, which may seem quite different from what one can be used to in optimization issues. Notice that, with modern compression techniq
19、ues such as JPEG 2000, the total size of the image can be easily reduced by 50 times without losing a lot of quality, but this is another topic. Vector representation of colors As we have seen, in a bitmap, colors are coded on three bytes representing their decomposition on the three primary colors.
20、 It sounds obvious to a mathematician to immediately interpret colors as vectors in a three-dimension space where each axis stands for one of the primary colors. Therefore we will benefit of most of the geometric mathematical concepts to deal with our colors, such as norms, scalar product, projectio
21、n, rotation or distance. This will be really interesting for some kind of filters we will see soon. Figure 1 illustrates this new interpretation: Figure 1(2) Immediate application to filters Edge DetectionFrom what we have said before we can quantify the difference between two colors by computing th
22、e geometric distance between the vectors representing those two colors. Lets consider two colors C1 = (R1,G1,B1) and C2 = (R2,B2,G2), the distance between the two colors is given by the formula :D(C1, C 2) =(R1+R2)2 +(G1+ G2)2+(B1+B2)2This leads us to our first filter: edge detection. The aim of edg
23、e detection is to determine the edge of shapes in a picture and to be able to draw a result bitmap where edges are in white on black background (for example). The idea is very simple; we go through the image pixel by pixel and compare the color of each pixel to its right neighbor, and to its bottom
24、neighbor. If one of these comparison results in a too big difference the pixel studied is part of an edge and should be turned to white, otherwise it is kept in black. The fact that we compare each pixel with its bottom and right neighbor comes from the fact that images are in two dimensions. Indeed
25、 if you imagine an image with only alternative horizontal stripes of red and blue, the algorithms wouldnt see the edges of those stripes if it only compared a pixel to its right neighbor. Thus the two comparisons for each pixel are necessary.This algorithm was tested on several source images of diff
26、erent types and it gives fairly good results. It is mainly limited in speed because of frequent memory access. The two square roots can be removed easily by squaring the comparison; however, the color extractions cannot be improved very easily. If we consider that the longest operations are the get
27、pixel function and put pixel functions, we obtain a polynomial complexity of 4*N*M, where N is the number of rows and M the number of columns. This is not reasonably fast enough to be computed in realtime. For a 30030032 image I get about 26 transforms per second on an Athlon XP 1600+. Quite slow in
28、deed.Here are the results of the algorithm on an example image:A few words about the results of this algorithm: Notice that the quality of the results depends on the sharpness of the source image. If the source image is very sharp edged, the result will reach perfection. However if you have a very b
29、lurry source you might want to make it pass through a sharpness filter first, which we will study later. Another remark, you can also compare each pixel with its second or third nearest neighbors on the right and on the bottom instead of the nearest neighbors. The edges will be thicker but also more
30、 exact depending on the source images sharpness. Finally we will see later on that there is another way to make edge detection with matrix convolution. Color extractionThe other immediate application of pixel comparison is color extraction.Instead of comparing each pixel with its neighbors, we are g
31、oing to compare it with a given color C1. This algorithm will try to detect all the objects in the image that are colored with C1. This was quite useful for robotics for example. It enables you to search on streaming images for a particular color. You can then make you robot go get a red ball for ex
32、ample. We will call the reference color, the one we are looking for in the image C0 = (R0,G0,B0). Once again, even if the square root can be easily removed it doesnt really improve the speed of the algorithm. What really slows down the whole loop is the NxM get pixel accesses to memory and put pixel
33、. This determines the complexity of this algorithm: 2xNxM, where N and M are respectively the numbers of rows and columns in the bitmap. The effective speed measured on my computer is about 40 transforms per second on a 300x300x32 source bitmap.3. JPEG image compression theory(一)JPEG compression is
34、divided into four steps to achieve:(1) Color mode conversion and sampling RGB color system is the most common ways that color. JPEG uses a YCbCr color system. Want to use JPEG compression method dealing with the basic full-color images, RGB color mode to first image data is converted to YCbCr color
35、model data. Y representative of brightness, Cb and Cr represents the hue, saturation. By the following calculation to be completed by data conversion. Y = 0.2990R +0.5870 G +0.1140 B Cb =- 0.1687R-0.3313G +0.5000 B +128 Cr = 0.5000R-0.4187G-0.0813B +128 of human eyes on the low-frequency data than h
36、igh-frequency data with higher The sensitivity, in fact, the human eye to changes in brightness than to color changes should be much more sensitive, ie Y component of the data is more important. Since the Cb and Cr components is relatively unimportant component of the data comparison, you can just t
37、ake part of the data to deal with. To increase the compression ratio. JPEG usually have two kinds of sampling methods: YUV411 and YUV422, they represent is the meaning of Y, Cb and Cr data sampling ratio of three components.(2) DCT transformation The full name is the DCT-discrete cosine transform (D
38、iscrete Cosine Transform), refers to a group of light intensity data into frequency data, in order that intensity changes of circumstances. If the modification of high-frequency data do, and then back to the original form of data, it is clear there are some differences with the original data, but th
39、e human eye is not easy to recognize. Compression, the original image data is divided into 8 * 8 matrix of data units. JPEG entire luminance and chrominance Cb matrix matrix, saturation Cr matrix as a basic unit called the MCU. Each MCU contains a matrix of no more than 10. For example, the ratio of
40、 rows and columns Jie Wei 4:2:2 sampling, each MCU will contain four luminance matrix, a matrix and a color saturation matrix. When the image data is divided into an 8 * 8 matrix, you must also be subtracted for each value of 128, and then a generation of formula into the DCT transform can be achiev
41、ed by DCT transform purposes. The image data value must be reduced by 128, because the formula accepted by the DCT-figure range is between -128 to +127. (3)Quantization Image data is converted to the frequency factor, you still need to accept a quantitative procedure to enter the coding phase. Quant
42、itative phase requires two 8 * 8 matrix of data, one is to deal specifically with the brightness of the frequency factor, the other is the frequency factor for the color will be the frequency coefficient divided by the value of quantization matrix to obtain the nearest whole number with the quotient
43、, that is completed to quantify. When the frequency coefficients after quantization, will be transformed into the frequency coefficients from the floating-point integer This facilitate the implementation of the final encoding. However, after quantitative phase, all the data to retain only the intege
44、r approximation, also once again lost some data content.(4) CodingHuffman encoding without patent issues, to become the most commonly used JPEG encoding, Huffman coding is usually carried out in a complete MCU. Coding, each of the DC value matrix data 63 AC value, will use a different Huffman code t
45、ables, while the brightness and chroma also require a different Huffman code tables, it needs a total of four code tables, in order to successfully complete the JPEG coding. DC Code DC is a color difference pulse code modulation using the difference coding method, which is in the same component to o
46、btain an image of each DC value and the difference between the previous DC value to encode. DC pulse code using the main reason for the difference is due to a continuous tone image, the difference mostly smaller than the original value of the number of bits needed to encode the difference will be mo
47、re than the original value of the number of bits needed to encode the less. For example, a margin of 5, and its binary representation of a value of 101, if the difference is -5, then the first changed to a positive integer 5, and then converted into its 1s complement binary number can be. The so-cal
48、led ones complement number, that is, if the value is 0 for each Bit, then changed to 1; Bit is 1, it becomes 0. Difference between the five should retain the median 3, the following table that lists the difference between the Bit to be retained and the difference between the number of content controls. In the margin of the margin front-end add some additional value Hoffman code, such as the brightness difference of 5 (101) of the median of three, then the Huffman code value should be 100, the tw