Wednesday 25 September 2013

Tutorial: Getting started with Computer Vision Part 1: The basics, how could a computer see.

/!\ Warning: The following is meant for a person who knows programming and wants to start with computer vision without reading through the intimidating things!

Wouldn't it be cool if computers could actually see like humans? I mean to a human this question seems absurd but we must understand the this trivial task of seeing isn't really trivial for a computer.

Let us look at an example.

Eye
The above is a bad sketch of the eye. :D, but it pretty much sums up how we look. Its simple, light enters the cornea (the whitish part) into the pupil and then to the retina. The retina is where the magic happens. As the light hits an individual receptor on the retina, it sparks off an electrochemical reaction. This information about light gets collected and transmitted to the brain for further processing.

Now lets see how a computer would get this data.


This is a bad sketch (again!) of a webcam. Light enters through a series of lenses which hits the shutter. Then it hits the sensor, which does the same job as the retina. But unlike the retina, its not organic, its made up of silicon. :D. The shutter just acts like a gate, if it opens, light hits the sensor, else it doesn't.

So so so so, what does this magical sensor give us in terms of data? Numbers. Yeah. Numbers. Think of an image like a group of pixels. Each pixel has its separate color


Essentially, the pixels that you can see on the zoomed in version is what makes an image.

Well, what makes a pixel? How do you devise a system that could produce any color in the visible spectrum? What are the components of a color? Well, the answer is: most of the colors can be produced by mixing the colors, Red, Green and Blue. (RGB). Each color element of RGB occupies 1 byte of memory. i.e. 256 different values. So a (R=255, B=0,G=0) would produce a red color. Click Here to try making your own colors using RGB values.

So let's answer the question we asked earlier, how could a computer see? Well, to a computer an image looks something like this:


53,121,32129,32,78123,981,211
13,151,319,32,7812,91,1
63,21,3210,39,1183,63,255
9,32,78129,32,78123,981,211

And these are just 12 pixels! An actual image has thousands of pixels.

Well, that is the end of this tutorial, if you wanna start coding, check out Part 2!

No comments:

Post a Comment