Wednesday 25 September 2013

Tutorial: Getting started with Computer Vision Part 2: Coding, lets put everything into practice! (Visual Basic or C#)

So in the last tutorial we understood how a computer sees an image. Now let's ask a computer to do all this stuff.

The code here is in Visual Basic, you can find the C# version at the end.

Aim: 

To write a computer program that would grab an image from a file that contains a yellow (maybe green!, I am color blind btw) ball, and would draw a rectangle around the ball to indicate it found it.

The Method:

Let's say we have these images:




(Right click and save them, if you are following along this tutorial)

Our aim is to draw a box around the ball. But how can we isolate the ball from the clutter. How could numbers reveal the position of the ball? Simple, we use color. The ball has a greenish-yellow color (again, I am colorblind) which can be indicated by the color of the pixels which are yellow greenish. In RGB this is around, 134,106,32, which looks like this:


So lets start! If you are not using my images, then make sure you have found out the average color of the ball/object you are trying to use, else, stick with the values I found out.

Go ahead, start with a new project and add a picture box to it (assuming you named it pbDisplay). This picturebox will serve as a display for the images.

In the Form_Load event type (yeah type, don't copy!) out the following:


Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim bmImage As New Bitmap(Filename Goes Here!!)

        pbDisplay.Image = bmImage
End Sub

The bmImage as a "Bitmap" object. It is essentially a group of pixels, an image. The constructor of the bitmap object takes in a string, which refers to the file path of the image you want to load to the memory.

Once we have our bitmap, we simply show it in the picturebox.

Add the path to your filename and launch your program, you should have an image loaded onto your form. Congrats!

The Method: The real thing :D

Well now the easy part is done, we'll now try to detect the ball.

The basic algorithm is to iterate every pixel and determine weather that pixel is green or not.

To determine if a specific RGB values is green or not, we shall write a function as follows:


Dim BALL As Color = Color.FromArgb(134, 106, 32)
Const THRESH As Integer = 15
Function isPixelGreen(ByVal pixel As Color) As Boolean
        If (pixel.R <= BALL.R + THRESH And pixel.R >= BALL.R - THRESH) And (pixel.G <= BALL.G + THRESH And pixel.G >= BALL.G - THRESH) And (pixel.B <= BALL.B + THRESH And pixel.B >= BALL.B - THRESH) Then
            Return True
        Else
            Return False
        End If
End Function

BALL is a variable that is the average color of the ball. (Remember a color has the RGB components). As the actual pixels of the image of the ball will not be exactly the average, we keep a THRESH, which acts like a range.

If the color provided (pixel) is in the range, we return true else false.

Now all we need to do is iterate through every pixel and check if the pixel belongs to the ball or not. We can right a simple nested for loop under the declaration of the bmImage object:


 For x = 0 To bmImage.Width - 1
            For y = 0 To bmImage.Height - 1
                If isPixelGreen(bmImage.GetPixel(x, y)) Then
                    bmImage.SetPixel(x, y, Color.White)
                Else
                    bmImage.SetPixel(x, y, Color.Black)
                End If
      Next
Next

Here we iterate through all the x's and y's of the image and using the GetPixel method, we were able to extract the pixel color from the image, pass it through our function. If the function returned true, we color it white else black (using the SetPixel method).

Run the program and you should see a black and white image. Woah! The image wasn't right? No problem, try increasing the threshold to about 25. This will make the checking less biased. Is the image better? It should look something like this:


So we see the ball is white but there is noise that is white too! For a beginner tutorial this is fine, as we progress, we will be able to do this almost perfectly using advanced algorithms, but for now this is okay.

Drawing the Box!

Now we need to define a rectangle around the ball. For that we need a start position and an end position.

We will fetch that by modifying our for loop:


        Dim startPos As Point
        Dim endPos As Point
        For x = 0 To bmImage.Width - 1
            For y = 0 To bmImage.Height - 1
                If isPixelGreen(bmImage.GetPixel(x, y)) Then
                    If IsNothing(startPos) Then
                        startPos = New Point(x, y)
                    Else
                        endPos = New Point(x, y)
                    End If
                End If
            Next
        Next
        Dim rect As New Rectangle(startPos, New Size(endPos.X - startPos.X, endPos.Y - startPos.Y))

Here we define our start and end positions and set the start position to the first green pixel and the end as the last green pixel.

Then we define a rectangle using the start and end positions. Finally we need to draw a rectangle and this can be done by:


        Using g As Graphics = Graphics.FromImage(bmImage)
            g.DrawRectangle(Pens.Blue, rect)
        End Using

Click launch and play with the threshold and color, you should see something like this (threshold: 23)


Well, this isn't impressive: but its a start. It is indeed difficult for a computer to make conclusions out of images. (If you try the other images, the noise is so much that the rectangle is way off!!)

But don't be sad. I guess we learnt a lot and as we progress through this series, we will use more advanced algorithms to get amazing results.

Stay tuned for the next part: improving the results using blob detection.

Till then,
Happy Coding

Oh, here is the C# version :D


using Microsoft.VisualBasic;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Diagnostics;
public class Form1
{

 private void Form1_Load(object sender, EventArgs e)
 {
  Bitmap bmImage = new Bitmap("<filename>");

  bool firstPass = true;
  Point startPos = default(Point);
  Point endPos = default(Point);
  for (x = 0; x <= bmImage.Width - 1; x++) {
   for (y = 0; y <= bmImage.Height - 1; y++) {
    if (isPixelGreen(bmImage.GetPixel(x, y))) {
     if (firstPass) {
      firstPass = false;
      startPos = new Point(x, y);
     } else {
      endPos = new Point(x, y);
     }
    }
   }
  }
  Rectangle rect = new Rectangle(startPos, new Size(endPos.X - startPos.X, endPos.Y - startPos.Y));

  using (Graphics g = Graphics.FromImage(bmImage)) {
   g.DrawRectangle(Pens.Blue, rect);
  }

  pbDisplay.Image = bmImage;
 }
 Color BALL = Color.FromArgb(134, 106, 32);
 const int THRESH = 23;
 public bool isPixelGreen(Color pixel)
 {
  if ((pixel.R <= BALL.R + THRESH & pixel.R >= BALL.R - THRESH) & (pixel.G <= BALL.G + THRESH & pixel.G >= BALL.G - THRESH) & (pixel.B <= BALL.B + THRESH & pixel.B >= BALL.B - THRESH)) {
   return true;
  } else {
   return false;
  }
 }
 public Form1()
 {
  Load += Form1_Load;
 }
}

Tutorial: Getting started with Computer Vision Part 1: The basics, how could a computer see.

/!\ Warning: The following is meant for a person who knows programming and wants to start with computer vision without reading through the intimidating things!

Wouldn't it be cool if computers could actually see like humans? I mean to a human this question seems absurd but we must understand the this trivial task of seeing isn't really trivial for a computer.

Let us look at an example.

Eye
The above is a bad sketch of the eye. :D, but it pretty much sums up how we look. Its simple, light enters the cornea (the whitish part) into the pupil and then to the retina. The retina is where the magic happens. As the light hits an individual receptor on the retina, it sparks off an electrochemical reaction. This information about light gets collected and transmitted to the brain for further processing.

Now lets see how a computer would get this data.


This is a bad sketch (again!) of a webcam. Light enters through a series of lenses which hits the shutter. Then it hits the sensor, which does the same job as the retina. But unlike the retina, its not organic, its made up of silicon. :D. The shutter just acts like a gate, if it opens, light hits the sensor, else it doesn't.

So so so so, what does this magical sensor give us in terms of data? Numbers. Yeah. Numbers. Think of an image like a group of pixels. Each pixel has its separate color


Essentially, the pixels that you can see on the zoomed in version is what makes an image.

Well, what makes a pixel? How do you devise a system that could produce any color in the visible spectrum? What are the components of a color? Well, the answer is: most of the colors can be produced by mixing the colors, Red, Green and Blue. (RGB). Each color element of RGB occupies 1 byte of memory. i.e. 256 different values. So a (R=255, B=0,G=0) would produce a red color. Click Here to try making your own colors using RGB values.

So let's answer the question we asked earlier, how could a computer see? Well, to a computer an image looks something like this:


53,121,32129,32,78123,981,211
13,151,319,32,7812,91,1
63,21,3210,39,1183,63,255
9,32,78129,32,78123,981,211

And these are just 12 pixels! An actual image has thousands of pixels.

Well, that is the end of this tutorial, if you wanna start coding, check out Part 2!