opencv3

Introduction to Computer Vision with Python and OpenCV3

5659 VIEWS

· ·

From service robotics to self-driving cars and Apple’s face recognition technology, computer vision has contributed substantially to the many technological marvels that exist today. The plethora of use cases and jobs in this problem space make learning computer vision a very profitable skill for developers interested in AI and robotics.

However, the sheer amount of information and theoretical knowledge behind this science can make getting started quite daunting for beginners looking to build computer vision software. Fortunately, the OpenCV library exists, and provides numerous pre-implemented functions that make developing real-time computer vision software very simple.

In this article, we will cover the necessary environment setup, briefly introduce some theoretical concepts, and then go on to build a sample computer vision application in Python 3, using the OpenCV3 library.

Side Tip

There is a common confusion between image processing and computer vision — While these two are related, they are not the same. In simple terms, image processing is one of the many steps that occur in computer vision. Computer vision can be thought of as making inference or extracting some information from visual input, while image processing performs some operation or transformation to an image and outputs another image.

Power points:
· Computer Vision: inputs visual data and outputs an inference or some knowledge from visual input.
· Image processing: Input an image, receive a modified image back.

Getting Started (Installation and Environment Setup)

[This article assumes you already have Python installed.]

We need to install the OpenCV3 library into our Python modules. There are numerous ways to do this, some more stressful than others. I have personally spent painful hours installing OpenCV countless times until I discovered Conda/Miniconda. Conda is a Python package management system that makes it super simple to install multiple packages and set up Python environments for safe, sandboxed prototyping. I use the Miniconda variant because it is lightweight and only contains the essentials.

First, visit the Miniconda site and follow the instructions to download and install the Python 3 version of the software that corresponds to your OS.

Next, create a Python 3 environment to work in safely and install the NumPy and OpenCV3 library with all dependencies. Run the command below in the command line or terminal (depending on your OS) to do this:

conda create -n vision_env python=3 numpy opencv3 -c https://conda.anaconda.org/menpo
 


Output of command showing dependencies to install for NumPy and OpenCV3:

Syntax
conda create -n [env_name] python=[version number]  [package] -c [channel to search in for packages]

· Windows
Type activate [env_name] to use the created environment.

Linux / macOS
Type source activate [env_name] in your terminal to use the newly created environment.

Once we activate the environment let’s go ahead and install imutils using pip (this is a python package with functions for quick image manipulation). To do this, while in your activated environment run:

pip install imutils

Let’s build something

We are all set up now and can build our first application. To introduce you to the OpenCV3 library, we will build a simple application that detects faces. The application will start a video feed using the laptop’s webcam and identify all faces it detects with a box.

To detect faces, we will be using something called a Haar Cascade. This finds its origins in machine learning, and is basically a classifier that classifies objects based on certain features. For our application, we will use an already-trained Haar Cascade Classifier that was designed to detect the faces of humans. It is entirely possible to train your own Haar Cascade Classifier for any object of your choice. (I may upload a subsequent tutorial to show you how this is done.)

Open any text editor and create a new Python file called: face_detect.py

#import necessary packages
 
import cv2
import numpy as np
import imutils
 
#load harr cascade classifer xml file attached in tutorial
face_classifer = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
 
# create an object to hold the resource the video would be coming from
cap = cv2.VideoCapture(0)
 
The VideoCapture object of the library takes an argument which represents the index of the device you would be capturing video from, or the name of the video file you will be reading (if you want to process an existing video on your computer). In our case we input 0 because the built-in webcam is the first default video-capturing device on a laptop. If you have a second camera connected to your computer, you may use 1 (and so on).

To load the Haar Cascade classifier, copy the attached xml file into the same folder as your Python file or replace ‘haarcascade_frontalface_default.xml’ in the code with the absolute file path to the classifier.

  while (True):
     	#capture frame
     	_,frame = cap.read()
 
     	#resize frame to a width of 700 pixels
     	frame=imutils.resize(frame, width=700)
 
     	#operations on frame happens here
     	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

In the code block above, we repeatedly capture video frames using the read() function and store them in the frame variable. We also resize the frames and convert them to grayscale for faster processing, and store the frames in the ‘gray’ variable.

  faces = face_classifer.detectMultiScale(gray)
     	if faces != []:
              	for (x,y,w,h) in faces:
                       	cv2.rectangle(gray,(x,y),(x+w,y+h),(255,255,255),2)
     	
     	#print number of faces detected
     	print('{} faces detected'.format(len(faces)))
 
     	
     	#display frame in a window
     	cv2.imshow('Face Detection', gray)
     	
     	#allow for 1 millisecond delay each loop where program can be quit by pressing 'q'
     	if cv2.waitKey(1) & 0xFF == ord('q'):
              	break
 
#release device or file held for video capture and close all windows
cap.release()
cv2.destroyAllWindows()

The code block above detects the faces in our footage. The detectMultiScale() function returns a two-dimensional array containing the positions of all detected objects of interest, in the form [x, y, w, h] representing a rectangular boundary in the image where that object can be found.
x – the x-coordinate of the top left corner
y – the y-coordinate of the top left corner
w – the width of the rectangle
h – the height of the rectangle

If any faces are detected, we draw rectangles on the frame with the cv2.rectangle(a,b,c,d,e) function by supplying the following arguments using the information stored in the faces array:
a – frame to draw on
b – the coordinates of the upper left vertex of the rectangle we want to draw (x,y)
c – the coordinates of the opposite vertex from b, or the lower right vertex of the rectangle (x+width,y+height)
d – color of the rectangle in rgb(0,0,0) format
e – thickness of border line

Putting it all together
This is the final code:

#import necessary packages
 
import cv2
import numpy as np
import imutils
 
 
#load harr cascade classifer xml file attached in tutorial
face_classifer = cv2.CascadeClassifier('/Users/benedictquartey/Desktop/PythonTests/haarcascade_frontalface_default.xml')
 
# create an object to hold the resource the video would be coming from
cap = cv2.VideoCapture(0)
 
 
while (True):
     	#capture frame
     	_,frame = cap.read()
 
     	frame=imutils.resize(frame, width=700)
 
     	#operations on frame happens here
     	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
     	faces = face_classifer.detectMultiScale(gray)
 
 
     	if faces != []:
              	for (x,y,w,h) in faces:
                       	cv2.rectangle(gray,(x,y),(x+w,y+h),(255,255,255),2)
 
     	#print number of faces detected
     	print('{} faces detected'.format(len(faces)))
 
     	#show frame
     	cv2.imshow('Face Detection', gray)
     	
     	#allow for 1 millisecond delay each loop where program can be quit by pressing 'q'
     	if cv2.waitKey(1) & 0xFF == ord('q'):
              	break
 
#release device or file held for video capture and close all windows
cap.release()
cv2.destroyAllWindows()

Conclusion

We have learned how to set up a Python 3 environment and install the OpenCV3 library using Miniconda in this tutorial. We also wrote our first computer vision program, which detects faces, using the library. While this seems simplistic, we have gone through all the major parts of an OpenCV application:
· How to capture and read video
· Perform operations on video frames
· Display results in a window
· Release resources at the end of the program.

Moving forward, have a look at the tutorials page of the documentation for OpenCV https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_tutorials.html. It has tons of sample code to build on your understanding.


Benedict is an autodidact physics enthusiast, dreamer, inventor, artist, roboticist. He is on a constant journey of self improvement and knowledge.


Discussion

Leave a Comment

Your email address will not be published. Required fields are marked *

Menu
Skip to toolbar