How To Monitor Social Distancing Using Python And Object Detection

60 VIEWS

·

Given that COVID-19 is showing no signs of slowing down in many regions of the world, it’s more important than ever to maintain “social distancing” aka physical distancing between persons not of the same household in both indoor and outdoor settings. Several national health agencies including the Center for Disease Control (CDC)  have recognized that the virus can be spread if “an infected person coughs, sneezes, or talks, and droplets from their mouth or nose are launched into the air and land in the mouths or noses of people nearby.”

To minimize the likelihood of coming in contact with these droplets, it is recommended that any two persons should maintain a physical distance of 1.8 meters apart (approximately 6 feet). Different governing bodies have different rules for safe social distancing and different penalties for breaking them.

We don’t advocate putting in place surveillance to enforce social distancing. Rather, as an interesting exercise, we will use the fact that machine learning and object detection have come a long way in being able to recognize objects in an image. Let’s understand how Python can be used to monitor social distancing.

Crowded Beach

Python can be used to detect people’s faces in a photo or video loop, and then estimate their distance from each other. While the method we’ll use is not the most accurate, it’s cheap, fast and good enough to learn something about the efficacy of computer vision when it comes to social distancing. Also, this approach is simple enough to be extended to an android compatible solution, so it can be used in the field.

Social Distancing, Python, And Trigonometry

Let’s start by outlining the process to address our use case:

  1. Pre-Process images taken by a video camera to detect the contours of the components of the images (using Python)
  2. Classify each contour as a face or not (also using Python)
  3. Approximate the distance between faces detected by the camera (do you remember your trigonometry?)

To accomplish the Python parts, we’ll use the OpenCV library, which implements most of the state-of-the-art algorithms for computer vision. It’s written in C++ but has bindings for both Python and Java. It also has an Android SDK that could help extend the solution for mobile devices. Such a mobile application might be particularly useful for live, on demand assessments of social distancing using Python.

Steps 1 and 2 in our process are fairly straightforward because OpenCV provides methods to get results in a single line of code. The following code creates a function faces_dist, which takes care of detecting all the faces in an image:

def faces_dist(classifier, ref_width, ref_pix):
   ratio_px_cm = ref_width / ref_pix
   cap = cv2.VideoCapture(0)
   while True:       
       _, img = cap.read()
       gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
       # detect all the faces on the image
       faces = classifier.detectMultiScale( gray, 1.1, 4)
       annotate_faces( img, faces, ratio_px_cm)
 
       k = cv2.waitKey(30) & 0xff
       # use ESC to terminate
       if k==27:
           break
   # release the camera
   cap.release()

After starting the camera in video mode, we enter into an infinite loop to process each image streamed from the video. To simplify identification, the routine transforms the image to grayscale, and then, using the classifier passed as an argument, we get a list of tuples that contains the X, Y coordinates for each face detected along with the corresponding Width and Height which gives us a rectangle.

Next, we call the annotate_faces function, which is in charge of drawing the detected rectangles and calculating the euclidean distance between the objects detected:

def annotate_faces( img, faces, ratio_px_cm ):
   points = []
   for (x, y, w, h) in faces:       
       center = (x+(int(w/2)), y+(int(h/2)))
       cv2.circle( img, center, 2, (0,255,0),2)
       for p in points:
           ed = euclidean_dist( p, center ) * ratio_px_cm
           color = (0,255,0)
           if ed < MIN_DIST:
               color = (0,0,255)
           # draw a rectangle over each detected face
           cv2.rectangle( img, (x, y), (x+w, y+h), color, 2)           
           # put the distance as text over the face's rectangle
           cv2.putText( img, "%scm" % (ed),
               (x, h -10), cv2.FONT_HERSHEY_SIMPLEX,
               0.5, color, 2)
           # draw a line between the faces detected
           cv2.line( img, center, p, color, 5)
       points.append( center )
 
   cv2.imshow('img', img )

Object Detection Using Python

Now that we’re well on our way to solving the problem, let’s step back and review Python’s object detection capabilities in general, and human face detection in particular. We’re using a classifier to do human face detection. In the simplest sense, a classifier can be thought of as a function that chooses a category for a given object. In our case, the classifier is provided by the OpenCV library (of course, you could write your own classifier, but that’s a subject for another time). OpenCV has already been pre-trained against an extensive dataset of images featuring human faces. As a result, it can be very reliable in detecting areas of an image that correspond to human faces. But because real world use cases can differ, OpenCV provides two models for detection:

  • Local Binary Pattern (LBP) divides the image into small quadrants (3×3 pixels) and checks if the quadrants surrounding the center are darker or not. If they are, it assigns them to 1; if not 0.  Given this information, the algorithm checks a feature vector against the pre-trained ones and returns if an area is a face or not. This classifier is fast, but prone to error with higher false positives.
  • Haar Classifier is based on the features in adjacent rectangular sections of the image whose intensities are computed and compared. The Haar-like classifier is slower than LBP, but typically far more accurate.
Social distancing using Python
Image from youtube.com/watch?v=uv39mFl5OGA

Our use case for object detection is quite specific since it pertains to social distancing: we need to detect faces, which both classifiers are suited for. But we also need to build an area that represents the most prominent part of the face. That’s because in order to calculate the distance between faces, we first need to approximate the distance the faces are from the camera, which will be based on a comparison of the widths of the facial areas detected.

As a result, we have chosen the Haar Classifier since in our tests the rectangle detected for the face was better than that approximated by LBP.

Approximating Distances

Now that we can reliably detect the faces in each image, it’s time to tackle step 3: calculating the distance between the faces. That is, to approximate the distance between the centroid of each rectangle drawn. To accomplish that, we have to calculate the ratio between pixels and cm measured from a known distance for a known object reference. The reference_pixels function calculates this value which is used in the faces_dist function described earlier:

def reference_pixels(image_path, ref_distance, ref_width):
   # open reference image
   image = cv2.imread( image_path )
   edged = get_edged( image )
   # detect all contours over the gray image
   allContours = cv2.findContours( edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE )
   allContours = imutils.grab_contours( allContours )
   markerContour = get_marker( allContours )
   # use the marker width to calculate the focal length of the camera
   pixels = (cv2.minAreaRect( markerContour ))[1][0]
   return pixels

We can use the classic Euclidean distance formula between each centroid, and then transform it to measurement units (cm) using the ratio calculated:

D = square_root( (point1_x  + point2_x ) ^2 +  (point1_y  + point2_y ) ^2 )

With this simple formula our annotate_faces function adds the approximation to a line drawn from each centroid. If the approximation is lower than the minimal distance required the line will be red.

Object Detection Programmed For Social Distancing

The complete source code for this example is available in my Github repository. There, you’ll find the code in which we pass three arguments to our Python script:

  • The path of the reference image
  • The reference distance in centimeters
  • The reference width in centimeters

Given these parameters, our code will calculate the focal length of the camera, and then use detect_faces.py to determine the approximate distances between the faces the camera detects. To stop the infinite cycle, just press the ESC key. The GitHub repo also contains reference images and two datasets for OpenCV’s classifiers.

classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
#classifier = cv2.CascadeClassifier('lbpcascade_frontalface_improved.xml')
 
IMG_SRC = sys.argv[1]
REF_DISTANCE = float(sys.argv[2])
REF_WIDTH = float(sys.argv[3])
FL = largest_marker_focal_len( IMG_SRC, REF_DISTANCE, REF_WIDTH )
faces_dist(classifier, REF_WIDTH, REF_WIDTH, FL)

To try out the code, start by downloading the Social Distancing runtime environment, which contains a recent version of Python and all the packages you need to run my code, including OpenCV. 

NOTE: the simplest way to install the environment is to first install the ActiveState Platform’s command line interface (CLI), the State Tool. 

If you’re on Windows, you can use Powershell to install the State Tool:

IEX(New-Object Net.WebClient).downloadString('https://platform.activestate.com/dl/cli/install.ps1')

If you’re on Linux, run:

sh <(curl -q https://platform.activestate.com/dl/cli/install.sh)

Now run the following to automatically create a virtual environment, and then download and install the Social Distancing runtime environment into it:

state activate Pizza-Team/Social-Distancing

To learn more about working with the State Tool, refer to the documentation.

Conclusions

Of course, this is not a production-ready way to ensure social distancing and keep you safe from close talkers. You can improve this solution using more accurate approaches to detect faces (as neural networks created either with OpenCV or Tensorflow), or else using more than a single camera (similar to how kinect works). But for our purposes of object detection, we have a working solution with relatively minimal effort.


Nicolas Bohorquez is a Developer and Entrepreneur from Bogotá, Colombia, he has been involved with technology in several languages, teams, and projects in a variety of roles in Latin America and United States. Currently he is doing the Master in Data Sciente for Complex Economic Systems in Torino, Italy. Nicolas is a regular contributor at Fixate IO.


Discussion

Click on a tab to select how you'd like to leave your comment

Leave a Comment

Your email address will not be published. Required fields are marked *

%d bloggers like this: