The advancements in computer vision are at a rapid rate and the last decade saw a super surge in many vision techniques from image recognition to self driving cars. These techniques provide state of the art solution for tasks like face detection, face recognition, object detection, image classification, image segmentation.
As these models cannot be used in their raw form, they need to be deployed on edge device or cloud, wrapped up with user interface. As every individual can not have an access to the cloud and having it is also an expensive affair these should be ported on to edge devices so that every individual can use them to ease their life.
In this blog i would summarise things on how a face recognition model is deployed on an edge device and how it is different from a model in cloud.
What are Edge Devices?
Model on Cloud vs model on Edge Device
Face recognition is the task of detecting faces of individuals. The development in this technology started in the early 20th century. There have been many advancements starting from finding Haar-Cascade features to Deep Face(a deep learning approach for face detection). Researchers are actively working on improving the accuracy of these models.
While training a model with higher accuracy is one of the challenges, putting these models on cloud and accessing them is another challenge i.e. you need a separate infrastructure to use and maintain your model.
Instead of maintaining the model on the cloud you can deploy the model on the devices that are connected to the camera. This requires model pruning and optimisation so that the model size and computations are reduced to suit the resources available on the edge device.
What are Edge devices?
‘Edge’ refers to having a compute unit close to data gathering. It is a parallel framework where data is processed as close as to the original data source. This architecture requires effective use of resources that may not be continuously connected to a cloud. Some of them are embedded devices, IoT devices, smartphones, tablets, laptops, and sensors.
Why Edge devices?
Imagine a case of a security camera where you need to continuously monitor people and alert the police if there is any malicious activity, the camera is sending a live stream continuously to the central servers. Now, the camera has to make a crucial decision based on the activity of the person. The consequences can be disastrous if the camera waits for the central servers to process the data and respond back to it. Although algorithms like YOLO have sped up the process of object detection the latency is at that part of the system when the camera has to send terabytes to the central server and then receive the response and then act! Hence, we need the basic processing like when to take action, to be done in the device itself.
The goal of Edge Computing is to minimise the latency and cost incorporated for sending the data to the cloud
The following are the advantages of using an edge device
Privacy: Avoid sending all raw data to be stored and processed on cloud servers.
Real-time responsiveness: Reaction time can be a critical factor.
Reliability: The system is capable of working even when disconnected to cloud servers.
Model deployed on cloud
The camera first interacts with the CPU and the CPU sends the data to the cloud. The images are then fed to the CNN model and the results are sent back to the processing unit.
Model deployed on the Edge Device
Face Recognition on an edge device
Face recognition (FR) is a technique used for verification or identification of a person’s identity by analysing and relating patterns based on the person’s facial features.
The user first registers his face in the database and when he reappears it will detect and recognise him.
The FR system consists of three phases:
Face detection is where you localise faces in an input image or video.Here you detect the face along with the facial key points.
Once the face is detected along with the facial key points these key-points are used to align the face so that the perspective distortion in the image can be handled.
Perspective transformation applied image is fed as the input to the face recognition model and matches it with the images in the database.
How matching is done?
A 512-d vector is generated for each face during the time of registration. This is a unique vector for each registration and stored in the database. When a person appears in the frame a 512-d vector is generated and this vector is compared with the vectors in the database. Here the comparison is the cosine distance between two vectors.
How is the 512-d vector generated?
The model is trained as a classifier i.e each person as a single class. Once training is done the last FC layer is cropped to generate a 512-d vector in the forward pass.
Optimising Models for edge devices
Optimisation is one of the key aspects while you are porting your model on the edge device because you will be having resource constraints. Like on the server you will be having an RTX 2080 where on NVIDIA's Jetson Nano you have 128-core NVIDIA Maxwell™ architecture-based GPU. Porting these models with a minimal amount of drop in accuracy is a challenge.
The main optimisation Techniques are:
Choosing a backbone network with a minimal number of parameters which can give you a better accuracy.
Weights pruning, or model pruning, is a set of methods to increase the sparsity (amount of zero-valued elements in a tensor) of a network's weights removing the filters which are least effective to your network so that you decrease the size of the network.
Quantising the weights to INT8 from FP16 or FP32 so that you will get a good latency to your network.
There are other techniques that can be used to optimise your network such as decreasing the spatial dimension of your network or introducing sparsity to your network.
Finding missing people
Track school attendance
Validate Identity at ATM's