CenterNet is an anchorless object detection architecture. This structure has an important advantage in that it replaces the classical NMS (Non Maximum Suppression) at the post process, with a much more elegant algorithm, that is natural to the CNN flow. This mechanism enables a much faster inference. See Fig. 1.
In this post we’ll discuss the YOLO detection network and its versions 1, 2 and especially 3.
In 2016 Redmon, Divvala, Girschick and Farhadi revolutionized object detection with a paper titled: You Only Look Once: Unified, Real-Time Object Detection. In the paper they introduced a new approach to object detection — The feature extraction and object localization were unified into a single monolithic block. Furthermore — the localization and classification heads were also united. Their single-stage architecture, named YOLO (You Only Look Once) results in a very fast inference time. The frame rate for 448x448 pixel images was 45 fps…
Object detection in images is a common task in computer vision. Its use cases vary from missile guidance to automated production lines, console games and cleaning robots. Object detection algorithms in computer vision have been around long before their migration to deep learning:
In the 2001 paper Rapid Object Detection Using a Boosted Cascade of Simple Features by Viola and Jones, they present a method to slide rectangular hand-crafted filters over the image to extract features correlated with facial features. The extracted features are fed to a classifier which learns which features are more significant than others, but does not…
In this post we will explore the mechanism of neural network training, but I’ll do my best to avoid rigorous mathematical discussions and keep it intuitive.
Consider the following task: you receive an image, and want an algorithm that returns (predicts) the correct number of people in the image.
We start by assuming that there is, indeed, some mathematical function out there that relates the collection of all possible images, with the collection of integer values describing the number of people in each image. …
Perceptron is the most basic form of a neural network and therefor a good place to start. Perceptrons are a type of binary classifiers — given data representing some object as an input, a perceptron determines to which class the object belongs. It is called ‘binary’ because it only has two classes to choose from. These classes can be, for example ‘cat’ and ‘dog’, but they can also be, in the more general case, ‘cat’ and ‘not a cat’.
Let’s build an example that will remain with us throughout this journey:
Let’s say you’re a movie enthusiast and you want…
At the end of the first post I wrote that next we’d be learning a basic neural network, but later I realized that there should come another section, which discusses artificial intelligence from the mathematical aspect. So that is the subject of this small section, which will be followed by a post about perceptrons.
If you’ve heard about neural networks and are not sure what they are, if you’re interested in artificial intelligence and how it ‘learns by itself’ to perform complex tasks for autonomous driving, healthcare, product quality assurance etc., or if you are already in the field and want to learn more, then I hope this blog is for you!
A quick bio: