This article will help you demystify denoising using autoencoder in few minutes!!

Image for post
Image for post

Autoencoders aren’t too useful in practice, but they can be used to denoise images quite successfully just by training the network on noisy images. We can generate noisy images by adding Gaussian noise to the training images, then clipping the values to be between 0 and 1.

“Denoising auto-encoder forces the hidden layer to extract more robust features and restrict it from merely learning the identity. Autoencoder reconstructs the input from a corrupted version of it.”

A denoising auto-encoder does two things:

You Only Look Once is a real-time object detection algorithm, that avoids spending too much time on generating region proposals.Instead of locating objects perfectly, it prioritises speed and recognition.

Architectures like faster R-CNN are accurate, but the model itself is quite complex, with multiple outputs that are each a potential source of error. Once trained they’re still not fast enough to run in real time.

Consider a self-driving car that sees this image of a street. It’s essential for a self-driving car to be able to detect the location of objects all around it, such as pedestrians cars, and traffic…

This article will simplify the Kalman Filter for you. Hopefully, you’ll learn and demystify all these cryptic things that you find in Wikipedia when you google Kalman filters.

So let’s get started!

To know Kalman Filter we need to get to the basics. In Kalman Filters, the distribution is given by what’s called a Gaussian.

What is a Gaussian though?

Gaussian is a continuous function over the space of locations and the area underneath sums up to 1.

Image for post
Image for post
Gaussian in graph

The Gaussian is defined by two parameters, the mean, often abbreviated with the Greek letter Mu, and the width of the Gaussian often called the variance(Sigma square). …

Facial key-points are relevant for a variety of tasks, such as face filters, emotion recognition, pose recognition, and so on. So if you’re onto these projects, keep reading!

In this project, facial key-points (also called facial landmarks) are the small magenta dots shown on each of the faces in the image below. In each training and test image, there is a single face and 68 key-points, with coordinates (x, y), for that face.These key-points mark important areas of the face: the eyes, corners of the mouth, the nose, etc.

Image for post
Image for post
Magenta dots showing key-points

Dataset used: We’ll be using YouTube Faces Dataset, which includes videos…

Classify beer bottles on the go!

I came across a beer label classifier competition and thought why not give ORB, SURF and SIFT a try to see which one of them would perform the best.

So here’s my attempt at classifying beer labels. I’d have to admit that scraping for the dataset has made me familiar with hundreds of beer companies that I never knew existed.

Image for post
Image for post
Gif Courtesy: Google

Okay, let’s jump right into coding!!


You’ll need “query” (test images) and “database” (training images). …

Installation & Verification in 10 minutes!

I know it is truly bothersome to install CUDA separately. It is really troublesome for a user who is not so familiar with Linux to set the path of CUDA.

Before proceeding, I suggest you back up your data because you never know when things go down south and may turn out different given the complex nature of the Linux graphics stack.

Install nvidia-driver-410 from the graphics-drivers PPA

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-driver-410

Now do a reboot to apply the changes.
Run “nvidia-smi” (without the quotes) which would give you the correct driver version — 410.xx like this:

Image for post
Image for post

Install CUDA 10.0 Using the Local.deb

Notice that installing…

Let’s learn by connecting theory to code!

Now as per the Deep Learning Book, An autoencoder is a neural network that is trained to aim to copy its input to its output. Internally, it has a hidden layer that describes a code used to represent the input. The network may be viewed as consisting of two parts: an encoder function “h=f(x)” and a decoder that produces a reconstruction “r=g(h)”.

Image for post
Image for post
Gif Courtesy: Google

Okay okay, I know what you’re thinking! Just another post with no proper explanation? No! That’s not how we’ll proceed here. Let’s take a breath & connect our theoretical knowledge to…

This article will simplify data-loading for you. Hopefully, you’ll learn how to easily make custom dataloader and would be able to implement in any type of dataset that comes your way.

Image for post
Image for post
Image: Google

Custom data loading startled me too when I first started my computer vision journey & now that I think about it, it seems a few missing details here & there can make a lot of difference in the understanding, which would obviously reflect in the code. So, I decided to curate a little post to summarize what all I’ve learned & if it helps you even in the tiniest way possible, I’ll consider this post a win.

So, let’s get into it then.

I’ll be making a multilabel classification in Pytorch. The dataset used in this is from


Using File Transfer on Linux | Towards AI

SCP (secure copy) command in Linux can be confusing, let me make this a bit easier for you

In this blog post, I’m going to show you how you can use the SCP command to copy your files or directories from your local machine to a remote server. So let’s say I have a remote server( shown below):

Image for post
Image for post

This one is my local machine,

because Attention Is All You Need, literally!

“One important property of human perception is that one does not tend to process a whole scene in its entirety at once. Instead, humans focus attention selectively on parts of the visual space to acquire information when and where it is needed and combine information from different fixations over time to build up an internal representation of the scene, guiding future eye movements and decision making” — Recurrent Models of Visual Attention,2014

Image for post
Image for post
Gif Courtesy: Google

In this post, I will show you how attention is implemented. The main focus would be on implementing attention in isolation…

Garima Nishad

A Machine Learning Research scholar who loves to moonlight as a blogger.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store