DeepViFT - Image Embedding Component

Project Home Nearest Neighbors Learn Stuff

Overview

Embeddings are one of the most useful and interesting components of deep learning (and perhaps all of computer science imo). As a byproduct of training a neural network on some task, we force it to "learn" a bunch of math that transforms the input data into a vector. These intermediary vectors must map input data points that share similar target outputs to (roughly) similar points --and this leads to some desirable effects (eg; captures things like meaning, context, arbitrary features, number of things, ect..). You don't have to understand all the math, but it is important to know the high level of what is going on here.

The post below summarizes the concept a bit better;

Embedding means converting data to a feature representation where certain properties can be represented by notions of distance. Example, a model trained on speech signals for speaker identification, may allow you to convert a speech snippet to a vector of numbers, such that another snippet from the same speaker will have a small distance (e.g. Euclidean distance) from the original vector. Alternately, a different embedding function, might allow you to convert the speech signal on the basis of the word that is said in the signal. So you will get small Euclidean distance between the encoded representations of two speech signals if the same word if spoken in those snippets. Yet again, you might simply want to learn an embedding, that represents the "mood" of the speech signal e.g. "happy" vs "sad" vs "angry" etc. A small distance between encoded representations of two speech signals will then imply similar mood and vice versa.

Or for instance, word2vec embeddings "project" a word in a space in which Euclidean distances between these words represent semantic similarity (again embedding ~ gives you a vector of numbers for a given word). So if you take the word2vec representation of "Germany", subtract "Berlin" from it, and add the result to "France", you get a vector which is very close in Euclidean space to the embedding for "Paris"!

Similarly, in applications where you need to classify into hundreds of thousands or millions of classes, e.g. face recognition, one common way is to use "metric learning" techniques (often Siamese CNNs with so-called contrastive or triplet loss), which at test time allows you to use Nearest Neighbor techniques on the vector representation of faces! -Dr. Zia, Microsoft

In this project we will try using the geometric properties of image embeddings to navigate the catalog space more intelligently.

Technical Details

Obtaining Images

Web scraping…

Turning Images into Vectors

Here is a script to get image embeddings from a pre-trained neural network. Prerequisites:

a Bash terminal (might be able to get it working on Windows subsystem)
a GPU (optional but recommended)

"""
Author: Nick Knowles (nknowles@richrelevance.com)
Description: Simple script to grab InceptionV3 imagenet embeddings.

MAIN_DIR should be set to a folder of many image files, where the name of each file is
the product ID corresponding to the image.


MAIN_DIR structure:
-./
--images/
----product123.jpeg
----product234.jpeg
---- .... ect ...

"""


from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.applications.inception_v3 import preprocess_input
from keras.models import Model
import pickle
import numpy as np
import os


def save_embeddings(imgs, save_dir="./", id_to_idx=None, delimiter='\x01'):
    """ dump embeddings into text file where each row is a vector. We also save the
        mapping from product_id->row index, because product_ids are not stored in the
        embeddings.txt file (just the vectors).

    Parameters
    ----------
    imgs : numpy.ndarray [num_products, embedding_dim]
    id_to_idx : Dict[str, int]
        Corresponds to the embeddings stored in imgs.
        aka keys are externalproductid values are rows in imgs.
    directory : str
        Keys are external product ids, values are internal product ids.
    delimiter : str

    Returns
    -------
    Nothing

    """

    # Overwrites if directory existed previously
    if not os.path.exists(save_dir):
        os.mkdir(save_dir)


    img_emb_fn = os.path.join(save_dir, 'img_embeddings.txt')
    np.savetxt(img_emb_fn, imgs)
    with open(os.path.join(save_dir, 'indexes.txt'), 'w') as f:
        f.writelines('{}{}{}\n'.format(prod, delimiter, idx)
                     for prod, idx in id_to_idx.items())

def embed_images(model):
    """
    Loops over the image paths, loading them one by one and pushing it through
    a neural network. adding the embeddings to the list called 'images'.

    Returns:
    feats : (np.ndarray) - matrix of image embeddings. Each row corresponds to a product vector
    id2idx : (dict) - a dictionary mapping product ID -> index of that product's image embeddings
                      in 'images'.
    ids : (list of strings) - a list of product IDs. The idea is your image file names contain the
                              product ID, and then they get extracted in this function. Not used
                              currently, but might be useful to have later.

    """
    id2idx = dict()
    images = []
    ids = []

    for idx, img_path in enumerate(os.listdir(MAIN_DIR)):
        prod_id = img_path.split('/')[-1]
        try:
            img = image.load_img(MAIN_DIR + img_path, target_size=(299, 299))
            x = image.img_to_array(img)
            x = np.expand_dims(x, axis=0)
            x = preprocess_input(x)
            images.append(x)
            id2idx[prod_id] = idx
            ids.append(prod_id)

        except:
            print(prod_id)

    images = np.concatenate(images, axis=0) # list -> np.ndarray
    feats = model.predict(images) # model computes all embeddings at once

    return feats, id2idx, ids

if __name__ == "__main__":

    base_model = InceptionV3(weights='imagenet')
    model = Model(inputs=base_model.input, outputs=base_model.get_layer('avg_pool').output)

    feats, id2idx, ids = embed_images(model)

    # sanity check --make sure num rows == num products
    #print(feats.shape)
    #print(len(id2idx))

    save_embeddings(feats, id_to_idx=id2idx)

Steps to run:

cd into your project folder and create a virtual environment

$: python3.6 -m venv ./some_env_name
$: source ./some_env_name/bin/activate
(some_env_name) $: pip install -r requirements.txt

set up the image folder (see above)
set the MAIN_DIR variable in the script to your image folder
run the script

$: python embed_images.py