The Neural Network model lets us go from images to vectors (embeddings).
The Nearest Neighbors model allows us to ground these vectors (input data + noise)
to images in our actual dataset.
A TL;DR oversimplification for what is going on in a Nearest Neighbors model:
index all the vectors {id13: vector1, id2: vecto2, ..., idN: vectorN}
take new input vector, compute cosine distance between it and every other indexed vector (this
is an oversimplification: they do clever things to make it crazy fast)
return indexes of the top K closest vectors
For this project, we will index the embedding vector for each product image. Then at runtime,
we will compute the masked and unmask image embedding vectors and query the Nearest Neighbors model
to get the top K similar product images.
The cool thing about embedding spaces (go read that module if you haven't yet) is that you can do
algebra and geometry in them which holds in the data (pixel) space. No one really knows for sure
why this works, but a good intuition is that the neural network is trying to figure out what patterns
in the data are important (shapes, objects, scenery, ect..) in order to preform well on some task,
and a numerically efficient way to do that is to route similar patterns in the data space
through similar neural pathways in the network. So if you chop off the network at some layer
you will get this clustering effect where similar inputs have similar vectors. As you get into
the deeper layers, more and more filters have been applied, and the clusters are less abstract.
For example, if we chop off the first layer we might find that all red images get a similar vector.
But if we chop off the last layer, we would find that images containing similar set of objects
(cars, animals, ect..) --regardless of pixel positions-- get clustered together.
So the hope is that if we mask regions of a product image in pixel space, we will get an embedding
that is far from the original image, but still in a relevant neighborhood. We can push this further
by interpolating (fancy word for walking in a direction defined by 2 end points) along a difference vector.
For example:
if we take a bunch of image embeddings of chairs and average them, then take a bunch of embeddings
for couches and average them, the difference between these two centroids (fancy word for
averaged vectors) will give us a direction. If we add that direction (scaled it down to something
reasonable first by multiplying by 0.1 to 0.9) to a the embedding of an image of a love-seet,
it will push us towards the embeddings for larger love seets and couches in the similar style.
"""
Author: Nick Knowles (nknowles@richrelevance.com)
Description: Simple script to grab InceptionV3 imagenet embeddings.
prereqs: have embeddings computation set up
"""importnumpyasnpimportpickleimportcloudpickleimportmathimporttimeitimportfunctoolsimportnmslibdefANN(embeddings_dict=None):"""
Args
embeddings_dict : {prod_id: embedding vector}
Returns
ANN : approximate nearest neighbors model
"""index_time_params={'M':30,'efConstruction':100,'post':0}prods_embedded=np.array(list(embeddings_dict.values()))index=nmslib.init(method='hnsw',space='cosinesimil')index.addDataPointBatch(prods_embedded)index.createIndex(index_time_params)returnindexdefget_neighbors(model,k,seed_embedding):ids,distances=model.knnQuery(seed_embedding,k=k)returnids,distancesif__name__=="__main__":# example seed
seed_id='V_126H754'# build an approximate nearest neighbors model object
embs_dict=# a dict of id -> embedding mappings
knn_model=ANN(embeddingss_dict=embs_dict)# get the recs (neighbors)
seed_emb=embs_dict[seed_id]ids,distance=get_neighbors(knn_model,k=20,seed_embedding=seed_emb)
Steps to run:
1. cd into your project folder and create a virtual environment