API Reference

Product Quantization (PQ)

class nanopq.PQ(M, Ks=256, verbose=True)

Pure python implementation of Product Quantization (PQ) [Jegou11].

For the indexing phase of database vectors, a D-dim input vector is divided into M D/M-dim sub-vectors. Each sub-vector is quantized into a small integer via Ks codewords. For the querying phase, given a new D-dim query vector, the distance beween the query and the database PQ-codes are efficiently approximated via Asymmetric Distance.

All vectors must be np.ndarray with np.float32

[Jegou11]
  1. Jegou et al., “Product Quantization for Nearest Neighbor Search”, IEEE TPAMI 2011
Parameters:
  • M (int) – The number of sub-space
  • Ks (int) – The number of codewords for each subspace (typically 256, so that each sub-vector is quantized into 256 bits = 1 byte = uint8)
  • verbose (bool) – Verbose flag
M

The number of sub-space

Type:int
Ks

The number of codewords for each subspace

Type:int
verbose

Verbose flag

Type:bool
code_dtype

dtype of PQ-code. Either np.uint{8, 16, 32}

Type:object
codewords

shape=(M, Ks, Ds) with dtype=np.float32. codewords[m][ks] means ks-th codeword (Ds-dim) for m-th subspace

Type:np.ndarray
Ds

The dim of each sub-vector, i.e., Ds=D/M

Type:int
fit(vecs, iter=20, seed=123)

Given training vectors, run k-means for each sub-space and create codewords for each sub-space.

This function should be run once first of all.

Parameters:
  • vecs (np.ndarray) – Training vectors with shape=(N, D) and dtype=np.float32.
  • iter (int) – The number of iteration for k-means
  • seed (int) – The seed for random process
Returns:

self

Return type:

object

encode(vecs)

Encode input vectors into PQ-codes.

Parameters:vecs (np.ndarray) – Input vectors with shape=(N, D) and dtype=np.float32.
Returns:PQ codes with shape=(N, M) and dtype=self.code_dtype
Return type:np.ndarray
decode(codes)

Given PQ-codes, reconstruct original D-dimensional vectors approximately by fetching the codewords.

Parameters:codes (np.ndarray) – PQ-cdoes with shape=(N, M) and dtype=self.code_dtype. Each row is a PQ-code
Returns:Reconstructed vectors with shape=(N, D) and dtype=np.float32
Return type:np.ndarray
dtable(query)

Compute a distance table for a query vector. The distances are computed by comparing each sub-vector of the query to the codewords for each sub-subspace. dtable[m][ks] contains the squared Euclidean distance between the m-th sub-vector of the query and the ks-th codeword for the m-th sub-space (self.codewords[m][ks]).

Parameters:query (np.ndarray) – Input vector with shape=(D, ) and dtype=np.float32
Returns:Distance table. which contains dtable with shape=(M, Ks) and dtype=np.float32
Return type:nanopq.DistanceTable

Distance Table

class nanopq.DistanceTable(dtable)

Distance table from query to codeworkds. Given a query vector, a PQ/OPQ instance compute this DistanceTable class using PQ.dtable() or OPQ.dtable(). The Asymmetric Distance from query to each database codes can be computed by DistanceTable.adist().

Parameters:dtable (np.ndarray) – Distance table with shape=(M, Ks) and dtype=np.float32 computed by PQ.dtable() or OPQ.dtable()
dtable

Distance table with shape=(M, Ks) and dtype=np.float32. Note that dtable[m][ks] contains the squared Euclidean distance between (1) m-th sub-vector of query and (2) ks-th codeword for m-th subspace.

Type:np.ndarray
adist(codes)

Given PQ-codes, compute Asymmetric Distances between the query (self.dtable) and the PQ-codes.

Parameters:codes (np.ndarray) – PQ codes with shape=(N, M) and dtype=pq.code_dtype where pq is a pq instance that creates the codes
Returns:Asymmetric Distances with shape=(N, ) and dtype=np.float32
Return type:np.ndarray

Optimized Product Quantization (OPQ)

class nanopq.OPQ(M, Ks=256, verbose=True)

Pure python implementation of Optimized Product Quantization (OPQ) [Ge14].

OPQ is a simple extension of PQ. The best rotation matrix R is prepared using training vectors. Each input vector is rotated via R, then quantized into PQ-codes in the same manner as the original PQ.

[Ge14]
  1. Ge et al., “Optimized Product Quantization”, IEEE TPAMI 2014
Parameters:
  • M (int) – The number of sub-spaces
  • Ks (int) – The number of codewords for each subspace (typically 256, so that each sub-vector is quantized into 256 bits = 1 byte = uint8)
  • verbose (bool) – Verbose flag
R

Rotation matrix with the shape=(D, D) and dtype=np.float32

Type:np.ndarray
M

The number of sub-space

Type:int
Ks

The number of codewords for each subspace

Type:int
verbose

Verbose flag

Type:bool
code_dtype

dtype of PQ-code. Either np.uint{8, 16, 32}

Type:object
codewords

shape=(M, Ks, Ds) with dtype=np.float32. codewords[m][ks] means ks-th codeword (Ds-dim) for m-th subspace

Type:np.ndarray
Ds

The dim of each sub-vector, i.e., Ds=D/M

Type:int
fit(vecs, pq_iter=20, rotation_iter=10, seed=123)

Given training vectors, this function alternatively trains (a) codewords and (b) a rotation matrix. The procedure of training codewords is same as PQ.fit(). The rotation matrix is computed so as to minimize the quantization error given codewords (Orthogonal Procrustes problem)

This function is a translation from the original MATLAB implementation to that of python http://kaiminghe.com/cvpr13/index.html

If you find the error message is messy, please turn off the verbose flag, then you can see the reduction of error for each iteration clearly

Parameters:
  • vecs – (np.ndarray): Training vectors with shape=(N, D) and dtype=np.float32.
  • pq_iter (int) – The number of iteration for k-means
  • rotation_iter (int) – The number of iteration for leraning rotation
  • seed (int) – The seed for random process
Returns:

self

Return type:

object

rotate(vecs)

Rotate input vector(s) by the rotation matrix.`

Parameters:vecs (np.ndarray) – Input vector(s) with dtype=np.float32. The shape can be a single vector (D, ) or several vectors (N, D)
Returns:Rotated vectors with the same shape and dtype to the input vecs.
Return type:np.ndarray
encode(vecs)

Rotate input vectors by OPQ.rotate(), then encode them via PQ.encode().

Parameters:vecs (np.ndarray) – Input vectors with shape=(N, D) and dtype=np.float32.
Returns:PQ codes with shape=(N, M) and dtype=self.code_dtype
Return type:np.ndarray
decode(codes)

Given PQ-codes, reconstruct original D-dimensional vectors via PQ.decode(), and applying an inverse-rotation.

Parameters:codes (np.ndarray) – PQ-cdoes with shape=(N, M) and dtype=self.code_dtype. Each row is a PQ-code
Returns:Reconstructed vectors with shape=(N, D) and dtype=np.float32
Return type:np.ndarray
dtable(query)

Compute a distance table for a query vector. The query is first rotated by OPQ.rotate(), then DistanceTable is computed by PQ.dtable().

Parameters:query (np.ndarray) – Input vector with shape=(D, ) and dtype=np.float32
Returns:Distance table. which contains dtable with shape=(M, Ks) and dtype=np.float32
Return type:nanopq.DistanceTable

Convert Functions to/from Faiss

nanopq.nanopq_to_faiss(pq_nanopq)

Convert a nanopq.PQ instance to faiss.IndexPQ. To use this function, faiss module needs to be installed.

Parameters:pq_nanopq (nanopq.PQ) – An input PQ instance.
Returns:A converted PQ instance, with the same codewords to the input.
Return type:faiss.IndexPQ
nanopq.faiss_to_nanopq(pq_faiss)

Convert a faiss.IndexPQ instance to nanopq.PQ. To use this function, faiss module needs to be installed.

Parameters:pq_faiss (faiss.IndexPQ) – An input PQ instance.
Returns:
  • nanopq.PQ: A converted PQ instance, with the same codewords to the input.
  • np.ndarray: Stored PQ codes in the input IndexPQ, with the shape=(N, M). This will be empty if codes are not stored
Return type:tuple