API Reference
Product Quantization (PQ)
- class nanopq.PQ(M, Ks=256, metric='l2', verbose=True)
Pure python implementation of Product Quantization (PQ) [Jegou11].
For the indexing phase of database vectors, a D-dim input vector is divided into M D/M-dim sub-vectors. Each sub-vector is quantized into a small integer via Ks codewords. For the querying phase, given a new D-dim query vector, the distance beween the query and the database PQ-codes are efficiently approximated via Asymmetric Distance.
All vectors must be np.ndarray with np.float32
[Jegou11]Jegou et al., “Product Quantization for Nearest Neighbor Search”, IEEE TPAMI 2011
- Parameters:
M (int) – The number of sub-space
Ks (int) – The number of codewords for each subspace (typically 256, so that each sub-vector is quantized into 8 bits = 1 byte = uint8)
metric (str) – Type of metric used among vectors (either ‘l2’ or ‘dot’) Note that even for ‘dot’, kmeans and encoding are performed in the Euclidean space.
verbose (bool) – Verbose flag
- M
The number of sub-space
- Type:
int
- Ks
The number of codewords for each subspace
- Type:
int
- metric
Type of metric used among vectors
- Type:
str
- verbose
Verbose flag
- Type:
bool
- code_dtype
dtype of PQ-code. Either np.uint{8, 16, 32}
- Type:
object
- codewords
shape=(M, Ks, Ds) with dtype=np.float32. codewords[m][ks] means ks-th codeword (Ds-dim) for m-th subspace
- Type:
np.ndarray
- Ds
The dim of each sub-vector, i.e., Ds=D/M
- Type:
int
- fit(vecs, iter=20, seed=123, minit='points')
Given training vectors, run k-means for each sub-space and create codewords for each sub-space.
This function should be run once first of all.
- Parameters:
vecs (np.ndarray) – Training vectors with shape=(N, D) and dtype=np.float32.
iter (int) – The number of iteration for k-means
seed (int) – The seed for random process
minit (str) – The method for initialization of centroids for k-means (either ‘random’, ‘++’, ‘points’, ‘matrix’)
- Returns:
self
- Return type:
object
- encode(vecs)
Encode input vectors into PQ-codes.
- Parameters:
vecs (np.ndarray) – Input vectors with shape=(N, D) and dtype=np.float32.
- Returns:
PQ codes with shape=(N, M) and dtype=self.code_dtype
- Return type:
np.ndarray
- decode(codes)
Given PQ-codes, reconstruct original D-dimensional vectors approximately by fetching the codewords.
- Parameters:
codes (np.ndarray) – PQ-cdoes with shape=(N, M) and dtype=self.code_dtype. Each row is a PQ-code
- Returns:
Reconstructed vectors with shape=(N, D) and dtype=np.float32
- Return type:
np.ndarray
- dtable(query)
Compute a distance table for a query vector. The distances are computed by comparing each sub-vector of the query to the codewords for each sub-subspace. dtable[m][ks] contains the squared Euclidean distance between the m-th sub-vector of the query and the ks-th codeword for the m-th sub-space (self.codewords[m][ks]).
- Parameters:
query (np.ndarray) – Input vector with shape=(D, ) and dtype=np.float32
- Returns:
Distance table. which contains dtable with shape=(M, Ks) and dtype=np.float32
- Return type:
Distance Table
- class nanopq.DistanceTable(dtable, metric='l2')
Distance table from query to codewords. Given a query vector, a PQ/OPQ instance compute this DistanceTable class using
PQ.dtable()
orOPQ.dtable()
. The Asymmetric Distance from query to each database codes can be computed byDistanceTable.adist()
.- Parameters:
dtable (np.ndarray) – Distance table with shape=(M, Ks) and dtype=np.float32 computed by
PQ.dtable()
orOPQ.dtable()
metric (str) – metric type to calculate distance
- dtable
Distance table with shape=(M, Ks) and dtype=np.float32. Note that dtable[m][ks] contains the squared Euclidean distance between (1) m-th sub-vector of query and (2) ks-th codeword for m-th subspace.
- Type:
np.ndarray
- adist(codes)
Given PQ-codes, compute Asymmetric Distances between the query (self.dtable) and the PQ-codes.
- Parameters:
codes (np.ndarray) – PQ codes with shape=(N, M) and dtype=pq.code_dtype where pq is a pq instance that creates the codes
- Returns:
Asymmetric Distances with shape=(N, ) and dtype=np.float32
- Return type:
np.ndarray
Optimized Product Quantization (OPQ)
- class nanopq.OPQ(M, Ks=256, metric='l2', verbose=True)
Pure python implementation of Optimized Product Quantization (OPQ) [Ge14].
OPQ is a simple extension of PQ. The best rotation matrix R is prepared using training vectors. Each input vector is rotated via R, then quantized into PQ-codes in the same manner as the original PQ.
[Ge14]Ge et al., “Optimized Product Quantization”, IEEE TPAMI 2014
- Parameters:
M (int) – The number of sub-spaces
Ks (int) – The number of codewords for each subspace (typically 256, so that each sub-vector is quantized into 8 bits = 1 byte = uint8)
verbose (bool) – Verbose flag
- R
Rotation matrix with the shape=(D, D) and dtype=np.float32
- Type:
np.ndarray
- property M
The number of sub-space
- Type:
int
- property Ks
The number of codewords for each subspace
- Type:
int
- property verbose
Verbose flag
- Type:
bool
- property code_dtype
dtype of PQ-code. Either np.uint{8, 16, 32}
- Type:
object
- property codewords
shape=(M, Ks, Ds) with dtype=np.float32. codewords[m][ks] means ks-th codeword (Ds-dim) for m-th subspace
- Type:
np.ndarray
- property Ds
The dim of each sub-vector, i.e., Ds=D/M
- Type:
int
- eigenvalue_allocation(vecs)
Given training vectors, this function learns a rotation matrix. The rotation matrix is computed so as to minimize the distortion bound of PQ, assuming a multivariate Gaussian distribution.
This function is a translation from the original MATLAB implementation to that of python http://kaiminghe.com/cvpr13/index.html
- Parameters:
vecs – (np.ndarray): Training vectors with shape=(N, D) and dtype=np.float32.
- Returns:
(np.ndarray) rotation matrix of shape=(D, D) with dtype=np.float32.
- Return type:
R
- fit(vecs, parametric_init=False, pq_iter=20, rotation_iter=10, seed=123, minit='points')
Given training vectors, this function alternatively trains (a) codewords and (b) a rotation matrix. The procedure of training codewords is same as
PQ.fit()
. The rotation matrix is computed so as to minimize the quantization error given codewords (Orthogonal Procrustes problem)This function is a translation from the original MATLAB implementation to that of python http://kaiminghe.com/cvpr13/index.html
If you find the error message is messy, please turn off the verbose flag, then you can see the reduction of error for each iteration clearly
- Parameters:
vecs (np.ndarray) – Training vectors with shape=(N, D) and dtype=np.float32.
parametric_init (bool) – Whether to initialize rotation using parametric assumption.
pq_iter (int) – The number of iteration for k-means
rotation_iter (int) – The number of iteration for learning rotation
seed (int) – The seed for random process
minit (str) – The method for initialization of centroids for k-means (either ‘random’, ‘++’, ‘points’, ‘matrix’)
- Returns:
self
- Return type:
object
- rotate(vecs)
Rotate input vector(s) by the rotation matrix.`
- Parameters:
vecs (np.ndarray) – Input vector(s) with dtype=np.float32. The shape can be a single vector (D, ) or several vectors (N, D)
- Returns:
Rotated vectors with the same shape and dtype to the input vecs.
- Return type:
np.ndarray
- encode(vecs)
Rotate input vectors by
OPQ.rotate()
, then encode them viaPQ.encode()
.- Parameters:
vecs (np.ndarray) – Input vectors with shape=(N, D) and dtype=np.float32.
- Returns:
PQ codes with shape=(N, M) and dtype=self.code_dtype
- Return type:
np.ndarray
- decode(codes)
Given PQ-codes, reconstruct original D-dimensional vectors via
PQ.decode()
, and applying an inverse-rotation.- Parameters:
codes (np.ndarray) – PQ-cdoes with shape=(N, M) and dtype=self.code_dtype. Each row is a PQ-code
- Returns:
Reconstructed vectors with shape=(N, D) and dtype=np.float32
- Return type:
np.ndarray
- dtable(query)
Compute a distance table for a query vector. The query is first rotated by
OPQ.rotate()
, then DistanceTable is computed byPQ.dtable()
.- Parameters:
query (np.ndarray) – Input vector with shape=(D, ) and dtype=np.float32
- Returns:
Distance table. which contains dtable with shape=(M, Ks) and dtype=np.float32
- Return type:
Convert Functions to/from Faiss
- nanopq.nanopq_to_faiss(pq_nanopq)
Convert a
nanopq.PQ
instance to faiss.IndexPQ. To use this function, faiss module needs to be installed.- Parameters:
pq_nanopq (nanopq.PQ) – An input PQ instance.
- Returns:
A converted PQ instance, with the same codewords to the input.
- Return type:
faiss.IndexPQ
- nanopq.faiss_to_nanopq(pq_faiss)
Convert a faiss.IndexPQ or a faiss.IndexPreTransform instance to
nanopq.OPQ
. To use this function, faiss module needs to be installed.- Parameters:
pq_faiss (Union[faiss.IndexPQ, faiss.IndexPreTransform]) – An input PQ or OPQ instance.
- Returns:
Union[nanopq.PQ, nanopq.OPQ]: A converted PQ or OPQ instance, with the same codewords to the input.
np.ndarray: Stored PQ codes in the input IndexPQ, with the shape=(N, M). This will be empty if codes are not stored
- Return type:
tuple