API Reference¶
Product Quantization (PQ)¶
-
class
nanopq.
PQ
(M, Ks=256, metric='l2', verbose=True)¶ Pure python implementation of Product Quantization (PQ) [Jegou11].
For the indexing phase of database vectors, a D-dim input vector is divided into M D/M-dim sub-vectors. Each sub-vector is quantized into a small integer via Ks codewords. For the querying phase, given a new D-dim query vector, the distance beween the query and the database PQ-codes are efficiently approximated via Asymmetric Distance.
All vectors must be np.ndarray with np.float32
[Jegou11] - Jegou et al., “Product Quantization for Nearest Neighbor Search”, IEEE TPAMI 2011
Parameters: - M (int) – The number of sub-space
- Ks (int) – The number of codewords for each subspace (typically 256, so that each sub-vector is quantized into 8 bits = 1 byte = uint8)
- metric (str) – Type of metric used among vectors (either ‘l2’ or ‘dot’) Note that even for ‘dot’, kmeans and encoding are performed in the Euclidean space.
- verbose (bool) – Verbose flag
-
M
¶ The number of sub-space
Type: int
-
Ks
¶ The number of codewords for each subspace
Type: int
-
metric
¶ Type of metric used among vectors
Type: str
-
verbose
¶ Verbose flag
Type: bool
-
code_dtype
¶ dtype of PQ-code. Either np.uint{8, 16, 32}
Type: object
-
codewords
¶ shape=(M, Ks, Ds) with dtype=np.float32. codewords[m][ks] means ks-th codeword (Ds-dim) for m-th subspace
Type: np.ndarray
-
Ds
¶ The dim of each sub-vector, i.e., Ds=D/M
Type: int
-
fit
(vecs, iter=20, seed=123, minit='points')¶ Given training vectors, run k-means for each sub-space and create codewords for each sub-space.
This function should be run once first of all.
Parameters: - vecs (np.ndarray) – Training vectors with shape=(N, D) and dtype=np.float32.
- iter (int) – The number of iteration for k-means
- seed (int) – The seed for random process
- minit (str) – The method for initialization of centroids for k-means (either ‘random’, ‘++’, ‘points’, ‘matrix’)
Returns: self
Return type: object
-
encode
(vecs)¶ Encode input vectors into PQ-codes.
Parameters: vecs (np.ndarray) – Input vectors with shape=(N, D) and dtype=np.float32. Returns: PQ codes with shape=(N, M) and dtype=self.code_dtype Return type: np.ndarray
-
decode
(codes)¶ Given PQ-codes, reconstruct original D-dimensional vectors approximately by fetching the codewords.
Parameters: codes (np.ndarray) – PQ-cdoes with shape=(N, M) and dtype=self.code_dtype. Each row is a PQ-code Returns: Reconstructed vectors with shape=(N, D) and dtype=np.float32 Return type: np.ndarray
-
dtable
(query)¶ Compute a distance table for a query vector. The distances are computed by comparing each sub-vector of the query to the codewords for each sub-subspace. dtable[m][ks] contains the squared Euclidean distance between the m-th sub-vector of the query and the ks-th codeword for the m-th sub-space (self.codewords[m][ks]).
Parameters: query (np.ndarray) – Input vector with shape=(D, ) and dtype=np.float32 Returns: Distance table. which contains dtable with shape=(M, Ks) and dtype=np.float32 Return type: nanopq.DistanceTable
Distance Table¶
-
class
nanopq.
DistanceTable
(dtable, metric='l2')¶ Distance table from query to codewords. Given a query vector, a PQ/OPQ instance compute this DistanceTable class using
PQ.dtable()
orOPQ.dtable()
. The Asymmetric Distance from query to each database codes can be computed byDistanceTable.adist()
.Parameters: - dtable (np.ndarray) – Distance table with shape=(M, Ks) and dtype=np.float32
computed by
PQ.dtable()
orOPQ.dtable()
- metric (str) – metric type to calculate distance
-
dtable
¶ Distance table with shape=(M, Ks) and dtype=np.float32. Note that dtable[m][ks] contains the squared Euclidean distance between (1) m-th sub-vector of query and (2) ks-th codeword for m-th subspace.
Type: np.ndarray
-
adist
(codes)¶ Given PQ-codes, compute Asymmetric Distances between the query (self.dtable) and the PQ-codes.
Parameters: codes (np.ndarray) – PQ codes with shape=(N, M) and dtype=pq.code_dtype where pq is a pq instance that creates the codes Returns: Asymmetric Distances with shape=(N, ) and dtype=np.float32 Return type: np.ndarray
- dtable (np.ndarray) – Distance table with shape=(M, Ks) and dtype=np.float32
computed by
Optimized Product Quantization (OPQ)¶
-
class
nanopq.
OPQ
(M, Ks=256, metric='l2', verbose=True)¶ Pure python implementation of Optimized Product Quantization (OPQ) [Ge14].
OPQ is a simple extension of PQ. The best rotation matrix R is prepared using training vectors. Each input vector is rotated via R, then quantized into PQ-codes in the same manner as the original PQ.
[Ge14] - Ge et al., “Optimized Product Quantization”, IEEE TPAMI 2014
Parameters: - M (int) – The number of sub-spaces
- Ks (int) – The number of codewords for each subspace (typically 256, so that each sub-vector is quantized into 8 bits = 1 byte = uint8)
- verbose (bool) – Verbose flag
-
R
¶ Rotation matrix with the shape=(D, D) and dtype=np.float32
Type: np.ndarray
-
M
¶ The number of sub-space
Type: int
-
Ks
¶ The number of codewords for each subspace
Type: int
-
verbose
¶ Verbose flag
Type: bool
-
code_dtype
¶ dtype of PQ-code. Either np.uint{8, 16, 32}
Type: object
-
codewords
¶ shape=(M, Ks, Ds) with dtype=np.float32. codewords[m][ks] means ks-th codeword (Ds-dim) for m-th subspace
Type: np.ndarray
-
Ds
¶ The dim of each sub-vector, i.e., Ds=D/M
Type: int
-
eigenvalue_allocation
(vecs)¶ Given training vectors, this function learns a rotation matrix. The rotation matrix is computed so as to minimize the distortion bound of PQ, assuming a multivariate Gaussian distribution.
This function is a translation from the original MATLAB implementation to that of python http://kaiminghe.com/cvpr13/index.html
Parameters: vecs – (np.ndarray): Training vectors with shape=(N, D) and dtype=np.float32. Returns: (np.ndarray) rotation matrix of shape=(D, D) with dtype=np.float32. Return type: R
-
fit
(vecs, parametric_init=False, pq_iter=20, rotation_iter=10, seed=123, minit='points')¶ Given training vectors, this function alternatively trains (a) codewords and (b) a rotation matrix. The procedure of training codewords is same as
PQ.fit()
. The rotation matrix is computed so as to minimize the quantization error given codewords (Orthogonal Procrustes problem)This function is a translation from the original MATLAB implementation to that of python http://kaiminghe.com/cvpr13/index.html
If you find the error message is messy, please turn off the verbose flag, then you can see the reduction of error for each iteration clearly
Parameters: - vecs (np.ndarray) – Training vectors with shape=(N, D) and dtype=np.float32.
- parametric_init (bool) – Whether to initialize rotation using parametric assumption.
- pq_iter (int) – The number of iteration for k-means
- rotation_iter (int) – The number of iteration for learning rotation
- seed (int) – The seed for random process
- minit (str) – The method for initialization of centroids for k-means (either ‘random’, ‘++’, ‘points’, ‘matrix’)
Returns: self
Return type: object
-
rotate
(vecs)¶ Rotate input vector(s) by the rotation matrix.`
Parameters: vecs (np.ndarray) – Input vector(s) with dtype=np.float32. The shape can be a single vector (D, ) or several vectors (N, D) Returns: Rotated vectors with the same shape and dtype to the input vecs. Return type: np.ndarray
-
encode
(vecs)¶ Rotate input vectors by
OPQ.rotate()
, then encode them viaPQ.encode()
.Parameters: vecs (np.ndarray) – Input vectors with shape=(N, D) and dtype=np.float32. Returns: PQ codes with shape=(N, M) and dtype=self.code_dtype Return type: np.ndarray
-
decode
(codes)¶ Given PQ-codes, reconstruct original D-dimensional vectors via
PQ.decode()
, and applying an inverse-rotation.Parameters: codes (np.ndarray) – PQ-cdoes with shape=(N, M) and dtype=self.code_dtype. Each row is a PQ-code Returns: Reconstructed vectors with shape=(N, D) and dtype=np.float32 Return type: np.ndarray
-
dtable
(query)¶ Compute a distance table for a query vector. The query is first rotated by
OPQ.rotate()
, then DistanceTable is computed byPQ.dtable()
.Parameters: query (np.ndarray) – Input vector with shape=(D, ) and dtype=np.float32 Returns: Distance table. which contains dtable with shape=(M, Ks) and dtype=np.float32 Return type: nanopq.DistanceTable
Convert Functions to/from Faiss¶
-
nanopq.
nanopq_to_faiss
(pq_nanopq)¶ Convert a
nanopq.PQ
instance to faiss.IndexPQ. To use this function, faiss module needs to be installed.Parameters: pq_nanopq (nanopq.PQ) – An input PQ instance. Returns: A converted PQ instance, with the same codewords to the input. Return type: faiss.IndexPQ
-
nanopq.
faiss_to_nanopq
(pq_faiss)¶ Convert a faiss.IndexPQ or a faiss.IndexPreTransform instance to
nanopq.OPQ
. To use this function, faiss module needs to be installed.Parameters: pq_faiss (Union[faiss.IndexPQ, faiss.IndexPreTransform]) – An input PQ or OPQ instance. Returns: - Union[nanopq.PQ, nanopq.OPQ]: A converted PQ or OPQ instance, with the same codewords to the input.
- np.ndarray: Stored PQ codes in the input IndexPQ, with the shape=(N, M). This will be empty if codes are not stored
Return type: tuple