Commit graph

6 commits

Author SHA1 Message Date
Benjamin Trent
f478f849e3
Allow reading vectors where dim is in the file (#130138)
This allows configuration to have a `-1` dim to read files that have the
`dim` in the file.

Additionally, allows setting numQuerys to `0` to skip the search phase
easily.
2025-06-27 08:38:21 +10:00
Benjamin Trent
2407358fe0
Adding profiling option to checkVec task (#129502)
Adds simple profiling to checkVec.

```
DO_PROFILING=true ./gradlew :qa:vector:checkVec --args=/path/to/config.json
```
2025-06-17 05:14:03 +10:00
John Wagster
be703a034f
Switch IVF Writer to ES Logger (#129224)
update to use ES logger instead of infostream and fixing native access warnings
2025-06-11 17:36:47 -05:00
John Wagster
47d4b983af
IVF Hierarchical KMeans Flush & Merge (#128675)
added hierarchical kmeans as a clustering algorithm to better partitionin the space when running ivf on flush and merge
2025-06-10 15:19:27 -05:00
Benjamin Trent
57ef140d2f
Correct index path validation (#129144)
All we care about is if reindex is true or false. We shouldn't worry
about force merge. Because if reindex is true, we will create the
directory, if its false, we won't.
2025-06-09 23:52:16 +10:00
Benjamin Trent
155c0da00a
Vector test tools (#128934)
This adds some testing tools for verifying vector recall and latency
directly without having to spin up an entire ES node and running a rally
track.

Its pretty barebones and takes inspiration from lucene-util, but I
wanted access to our own formats and tooling to make our lives easier.

Here is an example config file. This will build the initial index, run
queries at num_candidates: 50, then again at num_candidates 100 (without
reindexing, and re-using the cached nearest neighbors).

```
[{
  "doc_vectors" : "path",
  "query_vectors" : "path",
  "num_docs" : 10000,
  "num_queries" : 10,
  "index_type" : "hnsw",
  "num_candidates" : 50,
  "k" : 10,
  "hnsw_m" : 16,
  "hnsw_ef_construction" : 200,
  "index_threads" : 4,
  "reindex" : true,
  "force_merge" : false,
  "vector_space" : "maximum_inner_product",
  "dimensions" : 768
},
{
"doc_vectors" : "path",
"query_vectors" : "path",
"num_docs" : 10000,
"num_queries" : 10,
"index_type" : "hnsw",
"num_candidates" : 100,
"k" : 10,
"hnsw_m" : 16,
"hnsw_ef_construction" : 200,
"vector_space" : "maximum_inner_product",
"dimensions" : 768
}
]
```

To execute:

```
./gradlew :qa:vector:checkVec --args="/Path/to/knn_tester_config.json"
```

Calling `./gradlew :qa:vector:checkVecHelp` gives some guidance on how
to use it, additionally providing a way to run it via java directly
(useful to bypass gradlew guff).
2025-06-07 02:07:32 +10:00