Hi. I've been trying out bayard and it's great so far. One thing that I've noticed now that the faceted search doesn't seem to be working. I'm using this docker image. It is tagged as v0.4.0 and has been pushed 2 months ago. Looking at CHANGES.md, it looks like faceted search has been implemented in v0.3.0 so I would assume it's already available in the docker container.
Perhaps my understanding of faceted search is wrong. Here's a minimal example:
data.jsonl (inspired from tantivy examples):
{"_id": "1", "name": "Cat", "category": ["/Felidae/Felinae/Felis"]}
{"_id": "2", "name": "Canada lynx", "category": ["/Felidae/Felinae/Lynx"]}
{"_id": "3", "name": "Cheetah", "category": ["/Felidae/Felinae/Acinonyx"]}
{"_id": "4", "name": "Tiger", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "5", "name": "Lion", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "6", "name": "Jaguar", "category": ["/Felidae/Pantherinae/Panthera"]}
{"_id": "7", "name": "Sunda clouded leopard", "category": ["/Felidae/Pantherinae/Neofelis"]}
{"_id": "8", "name": "Fossa", "category": ["/Eupleridae/Cryptoprocta"]}
schema.json:
[
{
"name": "_id",
"type": "text",
"options": {
"indexing": {
"record": "basic",
"tokenizer": "raw"
},
"stored": true
}
},
{
"name": "name",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "en_stem"
},
"stored": false
}
},
{
"name": "category",
"type": "hierarchical_facet"
}
]
Then, through the web api, I request the following:
curl -X GET 'http://localhost:8000/index/search?query=cat&from=0&limit=10&facet_field=category&facet_prefix=/Felidae/Felinae'
which results in
{
"count": 1,
"docs": [
{
"fields": {
"_id": [
"1"
],
"category": [
"/Felidae/Felinae/Felis"
]
},
"score": 2.016771
}
],
"facet": {
"category": {
"/Felidae/Felinae/Felis": 1
}
}
}
This is what I expect because I'm searching in the correct category. However, searching in a different category will yield the same document:
curl -X GET 'http://localhost:8000/index/search?query=cat&from=0&limit=10&facet_field=category&facet_prefix=/Eupleridae'
{
"count": 1,
"docs": [
{
"fields": {
"_id": [
"1"
],
"category": [
"/Felidae/Felinae/Felis"
]
},
"score": 2.016771
}
],
"facet": {
"category": {}
}
}
I would expect 0 documents to be returned, since no element has the name "cat" in the category "/Eupleridae".
I also noticed that "facet" is filled differently but I'm not sure how to interpret that.
This is just a minimal example. I've had a more data and I've queried for terms which exist in a category, but still other elements were returned. Am I misunderstanding faceted search, using bayard wrong, am I using an unreleased feature or is this indeed a bug?