Introduced in: #88439
* [ML] add text_similarity nlp task documentation
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/ml-shared.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
This commit adds initial windowing support for text_classification tasks.
Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.
Default value is span: -1 indicates that no windowing should take place.
This commit adds support for MPNet based models.
MPNet models differ from BERT style models in that:
- Special tokens are different
- Input to the model doesn't require token positions.
To configure an MPNet tokenizer for your pytorch MPNet based model:
```
"tokenization": {
"mpnet": {...}
}
```
The options provided to `mpnet` are the same as the previously supported `bert` configuration.