elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-06-30 10:23:41 -04:00

Author	SHA1	Message	Date
David Kyle	547a567c13	[DOCS][ML] Document the text_expansion task type (#119581 )	2025-01-09 15:44:01 +00:00
Lisa Cawley	ba8beecdb0	[DOCS] More links to new API site (#119377 )	2024-12-31 11:32:29 -08:00
István Zoltán Szabó	57955cb8d4	[DOCS] Adds DeBERTA v2 to the tokenizers list in API docs (#112752 ) Co-authored-by: Max Hniebergall <137079448+maxhniebergall@users.noreply.github.com>	2024-10-07 10:23:46 +02:00
István Zoltán Szabó	e54f46e4eb	[DOCS] Fixes indentation issue on PUT trained models docs page. (#112538 )	2024-09-05 10:46:41 +02:00
Liam Thompson	33a71e3289	[DOCS] Refactor book-scoped variables in `docs/reference/index.asciidoc` (#107413 ) * Remove `es-test-dir` book-scoped variable * Remove `plugins-examples-dir` book-scoped variable * Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables - In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed. - In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path - In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem * Replace `es-repo-dir` with `es-ref-dir` * Move `:include-xpack: true` to few files that use it, remove from index.asciidoc	2024-04-17 14:37:07 +02:00
David Kyle	50dcfdc726	[ML] Document wait_for_completion parameter to PUT trained models (#106769 )	2024-03-27 16:55:06 +00:00
István Zoltán Szabó	e48b549588	[DOCS] Fixes asciidoc syntax in PUT trained models API docs. (#104741 )	2024-01-25 14:22:17 +01:00
David Kyle	330e8b99bf	[ML] Add prefix strings option to trained models (#102089 ) Certain NLP models such as multilingual-e5-large require a prefix string to be applied to the input text. For asymmetric tasks such as information retrieval the prefix can be different when ingesting the data and when searching it. For example text embedding model can have a one prefix applied when the model is evaluated as part of an knn search and a different prefix when ingesting documents.	2023-11-14 13:02:02 +00:00
István Zoltán Szabó	481ebd2e21	[DOCS] Improves readability of PUT trained models API docs page (#101880 ) * [DOCS] Improves readability of PUT trained models API docs page. * [DOCS] Fixes URLs.	2023-11-08 17:57:57 +01:00
Max Hniebergall	7c21ce3f1b	Platform specific models (#99584 ) * Added platform architecture field to TrainedModelMetadata and users of TrainedModelMetadata * Added TransportVersions guarding for TrainedModelMetadata * Prevent platform-specific models from being deployed on the wrong architecture * Added logic to only verify node architectures for models which are platform specific * Handle null platform architecture * Added logging for the detection of heterogeneous platform architectures among ML nodes and refactoring to support this * Added platform architecture field to TrainedModelConfig * Stop platform-speficic model when rebalance occurs and the cluster has a heterogeneous architecture among ML nodes * Added logic to TransportPutTrainedModelAction to return a warning response header when the model is paltform-specific and cannot be depoloyed on the cluster at that time due to heterogenous architectures among ML nodes * Added MlPlatformArchitecturesUtilTests * Updated Create Trained Models API docs to describe the new platform_architecture optional field. * Updated/incremented InferenceIndexConstants * Added special override to make models with linux-x86_64 in the model ID to be platform specific	2023-09-28 13:56:45 -04:00
István Zoltán Szabó	8d5b803bff	[DOCS] Adds API docs for bert_ja text embedding tokenizer option (#96873 )	2023-06-26 11:36:08 +02:00
Benjamin Trent	14ca8fee20	[ML] add support for xlm_roberta tokenized models (#94089 ) Many multi-lingual and newer models use a tokenization scheme similar to sentence-piece. This PR adds support for one of those tokenization schemes, XLMRoBERTa. The main changes are: - Support for xlm_roberta tokenization configuration - Adding `scores` to the vocabulary document stored, requiring that scores be the same size as the vocabulary - Adding a new flat text file to resources that is the spm char normalizer.	2023-06-13 08:40:55 -04:00
David Kyle	7d90c519ef	[ML] Add embedding_size to text embedding config (#95176 )	2023-04-17 11:49:35 +01:00
Benjamin Trent	9ce59bb7a9	[ML] add text_similarity nlp task documentation (#88994 ) Introduced in: #88439 * [ML] add text_similarity nlp task documentation * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/ml/ml-shared.asciidoc Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2022-08-02 12:17:14 -04:00
István Zoltán Szabó	cf68d0f13c	[DOCS] Updates infer trained model API docs with inference_config (#88500 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2022-07-13 17:47:05 +02:00
István Zoltán Szabó	f3e8904b2c	[DOCS] Adds settings of question_answering to inference_config of PUT and infer trained model APIs (#86895 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2022-05-19 11:04:14 +02:00
Benjamin Trent	258d2b71e2	[ML] add roberta/bart docs (#85001 ) adds roberta section to NLP tokenization documentation.	2022-03-17 12:14:57 -04:00
Benjamin Trent	45deac4c96	[ML] add windowing support for text_classification (#83989 ) This commit adds initial windowing support for text_classification tasks. Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating sub-sequences. Default value is span: -1 indicates that no windowing should take place.	2022-03-01 08:29:12 -05:00
Lisa Cawley	104efd4343	[DOCS] Minor edits to trained model APIs (#81549 )	2022-02-09 13:44:13 -08:00
Benjamin Trent	9dc8aea1cb	[ML] adds new mpnet tokenization for nlp models (#82234 ) This commit adds support for MPNet based models. MPNet models differ from BERT style models in that: - Special tokens are different - Input to the model doesn't require token positions. To configure an MPNet tokenizer for your pytorch MPNet based model: ``` "tokenization": { "mpnet": {...} } ``` The options provided to `mpnet` are the same as the previously supported `bert` configuration.	2022-01-05 12:56:47 -05:00
Lisa Cawley	429bdd9afc	[DOCS] Move trained model APIs out of dataframe analytics (#81315 )	2021-12-03 09:21:09 -08:00

21 commits