elasticsearch

69601 commits 507 branches 439 tags 2.4 GiB

Author	SHA1	Message	Date
Max Hniebergall	3a4113801c	[NLP] Support the different mask tokens used by NLP models for Fill Mask (#97453 ) Add mask_token field to fill_mask of _ml/trained_models. This change will enable users and Kibana to get the particular mask tokens needed for deployed models by adding a mask_token field to the GET _ml/trained_models API, as an enhancement to support kibana#159577.	2023-07-11 14:42:44 -04:00
István Zoltán Szabó	8d5b803bff	[DOCS] Adds API docs for bert_ja text embedding tokenizer option (#96873 )	2023-06-26 11:36:08 +02:00
Benjamin Trent	14ca8fee20	[ML] add support for xlm_roberta tokenized models (#94089 ) Many multi-lingual and newer models use a tokenization scheme similar to sentence-piece. This PR adds support for one of those tokenization schemes, XLMRoBERTa. The main changes are: - Support for xlm_roberta tokenization configuration - Adding `scores` to the vocabulary document stored, requiring that scores be the same size as the vocabulary - Adding a new flat text file to resources that is the spm char normalizer.	2023-06-13 08:40:55 -04:00
David Kyle	6de8469a51	[ML] Include model definition install status for Pytorch models (#95271 ) Adds a new include flag definition_status to the GET trained models API. When present the trained model configuration returned in the response will have the new boolean field fully_defined if the full model definition is exists.	2023-04-17 18:12:26 +01:00
David Kyle	7d90c519ef	[ML] Add embedding_size to text embedding config (#95176 )	2023-04-17 11:49:35 +01:00
Benjamin Trent	9ce59bb7a9	[ML] add text_similarity nlp task documentation (#88994 ) Introduced in: #88439 * [ML] add text_similarity nlp task documentation * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> * Update docs/reference/ml/ml-shared.asciidoc Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2022-08-02 12:17:14 -04:00
Benjamin Trent	258d2b71e2	[ML] add roberta/bart docs (#85001 ) adds roberta section to NLP tokenization documentation.	2022-03-17 12:14:57 -04:00
Benjamin Trent	45deac4c96	[ML] add windowing support for text_classification (#83989 ) This commit adds initial windowing support for text_classification tasks. Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating sub-sequences. Default value is span: -1 indicates that no windowing should take place.	2022-03-01 08:29:12 -05:00
Benjamin Trent	9dc8aea1cb	[ML] adds new mpnet tokenization for nlp models (#82234 ) This commit adds support for MPNet based models. MPNet models differ from BERT style models in that: - Special tokens are different - Input to the model doesn't require token positions. To configure an MPNet tokenizer for your pytorch MPNet based model: ``` "tokenization": { "mpnet": {...} } ``` The options provided to `mpnet` are the same as the previously supported `bert` configuration.	2022-01-05 12:56:47 -05:00
Lisa Cawley	429bdd9afc	[DOCS] Move trained model APIs out of dataframe analytics (#81315 )	2021-12-03 09:21:09 -08:00

Renamed from docs/reference/ml/df-analytics/apis/get-trained-models.asciidoc (Browse further)

10 commits