Commit graph

87 commits

Author SHA1 Message Date
Alexander Reelsen
c7ac9e7073
[DOCS] http -> https, remove outdated plugin docs (#60380)
Plugin discovery documentation contained information about installing
Elasticsearch 2.0 and installing an oracle JDK, both of which is no
longer valid.

While noticing that the instructions used cleartext HTTP to install
packages, this commit replaces HTTPs links instead of HTTP where possible.

In addition a few community links have been removed, as they do not seem
to exist anymore.
2020-07-31 15:58:38 -04:00
James Rodewig
441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
James Rodewig
80b674fb25
[DOCS] Reformat snippets to use two-space indents (#59973) 2020-07-21 12:24:26 -04:00
Shahzad
24e5da7851
Update regex file for es user agent node processor (#59697) 2020-07-17 16:54:34 +02:00
James Rodewig
2be9db01c8
[DOCS] Replace datatype with data type (#58972) 2020-07-07 13:52:10 -04:00
David Kyle
bf245e4c07
Make Inference processor field_map and inference_config optional (#58868)
Relaxes the requirement that the inference ingest processor must has a 
field_map and inference_config defined even if they are empty.
2020-07-03 08:36:57 +01:00
István Zoltán Szabó
d0042fb791
[DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 11:28:17 +02:00
Jake Landis
5088ab151a
Update hh to HH in date processor example (#58089) (#58142)
Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
2020-06-15 17:03:42 -05:00
bellengao
efc4c9a210
Add ignore_empty_value parameter in set ingest processor (#57030) 2020-06-15 07:26:57 -05:00
Jake Landis
f5910664b7
Ensure Joni warning are logged at debug (#57302)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 13:33:27 -05:00
Lisa Cawley
8b9293b3bf
[DOCS] Replace docdir attribute with es-repo-dir (#57489) 2020-06-01 15:55:05 -07:00
Adam Locke
d77388f919
[DOCS] Add links to flattened datatype (#56794)
* Changes for #52239.

* Incorporating review feedback from Julie T. Also single-sourcing nexted options in the Mapping page and referencing them in the Nested page.

* Moving tip after the introduction and clarifying limits.

* Update docs/reference/mapping.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/mapping/types/nested.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-05-19 13:40:26 -04:00
István Zoltán Szabó
ca2f98382f
[DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:02:14 +02:00
Benjamin Trent
c1afda4a23
[ML] adding prediction_field_type to inference config (#55128)
Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 08:32:48 -04:00
István Zoltán Szabó
a0662399c7
[DOCS] Makes PUT inference API docs collapsible (#54653)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-03 09:45:42 +02:00
Benjamin Trent
4e1ff31c3c
[ML] add new inference_config field to trained model config (#54421)
A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder. 

The inference processor can still override whatever is set as the default in the trained model config.
2020-04-02 10:34:17 -04:00
lcawl
2641a39fd5 [DOCS] Fixes shared attribute for feature importance 2020-04-01 14:46:38 -07:00
István Zoltán Szabó
a65e95e093
[DOCS] Adds feature importance mapping subsection to inference processor docs (#54190) 2020-03-26 09:22:12 +01:00
bellengao
8ffe5d1f94
Support array for all string ingest processors 2020-03-17 15:22:30 -05:00
Benjamin Trent
970f726c1f
[ML] renaming inference processor field field_mappings to new name field_map (#53433)
This renames the `inference` processor configuration field `field_mappings` to `field_map`. 

`field_mappings` is now deprecated.
2020-03-12 12:49:25 -04:00
Benjamin Trent
4e1f029b04
[ML][Inference] adds new default_field_map field to trained models (#53294)
Adds a new `default_field_map` field to trained model config objects. 

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 12:23:56 -04:00
David Pilato
e51b8a51aa
[DOS] Fix typo in CSV processor docs (#52649)
Corrects an example array in a snippet of the CSV processor docs.
2020-02-25 08:47:58 -05:00
Benjamin Trent
20f54272f0
[ML] Adds feature importance to option to inference processor (#52218)
This adds machine learning model feature importance calculations to the inference processor. 

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : { 
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). 

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. 

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 16:36:21 -05:00
Yang Wang
5c9f79534f
Expose more authentication info to ingest pipeline (#51305)
The changes add more granularity for identiying the data ingestion user.
The ingest pipeline can now be configure to record authentication realm and
type. It can also record API key name and ID when one is in use. 
This improves traceability when data are being ingested from multiple agents
and will become more relevant with the incoming support of required
pipelines (#46847)

Resolves: #49106
2020-02-10 13:56:07 +11:00
Przemko Robakowski
5560135542
Add empty_value parameter to CSV processor (#51567)
* Add empty_value parameter to CSV processor

This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.

* docs updated

* Fix compilation problem

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-05 22:36:00 +01:00
David Kyle
34743bcd6f
[ML] Remove stray field from inference docs (#51870)
model_info_field is not a valid option
2020-02-05 10:49:36 +00:00
Florian Kelbert
bd52041f92
[DOCS] Remove unneeded comma from CSV processor example (#51859) 2020-02-04 09:23:43 -05:00
István Zoltán Szabó
4e0e6e83e0
[DOCS] Fixes indentation in inference processor code snippet (#51252) 2020-01-21 16:21:17 +01:00
Martijn van Groningen
2b2935fd52
Add pipeline name to ingest metadata (#50467)
This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.

Example usage in pipeline:
PUT /_ingest/pipeline/2
{
    "processors": [
        {
            "set": {
                "field": "pipeline_name",
                "value": "{{_ingest.pipeline}}"
            }
        }
    ]
}

Closes #42106
2020-01-15 16:17:05 +01:00
Igor Motov
7f81467378
Geo: Switch generated GeoJson type names to camel case (#50285) (#50400)
Switches generated GeoJson type names to camel case
to conform to the standard.

Closes #49568
2019-12-20 04:47:42 -10:00
István Zoltán Szabó
b8cae37374
[DOCS] Adds inference processor documentation (#50204)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-19 12:19:44 +01:00
Igor Motov
a26e4d1e5e
Geo: Switch generated WKT to upper case (#50285)
Switches generated WKT to upper case to
conform to the standard recommendation.

Relates #49568
2019-12-18 07:28:56 -10:00
Przemko Robakowski
64e1a774fc
CSV ingest processor (#49509)
* CSV Processor for Ingest

This change adds new ingest processor that breaks line from CSV file into separate fields.
By default it conforms to RFC 4180 but can be tweaked.

Closes #49113
2019-12-11 14:52:04 +01:00
Przemko Robakowski
c57032f622
Allow list of IPs in geoip ingest processor (#49573)
* Allow list of IPs in geoip ingest processor

This change lets you use array of IPs in addition to string in geoip processor source field.
It will set array containing geoip data for each element in source, unless first_only parameter
option is enabled, then only first found will be returned.

Closes #46193
2019-12-06 21:57:06 +01:00
Alexander Reelsen
062f9f03bf
Docs: Fix & test more grok processor documentation (#49447)
The documentation contained a small error, as bytes and duration was not
properly converted to a number and thus remained a string.

The documentation is now also properly tested by providing a full blown
simulate pipeline example.
2019-12-03 11:47:27 +01:00
James Rodewig
37baa50815
[DOCS] Explicitly document enrich target_field includes match_field (#49407)
When the enrich processor appends enrich data to an incoming document,
it adds a `target_field` to contain the enrich data.

This `target_field` contains both the `match_field` AND `enrich_fields`
specified in the enrich policy.

Previously, this was reflected in the documented example but not
explicitly stated. This adds several explicit statements to the docs.
2019-12-02 09:12:21 -05:00
Martijn van Groningen
88aea2107d
Add templating support to pipeline processor. (#49030)
This commit adds templating support to the pipeline processor's `name` option.

Closes #39955
2019-11-27 13:45:11 +01:00
Martijn van Groningen
4013e814e8
Add templating support to enrich processor (#49093)
Adds support for templating to `field` and `target_field` options.
2019-11-27 07:52:42 +01:00
James Rodewig
4ccd3a2b3f
[DOCS] Correct required file ext for user agent ingest processor (#48688)
For the user agent ingest processor, custom regex files must end
with the `.yml` file extension.

This corrects the docs which said the `.yaml` extension was required.
2019-10-30 11:10:35 -04:00
Dan Hermann
fcc18dc19b
Add option to split processor for preserving trailing empty fields (#48664) 2019-10-30 07:23:47 -05:00
James Rodewig
25d3add88a
[DOCS] Remove duplicate links for ingest processor overview (#48394) 2019-10-23 10:54:53 -05:00
Alexander Reelsen
fd65eec64c update ingest-user-agent regexes.yml (#47807)
This new regexes are from:
154eba17f5/regexes.yaml
2019-10-18 16:14:44 +02:00
Martijn van Groningen
ddf3bc25d8
Change how max_matches affects target_field option. (#47982)
Prior to this change the `target_field` would always be a json array
field in the document being ingested. This to take into account that
multiple enrich documents could be inserted into the `target_field`.

However the default `max_matches` is `1`. Meaning that by default
only a single enrich document would be added to `target_field` json
array field.

This commit changes this; if `max_matches` is set to `1` then the single
document would be added as a json object to the `target_field` and
if it is configured to a higher value then the enrich documents will be
added as a json array (even if a single enrich document happens to be
enriched).
2019-10-14 21:04:47 +02:00
Martijn van Groningen
e06598ba56
Merge remote-tracking branch 'es/master' into enrich 2019-10-14 10:17:18 +02:00
Alan Woodward
566e1b7d33
Remove type field from DocWriteRequest and associated Response objects (#47671)
This commit removes the type field from index, update and delete requests, and their
associated responses.

Relates to #41059
2019-10-11 10:23:55 +01:00
James Rodewig
17eef81f83
[DOCS] Add docs for geo_match enrich policy type (#47745) 2019-10-09 08:39:11 -04:00
Martijn van Groningen
f676d9730d
Merge remote-tracking branch 'es/master' into enrich 2019-09-27 13:51:17 +02:00
Alan Woodward
c1f99e2d75
Remove _type from SearchHit (#46942)
This commit removes the `_type` field from all search hit responses.

Relates to #41059
2019-09-23 19:14:54 +01:00
Martijn van Groningen
afc16ba518
Merge remote-tracking branch 'es/master' into enrich 2019-09-23 09:34:53 +02:00
Alan Woodward
7c90801aff
Remove types from Get/MultiGet (#46587)
This commit removes types from the ShardGetService, and propagates this API change
up through the Transport and Rest actions for Get and MultiGet

Relates to #41059
2019-09-20 14:22:57 +01:00