elasticsearch

mirror of https://github.com/elastic/elasticsearch.git synced 2025-04-25 07:37:19 -04:00

Author	SHA1	Message	Date
Larisa Motova	7e801e0410	[ES\|QL] Add a standard deviation function (#116531 ) Uses Welford's online algorithm, as well as the parallel version, to calculate standard deviation.	2024-11-22 12:33:46 -10:00
Nik Everett	893dfd3c9a	ESQL: Make WEIGHTED_AVG not preview (#117356 ) It's not PREVIEW.	2024-11-22 16:28:06 +00:00
Iván Cea Fontenla	bc69827e1e	ESQL: WEIGHTED_AVG aggregation tests and docs (#111449 )	2024-07-31 00:42:23 +10:00
Iván Cea Fontenla	735d80dffd	ESQL: Add COUNT and COUNT_DISTINCT aggregation tests (#111409 )	2024-07-30 03:07:15 +10:00
Iván Cea Fontenla	826d49448b	ESQL: Added Median and MedianAbsoluteDeviation aggregations tests and kibana docs (#111231 )	2024-07-26 22:11:01 +10:00
Iván Cea Fontenla	595d907f61	ESQL: SpatialCentroid aggregation tests and docs (#111236 )	2024-07-26 10:41:18 +02:00
Iván Cea Fontenla	101775b93d	Added Sum aggregation tests and docs (#110984 ) - Added SUM() agg tests (Which autogenerates docs) - Converted non-finite doubles to nulls in aggregator The complete set of tests depends on https://github.com/elastic/elasticsearch/issues/110437, as commented in code. After completion, the test can be uncommented and everything should work fine	2024-07-22 21:43:58 +10:00
Iván Cea Fontenla	0e68117935	Added Percentile aggregation tests and Kibana docs (#111050 ) - Added Percentile aggregation tests and autogen docs - Added a new "appendix" section to FunctionInfo. Existing Percentile docs had a final, long section with info, and we need this to leep it. We have an "detailedDescription" attribute already, but it's right after the description, and it would make it harder to read the important bits of the function (types, examples...). So I'm not reusing it.	2024-07-19 14:28:11 +02:00
Iván Cea Fontenla	5d3512fb33	ESQL: Fix Max doubles bug with negatives and add tests for Max and Min (#110586 ) `MAX()` currently doesn't work with doubles smaller than `Double.MIN_VALUE` (Note that `Double.MIN_VALUE` returns the smallest non-zero positive, not the smallest double). This PR adds tests for Max and Min, and fixes the bug (Detected by the tests). Also, as the tests now generate the docs, replaced the old docs with the generated ones, and updated the Max&Min examples.	2024-07-09 21:05:00 +10:00
Iván Cea Fontenla	38cd0b333e	ESQL: AVG aggregation tests and ignore complex surrogates (#110579 ) Some work around aggregation tests, with AVG as an example: - Added tests and autogenerated docs for AVG - As AVG uses "complex" surrogates (A combination of functions), we can't trivially execute them without a complete plan. As I'm not sure it's worth it for most aggregations, I'm skipping those cases for now, as to avoid blocking other aggs tests. The bad side effect of skipping those tests is that most tests in AvgTests are actually ignored (74 of 100)	2024-07-09 12:01:46 +02:00
Fang Xing	8abc8857f2	[ES\|QL] weighted_avg (#109993 ) * weighted_avg	2024-07-02 18:29:02 -04:00
Iván Cea Fontenla	c89ee3b648	ESQL: Renamed TopList to Top (#110347 ) Rename TopList aggregation to Top, after internal discussions	2024-07-02 03:52:24 +10:00
Iván Cea Fontenla	fc0313f429	ESQL: Add aggregations testing base and docs (#110042 ) - Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation. - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable - Added a `TopListTests` example - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_ - Adapted Kibana docs to use `type: "agg"` (@drewdaemon) The current tests are very basic: Consume a page, generate an output, all in Single aggregation mode (No intermediates, no grouping). More complex testing will be added in future PRs Initial PR of https://github.com/elastic/elasticsearch/issues/109917	2024-06-27 21:21:55 +10:00
Bogdan Pintea	a21242054b	ESQL: Document BUCKET as a grouping function (#107864 ) This adds the documentation for BUCKET as a grouping function and the addition of the "direct" invocation mode providing a span (in addition to the auto mode).	2024-04-25 12:38:12 -04:00
Craig Taverner	d915b964ba	Rename ST_CENTROID to ST_CENTROID_AGG (#107226 ) * Rename ST_CENTROID to ST_CENTROID_AGG In order to allow development of a scalar ST_CENTROID function. * Fix table alignment	2024-04-10 17:56:45 +02:00
Craig Taverner	2380492fac	ESQL: Support ST_CONTAINS and ST_WITHIN (#106503 ) * WIP Started adding ST_CONTAINS * Add generated evaluators * Reduced warnings and use correct evaluators * Refactored tests to remove duplicate code, and fixed Contains/multi-components * Gradle build disallows using getDeclaredField * Fixed cases where rectangles cross the dateline * Fixed meta function tests * Added ST_WITHIN to support inverting ST_CONTAINS If the ST_CONTAINS is called with the constant on the left, we either have to create a lot more Evaluators to cover that case, or we have to invert it to ST_WITHIN. This inversion was a much easier option. * Simplify inversion logic * Add comment on choice of surrogate approach * Add unit tests and missing fold() function * Simple code cleanup * Add integration tests for literals * Add more integration tests based on actual data * Generated documentation files * Add documentation * Fixed failing function count test * Add tests that push-to-source works for ST_CONTAINS and ST_WITHIN * Test more combinations of WITH/CONTAINS and literal on right and left This also verifies that the re-writing of CONTAINS to WITHIN or vice versa occurs when the literal is on the left. * test that physical planning also handles doc-values from STATS * Added more tests for WITHIN/CONTAINS together with CENTROID This should test the doc-values for points. * Add cartesian_point tests * Add cartesian_shape tests * Disable Lucene-push-down for CARTESIAN data This is a limitation in Lucene, which we could address as a performance optimization in a future PR, but since it probably requires Lucene changes, it cannot be done in this work. * Fix doc links * Added test data and tests for cartesian multi-polygons Testing INTERSECTS, CONTAINS and WITHIN with multi-polydon fields * Use required features for spatial points, shapes and centroid * 8.13.0 is not yet historical version This needs to be reverted as soon as 8.13.0 is released * Added st_intersects and st_contains_within 'features' * Code review updates * Re-enable lucene push-down * Added more required_features * Fix point contains non-point * Fix point contains point * Re-enable lucene push-down in tests too Forgot to change the physical planner unit tests after re-enabling lucene push-down * Generate automatic docs * Use generated examples docs * Generated examples use '-result' prefix (singular) * Mark spatial functions as preview/experimental	2024-04-02 10:31:00 +02:00
Nik Everett	fa00e6176f	ESQL: Values aggregation function (#106065 ) This creates the `VALUES` aggregation function which buffers all field values it receives and emits them as a multivalued field. It can use a significant amount of memory and will circuit break if it uses too much memory, but it's really useful for putting together self-join-like behavior. It sort of functions as a stop-gap measure until we have more self-join style things. In the future we'll have spill-to-disk for aggregations and, likely, some kind of self-join command for aggregations at least so this will be able to grow beyond memory. But for now, memory it is. Example: ``` FROM employees \| EVAL first_letter = SUBSTRING(first_name, 0, 1) \| STATS first_name=VALUES(first_name) BY first_letter \| SORT first_letter ; first_name:keyword \| first_letter:keyword [Anneke, Alejandro, Anoosh, Amabile, Arumugam] \| A [Bezalel, Berni, Bojan, Basil, Brendon, Berhard, Breannda] \| B [Chirstian, Cristinel, Claudi, Charlene] \| C [Duangkaew, Divier, Domenick, Danel] \| D ``` I made this work for everything but `geo_point` and `cartesian_point` because I'm not 100% sure how to integrate with those. We can grab those in a follow up. Closes #103600	2024-03-21 12:52:04 -04:00
Craig Taverner	eb1c490264	ESQL: Support reading points from doc-values for STATS (#104218 ) * Add initial structure for ST_CENTROID * Revert "Revert stab at implementing forStats for doc-values vs source" This reverts commit `cfc4341bf4`. * Refined csv-spect tests with st_centroid * Spotless disagrees with intellij * Fixes after reverting fieldmapper code to test GeoPointFieldMapper * Get GeoPointFieldMapperTests working again after enabling doc-values reading * Simplify after rebase on main In particular, field-mappers that do not need to know about fields can have simpler calls. * Support local physical planning of forStats attributes for spatial aggregations * Get st_centroid aggregation working on doc-values We changed it to produce BytesRef, so we don't (yet) need any doc-values types. * Create both DocValues and SourceValues versions of st_centroid * Support physical planning of DocValues and SourceValues SpatialCentroid * Improve test for physical planning of DocValues in SpatialCentroid * Fixed show functions for st_centroid * More st_centroid tests with mv_expand To test single and multi-value centroids * Fix st_centroid from point literals The blocks contained BytesRef byte[] with multiple values, and we were ignoring the offsets when decoding, so decoding the first value over and over instead of decoding the subsequent values. * Teach CsvTests to handle spatial types alternative loading from doc-values Spatial GEO_POINT and CARTESIAN_POINT load from doc-values in some cases. If the physical planner has planned for this, we need the CsvTests to also take that into account, changing the type of the point field from BytesRefBlock to LongBlock. * Fixed failing NodeSubclassTests Required making the new constructor public and enabling Set as a valid parameter in the test framework. * More complex st_centroid tests and fixed bug with multiple aggs When there were multiple agregations in the same STATS, we were inadvertently re-ordering them, causing the wrong Blocks to be fed to the wrong aggregator in the coordinator node. * Update docs/changelog/104218.yaml * Fix automatically generated changelog file * Fixed failing test The nodes can now sometimes be Set, which is also a Collection, but not a List, and therefor never can be a subset of the children. * More tests covering more combinations including MV_EXPAND and grouping * Added cartesian st_centroid with grouping test We could not add MV_EXPAND tests since the cartesian data does not have multi-value columns, but the geo_point tests are sufficient for this since they share the same code. * Reduce flaky tests by sorting results * Reduce flaky tests by sorting results * Added tests for stats on stats to ensure planner coped * Add unit tests to ensure doc-values in query planning complex cases * Some minor updates from code review * Fixes after rebase on main * Get correct error message on unsupported geo_shape for st_centroid * Refined point vs shape differences after merging main * Added basic docs * Delete docs/changelog/104218.yaml * Revert "Delete docs/changelog/104218.yaml" This reverts commit `4bc596a442`. * Fixed broken docs tag link * Simplify BlockReaderSupport in MapperTestCase from code review * Moved spatial aggregations into a sub-package * Added some more code review updates, including nested tests * Get nested functions working, if only from source values for now * Code review update * Code review update * Added second location column to airports for wider testing * Use second location in tests, including nulls Includes a test fix for loading and converting nulls to encoded longs. * Fixed bug supporting multi spatial aggregations in the local node The local physical planner only marked a single field for stats loading, but marked all spatial aggregations for stats loading, which load to only one aggregation getting the right data, while the rest would get the wrong data. * Renamed forStats to fieldExtractPreference for clarity Now the planner decides whether to load data from doc-values. To remove the confusion of preferDocValues==false in the non-spatial cases, we use an ENUM with the default value of NONE, to make it clear we're leaving the choice up to the field type in all non-spatial cases. * EsqlSpecIT was failing on very high precision centroids on different computers This was not reproducible on the development machine, but CI machines were sufficiently different to lead to very tiny precision changes over very large Kahan summations. We fixed this by reducing the need for precision checks in clustered integration tests. * Delete docs/changelog/104218.yaml * Revert "Delete docs/changelog/104218.yaml" This reverts commit `12c6980881`. * Fixed changelog entry	2024-01-23 16:04:45 +01:00
AlexB	931dcae41d	Add improvements to the ES\|QL docs (#101195 ) Content and structural improvements to the ES\|QL docs --------- Co-authored-by: Alexandros Batsakis <abatsakis@splunk.com> Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>	2023-10-23 07:45:42 -07:00
Abdon Pijpelink	8ac4ba751e	Restructure ES\|QL docs (#100806 ) * Break out 'Limitations' into separate page * Add REST API docs * Restructure commands, functions, and operators refs * Add placeholder for getting started guide * Group 'Syntax', 'Metafields', and 'MV fields' under 'Language' * Add placeholder for Kibana page * Add link from landing page * Apply uniform formatting to ACOS, CASE, and DATE_PARSE function refs * Reword default LIMIT * Add support for COUNT() Move 'Commands' and 'Functions and operators' to individual pages --------- Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2023-10-17 17:36:14 +02:00

20 commits