Commit graph

93 commits

Author SHA1 Message Date
David Pilato
995e796eab [doc] Fix cross link with ICU plugin
Doc bug introduced with #15695
2015-12-30 12:07:33 +01:00
David Pilato
3076377fdb Remove ICU Plugin in reference guide
This documentation lives now in plugins documentation at https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html.

We don't need a copy in analysis reference guide.
2015-12-29 11:23:28 +01:00
socurites
485915bbe7 comma(,) was duplicated
deleted it.
2015-12-24 14:31:26 +01:00
socurites
25d23091e2 Edge NGram: "side" setting was depercated
Edge NGram: "side" setting was depercated
2015-12-24 14:26:24 +01:00
Jason Tedor
d9a24961c5 Fix minor issues in delimited payload token filter docs
This commit addresses a few minor issues in the delimited payload token
filter docs:
  - the provided example reversed the payloads associated with the
    tokens "the" and "fox"
  - two additional typos in the same sentence
    - "per default" -> "by default"
    - "default int to" -> "default into"
  - adds two serial commas
2015-12-16 13:00:20 -05:00
tomoya yokota
82d26c852a property name is not right
`ignore_script` is not right. `ignored_script' is right.

See org.elasticsearch.index.analysis.CJKBigramFilterFactory
2015-11-26 14:22:23 +09:00
Clinton Gormley
98028419a5 Merge pull request #14610 from yokotaso/patch-1
Update snowball document page.
2015-11-17 14:17:30 +01:00
Jason O'Donnell
42fb690a1c Fixing typo 2015-10-26 16:46:36 -04:00
Adrien Grand
d3aa3565db Deprecate index.analysis.analyzer.default_index in favor of index.analysis.analyzer.default.
Close #11861
2015-10-12 22:19:16 +02:00
Clinton Gormley
1f76f49003 Update compound-word-tokenfilter.asciidoc
Improved the docs for compound work token filter.

Closes #13670
Closes #13595
2015-09-21 11:22:14 +02:00
Robert Muir
f216d92d19 Upgrade to lucene 5.4-snapshot r1701068 2015-09-03 15:13:33 -04:00
Robert Muir
0d3e3f81fc Lithuanian analysis 2015-09-01 08:52:10 -04:00
xuzha
fb2be6d6a1 The name "position_offset_gap" is confusing because Lucene has three
similar sounding things:

* Analyzer#getPositionIncrementGap
* Analyzer#getOffsetGap
* IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS and
* FieldType#storeTermVectorOffsets

Rename position_offset_gap to position_increment_gap
closes #13056
2015-08-26 14:56:35 -07:00
Nik Everett
4b9664beeb Mapping: Default position_offset_gap to 100
This is much more fiddly than you'd expect it to be because of the way
position_offset_gap is applied in StringFieldMapper. Instead of setting
the default to 100 its simpler to make sure that all the analyzers default
to 100 and that StringFieldMapper doesn't override the default unless the
user specifies something different. Unless the index was created before
2.1, in which case the old default of 0 has to take.

Also postition_offset_gaps less than 0 aren't allowed at all.

New tests test that:
1. the new default doesn't match phrases across values with reasonably low
slop (5)
2. the new default doest match phrases across values with reasonably high
slop (50)
3. you can override the value and phrases work as you'd expect
4. if you leave the value undefined in the mapping and define it on a
custom analyzer the the value from the custom analyzer shines through

Closes #7268
2015-08-25 14:21:50 -04:00
Clinton Gormley
2b512f1f29 Docs: Use "js" instead of "json" and "sh" instead of "shell" for source highlighting 2015-07-14 18:14:09 +02:00
Britta Weber
eeeb29f900 spell correct and add single quotes 2015-05-26 11:41:19 +02:00
Britta Weber
37782c1745 analyzers: custom analyzers names and aliases must not start with _
closes #9596
2015-05-26 11:38:15 +02:00
Clinton Gormley
3a69b65e88 Docs: Fixed the backslash escaping on the pattern analyzer docs
Closes #11099
2015-05-15 18:40:16 +02:00
Ryan Ernst
ba68d354c4 Merge pull request #10934 from mattweber/custom_analyzer_pos_offset_gap
document and test custom analyzer position offset gap
2015-05-04 08:56:50 -07:00
Matt Weber
63c4a214db document and test custom analyzer position offset gap 2015-05-04 08:53:45 -07:00
Robert Muir
4b3672b7df Add migration note for hunspell dictionaries 2015-05-04 10:00:05 -04:00
Clinton Gormley
cf177c32d4 Docs: Fixed pattern-capture token filter example
Closes #10690
2015-04-25 19:27:55 +02:00
Benoit Delbosc
4a94e1f14b Docs: Warning about the conflict with the Standard Tokenizer
The examples given requires a specific Tokenizer to work.

Closes: 10645
2015-04-23 21:16:30 +02:00
Glen Smith
3d5fbfb997 Docs: Update pattern-replace-charfilter.asciidoc
Remove invalid trailing comma from json

Closes #9477
2015-01-29 20:24:08 +01:00
Lee Hinman
2f6527f491 [DOCS] Update documentation for max_token_length
In 1.4 the behavior is different due to
https://issues.apache.org/jira/browse/LUCENE-5897
2015-01-27 13:52:14 -07:00
David Haney
395960feef Docs: Updated standard token filter docs to indicate true behavior: doing nothing
Closes #9300
2015-01-15 21:33:29 +01:00
Tomoya Hirano
15d46988dc Fix typo in sample json
Fixes #9253
2015-01-12 15:58:16 +00:00
Michael McCandless
dfb6d6081c Core: upgrade to current Lucene 5.0.0 snapshot
Elasticsearch no longer unlocks the Lucene index on startup (this was
dangerous, and could possibly lead to corruption).

Added the new serbian_normalization TokenFilter from Lucene.

NoLockFactory is no longer supported (index.store.fs.fs_lock = none),
and if you have a typo in your fs_lock you'll now hit a StoreException
instead of silently using NoLockFactory.

Closes #8588
2014-11-24 05:08:42 -05:00
Robert Muir
610ce078fb Upgrade master to lucene 5.0 snapshot
This has a lot of improvements in lucene, particularly around memory usage, merging, safety, compressed bitsets, etc.

On the elasticsearch side, summary of the larger changes:

    API changes: postings API became a "pull" rather than "push", collector API became per-segment, etc.
    packaging changes: add lucene-backwards-codecs.jar as a dependency.
    improvements to boolean filtering: especially ensuring it will not be slow for SparseBitSet.
    use generic BitSet api in plumbing so that concrete bitset type is an implementation detail.
    use generic BitDocIdSetFilter api for dedicated bitset cache, so there is type safety.
    changes to support atomic commits
    implement Accountable.getChildResources (detailed memory usage API) for fielddata, etc
    change handling of IndexFormatTooOld/New, since they no longer extends CorruptIndexException

Closes #8347.

Squashed commit of the following:

commit d90d53f5f2
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Nov 5 21:35:28 2014 +0100

    Make default codec/postings/docvalues format constants

commit cb66c22c71
Merge: d4e2f6d ad4ff43
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Nov 5 11:41:13 2014 -0500

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit d4e2f6dfe7
Merge: 4e5445c 4111d93
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Nov 5 06:26:32 2014 -0500

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit 4e5445c775
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 16:19:19 2014 -0500

    FixedBitSet -> BitSet

commit 9887ea73e8
Merge: 1bf8894 fc84666
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 15:26:25 2014 -0500

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit 1bf8894430
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 15:22:51 2014 -0500

    remove nocommit

commit a9c2a2259f
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 13:48:43 2014 -0500

    turn jenkins red again

commit 067baaaa4d
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 13:18:21 2014 -0500

    unzip from stream

commit 82b6fba33d
Merge: b2214bb 6523cd9
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 13:10:59 2014 -0500

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit b2214bb093
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 13:09:53 2014 -0500

    go back to my URL until we can figure out what is up with jenkins

commit e7d6141722
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 10:52:54 2014 -0500

    try this jenkins

commit 337a3c7704
Author: Simon Willnauer <simonw@apache.org>
Date:   Tue Nov 4 16:17:49 2014 +0100

    Rename temp-files under lock to prevent metadata reads while renaming

commit 77d5ba80d0
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 10:07:11 2014 -0500

    continue to treat too-old/too-new as corruption for now

commit 98d0fd2f48
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Nov 4 09:24:21 2014 -0500

    fix last nocommit

commit 643fceed66
Author: Simon Willnauer <simonw@apache.org>
Date:   Tue Nov 4 14:46:17 2014 +0100

    remove NoSuchDirectoryException

commit 2e43c4feba
Merge: 93826e4 8163107
Author: Simon Willnauer <simonw@apache.org>
Date:   Tue Nov 4 14:38:00 2014 +0100

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit 93826e4d56
Merge: 7f10129 44e24d3
Author: Simon Willnauer <simonw@apache.org>
Date:   Tue Nov 4 12:54:27 2014 +0100

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

    Conflicts:
    	src/main/java/org/elasticsearch/index/store/DistributorDirectory.java
    	src/main/java/org/elasticsearch/index/store/Store.java
    	src/main/java/org/elasticsearch/indices/recovery/RecoveryStatus.java
    	src/test/java/org/elasticsearch/index/store/DistributorDirectoryTest.java
    	src/test/java/org/elasticsearch/index/store/StoreTest.java
    	src/test/java/org/elasticsearch/indices/recovery/RecoveryStatusTests.java

commit 7f10129364
Author: Adrien Grand <jpountz@gmail.com>
Date:   Tue Nov 4 11:32:24 2014 +0100

    Fix TopHitsAggregator to not ignore the top-level/leaf collector split.

commit 042fadc860
Author: Adrien Grand <jpountz@gmail.com>
Date:   Tue Nov 4 11:31:20 2014 +0100

    Remove MatchDocIdSet in favor of DocValuesDocIdSet.

commit 7d877581ff
Author: Adrien Grand <jpountz@gmail.com>
Date:   Tue Nov 4 11:10:08 2014 +0100

    Make the and filter use the cost API.

    Lucene 5 ensured that cost() can safely be used, and this will have the benefit
    that the order in which filters are specified is not important anymore (only
    for slow random-access filters in practice).

commit 78f1718aa2
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 23:55:17 2014 -0500

    fix previous eclipse import braindamage

commit 186c40e925
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 22:32:34 2014 -0500

    allow child queries to exhaust iterators again

commit b0b1271305
Author: Ryan Ernst <ryan@iernst.net>
Date:   Mon Nov 3 14:50:44 2014 -0800

    Fix nocommit for mapping output.  index_options will not be printed if
    the field is not indexed.

commit ba223eb85e
Author: Ryan Ernst <ryan@iernst.net>
Date:   Mon Nov 3 14:07:26 2014 -0800

    Remove no commit for chinese analyzer provider.  We should have a
    separate issue to address not using this provider on new indexes.

commit ca554b03c4
Author: Ryan Ernst <ryan@iernst.net>
Date:   Mon Nov 3 13:41:59 2014 -0800

    Fix stop tests

commit de67c4653e
Author: Ryan Ernst <ryan@iernst.net>
Date:   Mon Nov 3 12:51:17 2014 -0800

    Remove analysis nocommits, switching over to Lucene43*Filters for
    backcompat

commit 50cae9bec7
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 15:32:25 2014 -0500

    add ram accounting and TODO lazy-loading (its no worse than master, can be a followup improvement) for suggesters

commit 7a7f0122f1
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 15:11:26 2014 -0500

    bump lucene version

commit cd0cae5c35
Merge: 446bc09 3c72073
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 14:49:05 2014 -0500

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit 446bc09b4e
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 14:46:30 2014 -0500

    remove hack

commit a19d85a968
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 12:53:11 2014 -0500

    dont create exceptions with circular references on corruption (will open a PR for this)

commit 0beefb9e82
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 11:47:14 2014 -0500

    temporarily add craptastic detector for this horrible bug

commit e9f2d298bf
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 10:56:01 2014 -0500

    add nocommit

commit e97f1d50a9
Merge: c57a3c8 f1f50ac
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 10:12:12 2014 -0500

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit c57a3c8341
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 10:11:46 2014 -0500

    fix nocommit

commit dd0e77e4ec
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Nov 3 09:54:09 2014 -0500

    nocommit -> TODO, this is in much more places in the codebase, bigger issue

commit 3cc3bf56d7
Author: Ryan Ernst <ryan@iernst.net>
Date:   Sat Nov 1 23:59:17 2014 -0700

    Remove nocommit and awaitsfix for edge ngram filter test.

commit 89f1152451
Author: Ryan Ernst <ryan@iernst.net>
Date:   Sat Nov 1 23:57:44 2014 -0700

    Fix EdgeNGramTokenFilter logic for version <= 4.3, and fixed instanceof
    checks in corresponding tests to correctly check for reverse filter when
    applicable.

commit 112df869cd
Author: Robert Muir <rmuir@apache.org>
Date:   Sun Nov 2 00:08:30 2014 -0400

    execute geo disjoint query/filter as intersects

commit e5061273cc
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 22:58:59 2014 -0400

    remove chinese analyzer from docs

commit ea1af11b89
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 22:29:00 2014 -0400

    fix ram accounting bug

commit 53c0a42c6a
Merge: e3bcd3c 6011a18
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 22:16:29 2014 -0400

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit e3bcd3cc07
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 22:15:01 2014 -0400

    fix url-email back compat (thanks ryan)

commit 91d6b096a9
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 22:11:26 2014 -0400

    bump lucene version

commit d2bb9568df
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 20:33:07 2014 -0400

    remove nocommit

commit 1d049c471e
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 20:28:58 2014 -0400

    fix eclipse to group org/com imports together: without this, its madness

commit 09d8c1585e
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Nov 1 14:27:41 2014 -0400

    remove nocommit, if you dont liek it, print assembly and tell me how it can be better

commit 8a6a294313
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 20:01:55 2014 +0100

    Remove deprecated usage of DocIdSets.newDocIDSet.

commit 601bee6054
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 14:13:18 2014 -0400

    maybe one of these zillions of annotations will stop thread leaks

commit 9d3f69abc7
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 14:05:39 2014 -0400

    fix some analysis nocommits

commit 312e3a29c7
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 18:28:45 2014 +0100

    Remove XConstantScoreQuery/XFilteredQuery/ApplyAcceptedDocsFilter.

commit 5a0cb9f8e1
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 17:06:45 2014 +0100

    Fix misleading documentation of DocIdSets.toCacheable.

commit 8b4ef2b5b4
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 17:05:59 2014 +0100

    Fix CustomRandomAccessFilterStrategy to override the right method.

commit d7a9a407a6
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 16:21:35 2014 +0100

    Better handle the special case when there is a single SHOULD clause.

commit 648ad389f0
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 15:53:38 2014 +0100

    Cut over XBooleanFilter to BitDocIdSet.Builder.

    The idea is similar to what happened to Lucene's BooleanFilter.

    Yet XBooleanFilter is a bit more sophisticated and I had to slightly
    change the way it is implemented in order to make it work. The main difference
    with before is that slow filters are now applied lazily, so eg. if you have 3
    MUST clauses, two with a fast iterator and the third with a slow iterator, the
    previous implementation used to apply the fast iterators first and then only
    check the slow filter for bits which were set in the bit set. Now we are
    computing a bit set based on the fast must clauses and then basically returning
    a BitsFilteredDocIdSet.wrap(bitset, slowClause).

    Other than that, BooleanFilter still uses the bitset optimizations when or-ing
    and and-ind filters.

    Another improvement is that BooleanFilter is now aware of the cost API.

commit b2dad312b4
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 10:18:53 2014 -0400

    clear nocommit

commit 4851d2091e
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 15:15:16 2014 +0100

    cut over to RoaringDocIdSet

commit ca6aec24a9
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 14:57:30 2014 +0100

    make nocommit more explicit

commit d0742ee2cb
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 09:55:24 2014 -0400

    fix standardtokenizer nocommit

commit 7d6faccaff
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 14:54:08 2014 +0100

    fix compilation

commit a038a405c1
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 14:53:43 2014 +0100

    fix compilation

commit 30c9e307b1
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 14:52:35 2014 +0100

    fix compilation

commit e5139bc5a0
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 09:52:16 2014 -0400

    clear nocommit here

commit 85dd2cedf7
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 14:46:17 2014 +0100

    fix CompletionPostingsFormatTest

commit c0f3781f61
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 09:38:00 2014 -0400

    add tests for these analyzers

commit 51f9999b4a
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 14:10:26 2014 +0100

    remove nocommit - this is not an issue

commit fd1388fa03
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Fri Oct 31 14:07:01 2014 +0100

    Remove redundant null check

commit 3d6dd51b09
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Fri Oct 31 14:01:37 2014 +0100

    Removed the work around to prevent p/c error when invoking #iterator() twice, because the custom query filter wrapper now doesn't transform the result to a cache doc id set any more.

    I think the transforming to a cachable doc id set in CustomQueryWrappingFilter isn't needed at all, because we use the DocIdSet only once and because of that is just slowed things down.

commit 821832a537
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 13:54:33 2014 +0100

    one more nocommit

commit 77eb9ea4c4
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Fri Oct 31 13:52:29 2014 +0100

    Remove cast

commit a400573c03
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 13:49:24 2014 +0100

    fix stop filter

commit 51746087cf
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 13:21:36 2014 +0100

    fix changed semantics of FBS.nextSetBit to check for NO_MORE_DOCS

commit 8d0a4e2511
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 08:13:44 2014 -0400

    do the bogus cast differently

commit 46a5cc5732
Author: Simon Willnauer <simonw@apache.org>
Date:   Fri Oct 31 13:00:16 2014 +0100

    I hate it but P/C now passes

commit 580c0c2f82
Merge: a9d3c00 1645434
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 31 06:54:31 2014 -0400

    fix nocommit/classcast

commit a9d3c004d6
Author: Adrien Grand <jpountz@gmail.com>
Date:   Fri Oct 31 08:49:31 2014 +0100

    Update TODO.

commit aa75af0b40
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 19:18:25 2014 -0400

    clear obselete nocommits from lucene bump

commit d438534cf4
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 18:53:20 2014 -0400

    throw classcastexception when ES abuses regular filtercache for nested docs

commit 2c751f3a8f
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 18:31:34 2014 -0400

    bump lucene revision, fix tests

commit d6ef7f6304
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 22:37:58 2014 +0100

    fix merge problems

commit de9d361f88
Merge: 41f6aab f6b37a3
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 22:28:59 2014 +0100

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

    Conflicts:
    	pom.xml
    	src/main/java/org/elasticsearch/Version.java
    	src/main/java/org/elasticsearch/gateway/local/state/meta/MetaDataStateFormat.java

commit 41f6aab388
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 17:48:46 2014 +0100

    fix potiential NPE

commit c4428b12e1
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 17:38:46 2014 +0100

    don't advance iterator in a match(doc) method

commit 28ab948e99
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 17:34:58 2014 +0100

    don't advance iterator in a match(doc) method

commit eb0f33f663
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 16:55:54 2014 +0100

    fix GeoUtilsTest

commit 7f711fe3ea
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 16:43:16 2014 +0100

    Use a dedicated default index option if field type is not indexed by default

commit 78e3f37ab7
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 10:56:14 2014 -0400

    disable this test with AwaitsFix to reduce noise

commit 9a590f563c
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 09:38:49 2014 +0100

    fix lucene version

commit abe3ca1d8b
Author: Simon Willnauer <simonw@apache.org>
Date:   Thu Oct 30 09:35:05 2014 +0100

    fix AnalyzingCompletionLookupProvider to wrok with new codec API

commit 464293b245
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 00:26:00 2014 -0400

    don't try to write stuff to tests class directory

commit 031cc6c19f
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 00:12:36 2014 -0400

    AwaitsFix these known issues to reduce noise

commit 4600d51891
Author: Robert Muir <rmuir@apache.org>
Date:   Thu Oct 30 00:06:53 2014 -0400

    openbitset lives on

commit 8492bae056
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 23:42:54 2014 -0400

    fixes for filter tests

commit 31f24ce4ef
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 23:12:38 2014 -0400

    don't use fieldcache

commit 8480789942
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 23:04:29 2014 -0400

    ancient index no longer supported

commit 02e78dc7eb
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 23:37:02 2014 +0100

    fix more tests

commit ff746c6df2
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 23:08:19 2014 +0100

    fix all mapper

commit e4fb84b517
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 22:55:54 2014 +0100

    fix distributor tests and cut over to FileStore API

commit 20c850e2cf
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 22:42:18 2014 +0100

    use DOCS_ONLY if index=true and current options == null

commit 44169c1084
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 22:33:36 2014 +0100

    Fix index=yes|no settings in mappers

commit a3c5f77987
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 21:51:41 2014 +0100

    fix several field mappers conversion from setIndexed to indexOptions

commit df84d73690
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 21:33:35 2014 +0100

    fix SourceFieldMapper to be not indexed

commit b2bf01d12a
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 21:23:08 2014 +0100

    Cut over to .liv files in store and corruption tests

commit 619004df43
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 17:05:52 2014 +0100

    fix more tests

commit b7ed653a8b
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 16:19:08 2014 +0100

    [STORE] Add dedicated method to write temporary files

    Recovery writes temporary files which might not end up in the
    right distributor directories today. This commit adds a dedicated
    API that allows specifying the target file name in order to create the
    tempoary file in the correct directory.

commit 7d574659f6
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 10:28:49 2014 -0400

    add some leniency to temporary bogus method

commit f97022ea7c
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 10:24:17 2014 -0400

    fix MultiCollector bug

commit b760533128
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:56:08 2014 +0100

    CheckIndex is now closeable we need to close it

commit 9dae9fb6d6
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:45:11 2014 +0100

    s/Lucene51/Lucene50

commit 7aea9b8685
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:42:30 2014 +0100

    fix BloomFilterPostingsFormat

commit 16fea6fe84
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:41:16 2014 +0100

    fix some codec format issues

commit 3d77aa97dd
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:30:43 2014 +0100

    fix CodecTests

commit 6ef823b1fd
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:26:47 2014 +0100

    make it compile

commit 9991eee1fe
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 09:12:43 2014 -0400

    add an ugly hack for TopHitsAggregator for now

commit 03e768a01f
Author: Simon Willnauer <simonw@apache.org>
Date:   Wed Oct 29 14:01:02 2014 +0100

    cut over ES090PostingsFormat

commit 463d281faa
Merge: 0f8740a 8eac79c
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 08:30:36 2014 -0400

    Merge branch 'master' into enhancement/lucene_5_0_upgrade

commit 0f8740a782
Author: Robert Muir <rmuir@apache.org>
Date:   Wed Oct 29 01:00:15 2014 -0400

    fix/hack remaining filter and analysis issues

commit df53448856
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Oct 28 23:11:47 2014 -0400

    fix ngrams / openbitset usage

commit 11f5dc3b98
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Oct 28 22:42:44 2014 -0400

    hack over sort comparators

commit 4ebdc75435
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Oct 28 21:27:07 2014 -0400

    compiler errors < 100

commit 2d60c9e29d
Author: Robert Muir <rmuir@apache.org>
Date:   Tue Oct 28 03:13:08 2014 -0400

    clear some nocommits around ram usage

commit aaf47fe6c0
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 12:27:34 2014 -0400

    migrate fieldinfo handling

commit ef6ed6d15d
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 12:07:13 2014 -0400

    more simple fixes

commit f475e1048a
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 11:58:21 2014 -0400

    more fielddata ram accounting fixes

commit 16b4239eaa
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 16:47:32 2014 +0100

    add missing file

commit 5b542fa2a6
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 16:43:29 2014 +0100

    cut over completion posting formats - still some nocommits

commit ecdea49404
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 11:21:09 2014 -0400

    fielddata accountable fixes

commit d43da26571
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 16:19:53 2014 +0100

    cut over BloomFilterPostings to new API

commit 29b192ba62
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 10:22:51 2014 -0400

    fix more analyzers

commit 74b4a0c528
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 09:54:25 2014 -0400

    fix tests

commit 554084ccb4
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 14:51:48 2014 +0100

    maintain supressed exceptions on CorruptIndexException

commit cf882d9112
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 14:47:17 2014 +0100

    commitOnClose=false

commit ebb2a9189a
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 14:46:06 2014 +0100

    cut over indexwriter closeing in InternalEngine

commit cd21b3d470
Author: Simon Willnauer <simonw@apache.org>
Date:   Mon Oct 27 14:38:10 2014 +0100

    fix constant

commit f93f900c4a
Author: Robert Muir <rmuir@apache.org>
Date:   Mon Oct 27 09:50:49 2014 -0400

    fix test

commit a9a752940b
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Mon Oct 27 09:26:18 2014 +0100

    Be explicit about the index options

commit d9ee815bab
Author: Simon Willnauer <simonw@apache.org>
Date:   Sun Oct 26 20:03:44 2014 +0100

    cut over store and directory

commit b3f5c8e390
Author: Robert Muir <rmuir@apache.org>
Date:   Sun Oct 26 13:08:39 2014 -0400

    more test fixes

commit 8842f2684e
Author: Robert Muir <rmuir@apache.org>
Date:   Sun Oct 26 12:14:52 2014 -0400

    tests manual labor

commit c43de5aec3
Author: Robert Muir <rmuir@apache.org>
Date:   Sun Oct 26 11:04:13 2014 -0400

    BytesRef -> BytesRefBuilder

commit 020c0d087a
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Sun Oct 26 15:53:37 2014 +0100

    Moved over to BitSetFilter

commit 48dd1b909e
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Sun Oct 26 15:53:11 2014 +0100

    Left over Collector api change in ScanContext

commit 6ec248ef63
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Sun Oct 26 15:47:40 2014 +0100

    Moved indexed() over to indexOptions != null or indexOptions == null

commit 9937aebfd8
Author: Martijn van Groningen <martijn.v.groningen@gmail.com>
Date:   Sun Oct 26 13:26:31 2014 +0100

    Fixed many compile errors. Mainly around the breaking Collector api change in 5.0.

commit fec32c4abc
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Oct 25 11:22:17 2014 -0400

    more easy fixes

commit dab22531d8
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Oct 25 09:33:41 2014 -0400

    more progress

commit 414767e9a9
Author: Robert Muir <rmuir@apache.org>
Date:   Sat Oct 25 06:33:17 2014 -0400

    more progress

commit ad9d969fdd
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 24 14:28:01 2014 -0400

    current state of fun

commit 464475eecb
Author: Robert Muir <rmuir@apache.org>
Date:   Fri Oct 24 11:42:41 2014 -0400

    bump to 5.0 snapshot
2014-11-05 15:48:51 -05:00
Aarni Koskela
6011a18381 Docs: Add mention of hyphenation_patterns_path
Refs ElasticSearch's HyphenationCompoundWordTokenFilterFactory.java.

Closes #8305
2014-11-01 15:47:53 +01:00
Jun Ohtani
533c1084ec Docs: add the predefined language-specific stopword lists to stop-tokenfilter.asciidoc 2014-10-16 13:20:38 +09:00
sp836490
517caa0c6f Update cjk-bigram-tokenfilter.asciidoc 2014-10-15 11:54:19 +09:00
HenrikOssipoff
1445dd2308 Remove comma in JSON
Closes #7827
2014-09-28 11:08:09 +02:00
Clinton Gormley
cb00d4a542 Docs: Removed all the added/deprecated tags from 1.x 2014-09-26 21:04:42 +02:00
Clinton Gormley
091578d117 Update stemmer-tokenfilter.asciidoc
Change the `minimal_english` link to a publicly accessible URL
2014-09-25 20:29:12 +02:00
Sergii Golubev
059d9f757a Docs: bad text wrapping
On the page http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html

even on a huge monitor the text is being wrapped the next way
```
mapping:
ipod, i-pod, i pod => ipod, i-pod, i pod
mapping:
ipod, i-pod, i pod => ipod
```

So one can think that "mapping:" is not in comment and is a part of syntax. But the lines are less than 80 chars, so perhaps the problem is in the page layout and there may be some other pages in the reference where the text is also being wrapped in an undesirable way.

Closes #7739
2014-09-25 19:43:23 +02:00
Nik Everett
7bcd09a134 [docs] fix typo in language analyzer docs 2014-09-04 09:33:00 +02:00
Robert Muir
395744b0d2 [Analysis] Add missing docs for latvian analysis 2014-09-02 19:22:59 -04:00
Robert Muir
5c7cefa292 Analysis: Add keep_types for filtering by token type 2014-08-15 09:28:12 -04:00
Nik Everett
34426eb8c2 Docs: Fix syntax on lang-analyzer
Some of the language analyzer documentation contained invalid json.

Closes #7098
2014-07-30 20:17:27 +02:00
Simon Willnauer
5bfea56457 [DOCS] move all coming tags to added in master 2014-07-23 16:37:19 +02:00
Clinton Gormley
6e70edb0a4 Analysis: Improve Hunspell error messages
The Hunspell service would throw a confusing error message if more than
one affix file was present.  This commit distinguishes between the two
error cases: where there are no affix files and when there are too many
affix files.

Also implements lazy dictionary loading, which was used in the tests
but not implemented.

Closes #6850
2014-07-14 12:13:32 +02:00
Clinton Gormley
e4baa56f4b Docs: Language analyzers
Clarified the use of stem_exclusion and the keyword_marker
token filter

Closes #6613
2014-07-07 10:06:18 +02:00
Clinton Gormley
54790eea10 Update lang-analyzer.asciidoc
Clarified the use of the `stem_exclusion` token filter.

Closes #6613
2014-07-04 17:50:43 +02:00
Jun Ohtani
0c6a859357 Docs: fixed ICU plugin documentation
add ICU Normalization CharFilter to docs

Closes #6711
2014-07-03 15:21:51 +02:00
Mikhail Korobov
955473f475 Docs: unescape regexes in Pattern Tokenizer docs
Currently regexes in Pattern Tokenizer docs are escaped (it seems according to Java rules). I think it is better not to escape them because JSON escaping should be automatic in client libraries, and string escaping depends on a client language used. The default pattern is `\W+`, not `\\W+`.

Closes #6615
2014-07-03 13:34:13 +02:00
Robert Muir
2935b751e9 Fix doc formatting. Norwegian stemmers and Scandinavian normalizers
were missing commas between entries.
2014-07-03 07:08:33 -04:00
Robert Muir
b9a09c2b06 Analysis: Add additional Analyzers, Tokenizers, and TokenFilters from Lucene
Add `irish` analyzer
Add `sorani` analyzer (Kurdish)

Add `classic` tokenizer: specific to english text and tries to recognize hostnames, companies, acronyms, etc.
Add `thai` tokenizer: segments thai text into words.

Add `classic` tokenfilter: cleans up acronyms and possessives from classic tokenizer
Add `apostrophe` tokenfilter: removes text after apostrophe and the apostrophe itself
Add `german_normalization` tokenfilter: umlaut/sharp S normalization
Add `hindi_normalization` tokenfilter: accounts for hindi spelling differences
Add `indic_normalization` tokenfilter: accounts for different unicode representations in Indian languages
Add `sorani_normalization` tokenfilter: normalizes kurdish text
Add `scandinavian_normalization` tokenfilter: normalizes Norwegian, Danish, Swedish text
Add `scandinavian_folding` tokenfilter: much more aggressive form of `scandinavian_normalization`
Add additional languages to stemmer tokenfilter: `galician`, `minimal_galician`, `irish`, `sorani`, `light_nynorsk`, `minimal_nynorsk`

Add support access to default Thai stopword set "_thai_"

Fix some bugs and broken links in documentation.

Closes #5935
2014-07-03 05:47:49 -04:00
Clinton Gormley
cf059378d1 Docs: Updated stop token filter docs 2014-06-21 18:42:38 +02:00
Clinton Gormley
69350dc426 Update stemmer-override-tokenfilter.asciidoc 2014-06-18 11:34:20 +02:00