min_gram to max_gram characters long (clamped to the word length). Words shorter than min_gram are skipped.
Expected Response
Token Chars
By default, the edge ngram tokenizer treats letters and digits as token content and everything else (spaces, punctuation, symbols) as word delimiters. You can customize this withtoken_chars, which accepts a comma-separated
list of character classes: letter, digit, whitespace, punctuation, symbol. Character classification uses
Unicode general categories, matching Elasticsearch’s behavior.
For example, including punctuation keeps hyphens as part of words:
Expected Response