Basic queries_Elasticsearch Server（Third Edition）-QQ阅读女频幻言网

上QQ阅读APP看书，第一时间看更新

Basic queries

Elasticsearch has extensive search and data analysis capabilities that are exposed in forms of different queries, filters, aggregates, and so on. In this section, we will concentrate on the basic queries provided by Elasticsearch. By basic queries we mean the ones that don't combine the other queries together but run on their own.

The term query

The term query is one of the simplest queries in Elasticsearch. It just matches the document that has a term in a given field - the exact, not analyzed term. The simplest term query is as follows:

{
  "query" : {
  "term" : {
    "title" : "crime"
  }
  }
}

It will match the documents that have the term crime in the title field. Remember that the term query is not analyzed, so you need to provide the exact term that will match the term in the indexed document. Note that in our input data, we have the title field with the value of Crime and Punishment (upper cased), but we are searching for crime, because the Crime terms becomes crime after analysis during indexing.

In addition to the term we want to find, we can also include the boost attribute to our term query, which will affect the importance of the given term. We will talk more about boosts in the Introduction to Apache Lucene scoring section of Chapter 6, Make Your Search Better. For now, we just need to remember that it changes the importance of the given part of the query.

For example, to change our previous query and give our term query a boost of 10.0, send the following query:

{
  "query" : {
  "term" : {
    "title" : {
    "value" : "crime",
    "boost" : 10.0
    }
  }
  }
}

As you can see, the query changed a bit. Instead of a simple term value, we nested a new JSON object which contains the value property and the boost property. The value of the value property should contain the term we are interested in and the boost property is the boost value we want to use.

The terms query

The terms query is an extension to the term query. It allows us to match documents that have certain terms in their contents instead of a single term. The term query allowed us to match a single, not analyzed term and the terms query allows us to match multiple of those. For example, let's say that we want to get all the documents that have the terms novel or book in the tags field. To achieve this, we will run the following query:

{
  "query" : {
  "terms" : {
    "tags" : [ "novel", "book" ]
  }
  }
}

The preceding query returns all the documents that have one or both of the searched terms in the tags field. This is a key point to remember – the terms query will find documents having any of the provided terms.

The match all query

The match all query is one of the simplest queries available in Elasticsearch. It allows us to match all of the documents in the index. If we want to get all the documents from our index, we just run the following query:

{
  "query" : {
  "match_all" : {}
  }
}

We can also include boost in the query, which will be given to all the documents matched by it. For example, if we want to add a boost of 2.0 to all the documents in our match all query, we will send the following query to Elasticsearch:

{
  "query" : {
  "match_all" : { 
    "boost" : 2.0 
  }
  }
}

The type query

A very simple query that allows us to find all the documents with a certain type. For example, if we would like to search for all the documents with the book type in our library index, we will run the following query:

{
  "query" : {
  "type" : {
    "value" : "book"
  }
  }
}

The exists query

A query that allows us to find all the documents that have a value in the defined field. For example, to find the documents that have a value in the tags field, we will run the following query:

{
  "query" : {
  "exists" : {
    "field" : "tags"
  }
  }
}

The missing query

Opposite to the exists query, the missing query returns the documents that have a null value or no value at all in a given field. For example, to find all the documents that don't have a value in the tags field, we will run the following query:

{
  "query" : {
  "missing" : {
    "field" : "tags"
  }
  }
}

The common terms query

The common terms query is a modern Elasticsearch solution for improving query relevance and precision with common words when we are not using stop words (http://en.wikipedia.org/wiki/Stop_words). For example, a crime and punishment query results in three term queries and each of them have a cost in terms of performance. However, the and term is a very common one and its impact on the document score will be very low. The solution is the common terms query which pides the query into two groups. The first group is the one with important terms, which are the ones that have lower frequency. The second group is the one with less important terms, which are the ones with high frequency. The first query is executed first and Elasticsearch calculates the score for all of the terms from the first group. This way the low frequency terms, which are usually the ones that have more importance, are always taken into consideration. Then Elasticsearch executes the second query for the second group of terms, but calculates the score only for the documents matched for the first query. This way the score is only calculated for the relevant documents and thus higher performance can be achieved.

An example of the common terms query is as follows:

{
 "query" : {
  "common" : { 
   "title" : {
    "query" : "crime and punishment",
    "cutoff_frequency" : 0.001
   }
  }
 }
}

The query can take the following parameters:

query: The actual query contents.
cutoff_frequency: The percentage (0.001 means 0.1%) or an absolute value (when property is set to a value equal to or larger than 1). High and low frequency groups are constructed using this value. Setting this parameter to 0.001 means that the low frequency terms group will be constructed for terms having a frequency of 0.1% and lower.
low_freq_operator: This can be set to or or and, but defaults to or. It specifies the Boolean operator used for constructing queries in the low frequency term group. If we want all the terms to be present in a document for it to be considered a match, we should set this parameter to and.
high_freq_operator: This can be set to or or and, but defaults to or. It specifies the Boolean operator used for constructing queries in the high frequency term group. If we want all the terms to be present in a document for it to be considered a match, we should set this parameter to and.
minimum_should_match: Instead of using low_freq_operator and high_freq_operator, we can use minimum_should_match. Just like with the other queries, it allows us to specify the minimum number of terms that should be found in a document for it to be considered a match. We can also specify high_freq and low_freq inside the minimum_should_match object, which allows us to define the different number of terms that need to be matched for the high and low frequency terms.
boost: The boost given to the score of the documents.
analyzer: The name of the analyzer that will be used to analyze the query text, which defaults to the default analyzer.
disable_coord: Defaults to false and allows us to enable or disable the score factor computation that is based on the fraction of all the query terms that a document contains. Set it to true for less precise scoring, but slightly faster queries.
Note
Unlike the term and terms queries, the common terms query is analyzed by Elasticsearch.

The match query

The match query takes the values given in the query parameter, analyzes it, and constructs the appropriate query out of it. When using a match query, Elasticsearch will choose the proper analyzer for the field we choose, so you can be sure that the terms passed to the match query will be processed by the same analyzer that was used during indexing. Remember that the match query (and the multi_match query) doesn't support Lucene query syntax; however, it perfectly fits as a query handler for your search box. The simplest match (and the default) query will look like the following:

{
  "query" : {
    "match" : {
      "title" : "crime and punishment"
    }
  }
}

The preceding query will match all the documents that have the terms crime, and, or punishment in the title field. However, the previous query is only the simplest one; there are multiple types of match query which we will discuss now.

The Boolean match query

The Boolean match query is a query which analyzes the provided text and makes a Boolean query out of it. This is also the default type for the match query. There are a few parameters which allow us to control the behavior of the Boolean match queries:

operator: This parameter can take the value of or or and, and controls which Boolean operator is used to connect the created Boolean clauses. The default value is or. If we want all the terms in our query to be matched, we should use the and Boolean operator.
analyzer: This specifies the name of the analyzer that will be used to analyze the query text and defaults to the default analyzer.
fuzziness: Providing the value of this parameter allows us to construct fuzzy queries. The value of this parameter can vary. For numeric fields, it should be set to numeric value; for date based field, it can be set to millisecond or time value, such as 2h; and for text fields, it can be set to 0, 1, or 2 (the edit distance in the Levenshtein algorithm – https://en.wikipedia.org/wiki/Levenshtein_distance), AUTO (which allows Elasticsearch to control how fuzzy queries are constructed and which is a preferred value). Finally, for text fields, it can also be set to values from 0.0 to 1.0, which results in edit distance being calculated as term length minus 1.0 multiplied by the provided fuzziness value. In general, the higher the fuzziness, the more difference between terms will be allowed.
prefix_length: This allows control over the behavior of the fuzzy query. For more information on the value of this parameter, refer to the The fuzzy query section in this chapter.
max_expansions: This allows control over the behavior of the fuzzy query. For more information on the value of this parameter, refer to the The fuzzy query section in this chapter.
zero_terms_query: This allows us to specify the behavior of the query, when all the terms are removed by the analyzer (for example, because of stop words). It can be set to none or all, with none as the default. When set to none, no documents will be returned when the analyzer removes all the query terms. If set it to all, all the documents will be returned.
cutoff_frequency: It allows piding the query into two groups: one with high frequency terms and one with low frequency terms. Refer to the description of the common terms query to see how this parameter can be used.
lenient: When set to true (by default it is false), it allows us to ignore the exceptions caused by data incompatibility, such as trying to query numeric fields using string value.

The parameters should be wrapped in the name of the field we are running the query against. So if we want to run a sample Boolean match query against the title field, we send a query as follows:

{
  "query" : {
  "match" : {
    "title" : {
    "query" : "crime and punishment",
    "operator" : "and"
    }
  }
  }
}

The phrase match query

A phrase match query is similar to the Boolean query, but, instead of constructing the Boolean clauses from the analyzed text, it constructs the phrase query. You may wonder what phrase is when it comes to Lucene and Elasticsearch – well, it is two or more terms positioned one after another in an order. The following parameters are available:

slop: An integer value that defines how many unknown words can be put between the terms in the text query for a match to be considered a phrase. The default value of this parameter is 0, which means that no additional words are allowed.
analyzer: This specifies the name of the analyzer that will be used to analyze the query text and defaults to the default analyzer.

A sample phrase match query against the title field looks like the following code:

{
  "query" : {
  "match_phrase" : {
    "title" : {
    "query" : "crime punishment",
    "slop" : 1
    }
  }
  }
}

Note that we removed the and term from our query, but because the slop is set to 1, it will still match our document because we allowed one term to be present between our terms.

The match phrase prefix query

The last type of the match query is the match phrase prefix query. This query is almost the same as the phrase match query, but in addition, it allows prefix matches on the last term in the query text. Also, in addition to the parameters exposed by the match phrase query, it exposes an additional one – the max_expansions parameter, which controls how many prefixes the last term will be rewritten to. Our example query changed to the match_phrase_prefix query will look as follows:

{
  "query" : {
  "match_phrase_prefix" : {
    "title" : {
    "query" : "crime punishm",
    "slop" : 1,
    "max_expansions" : 20
    }
  }
  }
}

Note that we didn't provide the full crime and punishment phrase, but only crime punishm and still the query would match our document. This is because we used the match_phrase_prefix query combined with slop set to 1.

The multi match query

It is the same as the match query, but instead of running against a single field, it can be run against multiple fields with the use of the fields parameter. Of course, all the parameters you use with the match query can be used with the multi match query. So if we would like to modify our match query to be run against the title and otitle fields, we will run the following query:

{
  "query" : {
    "multi_match" : {
      "query" : "crime punishment",
      "fields" : [ "title^10", "otitle" ]
  }
  }
}

As shown in the preceding example, the nice thing about the multi match query is that the fields defined in it support boosting, so we can increase or decrease the importance of matches on certain fields.

However, this is not the only difference when it comes to comparison with the match query. We can also control how the query is run internally by using the type property and setting it to one of the following values:

best_fields: This is the default behavior, which finds documents having matches in any field from the defined ones, but setting the document score to the score of the best matching field. The most useful type when searching for multiple words and wanting to boost documents that have those words in the same field.
most_fields: This value finds documents that match any field and sets the score of the document to the combined score from all the matched fields.
cross_fields: This value treats the query as if all the terms were in one, big field, thus returning documents matching any field.
phrase: This value uses the match_phrase query on each field and sets the score of the document to the score combined from all the fields.
phrase_prefix: This value uses the match_phrase_prefix query on each field and sets the score of the document to the score combined from all the fields.

In addition to the parameters mentioned in the match query and type, the multi match query exposes some additional ones allowing more control over its behavior:

tie_breaker: This allows us to specify the balance between the minimum and the maximum scoring query items and the value can be from 0.0 to 1.0. When used, the score of the document is equal to the best scoring element plus the tie_breaker multiplied by the score of all the other matching fields in the document. So, when set to 0.0, Elasticsearch will only use the score of the most scoring matching element. You can read more about it in The dis_max query section in this chapter.

The query string query

In comparison to the other queries available, the query string query supports full Apache Lucene query syntax, which we discussed earlier in the Lucene query syntax section of Chapter 1, Getting Started with Elasticsearch Cluster. It uses a query parser to construct an actual query using the provided text. An example query string query will look like the following code:

{
  "query" : {
  "query_string" : {
    "query" : "title:crime^10 +title:punishment -otitle:cat +author:(+Fyodor +dostoevsky)",
    "default_field" : "title"
  }
  }
}

Because we are familiar with the basics of the Lucene query syntax, we can discuss how the preceding query works. As you can see, we wanted to get the documents that may have the term crime in the title field and such documents should be boosted with the value of 10. Next, we wanted only the documents that have the term punishment in the title field and we didn't want documents with the term cat in the otitle field. Finally, we told Lucene that we only wanted the documents that had the fyodor and dostoevsky terms in the author field.

Similar to most of the queries in Elasticsearch, the query string query provides quite a few parameters that allow us to control the query behavior and the list of parameters for this query is rather extensive:

query: This specifies the query text.
default_field: This specifies the default field the query will be executed against. It defaults to the index.query.default_field property, which is by default set to _all.
default_operator: This specifies the default logical operator (or or and) used when no operator is specified. The default value of this parameter is or.
analyzer: This specifies the name of the analyzer used to analyze the query provided in the query parameter.
allow_leading_wildcard: This specifies if a wildcard character is allowed as the first character of a term. It defaults to true.
lowercase_expand_terms: This specifies if the terms that are a result of query rewrite should be lowercased. It defaults to true, which means that the rewritten terms will be lowercased.
enable_position_increments: This specifies if position increments should be turned on in the result query. It defaults to true.
fuzzy_max_expansions: This specifies the maximum number of terms into which fuzzy query will be expanded, if fuzzy query is used. It defaults to 50.
fuzzy_prefix_length: This specifies the prefix length for the generated fuzzy queries and defaults to 0. To learn more about it, look at the fuzzy query description.
phrase_slop: This specifies the phrase slop and defaults to 0. To learn more about it, look at the phrase match query description.
boost: This specifies the boost value which will be used and defaults to 1.0.
analyze_wildcard: This specifies if the terms generated by the wildcard query should be analyzed. It defaults to false, which means that those terms won't be analyzed.
auto_generate_phrase_queries: specifies if the phrase queries will be automatically generated from the query. It defaults to false, which means that the phrase queries won't be automatically generated.
minimum_should_match: This controls how many of the generated Boolean should clauses should be matched against a document for the document to be considered a hit. The value can be provided as a percentage; for example, 50%, which would mean that at least 50 percent of the given terms should match. It can also be provided as an integer value, such as 2, which means that at least 2 terms must match.
fuzziness: This controls the behavior of the generated fuzzy query. Refer to the match query description for more information.
max_determined_states: This defaults to 10000 and sets the number of states that the automaton can have for handling regular expression queries. It is used to disallow very expensive queries using regular expressions.
locale: This sets the locale that should be used for the conversion of string values. By default, it is set to ROOT.
time_zone: This sets the time zone that should be used by range queries that are run on date based fields.
lenient: This can take the value of true or false. If set to true, format-based failures will be ignored. By default, it is set to false.

Note that Elasticsearch can rewrite the query string query and, because of that, Elasticsearch allows us to pass additional parameters that control the rewrite method. However, for more details about this process, go to the Understanding the querying process section in this chapter.

Running the query string query against multiple fields

It is possible to run the query string query against multiple fields. In order to do that, one needs to provide the fields parameter in the query body, which should hold the array of the field names. There are two methods of running the query string query against multiple fields: the default method uses the Boolean query to make queries and the other method can use the dis_max query.

In order to use the dis_max query, one should add the use_dis_max property in the query body and set it to true. An example query will look like the following code:

{
 "query" : {
  "query_string" : {
   "query" : "crime punishment",
   "fields" : [ "title", "otitle" ],
   "use_dis_max" : true
  }
 }
}

The simple query string query

The simple query string query uses one of the newest query parsers in Lucene - the SimpleQueryParser (https://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.html). Similar to the query string query, it accepts Lucene query syntax as the query; however, unlike it, it never throws an exception when a parsing error happens. Instead of throwing an exception, it discards the invalid parts of the query and runs the rest.

An example simple query string query will look like the following code:

{
 "query" : {
  "simple_query_string" : {
   "query" : "crime punishment",
   "default_operator" : "or"
  }
 }
}

The query supports parameters such as query, fields, default_operator, analyzer, lowercase_expanded_terms, locale, lenient, and minimum_should_match, and can also be run against multiple fields using the fields property.

The identifiers query

This is a simple query that filters the returned documents to only those with the provided identifiers. It works on the internal _uid field, so it doesn't require the _id field to be enabled. The simplest version of such a query will look like the following:

{
  "query" : {
  "ids" : {
   "values" : [ "1", "2", "3" ]
  }
  }
}

This query will only return those documents that have one of the identifiers present in the values array. We can complicate the identifiers query a bit and also limit the documents on the basis of their type. For example, if we want to only include documents from the book types, we will send the following query:

{
 "query" : {
  "ids" : {
   "type" : "book",
   "values" : [ "1", "2", "3" ]
  }
 }
}

As you can see, we've added the type property to our query and we've set its value to the type we are interested in.

The prefix query

This query is similar to the term query in its configuration and to the multi term query when looking into its logic. The prefix query allows us to match documents that have the value in a certain field that starts with a given prefix. For example, if we want to find all the documents that have values starting with cri in the title field, we will run the following query:

{
  "query" : {
    "prefix" : {
      "title" : "cri"
    }
  }
}

Similar to the term query, you can also include the boost attribute to your prefix query which will affect the importance of the given prefix. For example, if we would like to change our previous query and give our query a boost of 3.0, we will send the following query:

{
  "query" : {
  "prefix" : {
    "title" : {
    "value" : "cri",
    "boost" : 3.0
    }
  }
  }
}

Note

Note that the prefix query is rewritten by Elasticsearch and because of that Elasticsearch allows us to pass an additional parameter, that is, controlling the rewrite method. However, for more details about that process, refer to the Understanding the querying process section in this chapter.

The fuzzy query

The fuzzy query allows us to find documents that have values similar to the ones we've provided in the query. The similarity of terms is calculated on the basis of the edit distance algorithm. The edit distance is calculated on the basis of terms we provide in the query and against the searched documents. This query can be expensive when it comes to CPU resources, but can help us when we need fuzzy matching; for example, when users make spelling mistakes. In our example, let's assume that instead of crime, our user enters the crme word into the search box and we would like to run the simplest form of fuzzy query. Such a query will look like this:

{
  "query" : {
    "fuzzy" : {
      "title" : "crme"
    }
  }
}

The response for such a query will be as follows:

{
  "took" : 81,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.5,
    "hits" : [ {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4",
      "_score" : 0.5,
      "_source" : {
        "title" : "Crime and Punishment",
        "otitle" : "Преступлéние и наказáние",
        "author" : "Fyodor Dostoevsky",
        "year" : 1886,
        "characters" : [ "Raskolnikov", "Sofia Semyonovna Marmeladova" ],
        "tags" : [ ],
        "copies" : 0,
        "available" : true
      }
    } ]
  }
}

Even though we made a typo, Elasticsearch managed to find the documents we were interested in.

We can control the fuzzy query behavior by using the following parameters:

value: This specifies the actual query.
boost: This specifies the boost value for the query. It defaults to 1.0.
fuzziness: This controls the behavior of the generated fuzzy query. Refer to the match query description for more information.
prefix_length: This is the length of the common prefix of the differencing terms. It defaults to 0.
max_expansions: This specifies the maximum number of terms the query will be expanded to. The default value is unbounded.

The parameters should be wrapped in the name of the field we are running the query against. So if we would like to modify the previous query and add additional parameters, the query will look like the following code:

{
 "query" : {
  "fuzzy" : {
   "title" : {
    "value" : "crme",
    "fuzziness" : 2
   }
  }
 }
}

The wildcard query

A query that allows us to use * and ? wildcards in the values we search. Apart from that, the wildcard query is very similar to the term query in case of its body. To send a query that would match all the documents with the value of the cr?me term (? matching any character) we would send the following query:

{
 "query" : {
  "wildcard" : {
   "title" : "cr?me"
  }
 }
}

It will match the documents that have all the terms matching cr?me in the title field. However, you can also include the boost attribute to your wildcard query which will affect the importance of each term that matches the given value. For example, if we would like to change our previous query and give our term query a boost of 20.0, we will send the following query:

{
 "query" : {
  "wildcard" : {
   "title" : {
    "value" : "cr?me",
    "boost" : 20.0
   }
  }
 }
}

Note

Note that wildcard queries are not very performance oriented queries and should be avoided if possible; especially avoid leading wildcards (terms starting with wildcards). The wildcard query is rewritten by Elasticsearch and because of that Elasticsearch allows us to pass an additional parameter, that is, controlling the rewrite method. For more details about this process, refer to the Understanding the querying process section in this chapter. Also remember that the wildcard query is not analyzed.

The range query

A query that allows us to find documents that have a field value within a certain range and which works for numerical fields as well as for string-based fields and date based fields (just maps to a different Apache Lucene query). The range query should be run against a single field and the query parameters should be wrapped in the field name. The following parameters are supported:

gte: The query will match documents with the value greater than or equal to the one provided with this parameter
gt: The query will match documents with the value greater than the one provided with this parameter
lte: The query will match documents with the value lower than or equal to the one provided with this parameter
lt: The query will match documents with the value lower than the one provided with this parameter

So for example, if we want to find all the books that have the value from 1700 to 1900 in the year field, we will run the following query:

{
 "query" : {
  "range" : {
   "year" : {
    "gte" : 1700,
    "lte" : 1900
   }
  }
 }
}

Regular expression query

Regular expression query allows us to use regular expressions as the query text. Remember that the performance of such queries depends on the chosen regular expression. If our regular expression would match many terms, the query will be slow. The general rule is that the more terms matched by the regular expression, the slower the query will be.

An example regular expression query looks like this:

{
 "query" : {
  "regexp" : {
   "title" : {
    "value" : "cr.m[ae]",
    "boost" : 10.0
   }
  }
 }
}

The preceding query will result in Elasticsearch rewriting the query. The rewritten query will have the number of term queries depending on the content of our index matching the given regular expression. The boost parameter seen in the query specifies the boost value for the generated queries.

The full regular expression syntax accepted by Elasticsearch can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax.

The more like this query

One of the queries that got a major rework in Elasticsearch 2.0, the more like this query allows us to retrieve documents that are similar (or not similar) to the provided text or to the documents that were provided.

The more like this query allows us to get documents that are similar to the provided text. Elasticsearch supports a few parameters to define how the more like this query should work:

fields: An array of fields that the query should be run against. It defaults to the _all field.
like: This parameter comes in two flavors: it allows us to provide a text which the returned documents should be similar to or an array of documents that the returning document should be similar to.
unlike: This is similar to the like parameter, but it allows us to define text or documents that our returning document should not be similar to.
min_term_freq: The minimum term frequency (for the terms in the documents) below which terms will be ignored. It defaults to 2.
max_query_terms: The maximum number of terms that will be included in any generated query. It defaults to 25. The higher value may mean higher precision, but lower performance.
stop_words: An array of words that will be ignored when comparing documents and the query. It is empty by default.
min_doc_freq: The minimum number of documents in which the term has to be present in order not to be ignored. It defaults to 5, which means that a term needs to be present in at least five documents.
max_doc_freq: The maximum number of documents in which the term may be present in order not to be ignored. By default, it is unbounded (set to 0).
min_word_len: The minimum length of a single word below which a word will be ignored. It defaults to 0.
max_word_len: The maximum length of a single word above which it will be ignored. It defaults to unbounded (which means setting the value to 0).
boost_terms: The boost value that will be used when boosting each term. It defaults to 0.
boost: The boost value that will be used when boosting the query. It defaults to 1.
include: This specifies if the input documents should be included in the results returned by the query. It defaults to false, which means that the input documents won't be included.
minimum_should_match: This controls the number of terms that need to be matched in the resulting documents. By default, it is set to 30%.
analyzer: The name of the analyzer that will be used to analyze the text we provided.

An example for a more like this query looks like this:

{
 "query" : {
  "more_like_this" : {
   "fields" : [ "title", "otitle" ],
   "like" : "crime and punishment",
   "min_term_freq" : 1,
   "min_doc_freq" : 1
  }
 }
}

As we said earlier, the like property can also be used to show which documents the results should be similar to. For example, the following is the query that will use the like property to point to a given document (note that the following query won't return documents on our example data):

{
  "query" : {
  "more_like_this" : {
    "fields" : [ "title", "otitle" ],
    "min_term_freq" : 1,
    "min_doc_freq" : 1,
    "like" : [
     {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4"
    }
    ]
  }
  }
}

We can also mix the documents and text together:

{
  "query" : {
  "more_like_this" : {
    "fields" : [ "title", "otitle" ],
    "min_term_freq" : 1,
    "min_doc_freq" : 1,
    "like" : [
    {
      "_index" : "library",
      "_type" : "book",
      "_id" : "4"
    },
    "crime and punishment"
    ]
  }
  }
}

本周热推：

前端程序员面试笔试真题库 Spring Boot 2+Thymeleaf企业应用实战 Java Web从入门到精通（第2版）亮剑Visual C++项目开发案例导航 Python机器学习算法与应用