每个文档都有与之关联的元数据，例如 _index 和 _id 元数据字段。创建映射时，可以自定义其中一些元数据字段的行为。比如我们创建如下的一个文档：



1.  PUT test
2.  {
3.    "mappings": {
4.      "properties": {
5.        "id": {
6.          "type": "keyword"
7.        },
8.        "message": {
9.          "type": "text"
10.        }
11.      }
12.    }
13.  }

15.  PUT test/_doc/1
16.  {
17.    "id": "1234",
18.    "message": "This is so lovely"
19.  }

上面的最后一个命令的返回信息如下：



1.  {
2.    "_index" : "test",
3.    "_id" : "1",
4.    "_version" : 1,
5.    "result" : "created",
6.    "_shards" : {
7.      "total" : 2,
8.      "successful" : 1,
9.      "failed" : 0
10.    },
11.    "_seq_no" : 0,
12.    "_primary_term" : 1
13.  }

从上面的响应中，我们可以看出来 _index 字段为 test，而 _id 为 1。这个是上述文档的身份元数据字段：

身份元数据字段

身份元数据字段
_index	文档所属的索引。
_id	文档的 ID。

文档源元数据字段

文档源元数据字段
_source	表示文档正文的原始 JSON。
_size	_source 字段的大小（以字节为单位），由 mapper-size 插件提供。

要获得 _size 元数据，你必须按照上面表格中的链接按照 mapper-size 插件，并重新启动 Elasticsearch。我们做如下的练习：



1.  DELETE test

3.  PUT test
4.  {
5.    "mappings": {
6.      "_size": {
7.        "enabled": true
8.      },
9.      "properties": {
10.        "id": {
11.          "type": "keyword"
12.        },
13.        "message": {
14.          "type": "text"
15.        }
16.      }
17.    }
18.  }

20.  PUT test/_doc/1
21.  {
22.    "id": "1234",
23.    "message": "This is so lovely"
24.  }

在上面，我们在 mappings 里启动 _size。我们写入一个文档后，我们可以做如下的搜索：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "range": {
5.        "_size": {
6.          "gte": 10
7.        }
8.      }
9.    },
10.    "fields": [
11.      "_size"
12.    ]
13.  }

上面的命令返回的结果为：



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "test",
6.          "_id" : "1",
7.          "_score" : 1.0,
8.          "_source" : {
9.            "id" : "1234",
10.            "message" : "This is so lovely"
11.          },
12.          "fields" : {
13.            "_size" : [
14.              53
15.            ]
16.          }
17.        }
18.      ]
19.    }
20.  }

从上面，我们可以看出来 _size 这个元数据以及 _source 所表达的源文件数据。

文档计数元数据字段

文档计数元数据字段
_doc_count	当文档表示预聚合（pre-aggregation）数据时，用于存储文档计数的自定义字段。

桶聚合总是返回一个名为 doc_count 的字段，显示每个桶中聚合和分区的文档数。 doc_count 值的计算非常简单。对于每个存储桶中收集的每个文档，doc_count 都会增加 1。

虽然这种简单的方法在计算单个文档的聚合时很有效，但它不能准确地表示存储预聚合数据的文档（例如 histogram 或 aggregate_metric_double 字段），因为一个汇总字段可能代表多个文档。

为了在处理预聚合数据时正确计算文档数量，我们引入了一个名为 _doc_count 的元数据字段类型。 _doc_count 必须始终是一个正整数，表示在单个汇总字段中聚合的文档数。

当字段 _doc_count 添加到文档中时，所有存储桶聚合都将尊重其值并将存储桶 doc_count 增加该字段的值。如果文档不包含任何 _doc_count 字段，则默认隐含 _doc_count = 1。

重要：

每个文档的 _doc_count 字段只能存储一个正整数。不允许嵌套数组。

如果文档不包含 _doc_count 字段，聚合器将递增 1，这是默认行为。

例子

以下创建索引 API 请求创建具有以下字段映射的新索引：



1.  PUT my_index
2.  {
3.    "mappings" : {
4.      "properties" : {
5.        "my_histogram" : {
6.          "type" : "histogram"
7.        },
8.        "my_text" : {
9.          "type" : "keyword"
10.        }
11.      }
12.    }
13.  }

以下 index API请求存储两个直方图的预聚合数据：histogram_1 和 histogram_2。



1.  PUT my_index/_doc/1
2.  {
3.    "my_text" : "histogram_1",
4.    "my_histogram" : {
5.        "values" : [0.1, 0.2, 0.3, 0.4, 0.5],
6.        "counts" : [3, 7, 23, 12, 6]
7.     },
8.    "_doc_count": 45 
9.  }

11.  PUT my_index/_doc/2
12.  {
13.    "my_text" : "histogram_2",
14.    "my_histogram" : {
15.        "values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
16.        "counts" : [8, 17, 8, 7, 6, 2]
17.     },
18.    "_doc_count": 62 
19.  }

请注意上面的 _doc_count 的定义。它必须是一个正整数，存储为生成每个直方图而聚合的文档数。

如果我们在 my_index 上运行以下术语聚合：



1.  GET my_index/_search?filter_path=aggregations
2.  {
3.    "size": 0,
4.    "aggs": {
5.      "histogram_titles": {
6.        "terms": {
7.          "field": "my_text"
8.        }
9.      }
10.    }
11.  }

我们将得到以下响应：



1.  {
2.    "aggregations" : {
3.      "histogram_titles" : {
4.        "doc_count_error_upper_bound" : 0,
5.        "sum_other_doc_count" : 0,
6.        "buckets" : [
7.          {
8.            "key" : "histogram_2",
9.            "doc_count" : 62
10.          },
11.          {
12.            "key" : "histogram_1",
13.            "doc_count" : 45
14.          }
15.        ]
16.      }
17.    }
18.  }

索引元数据字段

索引元数据字段
_field_names	档中包含非空值的所有字段。
_ignored	由于 ignore_malformed 而在索引时被忽略的文档中的所有字段。

_field_names 字段用于索引文档中包含除 null 之外的任何值的每个字段的名称。 exists 查询使用此字段来查找具有或不具有特定字段的任何非空值的文档。

现在 _field_names 字段仅索引禁用了 doc_values 和 norms 的字段的名称。对于启用了 doc_values 或 norm 的字段，exists 查询仍然可用，但不会使用 _field_names 字段。

我们还是拿先前的例子来做展示：



1.  DELETE test

3.  PUT test
4.  {
5.    "mappings": {
6.      "_size": {
7.        "enabled": true
8.      },
9.      "properties": {
10.        "id": {
11.          "type": "keyword",
12.          "doc_values": true
13.        },
14.        "message": {
15.          "type": "text",
16.          "norms": false
17.        }
18.      }
19.    }
20.  }

22.  PUT test/_doc/1
23.  {
24.    "id": "1234",
25.    "message": "This is so lovely"
26.  }

我们对上面的文档进行搜索：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "match": {
5.        "_field_names": "id"
6.      }
7.    }
8.  }

由于 id 字段的 doc_values 是启动的，那么如下的查询：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "match": {
5.        "_field_names": "id"
6.      }
7.    }
8.  }



1.  {
2.    "hits" : {
3.      "hits" : [ ]
4.    }
5.  }

我们再查询：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "match": {
5.        "_field_names": "message"
6.      }
7.    }
8.  }



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "test",
6.          "_id" : "1",
7.          "_score" : 0.2876821,
8.          "_source" : {
9.            "id" : "1234",
10.            "message" : "This is so lovely"
11.          }
12.        }
13.      ]
14.    }
15.  }

我们可以使用如下的查询：



1.  GET test/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "exists": {
5.        "field": "message"
6.      }
7.    }
8.  }

10.  GET test/_search?filter_path=**.hits
11.  {
12.    "query": {
13.      "exists": {
14.        "field": "id"
15.      }
16.    }
17.  }

上面的两个凌乱都将返回：



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "test",
6.          "_id" : "1",
7.          "_score" : 1.0,
8.          "_source" : {
9.            "id" : "1234",
10.            "message" : "This is so lovely"
11.          }
12.        }
13.      ]
14.    }
15.  }

路由元数据字段

路由元数据字段
_routing	将文档路由到特定分片的自定义路由值。

使用以下公式将文档路由到索引中的特定分片：



1.  routing_factor = num_routing_shards / num_primary_shards
2.  shard_num = (hash(_routing) % num_routing_shards) / routing_factor

num_routing_shards 是 index.number_of_routing_shards 索引设置的值。 num_primary_shards 是index.number_of_shards 索引设置的值。

默认的 _routing 值是文档的 _id。自定义路由模式可以通过为每个文档指定自定义路由值来实现。例如：



1.  PUT my-index-000001/_doc/1?routing=user1&refresh=true 
2.  {
3.    "title": "This is a document"
4.  }

6.  GET my-index-000001/_doc/1?routing=user1

在上面，我们使用 user1 作为路由值而不是文档的 _id 值。当我们想更新，获取或者删除该文档时，我们需要使用同样的路由值。

_routing 字段的值可在查询中访问：



1.  GET my-index-000001/_search?filter_path=**.hits
2.  {
3.    "query": {
4.      "terms": {
5.        "_routing": [ "user1" ] 
6.      }
7.    }
8.  }

上面的命令的结果为：



1.  {
2.    "hits" : {
3.      "hits" : [
4.        {
5.          "_index" : "my-index-000001",
6.          "_id" : "1",
7.          "_score" : 1.0,
8.          "_routing" : "user1",
9.          "_source" : {
10.            "title" : "This is a document"
11.          }
12.        }
13.      ]
14.    }
15.  }

使用自定义路由搜索

自定义路由可以减少搜索的影响。不必将搜索请求大面积地发送到索引中的所有分片，而是可以将请求仅发送到与特定路由值（或值）匹配的分片：



1.  GET my-index-000001/_search?routing=user1,user2 
2.  {
3.    "query": {
4.      "match": {
5.        "title": "document"
6.      }
7.    }
8.  }

此搜索请求将仅在与 user1 和 user2 路由值关联的分片上执行。

使路由值成为必需

使用自定义路由时，在索引、获取、删除或更新文档时提供路由值非常重要。

忘记路由值可能导致文档在多个分片上被索引。作为保障，可以配置 _routing 字段以使所有 CRUD 操作都需要自定义路由值：



1.  PUT my-index-000002
2.  {
3.    "mappings": {
4.      "_routing": {
5.        "required": true 
6.      }
7.    }
8.  }

10.  PUT my-index-000002/_doc/1 
11.  {
12.    "text": "No routing value provided"
13.  }

在上面索引中，我们定义 _routing 是必须的。在下面的命令中，由于我们在请求中没有指定 _routing 参数，那么它将抛出 routing_missing_exception 错误。

其它元数据字段

其它元数据字段
_meta	应用程序特定的元数据。
_tier	文档所属索引的当前数据层首选项。

关于 _meta 元数据，请参考我的另外一篇文章 “Elasticsearch：添加 metadata 到 mapping 中”。

_tier 字段

在跨多个索引执行查询时，有时需要针对给定数据层（data_hot、data_warm、data_cold 或 data_frozen）的节点上保存的索引。 _tier 字段允许匹配文档被索引到的索引的 tier_preference 设置。在某些查询中可以访问首选值：



1.  PUT index_1/_doc/1
2.  {
3.    "text": "Document in index 1"
4.  }

6.  PUT index_2/_doc/2?refresh=true
7.  {
8.    "text": "Document in index 2"
9.  }

11.  GET index_1,index_2/_search
12.  {
13.    "query": {
14.      "terms": {
15.        "_tier": ["data_hot", "data_warm"] 
16.      }
17.    }
18.  }

在上面的查询中，我们对 data_hot 及 data_warm 数据层中的数据进行统计。

通常，查询将使用术语查询来列出感兴趣的层（tier），但你可以在任何重写为术语查询的查询中使用 _tier 字段，例如 match、query_string、term、terms 或 simple_query_string 查询，以及前缀和通配符查询。但是，它不支持正则表达式和模糊查询。

索引的 tier_preference 设置是按优先顺序排列的层名称的逗号分隔列表，即首先列出托管索引的首选层，然后是可能的许多后备选项。查询匹配只考虑第一个偏好（列表的第一个值）。

Elasticsearch：Metadata fields - 元数据字段介绍