ElasticSearch 学习笔记 - 11.桶聚合

秦可卿
• 阅读 7223

1、源数据


DELETE my-index

PUT my-index

PUT my-index/person/1
{
  "name":"张三",
  "age":27,
  "gender":"男",
  "salary":15000,
  "dep":"bigdata"
}

PUT my-index/person/2
{
  "name":"李四",
  "age":26,
  "gender":"女",
  "salary":15000,
  "dep":"bigdata"
}

PUT my-index/person/3
{
  "name":"王五",
  "age":26,
  "gender":"男",
  "salary":17000,
  "dep":"AI"
}
PUT my-index/person/4
{
  "name":"刘六",
  "age":27,
  "gender":"女",
  "salary":18000,
  "dep":"AI"
}

PUT my-index/person/5
{
  "name":"程裕强",
  "age":31,
  "gender":"男",
  "salary":20000,
  "dep":"bigdata"
}
PUT my-index/person/6
{
  "name":"hadron",
  "age":30,
  "gender":"男",
  "salary":20000,
  "dep":"AI"
}

2、Terms Aggregation

根据薪资水平进行分组,统计每个薪资水平的人数

GET /my-index/person/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "salary"
      }
    }
  }
}

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 15000,
          "doc_count": 2
        },
        {
          "key": 20000,
          "doc_count": 2
        },
        {
          "key": 17000,
          "doc_count": 1
        },
        {
          "key": 18000,
          "doc_count": 1
        }
      ]
    }
  }
}

统计上面每个分组的平均年龄

GET /my-index/person/_search
{
  "size": 0,
  "aggs": {
    "group_count": {
      "terms": {
        "field": "salary"
      }
      , "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 15000,
          "doc_count": 2,
          "avg_age": {
            "value": 26.5
          }
        },
        {
          "key": 20000,
          "doc_count": 2,
          "avg_age": {
            "value": 30.5
          }
        },
        {
          "key": 17000,
          "doc_count": 1,
          "avg_age": {
            "value": 26
          }
        },
        {
          "key": 18000,
          "doc_count": 1,
          "avg_age": {
            "value": 27
          }
        }
      ]
    }
  }
}

统计每个部门的人数

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "terms": {"field": "dep"}
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my-index",
        "node": "fQDwpdT2RfSfPr8ttHQCkA",
        "reason": {
          "type": "illegal_argument_exception",
          "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
        }
      }
    ],
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
      }
    }
  },
  "status": 400
}

根据错误提示”Fielddata is disabled on text fields by default.
Set fielddata=true on [dep] in order to load fielddata in memory by uninverting the inverted index.
Note that this can however use significant memory. Alternatively use a keyword field instead.”可知,需要开启fielddata参数。只需要设置某个字段"fielddata": true即可。
此外,根据官方文档提示se the my_field.keyword field for aggregations, sorting, or in scripts,可以尝试my_field.keyword格式用于聚合操作。

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "terms": {"field": "dep.keyword"}
    }
  }
}

2、Filter Aggregation

计算男人的平均年龄

也就是统计gender字段包含关键字“男”的文档的age平均值。

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "filter": {
        "term":{"gender": "男"}
      },
      "aggs":{
        "avg_age":{
          "avg":{"field": "age"}
        }
      }
    }
  }
}

3、Filters Aggregation

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "filters":{
        "filters": [
          {"match":{"gender": "男"}},
          {"match":{"gender": "女"}}
        ]
      },
      "aggs":{
        "avg_age":{
            "avg":{"field": "age"}
        }
      }
    }
  }
}

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "doc_count": 4,
          "avg_age": {
            "value": 28.5
          }
        },
        {
          "doc_count": 2,
          "avg_age": {
            "value": 26.5
          }
        }
      ]
    }
  }
}

4、Range Aggregation

from..to区间范围是[from,to),也就是说包含from点,不包含to点
【例子】查询薪资在[0,10000),[10000,20000),[2000,+无穷大)三个范围的员工数

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "range": {
        "field": "salary",
        "ranges": [
            {"to": 10000},
            {"from": 10000,"to":20000},  
            {"from": 20000}
        ]
      }
    }
  }
}

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 6,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-10000.0",
          "to": 10000,
          "doc_count": 0
        },
        {
          "key": "10000.0-20000.0",
          "from": 10000,
          "to": 20000,
          "doc_count": 4
        },
        {
          "key": "20000.0-*",
          "from": 20000,
          "doc_count": 2
        }
      ]
    }
  }
}

5、Date Range聚合

专用于日期值的范围聚合。
这种聚合和正常范围聚合的主要区别在于,起始和结束值可以在日期数学表达式中表示,并且还可以指定返回起始和结束响应字段的日期格式。
请注意,此聚合包含from值并排除每个范围的值。

【例子】计算一年前之前发表的博文数和从一年前以来发表的博文总数

GET website/_search
{
  "size": 0, 
  "aggs": {
    "group_count": {
      "range": {
        "field": "postdate",
        "format":"yyyy-MM-dd",
        "ranges": [
            {"to": "now-12M/M"},
            {"from": "now-12M/M"}
        ]
      }
    }
  }
}



{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group_count": {
      "buckets": [
        {
          "key": "*-2017-10-01",
          "to": 1506816000000,
          "to_as_string": "2017-10-01",
          "doc_count": 8
        },
        {
          "key": "2017-10-01-*",
          "from": 1506816000000,
          "from_as_string": "2017-10-01",
          "doc_count": 1
        }
      ]
    }
  }
}

6、Missing聚合

基于字段数据的单桶集合,创建当前文档集上下文中缺少字段值(实际上缺少字段或设置了配置的NULL值)的所有文档的桶。
此聚合器通常会与其他字段数据存储桶聚合器(如范围)一起使用,以返回由于缺少字段数据值而无法放置在其他存储桶中的所有文档的信息。

GET my-index/_search
{
  "size": 0, 
  "aggs": {
    "noDep_count": {
      "missing": {"field": "salary"}
    }
  }
}

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "noDep_count": {
      "doc_count": 3
    }
  }
}
点赞
收藏
评论区
推荐文章
Cindy-wallys Cindy-wallys
3年前
DR-AP40X9-A-Qualcomm-IPQ4019/IPQ4029-2.4G&5G/industrial wireless AP
DRAP40X9A isa2x22.4G&5GhighpowerRadioAPRouterwhichincluding1x DR40X9(IPQ4019/IPQ4029)Board;2x2.4G&5GdipoleAntennas;1xPSU(DCpowerorPoE);1xEnclosure. Dipole antenna ■ 2x2.4G&5GAntenna■ 5dBi2.4GHz/5Ghz■SMAPluginterface ■Plasticrodofblack ■RoHScomplianceEnclosure■ Dimension:132x119x34mm■ Aluminumalloymaterial■ SupportWallysDR40X9
Wesley13 Wesley13
4年前
MySQL基础(一)
一、连接MySQL数据库1连接:2mysqlhhostuuserp34常见错误:5ERROR2002(HY000):Can'tconnecttolocalMySQLserverthroughsocket'/tmp/my
Wesley13 Wesley13
4年前
Unity XLua 官方案例学习
1\.Helloworld1usingUnityEngine;2usingXLua;34publicclassHelloworld:MonoBehaviour{5//Usethisforinitialization
Stella981 Stella981
4年前
Eth2存款合约发布!如何质押你的 ETH 成为验证者呢?
!(https://oscimg.oschina.net/oscnet/3e74a40be34b4e2d8432b3ca743f1797.png)!(https://oscimg.oschina.net/oscnet/f6c23c1814124a579f1e32a490ff11b3.jpg)作者:RyanSe
Stella981 Stella981
4年前
OKHttp源码学习
1.HttpURLConnection1publicclassHttpURLConnectionGetAndPost{2privateStringurlAddress"xxxx";34publicvoiddoGet(Stringmethod,Strings)
Stella981 Stella981
4年前
Elastic日报 第288期 (2018
1\.个更好的Elasticsearch基准测试技巧。http://t.cn/R16rZXS长按此QRCode,阅读原文!(https://oscimg.oschina.net/oscnet/0be2b3c94914422d8de5cef6c34e5fd6.png)2\.ElasticSearch的搭建与数据统计。
Wesley13 Wesley13
4年前
Java学习之集合(HashSet)
HashSet的实例1importjava.util.HashSet;2importjava.util.Iterator;34publicclassHashSetDemo{56publicstaticvoidmain(Stringargs){
Stella981 Stella981
4年前
Android蓝牙连接汽车OBD设备
//设备连接public class BluetoothConnect implements Runnable {    private static final UUID CONNECT_UUID  UUID.fromString("0000110100001000800000805F9B34FB");
Wesley13 Wesley13
4年前
JAVA的接口
用法:1interfaceA{2publicstaticfinalinti10;34publicvoidrunLoad();5}67publicclassDemoimplementsA{8/
Stella981 Stella981
4年前
MyBaitis 源码浅读
Mybatis架构!(https://oscimg.oschina.net/oscnet/8a1557a89e7661d34b40e72708d98274455.jpg)!(https://oscimg.oschina.net/oscnet/3892d1dbc40146b9261a738dee49082b67d.jpg)怎么看源码下
Stella981 Stella981
4年前
OS X Mavericks 10.9.5 (13F34) bt下载地址
OSXMavericks10.9.5(13F34),源http://bitsnoop.com/osxmavericks109513f34dmgq68447977.html磁力链magnet:?xturn:btih:4c887e73cd37228d8dc0746315501edc289acc51&dnOS%20X%2