您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python elasticsearch DSL query, filter and aggregation operations

編輯：Python

Elasticsearch Basic concepts

Index：Elasticsearch The logical area used to store data , It's similar to... In a relational database database Concept . One index Can be in one or more shard above , At the same time shard There may be more than one replicas.

Document：Elasticsearch The entity data stored in it , Similar to one of the relational data table A line of data inside .

document By multiple field form , Different document It has the same name field Must have the same type .document Inside field Can be repeated , That's one field There will be multiple values , namely multivalued.

Document type： In order to query , One index There may be many document, That is to say document type. It's similar to... In a relational database table Concept . But we need to pay attention , Different document It has the same name field It must be the same type .

Mapping： It's similar to... In a relational database schema Define concepts . Storage field Related mapping information of , Different document type It will be different mapping.

The picture below is ElasticSearch Compared with some terms of relational database ：

Relationnal database

Elasticsearch

Database

Index

Table

Type

Row

Document

Column

Field

Schema

Mapping

Schema

Mapping

Index

Everything is indexed

SQL

Query DSL

SELECT * FROM table…

GET http://…

UPDATE table SET

PUT http://…

Python Elasticsearch DSL Brief introduction

Connect Es：

import elasticsearch
es = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])
Copy code

First look at the search ,q Search content , Space pair q Query results have no impact ,size Specified number ,from_ Specify starting position ,filter_path You can specify the data you want to display , As shown in this example, only _id and _type.

res_3 = es.search(index="bank", q="Holmes", size=1, from_=1)
res_4 = es.search(index="bank", q=" 39225 5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type'])
Copy code

Query all data of the specified index ：

among ,index Specify the index , A string represents an index ; A list represents multiple indexes , Such as index=["bank", "banner", "country"]; The regular form represents multiple indexes that meet the conditions , Such as index=["apple*"], Said to apple All the indexes at the beginning .

search You can also specify specific doc-type.

from elasticsearch_dsl import Search
s = Search(using=es, index="index-test").execute()
print s.to_dict()
Copy code

Query... According to a certain field , Multiple query conditions can be overlapped ：

s = Search(using=es, index="index-test").query("match", sip="192.168.1.1")
s = s.query("match", dip="192.168.1.2")
s = s.excute()
Copy code

Multi field query ：

from elasticsearch_dsl.query import MultiMatch, Match
multi_match = MultiMatch(query='hello', fields=['title', 'content'])
s = Search(using=es, index="index-test").query(multi_match)
s = s.execute()
print s.to_dict()
Copy code

You can also use Q() Object to query multiple fields ,fields It's a list ,query For the value to be queried .

from elasticsearch_dsl import Q
q = Q("multi_match", query="hello", fields=['title', 'content'])
s = s.query(q).execute()
print s.to_dict()
Copy code

Q() The first parameter is the query method , It can also be bool.

q = Q('bool', must=[Q('match', title='hello'), Q('match', content='world')])
s = s.query(q).execute()
print s.to_dict()
Copy code

adopt Q() Make a combination query , Equivalent to the other way of writing the above query .

q = Q("match", title='python') | Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"should": [...]}}
q = Q("match", title='python') & Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must": [...]}}
q = ~Q("match", title="python")
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must_not": [...]}}
Copy code

Filter , Here is the range filter ,range Is the method ,timestamp It's what we're looking for field name ,gte Is greater than or equal to ,lt Is less than , Set it up as needed .

About term and match The difference between ,term It's an exact match ,match Will blur , Can do word segmentation , Return match score ,（term If you look up a string of lowercase letters , If there is uppercase, it will return null, i.e. no hit ,match It can be queried regardless of case , The return result is the same ）

# Range queries
s = s.filter("range", timestamp={"gte": 0, "lt": time.time()}).query("match", country="in")
# General filtration
res_3 = s.filter("terms", balance_num=["39225", "5686"]).execute()
Copy code

Other writing ：

s = Search()
s = s.filter('terms', tags=['search', 'python'])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}
s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}
s = s.exclude('terms', tags=['search', 'python'])
# perhaps
s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'bool': {'must_not': [{'terms': {'tags': ['search', 'python']}}]}}]}}}
Copy code

Aggregations can be placed in queries , Filtering and other operations are overlapped , Need to add aggs.

bucket It's grouping , The first parameter is the name of the group , Just make your own appointment , The second parameter is the method , The third is designated field.

metric Also the same ,metric The way to do this is sum、avg、max、min etc. , But it should be noted that , There are two ways to return these values at once ,stats and extended_stats, The latter can also return variance equivalence .

# example 1
s.aggs.bucket("per_country", "terms", field="timestamp").metric("sum_click", "stats", field="click").metric("sum_request", "stats", field="request")
# example 2
s.aggs.bucket("per_age", "terms", field="click.keyword").metric("sum_click", "stats", field="click")
# example 3
s.aggs.metric("sum_age", "extended_stats", field="impression")
# example 4
s.aggs.bucket("per_age", "terms", field="country.keyword")
# example 5, This aggregation is based on the interval
a = A("range", field="account_number", ranges=[{"to": 10}, {"from": 11, "to": 21}])
res = s.execute()
Copy code

Finally, we still need to implement execute(), Notice here ,s.aggs Operation cannot receive... With variable （ Such as res=s.aggs, This operation is wrong ）, The results of the aggregation will be saved to res It shows that .

Sort

s = Search().sort(
'category',
'-title',
{"lines" : {"order" : "asc", "mode" : "avg"}}
)
Copy code

Pagination

s = s[10:20]
# {"from": 10, "size": 10}
Copy code

Some extension methods , Those of you who are interested can take a look ：

s = Search()
# Set extended properties to use `.extra()` Method
s = s.extra(explain=True)
# Set parameters using `.params()`
s = s.params(search_type="count")
# To restrict the return fields , have access to `source()` Method
# only return the selected fields
s = s.source(['title', 'body'])
# don't return any fields, just the metadata
s = s.source(False)
# explicitly include/exclude fields
s = s.source(include=["title"], exclude=["user.*"])
# reset the field selection
s = s.source(None)
# Use dict Serialize a query
s = Search.from_dict({"query": {"match": {"title": "python"}}})
# Modify existing queries
s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42})
Copy code