Index:Elasticsearch The logical area used to store data , It's similar to... In a relational database database Concept . One index Can be in one or more shard above , At the same time shard There may be more than one replicas.
Document:Elasticsearch The entity data stored in it , Similar to one of the relational data table A line of data inside .
document By multiple field form , Different document It has the same name field Must have the same type .document Inside field Can be repeated , That's one field There will be multiple values , namely multivalued.
Document type: In order to query , One index There may be many document, That is to say document type. It's similar to... In a relational database table Concept . But we need to pay attention , Different document It has the same name field It must be the same type .
Mapping: It's similar to... In a relational database schema Define concepts . Storage field Related mapping information of , Different document type It will be different mapping.
The picture below is ElasticSearch Compared with some terms of relational database :
Relationnal database
Elasticsearch
Database
Index
Table
Type
Row
Document
Column
Field
Schema
Mapping
Schema
Mapping
Index
Everything is indexed
SQL
Query DSL
SELECT * FROM table…
GET http://…
UPDATE table SET
PUT http://…
Connect Es:
import elasticsearch
es = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])
Copy code First look at the search ,q Search content , Space pair q Query results have no impact ,size Specified number ,from_ Specify starting position ,filter_path You can specify the data you want to display , As shown in this example, only _id and _type.
res_3 = es.search(index="bank", q="Holmes", size=1, from_=1) res_4 = es.search(index="bank", q=" 39225 5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type']) Copy code
Query all data of the specified index :
among ,index Specify the index , A string represents an index ; A list represents multiple indexes , Such as index=["bank", "banner", "country"]; The regular form represents multiple indexes that meet the conditions , Such as index=["apple*"], Said to apple All the indexes at the beginning .
search You can also specify specific doc-type.
from elasticsearch_dsl import Search s = Search(using=es, index="index-test").execute() print s.to_dict() Copy code
Query... According to a certain field , Multiple query conditions can be overlapped :
s = Search(using=es, index="index-test").query("match", sip="192.168.1.1")
s = s.query("match", dip="192.168.1.2")
s = s.excute()
Copy code Multi field query :
from elasticsearch_dsl.query import MultiMatch, Match multi_match = MultiMatch(query='hello', fields=['title', 'content']) s = Search(using=es, index="index-test").query(multi_match) s = s.execute() print s.to_dict() Copy code
You can also use Q() Object to query multiple fields ,fields It's a list ,query For the value to be queried .
from elasticsearch_dsl import Q
q = Q("multi_match", query="hello", fields=['title', 'content'])
s = s.query(q).execute()
print s.to_dict()
Copy code Q() The first parameter is the query method , It can also be bool.
q = Q('bool', must=[Q('match', title='hello'), Q('match', content='world')])
s = s.query(q).execute()
print s.to_dict()
Copy code adopt Q() Make a combination query , Equivalent to the other way of writing the above query .
q = Q("match", title='python') | Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"should": [...]}}
q = Q("match", title='python') & Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must": [...]}}
q = ~Q("match", title="python")
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must_not": [...]}}
Copy code Filter , Here is the range filter ,range Is the method ,timestamp It's what we're looking for field name ,gte Is greater than or equal to ,lt Is less than , Set it up as needed .
About term and match The difference between ,term It's an exact match ,match Will blur , Can do word segmentation , Return match score ,(term If you look up a string of lowercase letters , If there is uppercase, it will return null, i.e. no hit ,match It can be queried regardless of case , The return result is the same )
# Range queries
s = s.filter("range", timestamp={"gte": 0, "lt": time.time()}).query("match", country="in")
# General filtration
res_3 = s.filter("terms", balance_num=["39225", "5686"]).execute()
Copy code Other writing :
s = Search()
s = s.filter('terms', tags=['search', 'python'])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}
s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}
s = s.exclude('terms', tags=['search', 'python'])
# perhaps
s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'bool': {'must_not': [{'terms': {'tags': ['search', 'python']}}]}}]}}}
Copy code Aggregations can be placed in queries , Filtering and other operations are overlapped , Need to add aggs.
bucket It's grouping , The first parameter is the name of the group , Just make your own appointment , The second parameter is the method , The third is designated field.
metric Also the same ,metric The way to do this is sum、avg、max、min etc. , But it should be noted that , There are two ways to return these values at once ,stats and extended_stats, The latter can also return variance equivalence .
# example 1
s.aggs.bucket("per_country", "terms", field="timestamp").metric("sum_click", "stats", field="click").metric("sum_request", "stats", field="request")
# example 2
s.aggs.bucket("per_age", "terms", field="click.keyword").metric("sum_click", "stats", field="click")
# example 3
s.aggs.metric("sum_age", "extended_stats", field="impression")
# example 4
s.aggs.bucket("per_age", "terms", field="country.keyword")
# example 5, This aggregation is based on the interval
a = A("range", field="account_number", ranges=[{"to": 10}, {"from": 11, "to": 21}])
res = s.execute()
Copy code Finally, we still need to implement execute(), Notice here ,s.aggs Operation cannot receive... With variable ( Such as res=s.aggs, This operation is wrong ), The results of the aggregation will be saved to res It shows that .
Sort
s = Search().sort(
'category',
'-title',
{"lines" : {"order" : "asc", "mode" : "avg"}}
)
Copy code Pagination
s = s[10:20]
# {"from": 10, "size": 10}
Copy code Some extension methods , Those of you who are interested can take a look :
s = Search()
# Set extended properties to use `.extra()` Method
s = s.extra(explain=True)
# Set parameters using `.params()`
s = s.params(search_type="count")
# To restrict the return fields , have access to `source()` Method
# only return the selected fields
s = s.source(['title', 'body'])
# don't return any fields, just the metadata
s = s.source(False)
# explicitly include/exclude fields
s = s.source(include=["title"], exclude=["user.*"])
# reset the field selection
s = s.source(None)
# Use dict Serialize a query
s = Search.from_dict({"query": {"match": {"title": "python"}}})
# Modify existing queries
s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42})
Copy code