1. 删除单个文档
ElasticSearch 提供了多种方式来删除单个文档,最常见的包括 Delete API 和 Delete By Query API。
1.1 使用 Delete API
当你知道文档的 _index
和 _id
时,Delete API 是最直接的方式:
$ curl -X DELETE "localhost:9200/customers/_doc/1"
{"_index":"customers","_id":"1","_version":3,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":20,"_primary_term":1}
上述命令删除了 customers
索引中 ID 为 1
的文档。如果文档存在,ElasticSearch 将返回操作成功的 JSON 响应。
✅ 优点:操作简单,适合删除少量已知 ID 的文档。
❌ 缺点:不适合批量删除或条件删除。
1.2 使用 Delete By Query API
当你需要根据某些条件删除多个文档时,Delete By Query API 更加高效:
$ curl -X POST "localhost:9200/customers/_delete_by_query" -H 'Content-Type: application/json' -d'
{
"query": {
"range": {
"last_purchase_date": {
"lt": "now-1y"
}
}
}
}'
{"took":258,"timed_out":false,"total":4,"deleted":4,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}
该命令删除了 customers
索引中所有 last_purchase_date
超过一年的文档。
⚠️ 注意事项:
- 操作不是原子的,失败可能导致部分删除
- 对大数据集可能较耗资源,建议在低峰期执行
✅ 建议:可添加 size
参数限制单次删除数量,以降低集群压力。
2. 批量删除操作
当需要删除大量文档时,使用 Bulk API 可以显著提升效率。
2.1 使用 Python 客户端执行批量删除
以下是一个使用 Python 和 elasticsearch-py
客户端的示例:
from elasticsearch import Elasticsearch, helpers
es = Elasticsearch(["http://localhost:9200"])
def generate_actions(inactive_customer_ids):
for customer_id in inactive_customer_ids:
yield {
"_op_type": "delete",
"_index": "customers",
"_id": customer_id
}
inactive_customer_ids = ["3", "5", "8"]
response = helpers.bulk(es, generate_actions(inactive_customer_ids))
print(f"Deleted {response[0]} documents")
运行结果:
$ python3 bulk-removal.py
Deleted 3 documents
✅ 优势:
- 减少网络往返次数
- 提升删除效率,适合大规模数据操作
3. 索引级别删除
当需要删除整个索引或大量数据时,索引级别的操作更为高效。
3.1 删除整个索引
删除整个索引是最快捷的方式:
$ curl -X DELETE "localhost:9200/customers"
{"acknowledged":true}
⚠️ 警告:该操作不可逆,删除后数据无法恢复。
✅ 适用场景:删除日志索引、测试数据等。
3.2 使用别名实现零停机时间的索引重建
如果你需要删除部分数据但又不希望影响服务可用性,可以使用别名 + 重建索引的方式:
- 创建别名
$ curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
"actions": [
{ "add": { "index": "customers", "alias": "current_customers" }}
]
}'
{"acknowledged":true,"errors":false}
- 创建新索引
$ curl -X PUT "localhost:9200/customers_v2" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"customers_v2"}
- 重新索引并排除 inactive 用户
$ curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "customers",
"query": {
"bool": {
"must_not": {
"term": { "status": "inactive" }
}
}
}
},
"dest": {
"index": "customers_v2"
}
}'
{"took":251,"timed_out":false,"total":7,"updated":0,"created":7,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}
- 切换别名指向新索引
$ curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
"actions": [
{ "remove": { "index": "customers", "alias": "current_customers" }},
{ "add": { "index": "customers_v2", "alias": "current_customers" }}
]
}'
{"acknowledged":true,"errors":false}
✅ 优势:
- 零停机时间
- 支持对数据做清洗、转换、重新映射等操作
4. 总结
本文介绍了多种从 ElasticSearch 中删除数据的方法:
- Delete API:适用于删除单个已知文档
- Delete By Query API:按条件批量删除,但需注意操作的非原子性和资源消耗
- Bulk API:适用于高效批量删除大量文档
- 索引级操作:包括删除整个索引和使用别名进行零停机重建索引
通过合理使用这些方法,可以有效管理 ElasticSearch 中的数据,确保集群性能和存储资源的高效利用。