생계유지형 개발자/Elasticsearch

[Elasticsearch] 데이터 대량 검색 (2) - Scroll vs Search After

이 가을 2020. 7. 9. 17:03

엘라스틱서치로 대량의 데이터 검색하기 위한 방법으로 두 가지가 있다.

하나는 Scroll을 이용한 방법이고 두 번째는 Search After를 이용한 방법이다. 

둘 다 대량의 데이터를 검색하기 위한 방법이지만, 동작 방식이 다르므로 목적과 상황에 따라 달리 사용하기를 권장하고 있다.

Scroll

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

Search After

Pagination of results can be done by using the from and size but the cost becomes prohibitive when the deep pagination is reached. The index.max_result_window which defaults to 10,000 is a safeguard, search requests take heap memory and time proportional to from + size. The Scroll api is recommended for efficient deep scrolling but scroll contexts are costly and it is not recommended to use it for real time user requests. The search_after parameter circumvents this problem by providing a live cursor. The idea is to use the results from the previous page to help the retrieval of the next page.

 

참고

https://www.elastic.co/guide/en/elasticsearch/reference/7.8/search-request-body.html#request-body-search-search-after