Solr TTL – Auto-Purging Solr Documents

In this blog, We will learn about Auto-Purging and the importance of TTL (Time-To-Live), and how to remove documents automatically from a collection.

Why We need to remove the document:

– Data stored in the collections will keep accumulating each time, which can cause Storage to run out of space

– TTL will help to solve this problem, By removing the documents older than the TTL value

How it works

A document that is getting indexed in the Solr collection, Will have an event created and an expiration date added to it.

TTL can be set by the user, based on the company policy or based on the Storage space, Let’s say, we wanted to remove a document older than 7 days, Then we will set the TTL value to 7

While indexing a new document in Solr, expire date will be added to the doc based on the TTL value

Event time – When the document getting indexed in Solr

Expire date – Event date + TTL

Once, Your Document reaches the expiration date, This will be cleanup by the Auto-Purging mechanism in Solr

This can be validated by the below command

curl -k --negotiate -u : "https://hostname.url:8995/solr/<Collection_name>/select?q=*%3A*&wt=json&ident=true&rows=1&sort=evtTime+asc"

Output:

{
  "responseHeader":{
    "zkConnected":true,
    "status":0,
    "QTime":30414,
    "params":{
      "q":"*:*",
      "doAs":"solr/hostname.url@REALM",
      "ident":"true",
      "sort":"evtTime asc",
      "rows":"1",
      "_forwardedCount":"1",
      "wt":"json"}},
  "response":{"numFound":1315516995,"start":0,"docs":[
      {
        "id":"8f1bed1e-ab2d-42f5-8552-525b5e27fadf-0",
        "access":"open",
        "enforcer":"ranger-acl",
        "agent":"hdfs",
        "repo":"cm_hdfs",
        "evtTime":"2022-09-28T14:40:45.465Z",
        "seq_num":1,
        "event_count":1,
        "event_dur_ms":1,
        "_ttl_":"+7DAYS",
        "_expire_at_":"2022-10-05T15:14:18.452Z",
        "_version_":128719283791284}]
  }}

In the above example, We can confirm

TTL - 7 days
Event creation date - 2022-09-28
Expire date - 2022-10-05

So the document would be deleted automatically once it reaches the due date 🙂

Check here to know about validating the TTL and the latest document and its expiration date

To Update Solr collection TTL, we can use the below steps

solrctl instancedir --get collection /tmp/collection1

Open the “solrconfig.xml” file

vi /tmp/collection1/conf/solrconfig.xml

Update the TTL value to the desired value

<str name="fieldName">_ttl_</str><str name="value">+7DAYS</str>

Upload the new config to Solr

solrctl --jaas <Solr process>/jaas.conf instancedir --update collection1 /tmp/collection1

Once we have updated the config, We need to reload the collection to take effect

solrctl collection --reload collection1

Good Luck with your Learning, Use comment for any questions

Similar Posts