Solr TTL – Auto-Purging Solr Documents
In this blog, We will learn about Auto-Purging and the importance of TTL (Time-To-Live), and how to remove documents automatically from a collection.
Why We need to remove the document:
– Data stored in the collections will keep accumulating each time, which can cause Storage to run out of space
– TTL will help to solve this problem, By removing the documents older than the TTL value
How it works
A document that is getting indexed in the Solr collection, Will have an event created and an expiration date added to it.
TTL can be set by the user, based on the company policy or based on the Storage space, Let’s say, we wanted to remove a document older than 7 days, Then we will set the TTL value to 7
While indexing a new document in Solr, expire date will be added to the doc based on the TTL value
Event time – When the document getting indexed in Solr
Expire date – Event date + TTL
Once, Your Document reaches the expiration date, This will be cleanup by the Auto-Purging mechanism in Solr
This can be validated by the below command
curl -k --negotiate -u : "https://hostname.url:8995/solr/<Collection_name>/select?q=*%3A*&wt=json&ident=true&rows=1&sort=evtTime+asc"
Output:
{ "responseHeader":{ "zkConnected":true, "status":0, "QTime":30414, "params":{ "q":"*:*", "doAs":"solr/hostname.url@REALM", "ident":"true", "sort":"evtTime asc", "rows":"1", "_forwardedCount":"1", "wt":"json"}}, "response":{"numFound":1315516995,"start":0,"docs":[ { "id":"8f1bed1e-ab2d-42f5-8552-525b5e27fadf-0", "access":"open", "enforcer":"ranger-acl", "agent":"hdfs", "repo":"cm_hdfs", "evtTime":"2022-09-28T14:40:45.465Z", "seq_num":1, "event_count":1, "event_dur_ms":1, "_ttl_":"+7DAYS", "_expire_at_":"2022-10-05T15:14:18.452Z", "_version_":128719283791284}] }}
In the above example, We can confirm
TTL - 7 days Event creation date - 2022-09-28 Expire date - 2022-10-05
So the document would be deleted automatically once it reaches the due date 🙂
Check here to know about validating the TTL and the latest document and its expiration date
To Update Solr collection TTL, we can use the below steps
solrctl instancedir --get collection /tmp/collection1
Open the “solrconfig.xml” file
vi /tmp/collection1/conf/solrconfig.xml
Update the TTL value to the desired value
<str name="fieldName">_ttl_</str><str name="value">+7DAYS</str>
Upload the new config to Solr
solrctl --jaas <Solr process>/jaas.conf instancedir --update collection1 /tmp/collection1
Once we have updated the config, We need to reload the collection to take effect
solrctl collection --reload collection1
Good Luck with your Learning, Use comment for any questions