solr commit optimize - yaokun123/php-wiki GitHub Wiki

solr的软提交和硬提交

在solr4.0中增加了软提交,加快了index速度,具体如下:

A commit operation makes index changes visible to new search requests. A hard commit also calls fsync on the index files to ensure they have been flushed to stable storage and no data loss will result from a power failure.

A soft commit is much faster since it only makes index changes visible and does not fsync index files or write a new index descriptor. If the JVM crashes or there is a loss of power, changes that occurred after the last hard commit will be lost. Search collections that have near-real-time requirements (that want index changes to be quickly visible to searches) will want to soft commit often but hard commit less frequently.

An optimize is like a hard commit except that it forces all of the index segments to be merged into a single segment first. Depending on the use cases, this operation should be performed infrequently (like nightly), if at all, since it is very expensive and involves reading and re-writing the entire index. Segments are normally merged over time anyway (as determined by the merge policy), and optimize just forces these merges to occur immediately.

Example:

 <commit/>
<optimize/>

Optional attributes for "commit" and "optimize"

  • waitFlush = "true" | "false" — default is true — block until index changes are flushed to disk Solr1.4 At least in Solr 1.4 and later (perhaps earlier as well), this command has no affect. In Solr4.0 it will be removed.

  • waitSearcher = "true" | "false" — default is true — block until a new searcher is opened and registered as the main query searcher, making the changes visible.

  • softCommit = "true" | "false" — default is false — perform a soft commit - this will refresh the 'view' of the index in a more performant manner, but without "on-disk" guarantees. Solr4.0

Optional attributes for "commit"

  • expungeDeletes = "true" | "false" — default is false — merge segments with deletes away. Solr1.4

Optional attributes for "optimize"

  • maxSegments = N — default is '1' — optimizes down to at most this number of segments Solr1.3

Example of "commit" and "optimize" with optional attributes

 <commit waitSearcher="false"/>
<commit waitSearcher="false" expungeDeletes="true"/>
<optimize waitSearcher="false"/>

https://blog.csdn.net/htw2012/article/details/17136781

中文

摘要: Solr的近实时搜索NRT(Near Real Time Searching)意味着文档可以在索引以后马上可以被查询到。

Solr不会因为本次提交而阻塞更新操作,不会等待后台合并操作(merge)的完成而是直接检索索引并返回数据参见原文

利用NRT,就可以设置soft commit,因为标准的commit操作代价高昂,soft commit可以做到近乎实时的查询效果而不丢失数据。

Commits 与 Optimizing

一个commit操作可以使新的查询请求能够感知到索引的变化,一般使用的 hard commit通过事务的方式确保数据是最新的,并且会有同步方法 (fsync)的调用确保数据能持久化。而soft commit效率高是因为没有调用同步方法,这样的话,一旦JVM崩溃,可能会丢失数据。使用NRT可以使 Solr多做soft commit而少一点hard commit。

我们所使用的optimize很像hard commit,不同的是它会强制将所有的索引片段合并为一个。一般我们很少使用它,因为它会重写整个索引。正常情况下,片段合并会根据配置自动进行,调用optimize只是手动加快了这一进程。

对于soft commit,常用下面两个参数:

参数 说明
maxDocs int型,每多少个文档push到索引一次
maxTime long型,每多少毫秒push到索引一次

Auto commit

使用autocommit也可以使用上面两个参数maxDocs和maxTime。

一般,设置autocommit为每1-10分钟一次,设置autosoftcommit为每秒一次。这样的话,新的文档就可以在1秒内被添加到索引,就算出现意外,丢 失的数据也只是上一次hard commit之后添加的数据。

<autoSoftCommit> 
    <maxTime>1000</maxTime> 
</autoSoftCommit>

这是一段commit的配置,从经验角度,配置maxTime参数比maxDocs效果好,尤其是索引量很大的时候。一般还建议对于批处理的索引请求关闭 autoSoftCommit功能。

其他的参数

参数 参考值(默认) 说明
waitSearcher 布尔(true) 新的搜索器打开并注册为主查询搜索器之前,是否阻塞查询
softCommit 布尔(false) 是否执行softCommit
expungeDeletes 布尔(false) 仅针对commit,是否清理掉已经delete的数据
maxSegments 整数(1) 优化为多少个片段segments

下面就是一个配置片段:

<commit waitSearcher="false"/>
<commit waitSearcher="false" expungeDeletes="true"/>
<optimize waitSearcher="false"/>

在URL中使用commit参数

下面的URL使用了commit操作使得测试文档被插入后可以立即生效:

http://localhost:8983/solr/core0/update?stream.body=<add><doc> 
<field name="id">testdoc</field></doc></add>&commit=true

接下来,你可能会用到下面这个URL:

http://localhost:8983/solr/core0/update?stream.body=<optimize/>

还可以添加更多的参数,比如优化为10个片段,不需要等待操作结束:

http://localhost:8983/solr/core0/update?optimize=true&maxSegments=10&waitFlush=false

改变默认的commitWithin行为

参数commitWithin会使文档在一个确定的时间段内commit,因此常常用于NRT检索。但是,对于master/slave 环境,可能会导致新的文档不能复制到slave中(因为只有commit操作才会触发复制机制,softcommit不会使 replicate生效)。如果你需要这样的做,那就只能使用hard commit了,例如:

<commitWithin>
  <softCommit>false</softCommit>
</commitWithin>
⚠️ **GitHub.com Fallback** ⚠️