solr optimize - yaokun123/php-wiki GitHub Wiki
大家都知道,solr在提交索引的时候有commit和optimize的概念,今天来分析一下:
当你像solr提交索引更新时,只有运行了commit,索引才会发生变化。当然也并不意味着你每次提交都要commit 如果不是那么紧急,你可以多次提交之后,再执行commit操作。
The <commit> operation writes all documents loaded since the last commit to one or more segment files on
the disk. Before a commit has been issued, newly indexed content is not visible to searches. The commit
operation opens a new searcher, and triggers any event listeners that have been configured.
---------------------
commit操作将所有需要更新的文档全部写入索引中,但是新进入的索引不会立即生效。
optimize有点像硬盘上整理磁盘碎片的操作。为了提高搜索速度,它会将索引重组在一起, 然后移除需要被删除删除或是更新的文档,请注意,solr是没有update的这种操作的,只有增加与删除。 solr在优化时,将需要删除或是被替换的索引标记为deleted,然后再创建新的文档替换掉需要被替换的。 optimize就是执行此操作。所以在优化的时候,你的索引会增大,然后再减小。 optimize操作会创建一个全新的的索引结构,所以,你需要预备出2倍于你commit时索引大小的空间。
The <optimize> operation requests Solr to merge internal data structures in order to improve search
performance. For a large index, optimization will take some time to complete, but by merging many small
segment files into a larger one, search performance will improve. If you are using Solr’s replication mechanism
to distribute searches across many systems, be aware that after an optimize, a complete index will need to be
transferred. In contrast, post-commit transfers are usually much smaller.
---------------------
optimize操作是合并内部的数据结构来提供搜索性能。对于大型的索引,optimize耗时较多,
但是通过合并一些索引结构,到一个大的,那么索引性能会得到提高,需要注意的是一个完整的索引需要传送,
对比来说,以post方式进行的提交会更小。
参考:http://xiaofeng.iteye.com/blog/1299148
此外,他解释一些运行参数:
Optional Attribute | Description |
---|---|
waitSearch | Default is true. Blocks until a new searcher is opened and registered as the main query searcher, making the changes visible |
expungeDeletes | (commit only)Default is false. Merges segments that have more than 10% deleted docs,expunging them in the process. |
maxSegments | (optimize only)Default is 1. Merges the segments down to no more than this number of segments |