Class IndexWriterConfig
IndexWriter. Once IndexWriter has been created with this object, changes to this object will not affect the IndexWriter instance. For that, use LiveIndexWriterConfig that is returned from IndexWriter.getConfig().
All setter methods return IndexWriterConfig to allow chaining settings conveniently,
for example:
IndexWriterConfig conf = new IndexWriterConfig(analyzer); conf.setter1().setter2();
- Since:
- 3.1
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final booleanDefault value for whether calls toIndexWriter.close()include a commit.static final intDisabled by default (because IndexWriter flushes by RAM usage by default).static final intDisabled by default (because IndexWriter flushes by RAM usage by default).static final longDefault value for time to wait for merges on commit or getReader (when using aMergePolicythat implementsMergePolicy.findFullFlushMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)).static final doubleDefault value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).static final intDefault value is 1945.static final booleanDefault setting (true) forsetReaderPooling(boolean).static final booleanDefault value for compound file system for newly written segments (set totrue).static final intDenotes a flush trigger is disabled.Fields inherited from class org.apache.lucene.index.LiveIndexWriterConfig
checkPendingFlushOnUpdate, codec, commit, commitOnClose, createdVersionMajor, delPolicy, eventListener, flushPolicy, indexSort, indexSortFields, infoStream, leafSorter, maxFullFlushMergeWaitMillis, mergePolicy, mergeScheduler, openMode, parentField, perThreadHardLimitMB, readerPooling, similarity, softDeletesField, useCompoundFile -
Constructor Summary
ConstructorsConstructorDescriptionCreates a new config, usingStandardAnalyzeras the analyzer.IndexWriterConfig(Analyzer analyzer) Creates a new config that with the providedAnalyzer. -
Method Summary
Modifier and TypeMethodDescriptionReturns the default analyzer to use for indexing documents.getCodec()Returns the currentCodec.Returns theIndexCommitas specified insetIndexCommit(IndexCommit)or the default,nullwhich specifies to open the latest index commit point.Returns theIndexDeletionPolicyspecified insetIndexDeletionPolicy(IndexDeletionPolicy)or the defaultKeepOnlyLastCommitDeletionPolicy/ReturnsInfoStreamused for debugging.intReturns the number of buffered added documents that will trigger a flush if enabled.Returns the current merged segment warmer.Returns the current MergePolicy in use by this writer.Returns theMergeSchedulerthat was set bysetMergeScheduler(MergeScheduler).Returns theIndexWriterConfig.OpenModeset bysetOpenMode(OpenMode).doubleReturns the value set byLiveIndexWriterConfig.setRAMBufferSizeMB(double)if enabled.intReturns the max amount of memory eachDocumentsWriterPerThreadcan consume until forcefully flushed.booleanReturnstrueifIndexWritershould pool readers even ifDirectoryReader.open(IndexWriter)has not been called.Expert: returns theSimilarityimplementation used by thisIndexWriter.setCheckPendingFlushUpdate(boolean checkPendingFlushOnUpdate) Expert: sets if indexing threads check for pending flushes on update in order to help our flushing indexing buffers to disk.Set theCodec.setCommitOnClose(boolean commitOnClose) Sets if callsIndexWriter.close()should first commit before closing.setIndexCommit(IndexCommit commit) Expert: allows to open a certain commit point.setIndexCreatedVersionMajor(int indexCreatedVersionMajor) Expert: set the compatibility version to use for this index.setIndexDeletionPolicy(IndexDeletionPolicy delPolicy) Expert: allows an optionalIndexDeletionPolicyimplementation to be specified.setIndexSort(Sort sort) Set theSortorder to use for all (flushed and merged) segments.setIndexWriterEventListener(IndexWriterEventListener eventListener) Set event listener to record key events in IndexWritersetInfoStream(PrintStream printStream) Convenience method that usesPrintStreamInfoStream.setInfoStream(InfoStream infoStream) Information about merges, deletes and a message when maxFieldLength is reached will be printed to this.setLeafSorter(Comparator<LeafReader> leafSorter) Set the comparator for sorting leaf readers.setMaxBufferedDocs(int maxBufferedDocs) Determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment.setMaxFullFlushMergeWaitMillis(long maxFullFlushMergeWaitMillis) Expert: sets the amount of time to wait for merges (duringIndexWriter.commit()orIndexWriter.getReader(boolean, boolean)) returned by MergePolicy.findFullFlushMerges(...).setMergedSegmentWarmer(IndexWriter.IndexReaderWarmer mergeSegmentWarmer) Set the merged segment warmer.setMergePolicy(MergePolicy mergePolicy) Expert:MergePolicyis invoked whenever there are changes to the segments in the index.setMergeScheduler(MergeScheduler mergeScheduler) Expert: sets the merge scheduler used by this writer.setOpenMode(IndexWriterConfig.OpenMode openMode) SpecifiesIndexWriterConfig.OpenModeof the index.setParentField(String parentField) Sets the parent document field.setRAMBufferSizeMB(double ramBufferSizeMB) Determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory.setRAMPerThreadHardLimitMB(int perThreadHardLimitMB) Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded.setReaderPooling(boolean readerPooling) By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by callingDirectoryReader.open(IndexWriter).setSimilarity(Similarity similarity) Expert: set theSimilarityimplementation used by this IndexWriter.setSoftDeletesField(String softDeletesField) Sets the soft deletes field.setUseCompoundFile(boolean useCompoundFile) Sets if theIndexWritershould pack newly written segments in a compound file.toString()Methods inherited from class org.apache.lucene.index.LiveIndexWriterConfig
getCommitOnClose, getIndexCreatedVersionMajor, getIndexSort, getIndexSortFields, getIndexWriterEventListener, getLeafSorter, getMaxFullFlushMergeWaitMillis, getParentField, getSoftDeletesField, getUseCompoundFile, isCheckPendingFlushOnUpdate
-
Field Details
-
DISABLE_AUTO_FLUSH
public static final int DISABLE_AUTO_FLUSHDenotes a flush trigger is disabled.- See Also:
-
DEFAULT_MAX_BUFFERED_DELETE_TERMS
public static final int DEFAULT_MAX_BUFFERED_DELETE_TERMSDisabled by default (because IndexWriter flushes by RAM usage by default).- See Also:
-
DEFAULT_MAX_BUFFERED_DOCS
public static final int DEFAULT_MAX_BUFFERED_DOCSDisabled by default (because IndexWriter flushes by RAM usage by default).- See Also:
-
DEFAULT_RAM_BUFFER_SIZE_MB
public static final double DEFAULT_RAM_BUFFER_SIZE_MBDefault value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).- See Also:
-
DEFAULT_READER_POOLING
public static final boolean DEFAULT_READER_POOLINGDefault setting (true) forsetReaderPooling(boolean).- See Also:
-
DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MB
public static final int DEFAULT_RAM_PER_THREAD_HARD_LIMIT_MBDefault value is 1945. Change usingsetRAMPerThreadHardLimitMB(int)- See Also:
-
DEFAULT_USE_COMPOUND_FILE_SYSTEM
public static final boolean DEFAULT_USE_COMPOUND_FILE_SYSTEMDefault value for compound file system for newly written segments (set totrue). For batch indexing with very large ram buffers usefalse- See Also:
-
DEFAULT_COMMIT_ON_CLOSE
public static final boolean DEFAULT_COMMIT_ON_CLOSEDefault value for whether calls toIndexWriter.close()include a commit.- See Also:
-
DEFAULT_MAX_FULL_FLUSH_MERGE_WAIT_MILLIS
public static final long DEFAULT_MAX_FULL_FLUSH_MERGE_WAIT_MILLISDefault value for time to wait for merges on commit or getReader (when using aMergePolicythat implementsMergePolicy.findFullFlushMerges(org.apache.lucene.index.MergeTrigger, org.apache.lucene.index.SegmentInfos, org.apache.lucene.index.MergePolicy.MergeContext)).- See Also:
-
-
Constructor Details
-
IndexWriterConfig
public IndexWriterConfig()Creates a new config, usingStandardAnalyzeras the analyzer. By default,TieredMergePolicyis used for merging; Note thatTieredMergePolicyis free to select non-contiguous merges, which means docIDs may not remain monotonic over time. If this is a problem you should switch toLogByteSizeMergePolicyorLogDocMergePolicy. -
IndexWriterConfig
Creates a new config that with the providedAnalyzer. By default,TieredMergePolicyis used for merging; Note thatTieredMergePolicyis free to select non-contiguous merges, which means docIDs may not remain monotonic over time. If this is a problem you should switch toLogByteSizeMergePolicyorLogDocMergePolicy.
-
-
Method Details
-
setOpenMode
SpecifiesIndexWriterConfig.OpenModeof the index.Only takes effect when IndexWriter is first created.
-
getOpenMode
Description copied from class:LiveIndexWriterConfigReturns theIndexWriterConfig.OpenModeset bysetOpenMode(OpenMode).- Overrides:
getOpenModein classLiveIndexWriterConfig
-
setIndexCreatedVersionMajor
Expert: set the compatibility version to use for this index. In case the index is created, it will use the given major version for compatibility. It is sometimes useful to set the previous major version for compatibility due to the fact thatIndexWriter.addIndexes(org.apache.lucene.store.Directory...)only accepts indices that have been written with the same major version as the current index. If the index already exists, then this value is ignored. Default value is themajorof thelatest version.NOTE: Changing the creation version reduces backward compatibility guarantees. For instance an index created with Lucene 8 with a compatibility version of 7 can't be read with Lucene 9 due to the fact that Lucene only supports reading indices created with the current or previous major release.
- Parameters:
indexCreatedVersionMajor- the major version to use for compatibility
-
setIndexDeletionPolicy
Expert: allows an optionalIndexDeletionPolicyimplementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy isKeepOnlyLastCommitDeletionPolicywhich removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on.NOTE: the deletion policy must not be null.
Only takes effect when IndexWriter is first created.
-
getIndexDeletionPolicy
Description copied from class:LiveIndexWriterConfigReturns theIndexDeletionPolicyspecified insetIndexDeletionPolicy(IndexDeletionPolicy)or the defaultKeepOnlyLastCommitDeletionPolicy/- Overrides:
getIndexDeletionPolicyin classLiveIndexWriterConfig
-
setIndexCommit
Expert: allows to open a certain commit point. The default is null which opens the latest commit point. This can also be used to openIndexWriterfrom a near-real-time reader, if you pass the reader'sDirectoryReader.getIndexCommit().Only takes effect when IndexWriter is first created.
-
getIndexCommit
Description copied from class:LiveIndexWriterConfigReturns theIndexCommitas specified insetIndexCommit(IndexCommit)or the default,nullwhich specifies to open the latest index commit point.- Overrides:
getIndexCommitin classLiveIndexWriterConfig
-
setSimilarity
Expert: set theSimilarityimplementation used by this IndexWriter.NOTE: the similarity must not be null.
Only takes effect when IndexWriter is first created.
-
getSimilarity
Description copied from class:LiveIndexWriterConfigExpert: returns theSimilarityimplementation used by thisIndexWriter.- Overrides:
getSimilarityin classLiveIndexWriterConfig
-
setMergeScheduler
Expert: sets the merge scheduler used by this writer. The default isConcurrentMergeScheduler.NOTE: the merge scheduler must not be null.
Only takes effect when IndexWriter is first created.
-
getMergeScheduler
Description copied from class:LiveIndexWriterConfigReturns theMergeSchedulerthat was set bysetMergeScheduler(MergeScheduler).- Overrides:
getMergeSchedulerin classLiveIndexWriterConfig
-
setCodec
Set theCodec.Only takes effect when IndexWriter is first created.
-
getCodec
Description copied from class:LiveIndexWriterConfigReturns the currentCodec.- Overrides:
getCodecin classLiveIndexWriterConfig
-
getMergePolicy
Description copied from class:LiveIndexWriterConfigReturns the current MergePolicy in use by this writer.- Overrides:
getMergePolicyin classLiveIndexWriterConfig- See Also:
-
setReaderPooling
By default, IndexWriter does not pool the SegmentReaders it must open for deletions and merging, unless a near-real-time reader has been obtained by callingDirectoryReader.open(IndexWriter). This method lets you enable pooling without getting a near-real-time reader. NOTE: if you set this to false, IndexWriter will still pool readers onceDirectoryReader.open(IndexWriter)is called.Only takes effect when IndexWriter is first created.
-
getReaderPooling
public boolean getReaderPooling()Description copied from class:LiveIndexWriterConfigReturnstrueifIndexWritershould pool readers even ifDirectoryReader.open(IndexWriter)has not been called.- Overrides:
getReaderPoolingin classLiveIndexWriterConfig
-
setRAMPerThreadHardLimitMB
Expert: Sets the maximum memory consumption per thread triggering a forced flush if exceeded. ADocumentsWriterPerThreadis forcefully flushed once it exceeds this limit even if thegetRAMBufferSizeMB()has not been exceeded. This is a safety limit to prevent aDocumentsWriterPerThreadfrom address space exhaustion due to its internal 32 bit signed integer based memory addressing. The given value must be less that 2GB (2048MB)- See Also:
-
getRAMPerThreadHardLimitMB
public int getRAMPerThreadHardLimitMB()Description copied from class:LiveIndexWriterConfigReturns the max amount of memory eachDocumentsWriterPerThreadcan consume until forcefully flushed.- Overrides:
getRAMPerThreadHardLimitMBin classLiveIndexWriterConfig- See Also:
-
getInfoStream
Description copied from class:LiveIndexWriterConfigReturnsInfoStreamused for debugging.- Overrides:
getInfoStreamin classLiveIndexWriterConfig- See Also:
-
getAnalyzer
Description copied from class:LiveIndexWriterConfigReturns the default analyzer to use for indexing documents.- Overrides:
getAnalyzerin classLiveIndexWriterConfig
-
getMaxBufferedDocs
public int getMaxBufferedDocs()Description copied from class:LiveIndexWriterConfigReturns the number of buffered added documents that will trigger a flush if enabled.- Overrides:
getMaxBufferedDocsin classLiveIndexWriterConfig- See Also:
-
getMergedSegmentWarmer
Description copied from class:LiveIndexWriterConfigReturns the current merged segment warmer. SeeIndexWriter.IndexReaderWarmer.- Overrides:
getMergedSegmentWarmerin classLiveIndexWriterConfig
-
getRAMBufferSizeMB
public double getRAMBufferSizeMB()Description copied from class:LiveIndexWriterConfigReturns the value set byLiveIndexWriterConfig.setRAMBufferSizeMB(double)if enabled.- Overrides:
getRAMBufferSizeMBin classLiveIndexWriterConfig
-
setInfoStream
Information about merges, deletes and a message when maxFieldLength is reached will be printed to this. Must not be null, butInfoStream.NO_OUTPUTmay be used to suppress output. -
setInfoStream
Convenience method that usesPrintStreamInfoStream. Must not be null. -
setMergePolicy
Description copied from class:LiveIndexWriterConfigExpert:MergePolicyis invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return aMergePolicy.MergeSpecificationdescribing the merges. It also selects merges to do for forceMerge.Takes effect on subsequent merge selections. Any merges in flight or any merges already registered by the previous
MergePolicyare not affected.- Overrides:
setMergePolicyin classLiveIndexWriterConfig
-
setMaxBufferedDocs
Description copied from class:LiveIndexWriterConfigDetermines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing.When this is set, the writer will flush every maxBufferedDocs added documents. Pass in
DISABLE_AUTO_FLUSHto prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first.Disabled by default (writer flushes by RAM usage).
Takes effect immediately, but only the next time a document is added, updated or deleted.
- Overrides:
setMaxBufferedDocsin classLiveIndexWriterConfig- See Also:
-
setMergedSegmentWarmer
Description copied from class:LiveIndexWriterConfigSet the merged segment warmer. SeeIndexWriter.IndexReaderWarmer.Takes effect on the next merge.
- Overrides:
setMergedSegmentWarmerin classLiveIndexWriterConfig
-
setRAMBufferSizeMB
Description copied from class:LiveIndexWriterConfigDetermines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the Directory. Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can.When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in
DISABLE_AUTO_FLUSHto prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first.The maximum RAM limit is inherently determined by the JVMs available memory. Yet, an
IndexWritersession can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the Directory. Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the JVM should be significantly larger than the RAM buffer used for indexing.NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by Query, Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling commit() or refresh() periodically yourself.
NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured
FlushPolicyonly a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released.The default value is
DEFAULT_RAM_BUFFER_SIZE_MB.Takes effect immediately, but only the next time a document is added, updated or deleted.
- Overrides:
setRAMBufferSizeMBin classLiveIndexWriterConfig- See Also:
-
setUseCompoundFile
Description copied from class:LiveIndexWriterConfigSets if theIndexWritershould pack newly written segments in a compound file. Default istrue.Use
falsefor batch indexing with very large ram buffer settings.Note: To control compound file usage during segment merges see
MergePolicy.setNoCFSRatio(double)andMergePolicy.setMaxCFSSegmentSizeMB(double). This setting only applies to newly created segments.- Overrides:
setUseCompoundFilein classLiveIndexWriterConfig
-
setCommitOnClose
Sets if callsIndexWriter.close()should first commit before closing. Usetrueto match behavior of Lucene 4.x. -
setMaxFullFlushMergeWaitMillis
Expert: sets the amount of time to wait for merges (duringIndexWriter.commit()orIndexWriter.getReader(boolean, boolean)) returned by MergePolicy.findFullFlushMerges(...). If this time is reached, we proceed with the commit based on segments merged up to that point. The merges are not aborted, and will still run to completion independent of the commit or getReader call, like natural segment merges. The default is500L.Note: Which segments would get merged depends on the implementation of
MergePolicy.findFullFlushMerges(MergeTrigger, SegmentInfos, MergePolicy.MergeContext)Note: Set to 0 to disable merging on full flush.
Note: If
SerialMergeScheduleris used and a non-zero timout is configured, full-flush merges will always wait for the merge to finish without honoring the configured timeout. -
setIndexSort
Set theSortorder to use for all (flushed and merged) segments. -
setLeafSorter
Set the comparator for sorting leaf readers. A DirectoryReader opened from a IndexWriter with this configuration will have its leaf readers sorted with the provided leaf sorter.- Parameters:
leafSorter- – a comparator for sorting leaf readers- Returns:
- IndexWriterConfig with leafSorter set.
-
toString
- Overrides:
toStringin classLiveIndexWriterConfig
-
setCheckPendingFlushUpdate
Description copied from class:LiveIndexWriterConfigExpert: sets if indexing threads check for pending flushes on update in order to help our flushing indexing buffers to disk. As a consequence, threads callingDirectoryReader.openIfChanged(DirectoryReader, IndexWriter)orIndexWriter.flush()will be the only thread writing segments to disk unless flushes are falling behind. If indexing is stalled due to too many pending flushes indexing threads will help our writing pending segment flushes to disk.- Overrides:
setCheckPendingFlushUpdatein classLiveIndexWriterConfig
-
setSoftDeletesField
Sets the soft deletes field. A soft delete field in lucene is a doc-values field that marks a document as soft-deleted if a document has at least one value in that field. If a document is marked as soft-deleted the document is treated as if it has been hard-deleted through the IndexWriter API (IndexWriter.deleteDocuments(Term...). Merges will reclaim soft-deleted as well as hard-deleted documents and index readers obtained from the IndexWriter will reflect all deleted documents in it's live docs. If soft-deletes are used documents must be indexed viaIndexWriter.softUpdateDocument(Term, Iterable, Field...). Deletes are applied viaIndexWriter.updateDocValues(Term, Field...).Soft deletes allow to retain documents across merges if the merge policy modifies the live docs of a merge reader.
SoftDeletesRetentionMergePolicyfor instance allows to specify an arbitrary query to mark all documents that should survive the merge. This can be used to for example keep all document modifications for a certain time interval or the last N operations if some kind of sequence ID is available in the index.Currently there is no API support to un-delete a soft-deleted document. In oder to un-delete the document must be re-indexed using
IndexWriter.softUpdateDocument(Term, Iterable, Field...).The default value for this is
nullwhich disables soft-deletes. If soft-deletes are enabled documents can still be hard-deleted. Hard-deleted documents will won't considered as soft-deleted even if they have a value in the soft-deletes field.- See Also:
-
setIndexWriterEventListener
Set event listener to record key events in IndexWriter -
setParentField
Sets the parent document field. If this optional property is set, IndexWriter will add an internal field to every root document added to the index writer. A document is considered a parent document if it's the last document in a document block indexed viaIndexWriter.addDocuments(Iterable)orIndexWriter.updateDocuments(Term, Iterable)and its relatives. Additionally, all individual documents added via the single document methods (IndexWriter.addDocuments(Iterable)etc.) are also considered parent documents. This property is optional for all indices that don't use document blocks in combination with index sorting. In order to maintain the API guarantee that the document order of a block is not altered by theIndexWritera marker for parent documents is required.
-