Package org.apache.lucene.codecs.bloom
Class BloomFilteringPostingsFormat
java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat
- All Implemented Interfaces:
NamedSPILoader.NamedSPI
A
PostingsFormat useful for low doc-frequency fields such as primary keys. Bloom filters
are maintained in a ".blm" file which offers "fast-fail" for reads in segments known to have no
record of the key. A choice of delegate PostingsFormat is used to record all other Postings data.
A choice of BloomFilterFactory can be passed to tailor Bloom Filter settings on a
per-field basis. The default configuration is DefaultBloomFilterFactory which allocates a
~8mb bitset and hashes values using MurmurHash64. This should be suitable for most
purposes.
The format of the blm file is as follows:
- BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
- Filter --> FieldNumber, FuzzySet
- FuzzySet -->See
FuzzySet.serialize(DataOutput) - Header -->
IndexHeader - DelegatePostingsFormatName -->
StringThe name of a ServiceProvider registeredPostingsFormat - NumFilteredFields -->
Uint32 - FieldNumber -->
Uint32The number of the field in this segment - Footer -->
CodecFooter
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Field Summary
FieldsFields inherited from class org.apache.lucene.codecs.PostingsFormat
EMPTY -
Constructor Summary
ConstructorsConstructorDescriptionBloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat) Creates Bloom filters for a selection of fields created in the index.BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory) Creates Bloom filters for a selection of fields created in the index. -
Method Summary
Modifier and TypeMethodDescriptionfieldsConsumer(SegmentWriteState state) fieldsProducer(SegmentReadState state) toString()Methods inherited from class org.apache.lucene.codecs.PostingsFormat
availablePostingsFormats, forName, getName, reloadPostingsFormats
-
Field Details
-
BLOOM_CODEC_NAME
- See Also:
-
VERSION_START
public static final int VERSION_START- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENT- See Also:
-
-
Constructor Details
-
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory) Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data.- Parameters:
delegatePostingsFormat- The PostingsFormat that records all the non-bloom filter data i.e. postings info.bloomFilterFactory- TheBloomFilterFactoryresponsible for sizing BloomFilters appropriately
-
BloomFilteringPostingsFormat
Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data. This choice of constructor defaults to theDefaultBloomFilterFactoryfor configuring per-field BloomFilters.- Parameters:
delegatePostingsFormat- The PostingsFormat that records all the non-bloom filter data i.e. postings info.
-
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat()
-
-
Method Details
-
fieldsConsumer
- Specified by:
fieldsConsumerin classPostingsFormat- Throws:
IOException
-
fieldsProducer
- Specified by:
fieldsProducerin classPostingsFormat- Throws:
IOException
-
toString
- Overrides:
toStringin classPostingsFormat
-