Class Operations
- WARNING: This API is experimental and might change in incompatible ways in the next release.
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final intDefault maximum effort thatdeterminize(org.apache.lucene.util.automaton.Automaton, int)should spend before giving up and throwingTooComplexToDeterminizeException.
- 
Method SummaryModifier and TypeMethodDescriptionstatic Automatoncomplement(Automaton a, int determinizeWorkLimit) Returns a (deterministic) automaton that accepts the complement of the language of the given automaton.static Automatonconcatenate(List<Automaton> list) Returns an automaton that accepts the concatenation of the languages of the given automata.static Automatonconcatenate(Automaton a1, Automaton a2) Deprecated.static Automatondeterminize(Automaton a, int workLimit) Determinizes the given automaton.static StringReturns the longest string that is a prefix of all accepted strings and visits each state at most once.static BytesRefReturns the longest BytesRef that is a prefix of all accepted strings and visits each state at most once.static BytesRefReturns the longest BytesRef that is a suffix of all accepted strings.static IntsRefIf this automaton accepts a single input, return it.static booleanReturns true if this automaton has any states that cannot be reached from the initial state or cannot reach an accept state.static booleanReturns true if there are dead states reachable from an initial state.static booleanReturns true if there are dead states that reach an accept state.static Automatonintersection(Automaton a1, Automaton a2) Returns an automaton that accepts the intersection of the languages of the given automata.static booleanReturns true if the given automaton accepts no strings.static booleanReturns true if the given automaton accepts all strings.static booleanReturns true if the given automaton accepts all strings for the specified min/max range of the alphabet.static AutomatonReturns a (deterministic) automaton that accepts the intersection of the language ofa1and the complement of the language ofa2.static AutomatonReturns an automaton that accepts the union of the empty string and the language of the given automaton.static AutomatonRemoves transitions to dead states (a state is "dead" if it is not reachable from the initial state or no accept state is reachable from it.)static AutomatonReturns an automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton.static AutomatonReturns an automaton that acceptsminor more concatenated repetitions of the language of the given automaton.static AutomatonReturns an automaton that accepts betweenminandmax(including both) concatenated repetitions of the language of the given automaton.static AutomatonReturns an automaton accepting the reverse language.static booleanReturns true if the given string is accepted by the automaton.static booleanReturns true if the given string (expressed as unicode codepoints) is accepted by the automaton.static int[]Returns the topological sort of all states reachable from the initial state.static Automatonunion(Collection<Automaton> list) Returns an automaton that accepts the union of the languages of the given automata.static AutomatonDeprecated.useunion(Collection)instead
- 
Field Details- 
DEFAULT_DETERMINIZE_WORK_LIMITpublic static final int DEFAULT_DETERMINIZE_WORK_LIMITDefault maximum effort thatdeterminize(org.apache.lucene.util.automaton.Automaton, int)should spend before giving up and throwingTooComplexToDeterminizeException.- See Also:
 
 
- 
- 
Method Details- 
concatenateDeprecated.useconcatenate(List)insteadReturns an automaton that accepts the concatenation of the languages of the given automata.Complexity: linear in total number of states. 
- 
concatenateReturns an automaton that accepts the concatenation of the languages of the given automata.Complexity: linear in total number of states. - Parameters:
- list- List of automata to be joined
 
- 
optionalReturns an automaton that accepts the union of the empty string and the language of the given automaton. This may create a dead state.Complexity: linear in number of states. 
- 
repeatReturns an automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton. Never modifies the input automaton language.Complexity: linear in number of states. 
- 
repeatReturns an automaton that acceptsminor more concatenated repetitions of the language of the given automaton.Complexity: linear in number of states and in min.
- 
repeatReturns an automaton that accepts betweenminandmax(including both) concatenated repetitions of the language of the given automaton.Complexity: linear in number of states and in minandmax.
- 
complementReturns a (deterministic) automaton that accepts the complement of the language of the given automaton.Complexity: linear in number of states if already deterministic and exponential otherwise. - Parameters:
- determinizeWorkLimit- maximum effort to spend determinizing the automaton. Set higher to allow more complex queries and lower to prevent memory exhaustion.- DEFAULT_DETERMINIZE_WORK_LIMITis a good starting default.
 
- 
minusReturns a (deterministic) automaton that accepts the intersection of the language ofa1and the complement of the language ofa2. As a side-effect, the automata may be determinized, if not already deterministic.Complexity: quadratic in number of states if a2 already deterministic and exponential in number of a2's states otherwise. - Parameters:
- a1- the initial automaton
- a2- the automaton to subtract
- determinizeWorkLimit- maximum effort to spend determinizing the automaton. Set higher to allow more complex queries and lower to prevent memory exhaustion.- DEFAULT_DETERMINIZE_WORK_LIMITis a good starting default.
 
- 
intersectionReturns an automaton that accepts the intersection of the languages of the given automata. Never modifies the input automata languages.Complexity: quadratic in number of states. 
- 
hasDeadStatesReturns true if this automaton has any states that cannot be reached from the initial state or cannot reach an accept state. Cost is O(numTransitions+numStates).
- 
hasDeadStatesFromInitialReturns true if there are dead states reachable from an initial state.
- 
hasDeadStatesToAcceptReturns true if there are dead states that reach an accept state.
- 
unionDeprecated.useunion(Collection)insteadReturns an automaton that accepts the union of the languages of the given automata.Complexity: linear in number of states. 
- 
unionReturns an automaton that accepts the union of the languages of the given automata.Complexity: linear in number of states. - Parameters:
- list- List of automata to be unioned.
 
- 
determinizeDeterminizes the given automaton.Worst case complexity: exponential in number of states. - Parameters:
- workLimit- Maximum amount of "work" that the powerset construction will spend before throwing- TooComplexToDeterminizeException. Higher numbers allow this operation to consume more memory and CPU but allow more complex automatons. Use- DEFAULT_DETERMINIZE_WORK_LIMITas a decent default if you don't otherwise know what to specify.
- Throws:
- TooComplexToDeterminizeException- if determinizing requires more than- workLimit"effort"
 
- 
isEmptyReturns true if the given automaton accepts no strings.
- 
isTotalReturns true if the given automaton accepts all strings.The automaton must be deterministic, or this method may return false. Complexity: linear in number of states and transitions. 
- 
isTotalReturns true if the given automaton accepts all strings for the specified min/max range of the alphabet.The automaton must be deterministic, or this method may return false. Complexity: linear in number of states and transitions. 
- 
runReturns true if the given string is accepted by the automaton. The input must be deterministic.Complexity: linear in the length of the string. Note: for full performance, use the RunAutomatonclass.
- 
runReturns true if the given string (expressed as unicode codepoints) is accepted by the automaton. The input must be deterministic.Complexity: linear in the length of the string. Note: for full performance, use the RunAutomatonclass.
- 
removeDeadStatesRemoves transitions to dead states (a state is "dead" if it is not reachable from the initial state or no accept state is reachable from it.)
- 
getCommonPrefixReturns the longest string that is a prefix of all accepted strings and visits each state at most once. The automaton must not have dead states. If this automaton has already been converted to UTF-8 (e.g. usingUTF32ToUTF8) then you should usegetCommonPrefixBytesRef(org.apache.lucene.util.automaton.Automaton)instead.- Returns:
- common prefix, which can be an empty (length 0) String (never null)
- Throws:
- IllegalArgumentException- if the automaton has dead states reachable from the initial state.
 
- 
getCommonPrefixBytesRefReturns the longest BytesRef that is a prefix of all accepted strings and visits each state at most once.- Returns:
- common prefix, which can be an empty (length 0) BytesRef (never null), and might possibly include a UTF-8 fragment of a full Unicode character
 
- 
getSingletonIf this automaton accepts a single input, return it. Else, return null. The automaton must be deterministic.
- 
getCommonSuffixBytesRefReturns the longest BytesRef that is a suffix of all accepted strings. Worst case complexity: quadratic with number of states+transitions.- Returns:
- common suffix, which can be an empty (length 0) BytesRef (never null)
 
- 
reverseReturns an automaton accepting the reverse language.
- 
topoSortStatesReturns the topological sort of all states reachable from the initial state. This method assumes that the automaton does not contain cycles, and will throw an IllegalArgumentException if a cycle is detected. The CPU cost is O(numTransitions), and the implementation is non-recursive, so it will not exhaust the java stack for automaton matching long strings. If there are dead states in the automaton, they will be removed from the returned array.Note: This method uses a deque to iterative the states, which could potentially consume a lot of heap space for some automatons. Specifically, automatons with a deep level of states (i.e., a large number of transitions from the initial state to the final state) may particularly contribute to high memory usage. The memory consumption of this method can be considered as O(N), where N is the depth of the automaton (the maximum number of transitions from the initial state to any state). However, as this method detects cycles, it will never attempt to use infinite RAM. - Parameters:
- a- the Automaton to be sorted
- Returns:
- the topologically sorted array of state ids
 
 
- 
concatenate(List)instead