Package com.univocity.parsers.common
Class NormalizedString
java.lang.Object
com.univocity.parsers.common.NormalizedString
- All Implemented Interfaces:
Serializable,CharSequence,Comparable<NormalizedString>
public final class NormalizedString
extends Object
implements Serializable, Comparable<NormalizedString>, CharSequence
A
NormalizedString allows representing text in a normalized fashion. Strings
with different character case or surrounding whitespace are considered the same.
Used to represent groups of fields, where users may refer to their names using
different character cases or whitespaces.
Where the character case or the surrounding space is relevant, the NormalizedString
will have its isLiteral() method return true, meaning the exact
character case and surrounding whitespaces are required for matching it.
Invoking valueOf(String) with a String surrounded by single quotes
will create a literal NormalizedString. Use literalValueOf(String)
to obtain the same NormalizedString without having to introduce single quotes.- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final intprivate final booleanprivate final Stringprivate final Stringprivate static final longprivate static final StringCache<NormalizedString> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncharcharAt(int index) intintCompares aNormalizedStringagainst aStringlexicographically.booleanstatic StringCache<NormalizedString>getCache()Returns the internal string cache to allow users to tweak its size limit or clear it when appropriateprivate static <T extends Collection<String>>
TgetCollection(T out, NormalizedString... args) private static <T extends Collection<NormalizedString>>
TgetCollection(T out, String... args) private static <T extends Collection<NormalizedString>>
TgetCollection(T out, Collection<String> args) private static <T extends Collection<String>>
TgetStringCollection(T out, Collection<NormalizedString> args) inthashCode()static booleanidentifyLiterals(NormalizedString[] strings) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes.static booleanidentifyLiterals(NormalizedString[] strings, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes.booleanintlength()static NormalizedStringliteralValueOf(String string) Creates a literalNormalizedString, meaning it will only match with otherStringorNormalizedStringif they have the exact same content including character case and surrounding whitespaces.private Stringprivate static booleanshouldBeLiteral(String string, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) subSequence(int start, int end) static String[]toArray(NormalizedString... args) Converts multiple normalized strings into an array ofString.static NormalizedString[]Converts multiple plain strings into an array ofNormalizedString.static NormalizedString[]toArray(Collection<String> args) Converts a collection of plain strings into an array ofNormalizedStringstatic ArrayList<NormalizedString>toArrayList(String... args) Converts multiple plain strings into anArrayListofNormalizedString.static ArrayList<NormalizedString>toArrayList(Collection<String> args) Converts multiple plain strings into anArrayListofNormalizedString.toArrayListOfStrings(NormalizedString... args) Converts multiple normalized strings into aHashSetofString.Converts multiple normalized strings into aHashSetofString.static HashSet<NormalizedString>Converts multiple plain strings into aHashSetofNormalizedString.static HashSet<NormalizedString>toHashSet(Collection<String> args) Converts multiple plain strings into aHashSetofNormalizedString.toHashSetOfStrings(NormalizedString... args) Converts multiple normalized strings into aHashSetofString.Converts multiple normalized strings into aHashSetofString.static NormalizedString[]toIdentifierGroupArray(NormalizedString[] strings) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes.static NormalizedString[]toIdentifierGroupArray(String[] strings) Analyzes a group of String to identify any instances whose normalized content will generate clashes.static LinkedHashSet<NormalizedString>toLinkedHashSet(String... args) Converts multiple plain strings into aLinkedHashSetofNormalizedString.static LinkedHashSet<NormalizedString>toLinkedHashSet(Collection<String> args) Converts multiple plain strings into aLinkedHashSetofNormalizedString.static LinkedHashSet<String>toLinkedHashSetOfStrings(NormalizedString... args) Converts multiple normalized strings into aLinkedHashSetofString.static LinkedHashSet<String>Converts multiple normalized strings into aLinkedHashSetofString.Returns the literal representation of thisNormalizedString, meaning it will only match with otherStringorNormalizedStringif they have the exact same content including character case and surrounding whitespaces.toString()static String[]Converts a collection of normalized strings into an array ofStringstatic TreeSet<NormalizedString>Converts multiple plain strings into aTreeSetofNormalizedString.static TreeSet<NormalizedString>toTreeSet(Collection<String> args) Converts multiple plain strings into aTreeSetofNormalizedString.toTreeSetOfStrings(NormalizedString... args) Converts multiple normalized strings into aHashSetofString.Converts multiple normalized strings into aHashSetofString.static NormalizedString[]toUniqueArray(String... args) Converts multiple plain strings into an array ofNormalizedString, ensuring no duplicateNormalizedStringelements exist, even if their originalStrings are different.static StringvalueOf(NormalizedString string) Converts aNormalizedStringback to its originalStringrepresentationstatic NormalizedStringCreates a non-literalNormalizedString, meaning it will match with otherStringorNormalizedStringregardless of different including character case and surrounding whitespaces.static NormalizedStringCreates a non-literalNormalizedString, meaning it will match with otherStringorNormalizedStringregardless of different including character case and surrounding whitespaces.Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, waitMethods inherited from interface java.lang.CharSequence
chars, codePoints, isEmpty
-
Field Details
-
serialVersionUID
private static final long serialVersionUID- See Also:
-
stringCache
-
original
-
normalized
-
literal
private final boolean literal -
hashCode
private final int hashCode
-
-
Constructor Details
-
NormalizedString
-
-
Method Details
-
normalize
-
isLiteral
public boolean isLiteral() -
equals
-
hashCode
public int hashCode() -
length
public int length()- Specified by:
lengthin interfaceCharSequence
-
charAt
public char charAt(int index) - Specified by:
charAtin interfaceCharSequence
-
subSequence
- Specified by:
subSequencein interfaceCharSequence
-
compareTo
- Specified by:
compareToin interfaceComparable<NormalizedString>
-
compareTo
Compares aNormalizedStringagainst aStringlexicographically.- Parameters:
o- a plainString- Returns:
- the result of
String.compareTo(String). If thisNormalizedStringis a literal, the original argument string will be compared. If thisNormalizedStringis not a literal, the result will be from the comparison of the normalized content of both strings (i.e. surrounding whitespaces and character case differences will be ignored).
-
toString
- Specified by:
toStringin interfaceCharSequence- Overrides:
toStringin classObject
-
literalValueOf
Creates a literalNormalizedString, meaning it will only match with otherStringorNormalizedStringif they have the exact same content including character case and surrounding whitespaces.- Parameters:
string- the inputString- Returns:
- the literal
NormalizedStringversion of the given string.
-
valueOf
Creates a non-literalNormalizedString, meaning it will match with otherStringorNormalizedStringregardless of different including character case and surrounding whitespaces. If the input value is enclosed with single quotes, a literalNormalizedStringwill be returned, as described inliteralValueOf(String)- Parameters:
o- the input object whoseStringrepresentation will be used- Returns:
- the
NormalizedStringof the given object.
-
valueOf
Creates a non-literalNormalizedString, meaning it will match with otherStringorNormalizedStringregardless of different including character case and surrounding whitespaces. If the input string is enclosed with single quotes, a literalNormalizedStringwill be returned, as described inliteralValueOf(String)- Parameters:
string- the input string- Returns:
- the
NormalizedStringof the given string.
-
valueOf
Converts aNormalizedStringback to its originalStringrepresentation- Parameters:
string- the normalized string- Returns:
- the original string used to create the given normalized representation.
-
toArray
Converts a collection of plain strings into an array ofNormalizedString- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toStringArray
Converts a collection of normalized strings into an array ofString- Parameters:
args- the normalized strings to convert back to toString- Returns:
- the
Stringrepresentations of all normalized strings.
-
toUniqueArray
Converts multiple plain strings into an array ofNormalizedString, ensuring no duplicateNormalizedStringelements exist, even if their originalStrings are different.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toArray
Converts multiple plain strings into an array ofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toArray
Converts multiple normalized strings into an array ofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the
Stringrepresentations of all input strings.
-
getCollection
-
getCollection
private static <T extends Collection<NormalizedString>> T getCollection(T out, Collection<String> args) -
getCollection
-
getStringCollection
private static <T extends Collection<String>> T getStringCollection(T out, Collection<NormalizedString> args) -
toArrayList
Converts multiple plain strings into anArrayListofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toArrayList
Converts multiple plain strings into anArrayListofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toArrayListOfStrings
Converts multiple normalized strings into aHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toArrayListOfStrings
Converts multiple normalized strings into aHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toTreeSet
Converts multiple plain strings into aTreeSetofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toTreeSet
Converts multiple plain strings into aTreeSetofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toTreeSetOfStrings
Converts multiple normalized strings into aHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toTreeSetOfStrings
Converts multiple normalized strings into aHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toHashSet
Converts multiple plain strings into aHashSetofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toHashSet
Converts multiple plain strings into aHashSetofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toHashSetOfStrings
Converts multiple normalized strings into aHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toHashSetOfStrings
Converts multiple normalized strings into aHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toLinkedHashSet
Converts multiple plain strings into aLinkedHashSetofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toLinkedHashSet
Converts multiple plain strings into aLinkedHashSetofNormalizedString.- Parameters:
args- the strings to convert toNormalizedString- Returns:
- the
NormalizedStringrepresentations of all input strings.
-
toLinkedHashSetOfStrings
Converts multiple normalized strings into aLinkedHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toLinkedHashSetOfStrings
Converts multiple normalized strings into aLinkedHashSetofString.- Parameters:
args- the normalized strings to convert toString- Returns:
- the original
Strings of all input normalized strings.
-
toLiteral
Returns the literal representation of thisNormalizedString, meaning it will only match with otherStringorNormalizedStringif they have the exact same content including character case and surrounding whitespaces.- Returns:
- the literal representation of the current
NormalizedString
-
toIdentifierGroupArray
Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()), making it possible to identify one from the other.- Parameters:
strings- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.- Returns:
- the input string array, with
NormalizedStringliterals in the positions where clashes would originally occur.
-
toIdentifierGroupArray
Analyzes a group of String to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()), making it possible to identify one from the other.- Parameters:
strings- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered.- Returns:
- a
NormalizedStringarray with literals in the positions where clashes would originally occur.
-
identifyLiterals
Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()), making it possible to identify one from the other.- Parameters:
strings- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.- Returns:
trueif any entry has been modified to be a literal, otherwisefalse
-
identifyLiterals
public static boolean identifyLiterals(NormalizedString[] strings, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) Analyzes a group of NormalizedString to identify any instances whose normalized content will generate clashes. Any clashing entries will be converted to their literal counterparts (usingtoLiteral()), making it possible to identify one from the other.- Parameters:
strings- a group of identifiers that may contain ambiguous entries if their character case or surrounding whitespaces is not considered. This array will be modified.lowercaseIdentifiers- flag indicating that identifiers are stored in lower case (for compatibility with databases). If a string has a uppercase character, it means it must become a literal.uppercaseIdentifiers- flag indicating that identifiers are stored in upper case (for compatibility with databases). If a string has a lowercase character, it means it must become a literal.- Returns:
trueif any entry has been modified to be a literal, otherwisefalse
-
shouldBeLiteral
private static boolean shouldBeLiteral(String string, boolean lowercaseIdentifiers, boolean uppercaseIdentifiers) -
getCache
Returns the internal string cache to allow users to tweak its size limit or clear it when appropriate- Returns:
- the string cache used to store
NormalizedStringinstances associated with their originalString.
-