Package org.apache.uima.cas.impl
Class CasCompare
java.lang.Object
org.apache.uima.cas.impl.CasCompare
Used by tests for Binary Compressed de/serialization code.
Used by test app: XmiCompare.
Compare 2 CASes, with perhaps different type systems.
If the type systems are different, construct a type mapper and use that
to selectively ignore types or features not in other type system
The Mapper is from CAS1 -> CAS2
When computing the things to compare from CAS1, filter to remove
feature structures not reachable via indexes or refs
The index definitions are not compared.
The indexes are used to locate the FSs to be compared.
Reports are produced to System.out and System.err as a side effect
System.out: status messages, type system comparison
System.err: mismatch comparison information
Usage:
Use the static compareCASes method for default comparisons
Use the multi-step approach for more complex comparisons:
- Make an instance of this class, passing in the two CASes.
- Set any additional configuration
cc.compareAll(true) - continue comparing if mismatch found
cc.compardIds(true) - compare ids (require ids to be ==)
- Do any transformations needed on the CASes to account for known but allowed differences:
-- These are transformations done on the CAS Feature Structures outside of this routine
-- example: for certain type:feature string values, normalize to the same canonical value
-- example: for certain type:feature string arrays, where the order is not important, sort them
-- example: for certain type:feature FSArrays, where the order is not important, sort them
--- using the sortFSArray method
- Do any configuration to specify congruence sets for String values
-- example: addStringCongruenceSet( type, feature, set-of-strings, -1 or int index if array)
-- these are specific to type / feature specs
-- range can be string or string array - if string array, the spec includes the index or -1
to indicate all indexes
How it works
Prepare arrays of all the FSs in the CAS
- for each of 2 CASes to be compared
- 2 arrays:
-- all FSs in any index in any view
-- the above, plus all FSs reachable via references
-- but omit some types: only of interest when reached via ref,
e.g. String/Int/Float/Boolean arrays
The comparison of FSs is done, one FS at a time.
- in order to determine the right FSs to compare with each other, the FSs for each CAS
are sorted.
The sort and the CAS compare both use a Compare method.
- sorting skips items not in the other type system, including features
- (only possible if comparing two CASes with different type systems, of course)
Compare
- used for two purposes:
a) sorting FSs belonging to one CAS
- can be used by caller to pre-sort any array values where the
compare should be for set equality (in other words, ignore the order)
b) comparing a FS in one CAS with a FS in the other CAS
sort keys, in order:
1) type
2) if primitive array: sort based on
- size
- iterating thru all array items
3) All the features, considered in an order where non-refs are sorted before refs.
comparing values:
primitives - value comparison
refs - compare the ref'd FS, while recording reference paths
- stop when reach a compare point where the pair being compared has been seen
- stop at any point if the two FSs compare unequal
- at the stop point, if compare is equal, check the reference paths, and
report unequal reference paths (different cycle lengths, or different total lengths,
see the Prev data structure)
Context information, reused across compares:
prevCompare - if a particular pair of FSs compared equal
-- used to speed up comparison
-- used to stop recursive loops of references
prev1, prev2 - reset for each top level FS compare
- not reset for inner FS compares of fs-reference values)
holds information about the reference path for chains of FS references
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static class
private static class
hold info about previous compares, to break cycles in references The comparison records cycles and can distinguish different cyclic graphs.private static class
key for StringCongruenceSet -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final String
private final CASImpl
private final CASImpl
private boolean
private static final boolean
private static final boolean
private static final boolean
private static boolean
private boolean
if true, continues comparison and reporting after finding the first miscompareprivate boolean
private boolean
private boolean
private boolean
private boolean
private final PositiveIntSet
private final Int2ObjHashMap
<ArrayList<CommonList>, ArrayList<CommonList>> key = _id, value = arraylist holding well-formed list with this node in itprivate int
private int
private int
private final StringBuilder
private final Obj2IntIdentityHashMap
<CommonList> a map from list nodes which might be removed, to their place in the fss array list The index is 1 more, to avoid colliding with the 0 value, used for missing valueprivate final PositiveIntSet
set of list elements determined to be members of a non-linear structureprivate final CasCompare.Prev
private final CasCompare.Prev
This is used - to speed up the compare - avoid comparing the same things multiple times, instead just use previous result - when doing the comparison to break recursion if asked to compare the same two things while comparing them.private static final CommonList
private int
private final Map
<CasCompare.ScsKey, String[]> private final TypeSystemImpl
private final TypeSystemImpl
private final Map
<TypeImpl, CasCompare.FeatLists> private final CasTypeSystemMapper
private static int
-
Constructor Summary
ConstructorsConstructorDescriptionCasCompare
(CASImpl c1, CASImpl c2) Make an instance of this class to set up a compare operation, and optionally use to configure the compare. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addStringCongruenceSet
(String typeName, String featureBaseName, String[] set_of_strings_that_are_equivalent, int index) Add a set of strings that should be considered equal when doing string comparisons.private boolean
addSuccessors
(CommonList node, ArrayList<CommonList> al) walk down list, adding successors, looking for loops - each element is added to the array list, and also to the map from id -> array list - if loop found, stop and return false - before adding element, see if already in map from id -> array list -- if so, couple the array listsvoid
Many times some customation needs to be applied to both CASs being compared.void
Before comparing, you can adjust specific features of specific types, arbitrarily.private void
applyToTypeFeature_inner
(CASImpl cas, String typeName, String featureBaseName, Consumer2<TOP, Feature> c) void
canonicalizeString
(String typeName, String featureBaseName, String[] items_to_change, String canonical_value) Before comparing, you can, for a selected type and feature which has a string value belonging to one of a set of strings, change the value to another (fixed) string which will of course compare equal.private void
void
compareAll
(boolean v) Continues the comparison after a miscompare (or not).private int
compareAllArrayElements
(TOP fs1, TOP fs2, int len, IntUnaryOperator c, TypeImpl callerTi, FeatureImpl callerFi) boolean
This does the actual comparison operation of the previously specified CASesstatic boolean
compareCASes
(CASImpl c1, CASImpl c2) Compare 2 CASes, with perhaps different type systems.private int
compareFeature
(TOP fs1, TOP fs2, TypeImpl ti1, FeatureImpl fi1) private int
compareFss
(TOP fs1, TOP fs2, TypeImpl callerTi, FeatureImpl callerFi) To work with Java sort, must implement the comparator contract: - compare(x, y) must be opposite compare(y, x) - compare(x, y) invalid input: '<' 0 invalid input: '&'invalid input: '&' compare(y, z) invalid input: '<' 0 implies compare(x, z) invalid input: '<' 0 - compare(x, y) == 0 implies compare(x, z) same as compare(y, z) for any z Inner part of compareRefs; that other method adds: null-check type-mapping skips loop determination If not in a sort context, a miscompare generates messaging information.private int
compareFssArray
(TOP fs1, TOP fs2, TypeImpl callerTi, FeatureImpl callerFi) void
compareIds
(boolean v) Normally, compares ignore the Feature Structure ID when comparing.static StringBuilder
compareNumberOfFSsByType
(CAS cas1, CAS cas2) Counts and compares the number of Feature Structures, by type, and generates a reportprivate int
compareRefResult
(TOP rfs1, TOP rfs2) Returning because recursion detected a loop.private int
compareRefs
(TOP rfs1, TOP rfs2, TypeImpl callerTi, FeatureImpl callerFi) Two uses cases supported: - comparing for sorting (within on type system) -- goal is to be able to compare two CASes --- ordering must guarantee that the equal FSs appear in the --- same order - comparing two FSs (maybe from different CASes) -- supporting missing types and features -- happens when the two type systems are different -- the missing types and features are ignored in the comparison Different reference chains This compare routine may be called recursively - use case: FS(a) has slot which is ref to FS(b) which has slot which is ref to FS(c) -- the type of a, b, c may all be different.private int
compareSlot
(TOP fs1, TOP fs2, FeatureImpl fi1, FeatureImpl fi2, TypeImpl ti1) private int
compareStringsWithNull
(String s1, String s2, TypeImpl t, FeatureImpl f, int index) private CasCompare.FeatLists
called during sort phaseprivate void
private void
convert_to_array
(ArrayList<CommonList> al, ArrayList<TOP> fss, CASImpl view, TypeSystemImpl tsi) Convert an array list to a uima array (int, float, fs, string) - add to fss - go thru fss and null out list elementsprivate void
couple_array_lists
(ArrayList<CommonList> a1, ArrayList<CommonList> a2, CommonList commonNode) merge a2 to follow a1, starting from position where commonNode is in a2void
The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above.void
The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above.void
excludeRootTypesFromIndexes
(Set<String> excluded_typeNames) The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above.void
includeOnlyTheseTypesFromIndexes
(List<String> includedTypeNames) The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above.private boolean
isTypeInTgt
(TOP fs) private void
mismatchFs
(TOP fs1, TOP fs2, String msg, TypeImpl callerTi, FeatureImpl callerFi) private void
mismatchFs
(TOP fs1, TOP fs2, Feature fi, Feature fi2) private void
mismatchFs
(TOP fs1, TOP fs2, TypeImpl callerTi, FeatureImpl callerFi) private void
private void
private String
static void
call this to show progress of the compare - useful for long comparesprivate void
called to sort all the FSs before doing the equality comparessort_dedup_FSArray
(String typeName, String featureBaseName) sort_dedup_FSArray
(TOP fs, Feature feat) This is an optional pre-compare operation.private int
sortCompare
(TOP scFs1, TOP scFs2) Used for sorting within one type system, for two instances of the same type Uses field isSrcCas (boolean) to differentiate when being used to sort for srcCas vs tgtCas When sorting where type mapping is happening between source and target CASs, skip compares for features which are not in the opposite CAS.sortFSArray
(String typeName, String featureBaseName) sortFSArray
(FSArray<?> fsArray) This is an optional pre-compare operation.sortStringArray
(String typeName, String featureBaseName) sortStringArray
(StringArray stringArray) This is an optional pre-compare operation.type_feature_to_runnable
(String typeName, String featureBaseName, BiFunction<TOP, Feature, Runnable> c) Before comparing, you can create pending values for specific types / features, and return a list of runnables, which when run, plug in those pending values.type_feature_to_runnable
(CASImpl cas, String typeName, String featureBaseName, BiFunction<TOP, Feature, Runnable> c)
-
Field Details
-
IS_DEBUG_STOP_ON_MISCOMPARE
private static final boolean IS_DEBUG_STOP_ON_MISCOMPARE- See Also:
-
IS_MEAS_LIST_2_ARRAY
private static final boolean IS_MEAS_LIST_2_ARRAY- See Also:
-
BLANKS_89
-
IS_SHOW_PROGRESS
private static boolean IS_SHOW_PROGRESS -
IS_CANONICAL_EMPTY_LISTS
private static final boolean IS_CANONICAL_EMPTY_LISTS- See Also:
-
removed_list_marker
-
map_e_to_a_list
key = _id, value = arraylist holding well-formed list with this node in it -
non_linear_list_elements
set of list elements determined to be members of a non-linear structure -
node_indexes
a map from list nodes which might be removed, to their place in the fss array list The index is 1 more, to avoid colliding with the 0 value, used for missing value -
list_successor_seen
-
type2featLists
-
c1
-
c2
-
ts1
-
ts2
-
isCompareAll
private boolean isCompareAllif true, continues comparison and reporting after finding the first miscompare -
isCompareIds
private boolean isCompareIds -
leafErrorReported
-
excludedRootNames
-
includedTypeNames
-
prevCompare
This is used - to speed up the compare - avoid comparing the same things multiple times, instead just use previous result - when doing the comparison to break recursion if asked to compare the same two things while comparing them. value is the result of previous comparison. -
prevReport
-
prev1
-
prev2
-
isSrcCas
private boolean isSrcCas -
mismatchSb
-
inSortContext
private boolean inSortContext -
isSkipMismatch
private boolean isSkipMismatch -
isTypeMapping
private boolean isTypeMapping -
typeMapper
-
c1FoundFSs
-
c2FoundFSs
-
stringCongruenceSets
-
isUsingStringCongruenceSets
private boolean isUsingStringCongruenceSets -
maxId1
private int maxId1 -
maxId2
private int maxId2 -
miscompare_index
private int miscompare_index -
s1maxLen
private int s1maxLen -
working_on
private static int working_on
-
-
Constructor Details
-
CasCompare
Make an instance of this class to set up a compare operation, and optionally use to configure the compare.- Parameters:
c1
- one CAS to comparec2
- the other CAS to compare
-
-
Method Details
-
compareCASes
Compare 2 CASes, with perhaps different type systems. - using default configuration.- Parameters:
c1
- CAS to comparec2
- CAS to compare- Returns:
- true if equal (for types / features in both)
-
compareAll
public void compareAll(boolean v) Continues the comparison after a miscompare (or not). This is useful when you want to see all of the miscompares.- Parameters:
v
- defaults to false, set to true to continue the comparison after a miscompare
-
compareIds
public void compareIds(boolean v) Normally, compares ignore the Feature Structure ID when comparing.- Parameters:
v
- defaults to false, set to true to include the Feature Structure ID in the compare.
-
applyToBoth
Many times some customation needs to be applied to both CASs being compared. This routine does that- Parameters:
c
- the customization to be applied to both CASs
-
applyToTypeFeature
Before comparing, you can adjust specific features of specific types, arbitrarily. This routine applies the adjustments to both CASs.- Parameters:
typeName
- the fully qualified name of the typefeatureBaseName
- the short feature name to adjustc
- a function to do the adjustment
-
applyToTypeFeature_inner
-
type_feature_to_runnable
public List<Runnable> type_feature_to_runnable(String typeName, String featureBaseName, BiFunction<TOP, Feature, Runnable> c) Before comparing, you can create pending values for specific types / features, and return a list of runnables, which when run, plug in those pending values.- Parameters:
typeName
- the typefeatureBaseName
- the feature of the typec
- the code to run for this type and feature- Returns:
- a list of runnables, for both CASs
-
type_feature_to_runnable
-
canonicalizeString
public void canonicalizeString(String typeName, String featureBaseName, String[] items_to_change, String canonical_value) Before comparing, you can, for a selected type and feature which has a string value belonging to one of a set of strings, change the value to another (fixed) string which will of course compare equal. Use this to ignore selected string-valued features having particular values.- Parameters:
typeName
- the fully qualified type namefeatureBaseName
- the featureitems_to_change
- an array of strings to change if matched to one of thesecanonical_value
- the new value
-
sortFSArray
-
sort_dedup_FSArray
-
sortStringArray
-
excludeRootTypesFromIndexes
The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of FeatureStructure like StringArrays which (for some reason) are indexed, in favor of finding these only via refs. You can exclude these from being found via indexes by setting types here. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call;- Parameters:
excluded_typeNames
- type names to exclude
-
excludeCollectionsTypesFromIndexes
public void excludeCollectionsTypesFromIndexes()The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of FeatureStructure like StringArrays which (for some reason) are indexed, in favor of finding these only via refs. Call this to exclude the array types: boolean, byte, short, integer, long, float, double, string and fs arrays from being found via indexes. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call; -
excludeListTypesFromIndexes
public void excludeListTypesFromIndexes()The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude miscompares of List FeatureStructures like StringLists which (for some reason) are indexed, in favor of finding these only via refs. Call this to exclude the list types non-empty Float/Integer/String list elements from being found in the index. They could still be found via refs from other Feature Structures. Calling this disables any includeOnlyTheseTypesFromIndexes call; -
includeOnlyTheseTypesFromIndexes
The compare can find FeatureStructures to compare either from - being in some index in some view, or - being referenced through some chain which starts with the above. It sometimes helps to exclude all types except for a few selected ones which are indexed, in favor of finding these only via refs. Calling this disables any excludeXXXTypesFromIndexes calls;- Parameters:
includedTypeNames
- fully qualified type names to include when finding Feature Structures to compare via the indexes.
-
addStringCongruenceSet
public void addStringCongruenceSet(String typeName, String featureBaseName, String[] set_of_strings_that_are_equivalent, int index) Add a set of strings that should be considered equal when doing string comparisons. This is conditioned on the typename and feature name- Parameters:
typeName
- the fully qualified type namefeatureBaseName
- the feature short nameset_of_strings_that_are_equivalent
- a set of strings that should compare equal, if testing the type / featureindex
- if the item being compared is a reference to a string array, which index should be compared. Use -1 if not applicable.
-
showProgress
public static void showProgress()call this to show progress of the compare - useful for long compares -
compareCASes
public boolean compareCASes()This does the actual comparison operation of the previously specified CASes- Returns:
- true if compare is OK
-
sortFSArray
This is an optional pre-compare operation. Somtimes, when comparing FSArrays, the order of the elements is not significant, and the compare should be done ignoring order differences. This is accomplished by sorting the elements, before the compare is done, using this method. The sort order is not significant; it just needs to be the same order for otherwise equal FSArrays. Use this routine to accomplish the sort, on particular FSArrays you designate. Call it for each one you want to sort. During the sort, links are followed. The sorting is done in a clone of the array, and the original array is not updated. Instead, a Runnable is returned, which may be invoked later to update the original array with the sorted copy. This allows sorting to be done on the original item values (in case the links refer back to the originals)- Parameters:
fsArray
- the array to be sorted- Returns:
- a runnable, which (when invoked) updates the original array with the sorted result.
-
sort_dedup_FSArray
This is an optional pre-compare operation. It is identical to the method above, except that after sorting, it removes duplicates.- Parameters:
fs
- the feature structure having the fsarray featurefeat
- the feature having the fsarray- Returns:
- a runnable, which (when invoked) updates the original array with the sorted result.
-
sortStringArray
This is an optional pre-compare operation. Somtimes, when comparing StringArrays, the order of the elements is not significant, and the compare should be done ignoring order differences. This is accomplished by sorting the elements, before the compare is done, using this method. Use this routine to accomplish the sort, on particular StringArrays you designate. Call it for each one you want to sort. The sorting is done in a clone of the array, and the original array is not updated. Instead, a Runnable is returned, which may be invoked later to update the original array with the sorted copy. This allows sorting to be done while keeping the original values until a later time- Parameters:
stringArray
- the array to be sorted- Returns:
- null or a runnable, which (when invoked) updates the original array with the sorted result. callers should insure the runnable is garbage collected after use
-
convert_linear_lists_to_arrays
-
convert_to_array
private void convert_to_array(ArrayList<CommonList> al, ArrayList<TOP> fss, CASImpl view, TypeSystemImpl tsi) Convert an array list to a uima array (int, float, fs, string) - add to fss - go thru fss and null out list elements- Parameters:
al
- -fss
- -
-
addSuccessors
walk down list, adding successors, looking for loops - each element is added to the array list, and also to the map from id -> array list - if loop found, stop and return false - before adding element, see if already in map from id -> array list -- if so, couple the array lists- Parameters:
node
- -al
- -- Returns:
- false if loop found
-
couple_array_lists
private void couple_array_lists(ArrayList<CommonList> a1, ArrayList<CommonList> a2, CommonList commonNode) merge a2 to follow a1, starting from position where commonNode is in a2- Parameters:
a1
- -a2
- -commonNode
- -
-
move_to_non_linear
-
clearPrevFss
private void clearPrevFss() -
compareFss
To work with Java sort, must implement the comparator contract: - compare(x, y) must be opposite compare(y, x) - compare(x, y) invalid input: '<' 0 invalid input: '&'invalid input: '&' compare(y, z) invalid input: '<' 0 implies compare(x, z) invalid input: '<' 0 - compare(x, y) == 0 implies compare(x, z) same as compare(y, z) for any z Inner part of compareRefs; that other method adds: null-check type-mapping skips loop determination If not in a sort context, a miscompare generates messaging information.- Parameters:
callerTi
- - the type of another FS referencing this one, or null, used in congruence set testingcallerFi
- - the feature of the another FS referencing this one, or null, used in congruence set testing- Returns:
- the compare result
-
compareFeature
-
computeFeatLists
called during sort phase- Parameters:
ti
- - type being sorted- Returns:
- the feature lists for that type
-
compareFssArray
-
compareSlot
-
compareRefs
Two uses cases supported: - comparing for sorting (within on type system) -- goal is to be able to compare two CASes --- ordering must guarantee that the equal FSs appear in the --- same order - comparing two FSs (maybe from different CASes) -- supporting missing types and features -- happens when the two type systems are different -- the missing types and features are ignored in the comparison Different reference chains This compare routine may be called recursively - use case: FS(a) has slot which is ref to FS(b) which has slot which is ref to FS(c) -- the type of a, b, c may all be different. Two reference chains for the two arguments may have different structures - Difference in two ways: -- length of unique (same fs_id) FSs -- length of loop (if one exists at the end reached so far) IMPORTANT: the following 2 chains have different lengths, but this won't be discovered if the recursive descent stops too soon: - a -> b -> c ( -> b ) - c -> b ( -> c) At the 2nd recursion, we have b vs b, but haven't explored the chain deeply enough to know the first one has length 3, and the 2nd length 2. Meaning of comparision of two refs: - recursively defined - captures notion of reference chain differences -- means if two refs compare 0, the result may still be non-0 if the reference chains to get to these are different -- first compare on length of unique FSs -- if ==, compare on length of loop - if comparing (use case 2, two different type systems) with type not existing in other type system, skip (treat as 0). If comparing two FSs in 1 CAS, where there is type mapping, if the mapping to the other CAS is null, change the value of the FS to null to match the sort order the other CAS will haveand that mapping is to null (because the type is missing), use null for the argument(s). Complexities: the type rfs1 may not be in the target type system. For this case - treat rfs2 == null as "equal", rfs2 != null as not equal (always gt) Is assymetrical (?) - same logic isn't applied for reverse case.- Parameters:
rfs1
- -rfs2
- -fi
- -- Returns:
- -
-
compareRefResult
Returning because recursion detected a loop.- Parameters:
rfs1
- -rfs2
- -- Returns:
- - -1 if ref chain 1 length invalid input: '<' ref chain 2 length or is the same length but loop length 1 invalid input: '<' 2 1 if ref chain 1 length > ref chain 2 length or is the same length but loop length 1 > 2 0 if ref chain lengths are the same and loop length is the same Exception: if one of the items is a canonical "empty" list element, and the other is a non-canonical one - treat as equal.
-
compareAllArrayElements
private int compareAllArrayElements(TOP fs1, TOP fs2, int len, IntUnaryOperator c, TypeImpl callerTi, FeatureImpl callerFi) -
compareStringsWithNull
-
mismatchFsDisplay
private void mismatchFsDisplay() -
mismatchFs
-
mismatchFs
-
mismatchFs
-
sort
called to sort all the FSs before doing the equality compares -
sortCompare
Used for sorting within one type system, for two instances of the same type Uses field isSrcCas (boolean) to differentiate when being used to sort for srcCas vs tgtCas When sorting where type mapping is happening between source and target CASs, skip compares for features which are not in the opposite CAS.- Parameters:
scFs1
- -scFs2
- -- Returns:
- -
-
isTypeInTgt
-
ps
-
compareNumberOfFSsByType
Counts and compares the number of Feature Structures, by type, and generates a report- Parameters:
cas1
- first CAS to comparecas2
- second CAS to compare- Returns:
- a StringBuilder with a report
-