Package org.codehaus.jackson.sym
Class BytesToNameCanonicalizer
- java.lang.Object
-
- org.codehaus.jackson.sym.BytesToNameCanonicalizer
-
public final class BytesToNameCanonicalizer extends java.lang.Object
A caching symbol table implementation used for canonicalizing JSON field names (asName
s which are constructed directly from a byte-based input source). Complications arise from trying to do efficient reuse and merging of symbol tables, to be able to make use of usually shared vocabulary of subsequent parsing runs.- Author:
- Tatu Saloranta
-
-
Field Summary
Fields Modifier and Type Field Description protected int
_collCount
Total number of Names in collision buckets (included in_count
along with primary entries)protected int
_collEnd
Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)protected org.codehaus.jackson.sym.BytesToNameCanonicalizer.Bucket[]
_collList
Array of heads of collision bucket chains; size dynamicallyprotected int
_count
Total number of Names in the symbol table; only used for child tables.protected boolean
_intern
Whether canonical symbol Strings are to be intern()ed before added to the table or notprotected int
_longestCollisionList
We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases.protected int[]
_mainHash
Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)protected int
_mainHashMask
Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).protected Name[]
_mainNames
Array that containsName
instances matching entries in_mainHash
.protected BytesToNameCanonicalizer
_parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.protected java.util.concurrent.atomic.AtomicReference<org.codehaus.jackson.sym.BytesToNameCanonicalizer.TableInfo>
_tableInfo
Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table.protected static int
DEFAULT_TABLE_SIZE
protected static int
MAX_TABLE_SIZE
Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Name
addName(java.lang.String symbolStr, int[] quads, int qlen)
Name
addName(java.lang.String symbolStr, int q1, int q2)
int
bucketCount()
int
calcHash(int firstQuad)
int
calcHash(int[] quads, int qlen)
int
calcHash(int firstQuad, int secondQuad)
protected static int[]
calcQuads(byte[] wordBytes)
int
collisionCount()
Method mostly needed by unit tests; calculates number of entries that are in collision list.static BytesToNameCanonicalizer
createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.protected static BytesToNameCanonicalizer
createRoot(int hashSeed)
Factory method that should only be called from unit tests, where seed value should remain the same.Name
findName(int firstQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table.Name
findName(int[] quads, int qlen)
Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.Name
findName(int firstQuad, int secondQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table.static Name
getEmptyName()
int
hashSeed()
BytesToNameCanonicalizer
makeChild(boolean canonicalize, boolean intern)
Factory method used to create actual symbol table instance to use for parsing.int
maxCollisionLength()
Method mostly needed by unit tests; calculates length of the longest collision chain.boolean
maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries.void
release()
Method called by the using code to indicate it is done with this instance.protected void
reportTooManyCollisions(int maxLen)
int
size()
-
-
-
Field Detail
-
DEFAULT_TABLE_SIZE
protected static final int DEFAULT_TABLE_SIZE
- See Also:
- Constant Field Values
-
MAX_TABLE_SIZE
protected static final int MAX_TABLE_SIZE
Let's not expand symbol tables past some maximum size; this should protected against OOMEs caused by large documents with unique (~= random) names.- See Also:
- Constant Field Values
-
_parent
protected final BytesToNameCanonicalizer _parent
Reference to the root symbol table, for child tables, so that they can merge table information back as necessary.
-
_tableInfo
protected final java.util.concurrent.atomic.AtomicReference<org.codehaus.jackson.sym.BytesToNameCanonicalizer.TableInfo> _tableInfo
Member that is only used by the root table instance: root passes immutable state into child instances, and children may return new state if they add entries to the table. Child tables do NOT use the reference.
-
_intern
protected final boolean _intern
Whether canonical symbol Strings are to be intern()ed before added to the table or not
-
_count
protected int _count
Total number of Names in the symbol table; only used for child tables.
-
_longestCollisionList
protected int _longestCollisionList
We need to keep track of the longest collision list; this is needed both to indicate problems with attacks and to allow flushing for other cases.- Since:
- 1.9.9
-
_mainHashMask
protected int _mainHashMask
Mask used to truncate 32-bit hash value to current hash array size; essentially, hash array size - 1 (since hash array sizes are 2^N).
-
_mainHash
protected int[] _mainHash
Array of 2^N size, which contains combination of 24-bits of hash (0 to indicate 'empty' slot), and 8-bit collision bucket index (0 to indicate empty collision bucket chain; otherwise subtract one from index)
-
_mainNames
protected Name[] _mainNames
Array that containsName
instances matching entries in_mainHash
. Contains nulls for unused entries.
-
_collList
protected org.codehaus.jackson.sym.BytesToNameCanonicalizer.Bucket[] _collList
Array of heads of collision bucket chains; size dynamically
-
_collCount
protected int _collCount
Total number of Names in collision buckets (included in_count
along with primary entries)
-
_collEnd
protected int _collEnd
Index of the first unused collision bucket entry (== size of the used portion of collision list): less than or equal to 0xFF (255), since max number of entries is 255 (8-bit, minus 0 used as 'empty' marker)
-
-
Method Detail
-
createRoot
public static BytesToNameCanonicalizer createRoot()
Factory method to call to create a symbol table instance with a randomized seed value.
-
createRoot
protected static BytesToNameCanonicalizer createRoot(int hashSeed)
Factory method that should only be called from unit tests, where seed value should remain the same.
-
makeChild
public BytesToNameCanonicalizer makeChild(boolean canonicalize, boolean intern)
Factory method used to create actual symbol table instance to use for parsing.- Parameters:
intern
- Whether canonical symbol Strings should be interned or not
-
release
public void release()
Method called by the using code to indicate it is done with this instance. This lets instance merge accumulated changes into parent (if need be), safely and efficiently, and without calling code having to know about parent information
-
size
public int size()
-
bucketCount
public int bucketCount()
- Since:
- 1.9.9
-
maybeDirty
public boolean maybeDirty()
Method called to check to quickly see if a child symbol table may have gotten additional entries. Used for checking to see if a child table should be merged into shared table.
-
hashSeed
public int hashSeed()
- Since:
- 1.9.9
-
collisionCount
public int collisionCount()
Method mostly needed by unit tests; calculates number of entries that are in collision list. Value can be at most (size()
- 1), but should usually be much lower, ideally 0.- Since:
- 1.9.9
-
maxCollisionLength
public int maxCollisionLength()
Method mostly needed by unit tests; calculates length of the longest collision chain. This should typically be a low number, but may be up tosize()
- 1 in the pathological case- Since:
- 1.9.9
-
getEmptyName
public static Name getEmptyName()
-
findName
public Name findName(int firstQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.Note: separate methods to optimize common case of short element/attribute names (4 or less ascii characters)
- Parameters:
firstQuad
- int32 containing first 4 bytes of the name; if the whole name less than 4 bytes, padded with zero bytes in front (zero MSBs, ie. right aligned)- Returns:
- Name matching the symbol passed (or constructed for it)
-
findName
public Name findName(int firstQuad, int secondQuad)
Finds and returns name matching the specified symbol, if such name already exists in the table. If not, will return null.Note: separate methods to optimize common case of relatively short element/attribute names (8 or less ascii characters)
- Parameters:
firstQuad
- int32 containing first 4 bytes of the name.secondQuad
- int32 containing bytes 5 through 8 of the name; if less than 8 bytes, padded with up to 3 zero bytes in front (zero MSBs, ie. right aligned)- Returns:
- Name matching the symbol passed (or constructed for it)
-
findName
public Name findName(int[] quads, int qlen)
Finds and returns name matching the specified symbol, if such name already exists in the table; or if not, creates name object, adds to the table, and returns it.Note: this is the general purpose method that can be called for names of any length. However, if name is less than 9 bytes long, it is preferable to call the version optimized for short names.
- Parameters:
quads
- Array of int32s, each of which contain 4 bytes of encoded nameqlen
- Number of int32s, starting from index 0, in quads parameter- Returns:
- Name matching the symbol passed (or constructed for it)
-
addName
public Name addName(java.lang.String symbolStr, int q1, int q2)
-
addName
public Name addName(java.lang.String symbolStr, int[] quads, int qlen)
-
calcHash
public final int calcHash(int firstQuad)
-
calcHash
public final int calcHash(int firstQuad, int secondQuad)
-
calcHash
public final int calcHash(int[] quads, int qlen)
-
calcQuads
protected static int[] calcQuads(byte[] wordBytes)
-
reportTooManyCollisions
protected void reportTooManyCollisions(int maxLen)
- Since:
- 1.9.9
-
-