Class SmileConstants

    • Field Detail

      • MAX_SHORT_VALUE_STRING_BYTES

        public static final int MAX_SHORT_VALUE_STRING_BYTES
        Encoding has special "short" forms for value Strings that can be represented by 64 bytes of UTF-8 or less.
        See Also:
        Constant Field Values
      • MAX_SHORT_NAME_ASCII_BYTES

        public static final int MAX_SHORT_NAME_ASCII_BYTES
        Encoding has special "short" forms for field names that can be represented by 64 bytes of UTF-8 or less.
        See Also:
        Constant Field Values
      • MAX_SHORT_NAME_UNICODE_BYTES

        public static final int MAX_SHORT_NAME_UNICODE_BYTES
        Maximum byte length for short non-ASCII names is slightly less due to having to reserve bytes 0xF8 and above (but we get one more as values 0 and 1 are not valid)
        See Also:
        Constant Field Values
      • MAX_SHARED_NAMES

        public static final int MAX_SHARED_NAMES
        Longest back reference we use for field names is 10 bits; no point in keeping much more around
        See Also:
        Constant Field Values
      • MAX_SHARED_STRING_VALUES

        public static final int MAX_SHARED_STRING_VALUES
        Longest back reference we use for short shared String values is 10 bits, so up to (1 << 10) values to keep track of.
        See Also:
        Constant Field Values
      • MAX_SHARED_STRING_LENGTH_BYTES

        public static final int MAX_SHARED_STRING_LENGTH_BYTES
        Also: whereas we can refer to names of any length, we will only consider text values that are considered "tiny" or "short" (ones encoded with length prefix); this value thereby has to be maximum length of Strings that can be encoded as such.
        See Also:
        Constant Field Values
      • MIN_BUFFER_FOR_POSSIBLE_SHORT_STRING

        public static final int MIN_BUFFER_FOR_POSSIBLE_SHORT_STRING
        And to make encoding logic tight and simple, we can always require that output buffer has this amount of space available before encoding possibly short String (3 bytes since longest UTF-8 encoded Java char is 3 bytes). Two extra bytes need to be reserved as well; first for token indicator, and second for terminating null byte (in case it's not a short String after all)
        See Also:
        Constant Field Values
      • INT_MARKER_END_OF_STRING

        public static final int INT_MARKER_END_OF_STRING
        We need a byte marker to denote end of variable-length Strings. Although null byte is commonly used, let's try to avoid using it since it can't be embedded in Web Sockets content (similarly, 0xFF can't). There are multiple candidates for bytes UTF-8 can not have; 0xFC is chosen to allow reasonable ordering (highest values meaning most significant framing function; 0xFF being end-of-content and so on)
        See Also:
        Constant Field Values
      • BYTE_MARKER_END_OF_STRING

        public static final byte BYTE_MARKER_END_OF_STRING
        See Also:
        Constant Field Values
      • BYTE_MARKER_END_OF_CONTENT

        public static final byte BYTE_MARKER_END_OF_CONTENT
        In addition we can use a marker to allow simple framing; splitting of physical data (like file) into distinct logical sections like JSON documents. 0xFF makes sense here since it is also used as end marker for Web Sockets.
        See Also:
        Constant Field Values
      • HEADER_BYTE_1

        public static final byte HEADER_BYTE_1
        First byte of data header
        See Also:
        Constant Field Values
      • HEADER_BYTE_2

        public static final byte HEADER_BYTE_2
        Second byte of data header
        See Also:
        Constant Field Values
      • HEADER_BYTE_3

        public static final byte HEADER_BYTE_3
        Third byte of data header
        See Also:
        Constant Field Values
      • HEADER_VERSION_0

        public static final int HEADER_VERSION_0
        Current version consists of four zero bits (nibble)
        See Also:
        Constant Field Values
      • HEADER_BYTE_4

        public static final byte HEADER_BYTE_4
        Fourth byte of data header; contains version nibble, may have flags
        See Also:
        Constant Field Values
      • HEADER_BIT_HAS_SHARED_NAMES

        public static final int HEADER_BIT_HAS_SHARED_NAMES
        Indicator bit that indicates whether encoded content may have Shared names (back references to recently encoded field names). If no header available, must be processed as if this was set to true. If (and only if) header exists, and value is 0, can parser omit storing of seen names, as it is guaranteed that no back references exist.
        See Also:
        Constant Field Values
      • HEADER_BIT_HAS_SHARED_STRING_VALUES

        public static final int HEADER_BIT_HAS_SHARED_STRING_VALUES
        Indicator bit that indicates whether encoded content may have shared String values (back references to recently encoded 'short' String values, where short is defined as 64 bytes or less). If no header available, can be assumed to be 0 (false). If header exists, and bit value is 1, parsers has to store up to 1024 most recently seen distinct short String values.
        See Also:
        Constant Field Values
      • HEADER_BIT_HAS_RAW_BINARY

        public static final int HEADER_BIT_HAS_RAW_BINARY
        Indicator bit that indicates whether encoded content may contain raw (unquoted) binary values. If no header available, can be assumed to be 0 (false). If header exists, and bit value is 1, parser can not assume that specific byte values always have default meaning (specifically, content end marker 0xFF and header signature can be contained in binary values)

        Note that this bit being true does not automatically mean that such raw binary content indeed exists; just that it may exist. This because header is written before any binary data may be written.

        See Also:
        Constant Field Values
      • TOKEN_PREFIX_SHARED_STRING_SHORT

        public static final int TOKEN_PREFIX_SHARED_STRING_SHORT
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_TINY_ASCII

        public static final int TOKEN_PREFIX_TINY_ASCII
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_SMALL_ASCII

        public static final int TOKEN_PREFIX_SMALL_ASCII
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_TINY_UNICODE

        public static final int TOKEN_PREFIX_TINY_UNICODE
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_SHORT_UNICODE

        public static final int TOKEN_PREFIX_SHORT_UNICODE
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_MISC_OTHER

        public static final int TOKEN_PREFIX_MISC_OTHER
        See Also:
        Constant Field Values
      • TOKEN_LITERAL_EMPTY_STRING

        public static final byte TOKEN_LITERAL_EMPTY_STRING
        See Also:
        Constant Field Values
      • TOKEN_LITERAL_START_ARRAY

        public static final byte TOKEN_LITERAL_START_ARRAY
        See Also:
        Constant Field Values
      • TOKEN_LITERAL_END_ARRAY

        public static final byte TOKEN_LITERAL_END_ARRAY
        See Also:
        Constant Field Values
      • TOKEN_LITERAL_START_OBJECT

        public static final byte TOKEN_LITERAL_START_OBJECT
        See Also:
        Constant Field Values
      • TOKEN_LITERAL_END_OBJECT

        public static final byte TOKEN_LITERAL_END_OBJECT
        See Also:
        Constant Field Values
      • TOKEN_MISC_INTEGER

        public static final int TOKEN_MISC_INTEGER
        Type (for misc, other) used for regular integral types (byte/short/int/long)
        See Also:
        Constant Field Values
      • TOKEN_MISC_FP

        public static final int TOKEN_MISC_FP
        Type (for misc, other) used for regular floating-point types (float, double)
        See Also:
        Constant Field Values
      • TOKEN_MISC_LONG_TEXT_ASCII

        public static final int TOKEN_MISC_LONG_TEXT_ASCII
        Type (for misc, other) used for variable length UTF-8 encoded text, when it is known to only contain ASCII chars. Note: 2 LSB are reserved for future use; must be zeroes for now
        See Also:
        Constant Field Values
      • TOKEN_MISC_LONG_TEXT_UNICODE

        public static final int TOKEN_MISC_LONG_TEXT_UNICODE
        Type (for misc, other) used for variable length UTF-8 encoded text, when it is NOT known to only contain ASCII chars (which means it MAY have multi-byte characters) Note: 2 LSB are reserved for future use; must be zeroes for now
        See Also:
        Constant Field Values
      • TOKEN_MISC_BINARY_7BIT

        public static final int TOKEN_MISC_BINARY_7BIT
        Type (for misc, other) used for "safe" (encoded by only using 7 LSB, giving 8/7 expansion ratio). This is usually done to ensure that certain bytes are never included in encoded data (like 0xFF) Note: 2 LSB are reserved for future use; must be zeroes for now
        See Also:
        Constant Field Values
      • TOKEN_MISC_SHARED_STRING_LONG

        public static final int TOKEN_MISC_SHARED_STRING_LONG
        Type (for misc, other) used for shared String values where index does not fit in "short" reference range (which is 0 - 30). If so, 2 LSB from here and full following byte are used to get 10-bit index. Values
        See Also:
        Constant Field Values
      • TOKEN_MISC_BINARY_RAW

        public static final int TOKEN_MISC_BINARY_RAW
        Raw binary data marker is specifically chosen as separate from other types, since it can have significant impact on framing (or rather fast scanning based on structure and framing markers).
        See Also:
        Constant Field Values
      • TOKEN_MISC_FLOAT_32

        public static final int TOKEN_MISC_FLOAT_32
        Numeric subtype (2 LSB) for TOKEN_MISC_FP, indicating 32-bit IEEE single precision floating point number.
        See Also:
        Constant Field Values
      • TOKEN_MISC_FLOAT_64

        public static final int TOKEN_MISC_FLOAT_64
        Numeric subtype (2 LSB) for TOKEN_MISC_FP, indicating 64-bit IEEE double precision floating point number.
        See Also:
        Constant Field Values
      • TOKEN_MISC_FLOAT_BIG

        public static final int TOKEN_MISC_FLOAT_BIG
        Numeric subtype (2 LSB) for TOKEN_MISC_FP, indicating BigDecimal type.
        See Also:
        Constant Field Values
      • TOKEN_KEY_EMPTY_STRING

        public static final byte TOKEN_KEY_EMPTY_STRING
        Let's use same code for empty key as for empty String value
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_KEY_SHARED_LONG

        public static final int TOKEN_PREFIX_KEY_SHARED_LONG
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_KEY_SHARED_SHORT

        public static final int TOKEN_PREFIX_KEY_SHARED_SHORT
        See Also:
        Constant Field Values
      • TOKEN_PREFIX_KEY_UNICODE

        public static final int TOKEN_PREFIX_KEY_UNICODE
        See Also:
        Constant Field Values
      • sUtf8UnitLengths

        public static final int[] sUtf8UnitLengths
        Additionally we can combine UTF-8 decoding info into similar data table. Values indicate "byte length - 1"; meaning -1 is used for invalid bytes, 0 for single-byte codes, 1 for 2-byte codes and 2 for 3-byte codes.
    • Constructor Detail

      • SmileConstants

        public SmileConstants()