Package org.apache.uima.cas.impl
Class CASSerializer
java.lang.Object
org.apache.uima.cas.impl.CASSerializer
- All Implemented Interfaces:
Serializable
This object has 2 purposes.
- it can hold a collection of individually Java-object-serializable objects representing a CAS +
the list of FS's indexed in the CAS
- it has special methods (versions of addCAS) to do a custom binary serialization (no compression) of a CAS + lists
of its indexed FSs.
One use of this class follows this form:
1) create an instance of this class
2) add a Cas to it (via addCAS methods)
3) use the instance of this class as the argument to anObjectOutputStream.writeObject(anInstanceOfThisClass)
In UIMA this is done in the SerializationUtils class; it appears to be used for Vinci service adapters.
There are also custom serialization methods that serialize to outputStreams.
The format of the serialized data is in one of several formats:
normal Java object serialization / custom binary serialization
The custom binary serialization is in several formats:
full / delta:
full - the entire cas
delta - only differences from a previous "mark" are serialized
This class only does uncompressed forms of custom binary serialization.
This class is for internal use. Some of the serialized formats are readable by the C++
implementation, and used for efficiently transferring CASes between Java frameworks and other ones.
Others are used with Vinci to communicate to remote annotators.
To serialize the type definition and index specifications for a CAS
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionbyte[]
int[]
int[]
int[]
long[]
(package private) static final long
short[]
String[]
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
Add the CAS to be serialized.void
Add the CAS to be serialized.void
addCAS
(CASImpl cas, OutputStream ostream) Serializes the CAS data and writes it to the output stream.void
addCAS
(CASImpl cas, OutputStream ostream, boolean includeTsi) void
addCAS
(CASImpl cas, OutputStream ostream, Marker trackingMark) Serializes only new and modified FS and index operations made after the tracking mark is created.void
addNoMetaData
(CASImpl casImpl) Serialize CAS data without heap-internal meta data.(package private) void
addTsiCAS
(CASImpl cas, OutputStream ostream) private static int
convertArrayIndexToAuxHeapAddr
(BinaryCasSerDes bcsd, int index, TOP fs, Obj2IntIdentityHashMap<TOP> fs2auxOffset) The offset in the modeled heaps:private static int
convertArrayIndexToMainHeapAddr
(int index, TOP fs, Obj2IntIdentityHashMap<TOP> fs2addr) private void
(package private) byte[]
(package private) int[]
(package private) int[]
(package private) int[]
(package private) long[]
(package private) short[]
(package private) String[]
private void
outputStringHeap
(DataOutputStream dos, CASImpl cas, StringHeapDeserializationHelper shdh, BinaryCasSerDes bcsd) (package private) static void
scanModifications
(BinaryCasSerDes bcsd, CommonSerDesSequential csds, CASImpl.FsChange[] fssModified, Obj2IntIdentityHashMap<TOP> fs2auxOffset, List<CASSerializer.AddrPlusValue> chgMainAvs, List<CASSerializer.AddrPlusValue> chgByteAvs, List<CASSerializer.AddrPlusValue> chgShortAvs, List<CASSerializer.AddrPlusValue> chgLongAvs) Scan the v3 fsChange info and produce v2 style info into chgXxxAddr, chgXxxValue A prescan approach is needed in order to write the number of modifications preceding the write of the values (which unfortunately were written to the same stream in V2).private void
writeMods
(List<CASSerializer.AddrPlusValue> avs, DataOutputStream dos, Consumer_T_withIOException<CASSerializer.AddrPlusValue> writeValue)
-
Field Details
-
serialVersionUID
static final long serialVersionUID- See Also:
-
heapArray
public int[] heapArray -
heapMetaData
public int[] heapMetaData -
stringTable
-
fsIndex
public int[] fsIndex -
byteHeapArray
public byte[] byteHeapArray -
shortHeapArray
public short[] shortHeapArray -
longHeapArray
public long[] longHeapArray
-
-
Constructor Details
-
CASSerializer
public CASSerializer()Constructor for CASSerializer.
-
-
Method Details
-
addNoMetaData
Serialize CAS data without heap-internal meta data. Currently used for serialization to C++.- Parameters:
casImpl
- The CAS to be serialized.
-
addCAS
Add the CAS to be serialized. Note that we need the implementation here, the interface is not enough.- Parameters:
cas
- The CAS to be serialized.
-
addCAS
Add the CAS to be serialized.- Parameters:
cas
- The CAS to be serialized.addMetaData
- - true to include metadata
-
outputStringHeap
private void outputStringHeap(DataOutputStream dos, CASImpl cas, StringHeapDeserializationHelper shdh, BinaryCasSerDes bcsd) throws IOException - Throws:
IOException
-
addTsiCAS
-
addCAS
Serializes the CAS data and writes it to the output stream. --------------------------------------------------------------------- Blob Format Element Size Number of Description (bytes) Elements ------------ --------- -------------------------------- 4 1 Blob key = "UIMA" in utf-8 4 1 Version (currently = 1) 4 1 size of 32-bit FS Heap array = s32H 4 s32H 32-bit FS heap array 4 1 size of 16-bit string Heap array = sSH 2 sSH 16-bit string heap array 4 1 size of string Ref Heap zrray = sSRH 4 2*sSRH string ref offsets and lengths 4 1 size of FS index array = sFSI 4 sFSI FS index array 4 1 size of 8-bit Heap array = s8H 1 s8H 8-bit Heap array 4 1 size of 16-bit Heap array = s16H 2 s16H 16-bit Heap array 4 1 size of 64-bit Heap array = s64H 8 s64H 64-bit Heap array ---------------------------------------------------------------------- Parameters:
cas
- The CAS to be serialized. ostream The output stream.ostream
- -
-
addCAS
-
addCAS
Serializes only new and modified FS and index operations made after the tracking mark is created. Serializes CAS data in binary Delta format described below and writes it to the output stream. ElementSize NumberOfElements Description ----------- ---------------- --------------------------------------------------------- 4 1 Blob key = "UIMA" in utf-8 (byte order flag) 4 1 Version (1 = complete cas, 2 = delta cas) 4 1 size of 32-bit heap array = s32H 4 s32H 32-bit FS heap array (new elements) 4 1 size of 16-bit string Heap array = sSH 2 sSH 16-bit string heap array (new strings) 4 1 size of string Ref Heap array = sSRH 4 2*sSRH string ref offsets and lengths (for new strings) 4 1 number of modified, preexisting 32-bit modified FS heap elements = sM32H 4 2*sM32H 32-bit heap offset and value (preexisting cells modified) 4 1 size of FS index array = sFSI 4 sFSI FS index array in Delta format 4 1 size of 8-bit Heap array = s8H 1 s8H 8-bit Heap array (new elements) 4 1 size of 16-bit Heap array = s16H 2 s16H 16-bit Heap array (new elements) 4 1 size of 64-bit Heap array = s64H 8 s64H 64-bit Heap array (new elements) 4 1 number of modified, preexisting 8-bit heap elements = sM8H 4 sM8H 8-bit heap offsets (preexisting cells modified) 1 sM8H 8-bit heap values (preexisting cells modified) 4 1 number of modified, preexisting 16-bit heap elements = sM16H 4 sM16H 16-bit heap offsets (preexisting cells modified) 2 sM16H 16-bit heap values (preexisting cells modified) 4 1 number of modified, preexisting 64-bit heap elements = sM64H 4 sM64H 64-bit heap offsets (preexisting cells modified) 2 sM64H 64-bit heap values (preexisting cells modified)- Parameters:
cas
- -ostream
- -trackingMark
- -
-
writeMods
private void writeMods(List<CASSerializer.AddrPlusValue> avs, DataOutputStream dos, Consumer_T_withIOException<CASSerializer.AddrPlusValue> writeValue) throws IOException - Throws:
IOException
-
convertArrayIndexToAuxHeapAddr
private static int convertArrayIndexToAuxHeapAddr(BinaryCasSerDes bcsd, int index, TOP fs, Obj2IntIdentityHashMap<TOP> fs2auxOffset) The offset in the modeled heaps:- Parameters:
index
- the 0-based index into the arrayfs
- the feature structure representing the array- Returns:
- the addr into an aux array or main heap
-
convertArrayIndexToMainHeapAddr
private static int convertArrayIndexToMainHeapAddr(int index, TOP fs, Obj2IntIdentityHashMap<TOP> fs2addr) -
scanModifications
static void scanModifications(BinaryCasSerDes bcsd, CommonSerDesSequential csds, CASImpl.FsChange[] fssModified, Obj2IntIdentityHashMap<TOP> fs2auxOffset, List<CASSerializer.AddrPlusValue> chgMainAvs, List<CASSerializer.AddrPlusValue> chgByteAvs, List<CASSerializer.AddrPlusValue> chgShortAvs, List<CASSerializer.AddrPlusValue> chgLongAvs) Scan the v3 fsChange info and produce v2 style info into chgXxxAddr, chgXxxValue A prescan approach is needed in order to write the number of modifications preceding the write of the values (which unfortunately were written to the same stream in V2).- Parameters:
bcsd
- holds the model needed for v2 aux arrayscas
- the cas to use for the delta serializationchgMainHeapAddr
- an ordered collection of changed addresses as an array for the main heapchgByteAddr
- an ordered collection of changed addresses as an array for the aux byte heapchgShortAddr
- an ordered collection of changed addresses as an array for the aus short heapchgLongAddr
- an ordered collection of changed addresses as an array for the aux long heapchgMainHeapValue
- corresponding values
-
getHeapMetadata
int[] getHeapMetadata() -
getHeapArray
int[] getHeapArray() -
getStringTable
String[] getStringTable() -
getFSIndex
int[] getFSIndex() -
getByteArray
byte[] getByteArray() -
getShortArray
short[] getShortArray() -
getLongArray
long[] getLongArray() -
copyHeapsToArrays
-