Class TrimPrefixAndSuffixEncoder

java.lang.Object
morfologik.stemming.TrimPrefixAndSuffixEncoder
All Implemented Interfaces:
ISequenceEncoder

public class TrimPrefixAndSuffixEncoder extends Object implements ISequenceEncoder
Encodes dst relative to src by trimming whatever non-equal suffix and prefix src and dst have. The output code is (bytes):
 {P}{K}{suffix}
 
where (P - 'A') bytes should be trimmed from the start of src, (K - 'A') bytes should be trimmed from the end of src and then the suffix should be appended to the resulting byte sequence.

Examples:

 src: abc
 dst: abcd
 encoded: AAd
 
 src: abc
 dst: xyz
 encoded: ADxyz
 
  • Field Details

    • REMOVE_EVERYTHING

      private static final int REMOVE_EVERYTHING
      Maximum encodable single-byte code.
      See Also:
  • Constructor Details

    • TrimPrefixAndSuffixEncoder

      public TrimPrefixAndSuffixEncoder()
  • Method Details

    • encode

      public ByteBuffer encode(ByteBuffer reuse, ByteBuffer source, ByteBuffer target)
      Description copied from interface: ISequenceEncoder
      Encodes target relative to source, optionally reusing the provided ByteBuffer.
      Specified by:
      encode in interface ISequenceEncoder
      Parameters:
      reuse - Reuses the provided ByteBuffer or allocates a new one if there is not enough remaining space.
      source - The source byte sequence.
      target - The target byte sequence to encode relative to source
      Returns:
      Returns the ByteBuffer with encoded target.
    • prefixBytes

      public int prefixBytes()
      Description copied from interface: ISequenceEncoder
      The number of encoded form's prefix bytes that should be ignored (needed for separator lookup). An ugly workaround for GH-85, should be fixed by prior knowledge of whether the dictionary contains tags; then we can scan for separator right-to-left.
      Specified by:
      prefixBytes in interface ISequenceEncoder
      See Also:
      • "https://github.com/morfologik/morfologik-stemming/issues/85"
    • decode

      public ByteBuffer decode(ByteBuffer reuse, ByteBuffer source, ByteBuffer encoded)
      Description copied from interface: ISequenceEncoder
      Decodes encoded relative to source, optionally reusing the provided ByteBuffer.
      Specified by:
      decode in interface ISequenceEncoder
      Parameters:
      reuse - Reuses the provided ByteBuffer or allocates a new one if there is not enough remaining space.
      source - The source byte sequence.
      encoded - The previously encoded byte sequence.
      Returns:
      Returns the ByteBuffer with decoded target.
    • toString

      public String toString()
      Overrides:
      toString in class Object