com.lowagie.text.pdf

Class BidiOrder

public final class BidiOrder extends Object

Reference implementation of the Unicode 3.0 Bidi algorithm.

This implementation is not optimized for performance. It is intended as a reference implementation that closely follows the specification of the Bidirectional Algorithm in The Unicode Standard version 3.0.

Input:
There are two levels of input to the algorithm, since clients may prefer to supply some information from out-of-band sources rather than relying on the default behavior.

  1. unicode type array
  2. unicode type array, with externally supplied base line direction

Output:
Output is separated into several stages as well, to better enable clients to evaluate various aspects of implementation conformance.

  1. levels array over entire paragraph
  2. reordering array over entire paragraph
  3. levels array over line
  4. reordering array over line
Note that for conformance, algorithms are only required to generate correct reordering and character directionality (odd or even levels) over a line. Generating identical level arrays over a line is not required. Bidi explicit format codes (LRE, RLE, LRO, RLO, PDF) and BN can be assigned arbitrary levels and positions as long as the other text matches.

As the algorithm is defined to operate on a single paragraph at a time, this implementation is written to handle single paragraphs. Thus rule P1 is presumed by this implementation-- the data provided to the implementation is assumed to be a single paragraph, and either contains no 'B' codes, or a single 'B' code at the end of the input. 'B' is allowed as input to illustrate how the algorithm assigns it a level.

Also note that rules L3 and L4 depend on the rendering engine that uses the result of the bidi algorithm. This implementation assumes that the rendering engine expects combining marks in visual order (e.g. to the left of their base character in RTL runs) and that it adjust the glyphs used to render mirrored characters that are in RTL runs so that they render appropriately.

Author: Doug Felt

Field Summary
static byteAL
Right-to-Left Arabic
static byteAN
Arabic Number
static char[]baseTypes
static byteB
Paragraph Separator
static byteBN
Boundary Neutral
static byteCS
Common Number Separator
byte[]embeddings
static byteEN
European Number
static byteES
European Number Separator
static byteET
European Number Terminator
byte[]initialTypes
static byteL
Left-to-right
static byteLRE
Left-to-Right Embedding
static byteLRO
Left-to-Right Override
static byteNSM
Non-Spacing Mark
static byteON
Other Neutrals
byteparagraphEmbeddingLevel
static bytePDF
Pop Directional Format
byte[]resultLevels
byte[]resultTypes
static byte[]rtypes
static byteR
Right-to-Left
static byteRLE
Right-to-Left Embedding
static byteRLO
Right-to-Left Override
static byteS
Segment Separator
inttextLength
static byteTYPE_MAX
Maximum bidi type value.
static byteTYPE_MIN
Minimum bidi type value.
static byteWS
Whitespace
Constructor Summary
BidiOrder(byte[] types)
Initialize using an array of direction types.
BidiOrder(byte[] types, byte paragraphEmbeddingLevel)
Initialize using an array of direction types and an externally supplied paragraph embedding level.
BidiOrder(char[] text, int offset, int length, byte paragraphEmbeddingLevel)
Method Summary
static int[]computeMultilineReordering(byte[] levels, int[] linebreaks)
Return multiline reordering array for a given level array.
static int[]computeReordering(byte[] levels)
Return reordering array for a given level array.
voiddetermineExplicitEmbeddingLevels()
Process embedding format codes.
voiddetermineParagraphEmbeddingLevel()
1) determining the paragraph level.
intfindRunLimit(int index, int limit, byte[] validSet)
Return the limit of the run starting at index that includes only resultTypes in validSet.
intfindRunStart(int index, byte[] validSet)
Return the start of the run including index that includes only resultTypes in validSet.
bytegetBaseLevel()
Return the base level of the paragraph.
static bytegetDirection(char c)
byte[]getLevels()
byte[]getLevels(int[] linebreaks)
Return levels array breaking lines at offsets in linebreaks.
int[]getReordering(int[] linebreaks)
Return reordering array breaking lines at offsets in linebreaks.
static booleanisWhitespace(byte biditype)
Return true if the type is considered a whitespace type for the line break rules.
static byte[]processEmbeddings(byte[] resultTypes, byte paragraphEmbeddingLevel)
2) determining explicit levels Rules X1 - X8 The interaction of these rules makes handling them a bit complex.
intreinsertExplicitCodes(int textLength)
Reinsert levels information for explicit codes.
intremoveExplicitCodes()
Rules X9.
voidresolveImplicitLevels(int start, int limit, byte level, byte sor, byte eor)
7) resolving implicit embedding levels Rules I1, I2.
voidresolveNeutralTypes(int start, int limit, byte level, byte sor, byte eor)
6) resolving neutral types Rules N1-N2.
voidresolveWeakTypes(int start, int limit, byte level, byte sor, byte eor)
3) resolving weak types Rules W1-W7.
voidrunAlgorithm()
The algorithm.
voidsetLevels(int start, int limit, byte newLevel)
Set resultLevels from start up to (but not including) limit to newLevel.
voidsetTypes(int start, int limit, byte newType)
Set resultTypes from start up to (but not including) limit to newType.
static bytetypeForLevel(int level)
Return the strong type (L or R) corresponding to the level.
static voidvalidateLineBreaks(int[] linebreaks, int textLength)
Throw exception if line breaks array is invalid.
static voidvalidateParagraphEmbeddingLevel(byte paragraphEmbeddingLevel)
Throw exception if paragraph embedding level is invalid.
static voidvalidateTypes(byte[] types)
Throw exception if type array is invalid.

Field Detail

AL

public static final byte AL
Right-to-Left Arabic

AN

public static final byte AN
Arabic Number

baseTypes

private static char[] baseTypes

B

public static final byte B
Paragraph Separator

BN

public static final byte BN
Boundary Neutral

CS

public static final byte CS
Common Number Separator

embeddings

private byte[] embeddings

EN

public static final byte EN
European Number

ES

public static final byte ES
European Number Separator

ET

public static final byte ET
European Number Terminator

initialTypes

private byte[] initialTypes

L

public static final byte L
Left-to-right

LRE

public static final byte LRE
Left-to-Right Embedding

LRO

public static final byte LRO
Left-to-Right Override

NSM

public static final byte NSM
Non-Spacing Mark

ON

public static final byte ON
Other Neutrals

paragraphEmbeddingLevel

private byte paragraphEmbeddingLevel

PDF

public static final byte PDF
Pop Directional Format

resultLevels

private byte[] resultLevels

resultTypes

private byte[] resultTypes

rtypes

private static final byte[] rtypes

R

public static final byte R
Right-to-Left

RLE

public static final byte RLE
Right-to-Left Embedding

RLO

public static final byte RLO
Right-to-Left Override

S

public static final byte S
Segment Separator

textLength

private int textLength

TYPE_MAX

public static final byte TYPE_MAX
Maximum bidi type value.

TYPE_MIN

public static final byte TYPE_MIN
Minimum bidi type value.

WS

public static final byte WS
Whitespace

Constructor Detail

BidiOrder

public BidiOrder(byte[] types)
Initialize using an array of direction types. Types range from TYPE_MIN to TYPE_MAX inclusive and represent the direction codes of the characters in the text.

Parameters: types the types array

BidiOrder

public BidiOrder(byte[] types, byte paragraphEmbeddingLevel)
Initialize using an array of direction types and an externally supplied paragraph embedding level. The embedding level may be -1, 0, or 1. -1 means to apply the default algorithm (rules P2 and P3), 0 is for LTR paragraphs, and 1 is for RTL paragraphs.

Parameters: types the types array paragraphEmbeddingLevel the externally supplied paragraph embedding level.

BidiOrder

public BidiOrder(char[] text, int offset, int length, byte paragraphEmbeddingLevel)

Method Detail

computeMultilineReordering

private static int[] computeMultilineReordering(byte[] levels, int[] linebreaks)
Return multiline reordering array for a given level array. Reordering does not occur across a line break.

computeReordering

private static int[] computeReordering(byte[] levels)
Return reordering array for a given level array. This reorders a single line. The reordering is a visual to logical map. For example, the leftmost char is string.charAt(order[0]). Rule L2.

determineExplicitEmbeddingLevels

private void determineExplicitEmbeddingLevels()
Process embedding format codes.

Calls processEmbeddings to generate an embedding array from the explicit format codes. The embedding overrides in the array are then applied to the result types, and the result levels are initialized.

See Also: BidiOrder

determineParagraphEmbeddingLevel

private void determineParagraphEmbeddingLevel()
1) determining the paragraph level.

Rules P2, P3.

At the end of this function, the member variable paragraphEmbeddingLevel is set to either 0 or 1.

findRunLimit

private int findRunLimit(int index, int limit, byte[] validSet)
Return the limit of the run starting at index that includes only resultTypes in validSet. This checks the value at index, and will return index if that value is not in validSet.

findRunStart

private int findRunStart(int index, byte[] validSet)
Return the start of the run including index that includes only resultTypes in validSet. This assumes the value at index is valid, and does not check it.

getBaseLevel

public byte getBaseLevel()
Return the base level of the paragraph.

getDirection

public static final byte getDirection(char c)

getLevels

public byte[] getLevels()

getLevels

public byte[] getLevels(int[] linebreaks)
Return levels array breaking lines at offsets in linebreaks.
Rule L1.

The returned levels array contains the resolved level for each bidi code passed to the constructor.

The linebreaks array must include at least one value. The values must be in strictly increasing order (no duplicates) between 1 and the length of the text, inclusive. The last value must be the length of the text.

Parameters: linebreaks the offsets at which to break the paragraph

Returns: the resolved levels of the text

getReordering

public int[] getReordering(int[] linebreaks)
Return reordering array breaking lines at offsets in linebreaks.

The reordering array maps from a visual index to a logical index. Lines are concatenated from left to right. So for example, the fifth character from the left on the third line is

 getReordering(linebreaks)[linebreaks[1] + 4]
(linebreaks[1] is the position after the last character of the second line, which is also the index of the first character on the third line, and adding four gets the fifth character from the left).

The linebreaks array must include at least one value. The values must be in strictly increasing order (no duplicates) between 1 and the length of the text, inclusive. The last value must be the length of the text.

Parameters: linebreaks the offsets at which to break the paragraph.

isWhitespace

private static boolean isWhitespace(byte biditype)
Return true if the type is considered a whitespace type for the line break rules.

processEmbeddings

private static byte[] processEmbeddings(byte[] resultTypes, byte paragraphEmbeddingLevel)
2) determining explicit levels Rules X1 - X8 The interaction of these rules makes handling them a bit complex. This examines resultTypes but does not modify it. It returns embedding and override information in the result array. The low 7 bits are the level, the high bit is set if the level is an override, and clear if it is an embedding.

reinsertExplicitCodes

private int reinsertExplicitCodes(int textLength)
Reinsert levels information for explicit codes. This is for ease of relating the level information to the original input data. Note that the levels assigned to these codes are arbitrary, they're chosen so as to avoid breaking level runs.

Parameters: textLength the length of the data after compression

Returns: the length of the data (original length of types array supplied to constructor)

removeExplicitCodes

private int removeExplicitCodes()
Rules X9. Remove explicit codes so that they may be ignored during the remainder of the main portion of the algorithm. The length of the resulting text is returned.

Returns: the length of the data excluding explicit codes and BN.

resolveImplicitLevels

private void resolveImplicitLevels(int start, int limit, byte level, byte sor, byte eor)
7) resolving implicit embedding levels Rules I1, I2.

resolveNeutralTypes

private void resolveNeutralTypes(int start, int limit, byte level, byte sor, byte eor)
6) resolving neutral types Rules N1-N2.

resolveWeakTypes

private void resolveWeakTypes(int start, int limit, byte level, byte sor, byte eor)
3) resolving weak types Rules W1-W7. Note that some weak types (EN, AN) remain after this processing is complete.

runAlgorithm

private void runAlgorithm()
The algorithm. Does not include line-based processing (Rules L1, L2). These are applied later in the line-based phase of the algorithm.

setLevels

private void setLevels(int start, int limit, byte newLevel)
Set resultLevels from start up to (but not including) limit to newLevel.

setTypes

private void setTypes(int start, int limit, byte newType)
Set resultTypes from start up to (but not including) limit to newType.

typeForLevel

private static byte typeForLevel(int level)
Return the strong type (L or R) corresponding to the level.

validateLineBreaks

private static void validateLineBreaks(int[] linebreaks, int textLength)
Throw exception if line breaks array is invalid.

validateParagraphEmbeddingLevel

private static void validateParagraphEmbeddingLevel(byte paragraphEmbeddingLevel)
Throw exception if paragraph embedding level is invalid. Special allowance for -1 so that default processing can still be performed when using this API.

validateTypes

private static void validateTypes(byte[] types)
Throw exception if type array is invalid.