Package org.owasp.encoder
Class URIEncoder
- java.lang.Object
-
- org.owasp.encoder.Encoder
-
- org.owasp.encoder.URIEncoder
-
class URIEncoder extends Encoder
URIEncoder -- An encoder for URI based contexts.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classURIEncoder.ModeEncoding mode of operation for URI encodes.
-
Field Summary
Fields Modifier and Type Field Description private long_highMaskThe bit-mask of characters that do not need to be escaped, for character with code-points in the range 64 to 127.private long_lowMaskThe bit-mask of characters that do not need to be escaped, for characters with code-points in the range 0 to 63.private URIEncoder.Mode_modeThe encoding mode for this encoder--used primarily for toString().(package private) static intCHARS_0_TO_9Number of characters in the range '0' to '9'.(package private) static intCHARS_A_TO_ZNumber of characters in the range 'a' to 'z'.(package private) static charINVALID_REPLACEMENT_CHARACTERThe character to use when replacing an invalid character.(package private) static intLONG_BITSNumber of bits in a long.(package private) static intMAX_ENCODED_CHAR_LENGTHMaximum number of characters quired to encode a single input character.(package private) static intMAX_UTF8_2_BYTEMaximum code-point value that can be encoded with 2 utf-8 bytes.(package private) static intPERCENT_ENCODED_LENGTHNumber of characters used to '%' encode a single hex-value.(package private) static longRESERVED_MASK_HIGHThe second 64 RFC 3986 Reserved characters.(package private) static longRESERVED_MASK_LOWRFC 3986 Reserved Characters.(package private) static char[]UHEXRFC 3986 -- "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively.(package private) static longUNRESERVED_MASK_HIGHRFC 3986 Unreserved Characters.(package private) static longUNRESERVED_MASK_LOWRFC 3986 Unreserved Characters.(package private) static intUTF8_2_BYTE_FIRST_MSBWhen the encoded output requires 2 bytes, this is the high bits of the first byte.(package private) static intUTF8_3_BYTE_FIRST_MSBWhen the encoded output requires 3 bytes, this is the high bits of the first byte.(package private) static intUTF8_4_BYTE_FIRST_MSBWhen the encoded output requires 4 bytes, this is the high bits of the first byte.(package private) static intUTF8_BYTE_MSBFor all characters in a 2-4 byte encoded sequence after the first this is the high bits of the input bytes.(package private) static intUTF8_MASKThis is the mask containing 6-ones in the lower 6-bits.(package private) static intUTF8_SHIFTUTF-8 encodes 6-bits of the code-point in each output UTF-8 byte.
-
Constructor Summary
Constructors Constructor Description URIEncoder()Constructor equivalent to @{code URIEncoder(Mode.FULL_URI)}.URIEncoder(URIEncoder.Mode mode)Constructor for the URIEncoder the specifies the encoding mode the URIEncoder will use.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.nio.charset.CoderResultencodeArrays(java.nio.CharBuffer input, java.nio.CharBuffer output, boolean endOfInput)The core encoding loop used when both the input and output buffers are array backed.protected intfirstEncodedOffset(java.lang.String input, int off, int len)Scans the input string for the first character index that requires encoding.protected intmaxEncodedLength(int n)Returns the maximum encoded length (in chars) of an input sequence ofncharacters.java.lang.StringtoString()-
Methods inherited from class org.owasp.encoder.Encoder
encode, encodeBuffers, overflow, underflow
-
-
-
-
Field Detail
-
CHARS_0_TO_9
static final int CHARS_0_TO_9
Number of characters in the range '0' to '9'.- See Also:
- Constant Field Values
-
CHARS_A_TO_Z
static final int CHARS_A_TO_Z
Number of characters in the range 'a' to 'z'.- See Also:
- Constant Field Values
-
LONG_BITS
static final int LONG_BITS
Number of bits in a long.- See Also:
- Constant Field Values
-
MAX_ENCODED_CHAR_LENGTH
static final int MAX_ENCODED_CHAR_LENGTH
Maximum number of characters quired to encode a single input character.- See Also:
- Constant Field Values
-
PERCENT_ENCODED_LENGTH
static final int PERCENT_ENCODED_LENGTH
Number of characters used to '%' encode a single hex-value.- See Also:
- Constant Field Values
-
MAX_UTF8_2_BYTE
static final int MAX_UTF8_2_BYTE
Maximum code-point value that can be encoded with 2 utf-8 bytes.- See Also:
- Constant Field Values
-
UTF8_2_BYTE_FIRST_MSB
static final int UTF8_2_BYTE_FIRST_MSB
When the encoded output requires 2 bytes, this is the high bits of the first byte.- See Also:
- Constant Field Values
-
UTF8_3_BYTE_FIRST_MSB
static final int UTF8_3_BYTE_FIRST_MSB
When the encoded output requires 3 bytes, this is the high bits of the first byte.- See Also:
- Constant Field Values
-
UTF8_4_BYTE_FIRST_MSB
static final int UTF8_4_BYTE_FIRST_MSB
When the encoded output requires 4 bytes, this is the high bits of the first byte.- See Also:
- Constant Field Values
-
UTF8_BYTE_MSB
static final int UTF8_BYTE_MSB
For all characters in a 2-4 byte encoded sequence after the first this is the high bits of the input bytes.- See Also:
- Constant Field Values
-
UTF8_SHIFT
static final int UTF8_SHIFT
UTF-8 encodes 6-bits of the code-point in each output UTF-8 byte.- See Also:
- Constant Field Values
-
UTF8_MASK
static final int UTF8_MASK
This is the mask containing 6-ones in the lower 6-bits.- See Also:
- Constant Field Values
-
INVALID_REPLACEMENT_CHARACTER
static final char INVALID_REPLACEMENT_CHARACTER
The character to use when replacing an invalid character.- See Also:
- Constant Field Values
-
UHEX
static final char[] UHEX
RFC 3986 -- "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively. If two URIs differ only in the case of hexadecimal digits used in percent- encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings."
-
UNRESERVED_MASK_LOW
static final long UNRESERVED_MASK_LOW
RFC 3986 Unreserved Characters. The first 64.unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"- See Also:
- Constant Field Values
-
UNRESERVED_MASK_HIGH
static final long UNRESERVED_MASK_HIGH
RFC 3986 Unreserved Characters. The second 64.unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"- See Also:
- Constant Field Values
-
RESERVED_MASK_LOW
static final long RESERVED_MASK_LOW
RFC 3986 Reserved Characters. The first 64.reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="- See Also:
- Constant Field Values
-
RESERVED_MASK_HIGH
static final long RESERVED_MASK_HIGH
The second 64 RFC 3986 Reserved characters.- See Also:
- Constant Field Values
-
_lowMask
private final long _lowMask
The bit-mask of characters that do not need to be escaped, for characters with code-points in the range 0 to 63.
-
_highMask
private final long _highMask
The bit-mask of characters that do not need to be escaped, for character with code-points in the range 64 to 127.
-
_mode
private final URIEncoder.Mode _mode
The encoding mode for this encoder--used primarily for toString().
-
-
Constructor Detail
-
URIEncoder
URIEncoder()
Constructor equivalent to @{code URIEncoder(Mode.FULL_URI)}.
-
URIEncoder
URIEncoder(URIEncoder.Mode mode)
Constructor for the URIEncoder the specifies the encoding mode the URIEncoder will use.- Parameters:
mode- the encoding mode for this encoder.
-
-
Method Detail
-
maxEncodedLength
protected int maxEncodedLength(int n)
Description copied from class:EncoderReturns the maximum encoded length (in chars) of an input sequence ofncharacters.- Specified by:
maxEncodedLengthin classEncoder- Parameters:
n- the number of characters of input- Returns:
- the worst-case number of characters required to encode
-
firstEncodedOffset
protected int firstEncodedOffset(java.lang.String input, int off, int len)Description copied from class:EncoderScans the input string for the first character index that requires encoding. If the entire input does not require encoding then the length is returned. This method is used by the Encode.forXYZ methods to return input strings unchanged when possible.- Specified by:
firstEncodedOffsetin classEncoder- Parameters:
input- the input to check for encodingoff- the offset of the first character to checklen- the number of characters to check- Returns:
- the index of the first character to encode. The return value
will be
off+lenif no characters in the input require encoding.
-
encodeArrays
protected java.nio.charset.CoderResult encodeArrays(java.nio.CharBuffer input, java.nio.CharBuffer output, boolean endOfInput)Description copied from class:EncoderThe core encoding loop used when both the input and output buffers are array backed. The loop is expected to fetch the arrays and interact with the arrays directly for performance.- Overrides:
encodeArraysin classEncoder- Parameters:
input- the input buffer.output- the output buffer.endOfInput- when true, this is the last input to encode- Returns:
- UNDERFLOW or OVERFLOW
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
-