Class CharacterUtils


  • public final class CharacterUtils
    extends java.lang.Object
    Utility class to write tokenizers or token filters.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private CharacterUtils()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static boolean fill​(CharacterUtils.CharacterBuffer buffer, java.io.Reader reader)
      Convenience method which calls fill(buffer, reader, buffer.buffer.length).
      static boolean fill​(CharacterUtils.CharacterBuffer buffer, java.io.Reader reader, int numChars)
      Fills the CharacterUtils.CharacterBuffer with characters read from the given reader Reader.
      static CharacterUtils.CharacterBuffer newCharacterBuffer​(int bufferSize)
      Creates a new CharacterUtils.CharacterBuffer and allocates a char[] of the given bufferSize.
      (package private) static int readFully​(java.io.Reader reader, char[] dest, int offset, int len)  
      static int toChars​(int[] src, int srcOff, int srcLen, char[] dest, int destOff)
      Converts a sequence of unicode code points to a sequence of Java characters.
      static int toCodePoints​(char[] src, int srcOff, int srcLen, int[] dest, int destOff)
      Converts a sequence of Java characters to a sequence of unicode code points.
      static void toLowerCase​(char[] buffer, int offset, int limit)
      Converts each unicode codepoint to lowerCase via Character.toLowerCase(int) starting at the given offset.
      static void toUpperCase​(char[] buffer, int offset, int limit)
      Converts each unicode codepoint to UpperCase via Character.toUpperCase(int) starting at the given offset.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CharacterUtils

        private CharacterUtils()
    • Method Detail

      • toLowerCase

        public static void toLowerCase​(char[] buffer,
                                       int offset,
                                       int limit)
        Converts each unicode codepoint to lowerCase via Character.toLowerCase(int) starting at the given offset.
        Parameters:
        buffer - the char buffer to lowercase
        offset - the offset to start at
        limit - the max char in the buffer to lower case
      • toUpperCase

        public static void toUpperCase​(char[] buffer,
                                       int offset,
                                       int limit)
        Converts each unicode codepoint to UpperCase via Character.toUpperCase(int) starting at the given offset.
        Parameters:
        buffer - the char buffer to UPPERCASE
        offset - the offset to start at
        limit - the max char in the buffer to lower case
      • toCodePoints

        public static int toCodePoints​(char[] src,
                                       int srcOff,
                                       int srcLen,
                                       int[] dest,
                                       int destOff)
        Converts a sequence of Java characters to a sequence of unicode code points.
        Returns:
        the number of code points written to the destination buffer
      • toChars

        public static int toChars​(int[] src,
                                  int srcOff,
                                  int srcLen,
                                  char[] dest,
                                  int destOff)
        Converts a sequence of unicode code points to a sequence of Java characters.
        Returns:
        the number of chars written to the destination buffer
      • fill

        public static boolean fill​(CharacterUtils.CharacterBuffer buffer,
                                   java.io.Reader reader,
                                   int numChars)
                            throws java.io.IOException
        Fills the CharacterUtils.CharacterBuffer with characters read from the given reader Reader. This method tries to read numChars characters into the CharacterUtils.CharacterBuffer, each call to fill will start filling the buffer from offset 0 up to numChars. In case code points can span across 2 java characters, this method may only fill numChars - 1 characters in order not to split in the middle of a surrogate pair, even if there are remaining characters in the Reader.

        This method guarantees that the given CharacterUtils.CharacterBuffer will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.

        A return value of false means that this method call exhausted the reader, but there may be some bytes which have been read, which can be verified by checking whether buffer.getLength() > 0.

        Parameters:
        buffer - the buffer to fill.
        reader - the reader to read characters from.
        numChars - the number of chars to read
        Returns:
        false if and only if reader.read returned -1 while trying to fill the buffer
        Throws:
        java.io.IOException - if the reader throws an IOException.
      • fill

        public static boolean fill​(CharacterUtils.CharacterBuffer buffer,
                                   java.io.Reader reader)
                            throws java.io.IOException
        Convenience method which calls fill(buffer, reader, buffer.buffer.length).
        Throws:
        java.io.IOException
      • readFully

        static int readFully​(java.io.Reader reader,
                             char[] dest,
                             int offset,
                             int len)
                      throws java.io.IOException
        Throws:
        java.io.IOException