Class FileEncoding


  • public class FileEncoding
    extends java.lang.Object
    Tries to guess the encoding of the byte sequence. Orignial code taken from https://github.com/file/file/blob/master/src/encoding.c
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private java.lang.String code  
      private java.lang.String codeMime  
      private static char[] EBCDIC_1047_TO_8859  
      private static char[] EBCDIC_TO_ASCII  
      private static byte F  
      private static byte I  
      private static byte T  
      private byte[] text_chars  
      private java.lang.String type  
      private static byte X  
    • Constructor Summary

      Constructors 
      Constructor Description
      FileEncoding()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private byte[] fromEbcdic​(byte[] buf, int nbytes)  
      java.lang.String getCode()  
      java.lang.String getCodeMime()  
      java.lang.String getType()  
      boolean guessFileEncoding​(byte[] buf)
      Try to determine whether text is in some character code we can identify.
      private boolean looksAscii​(byte[] buf, int nbytes)  
      private boolean looksExtended​(byte[] buf, int nbytes)  
      private boolean looksLatin1​(byte[] buf, int nbytes)  
      private int looksUcs16​(byte[] buf, int nbytes)  
      private boolean looksUtf7​(byte[] buf, int nbytes)  
      protected int looksUtf8​(byte[] buf, int nbytes)  
      private boolean looksUtf8WithBOM​(byte[] buf, int nbytes)  
      private int unsignedByte​(byte value)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • type

        private java.lang.String type
      • code

        private java.lang.String code
      • codeMime

        private java.lang.String codeMime
      • text_chars

        private byte[] text_chars
      • EBCDIC_TO_ASCII

        private static final char[] EBCDIC_TO_ASCII
      • EBCDIC_1047_TO_8859

        private static final char[] EBCDIC_1047_TO_8859
    • Constructor Detail

      • FileEncoding

        public FileEncoding()
    • Method Detail

      • getCodeMime

        public java.lang.String getCodeMime()
      • getType

        public java.lang.String getType()
      • getCode

        public java.lang.String getCode()
      • guessFileEncoding

        public boolean guessFileEncoding​(byte[] buf)
        Try to determine whether text is in some character code we can identify. It also identifies EBCDIC by converting it to ISO-8859-1.
        Returns:
        true if it could guess an encoding.
      • looksAscii

        private boolean looksAscii​(byte[] buf,
                                   int nbytes)
      • looksLatin1

        private boolean looksLatin1​(byte[] buf,
                                    int nbytes)
      • looksExtended

        private boolean looksExtended​(byte[] buf,
                                      int nbytes)
      • looksUtf8

        protected int looksUtf8​(byte[] buf,
                                int nbytes)
      • looksUtf8WithBOM

        private boolean looksUtf8WithBOM​(byte[] buf,
                                         int nbytes)
      • looksUtf7

        private boolean looksUtf7​(byte[] buf,
                                  int nbytes)
      • looksUcs16

        private int looksUcs16​(byte[] buf,
                               int nbytes)
      • fromEbcdic

        private byte[] fromEbcdic​(byte[] buf,
                                  int nbytes)
      • unsignedByte

        private int unsignedByte​(byte value)