Class CheckIndex

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public final class CheckIndex
    extends java.lang.Object
    implements java.io.Closeable
    Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments.

    As this tool checks every byte in the index, on a large index it can take quite a long time to run.

    • Field Detail

      • infoStream

        private java.io.PrintStream infoStream
      • writeLock

        private Lock writeLock
      • closed

        private volatile boolean closed
      • doSlowChecks

        private boolean doSlowChecks
      • failFast

        private boolean failFast
      • verbose

        private boolean verbose
      • checksumsOnly

        private boolean checksumsOnly
      • assertsOn

        private static boolean assertsOn
    • Constructor Detail

      • CheckIndex

        public CheckIndex​(Directory dir)
                   throws java.io.IOException
        Create a new CheckIndex on the directory.
        Throws:
        java.io.IOException
      • CheckIndex

        public CheckIndex​(Directory dir,
                          Lock writeLock)
        Expert: create a directory with the specified lock. This should really not be used except for unit tests!!!! It exists only to support special tests (such as TestIndexWriterExceptions*), that would otherwise be more complicated to debug if they had to close the writer for each check.
    • Method Detail

      • ensureOpen

        private void ensureOpen()
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • setDoSlowChecks

        public void setDoSlowChecks​(boolean v)
        If true, additional slow checks are performed. This will likely drastically increase time it takes to run CheckIndex!
      • setFailFast

        public void setFailFast​(boolean v)
        If true, just throw the original exception immediately when corruption is detected, rather than continuing to iterate to other segments looking for more corruption.
      • setChecksumsOnly

        public void setChecksumsOnly​(boolean v)
        If true, only validate physical integrity for all files. Note that the returned nested status objects (e.g. storedFieldStatus) will be null.
      • setInfoStream

        public void setInfoStream​(java.io.PrintStream out,
                                  boolean verbose)
        Set infoStream where messages should go. If null, no messages are printed. If verbose is true then more details are printed.
      • msg

        private static void msg​(java.io.PrintStream out,
                                java.lang.String msg)
      • checkIndex

        public CheckIndex.Status checkIndex()
                                     throws java.io.IOException
        Returns a CheckIndex.Status instance detailing the state of the index.

        As this method checks every byte in the index, on a large index it can take quite a long time to run.

        WARNING: make sure you only call this when the index is not opened by any writer.

        Throws:
        java.io.IOException
      • checkIndex

        public CheckIndex.Status checkIndex​(java.util.List<java.lang.String> onlySegments)
                                     throws java.io.IOException
        Returns a CheckIndex.Status instance detailing the state of the index.
        Parameters:
        onlySegments - list of specific segment names to check

        As this method checks every byte in the specified segments, on a large index it can take quite a long time to run.

        Throws:
        java.io.IOException
      • testLiveDocs

        public static CheckIndex.Status.LiveDocStatus testLiveDocs​(CodecReader reader,
                                                                   java.io.PrintStream infoStream,
                                                                   boolean failFast)
                                                            throws java.io.IOException
        Test live docs.
        Throws:
        java.io.IOException
      • testFieldInfos

        public static CheckIndex.Status.FieldInfoStatus testFieldInfos​(CodecReader reader,
                                                                       java.io.PrintStream infoStream,
                                                                       boolean failFast)
                                                                throws java.io.IOException
        Test field infos.
        Throws:
        java.io.IOException
      • testFieldNorms

        public static CheckIndex.Status.FieldNormStatus testFieldNorms​(CodecReader reader,
                                                                       java.io.PrintStream infoStream,
                                                                       boolean failFast)
                                                                throws java.io.IOException
        Test field norms.
        Throws:
        java.io.IOException
      • getDocsFromTermRange

        private static long getDocsFromTermRange​(java.lang.String field,
                                                 int maxDoc,
                                                 TermsEnum termsEnum,
                                                 FixedBitSet docsSeen,
                                                 BytesRef minTerm,
                                                 BytesRef maxTerm,
                                                 boolean isIntersect)
                                          throws java.io.IOException
        Visits all terms in the range minTerm (inclusive) to maxTerm (exclusive), marking all doc IDs encountered into allDocsSeen, and returning the total number of terms visited.
        Throws:
        java.io.IOException
      • checkSingleTermRange

        private static boolean checkSingleTermRange​(java.lang.String field,
                                                    int maxDoc,
                                                    Terms terms,
                                                    BytesRef minTerm,
                                                    BytesRef maxTerm,
                                                    FixedBitSet normalDocs,
                                                    FixedBitSet intersectDocs)
                                             throws java.io.IOException
        Test Terms.intersect on this range, and validates that it returns the same doc ids as using non-intersect TermsEnum. Returns true if any fake terms were seen.
        Throws:
        java.io.IOException
      • checkFields

        private static CheckIndex.Status.TermIndexStatus checkFields​(Fields fields,
                                                                     Bits liveDocs,
                                                                     int maxDoc,
                                                                     FieldInfos fieldInfos,
                                                                     NormsProducer normsProducer,
                                                                     boolean doPrint,
                                                                     boolean isVectors,
                                                                     java.io.PrintStream infoStream,
                                                                     boolean verbose,
                                                                     boolean doSlowChecks)
                                                              throws java.io.IOException
        checks Fields api is consistent with itself. searcher is optional, to verify with queries. Can be null.
        Throws:
        java.io.IOException
      • checkImpacts

        static void checkImpacts​(Impacts impacts,
                                 int lastTarget)
      • testPostings

        public static CheckIndex.Status.TermIndexStatus testPostings​(CodecReader reader,
                                                                     java.io.PrintStream infoStream,
                                                                     boolean verbose,
                                                                     boolean doSlowChecks,
                                                                     boolean failFast)
                                                              throws java.io.IOException
        Test the term index.
        Throws:
        java.io.IOException
      • testPoints

        public static CheckIndex.Status.PointsStatus testPoints​(CodecReader reader,
                                                                java.io.PrintStream infoStream,
                                                                boolean failFast)
                                                         throws java.io.IOException
        Test the points index
        Throws:
        java.io.IOException
      • testStoredFields

        public static CheckIndex.Status.StoredFieldStatus testStoredFields​(CodecReader reader,
                                                                           java.io.PrintStream infoStream,
                                                                           boolean failFast)
                                                                    throws java.io.IOException
        Test stored fields.
        Throws:
        java.io.IOException
      • testDocValues

        public static CheckIndex.Status.DocValuesStatus testDocValues​(CodecReader reader,
                                                                      java.io.PrintStream infoStream,
                                                                      boolean failFast)
                                                               throws java.io.IOException
        Test docvalues.
        Throws:
        java.io.IOException
      • checkBinaryDocValues

        private static void checkBinaryDocValues​(java.lang.String fieldName,
                                                 int maxDoc,
                                                 BinaryDocValues bdv,
                                                 BinaryDocValues bdv2)
                                          throws java.io.IOException
        Throws:
        java.io.IOException
      • checkSortedDocValues

        private static void checkSortedDocValues​(java.lang.String fieldName,
                                                 int maxDoc,
                                                 SortedDocValues dv,
                                                 SortedDocValues dv2)
                                          throws java.io.IOException
        Throws:
        java.io.IOException
      • checkSortedSetDocValues

        private static void checkSortedSetDocValues​(java.lang.String fieldName,
                                                    int maxDoc,
                                                    SortedSetDocValues dv,
                                                    SortedSetDocValues dv2)
                                             throws java.io.IOException
        Throws:
        java.io.IOException
      • checkSortedNumericDocValues

        private static void checkSortedNumericDocValues​(java.lang.String fieldName,
                                                        int maxDoc,
                                                        SortedNumericDocValues ndv,
                                                        SortedNumericDocValues ndv2)
                                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • checkNumericDocValues

        private static void checkNumericDocValues​(java.lang.String fieldName,
                                                  NumericDocValues ndv,
                                                  NumericDocValues ndv2)
                                           throws java.io.IOException
        Throws:
        java.io.IOException
      • testTermVectors

        public static CheckIndex.Status.TermVectorStatus testTermVectors​(CodecReader reader,
                                                                         java.io.PrintStream infoStream,
                                                                         boolean verbose,
                                                                         boolean doSlowChecks,
                                                                         boolean failFast)
                                                                  throws java.io.IOException
        Test term vectors.
        Throws:
        java.io.IOException
      • exorciseIndex

        public void exorciseIndex​(CheckIndex.Status result)
                           throws java.io.IOException
        Repairs the index using previously returned result from checkIndex(). Note that this does not remove any of the unreferenced files after it's done; you must separately open an IndexWriter, which deletes unreferenced files when it's created.

        WARNING: this writes a new segments file into the index, effectively removing all documents in broken segments from the index. BE CAREFUL.

        Throws:
        java.io.IOException
      • testAsserts

        private static boolean testAsserts()
      • assertsOn

        public static boolean assertsOn()
        Check whether asserts are enabled or not.
        Returns:
        true iff asserts are enabled
      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException,
                                java.lang.InterruptedException
        Command-line interface to check and exorcise corrupt segments from an index.

        Run it like this:

            java -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex pathToIndex [-exorcise] [-verbose] [-segment X] [-segment Y]
            
        • -exorcise: actually write a new segments_N file, removing any problematic segments. *LOSES DATA*
        • -segment X: only check the specified segment(s). This can be specified multiple times, to check more than one segment, eg -segment _2 -segment _a. You can't use this with the -exorcise option.

        WARNING: -exorcise should only be used on an emergency basis as it will cause documents (perhaps many) to be permanently removed from the index. Always make a backup copy of your index before running this! Do not run this tool on an index that is actively being written to. You have been warned!

        Run without -exorcise, this tool will open the index, report version information and report any exceptions it hits and what action it would take if -exorcise were specified. With -exorcise, this tool will remove any segments that have issues and write a new segments_N file. This means all documents contained in the affected segments will be removed.

        This tool exits with exit code 1 if the index cannot be opened or has any corruption, else 0.

        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • doMain

        private static int doMain​(java.lang.String[] args)
                           throws java.io.IOException,
                                  java.lang.InterruptedException
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • parseOptions

        public static CheckIndex.Options parseOptions​(java.lang.String[] args)
        Parse command line args into fields
        Parameters:
        args - The command line arguments
        Returns:
        An Options struct
        Throws:
        java.lang.IllegalArgumentException - if any of the CLI args are invalid
      • doCheck

        public int doCheck​(CheckIndex.Options opts)
                    throws java.io.IOException,
                           java.lang.InterruptedException
        Actually perform the index check
        Parameters:
        opts - The options to use for this check
        Returns:
        0 iff the index is clean, 1 otherwise
        Throws:
        java.io.IOException
        java.lang.InterruptedException
      • checkSoftDeletes

        private static void checkSoftDeletes​(java.lang.String softDeletesField,
                                             SegmentCommitInfo info,
                                             SegmentReader reader,
                                             java.io.PrintStream infoStream,
                                             boolean failFast)
                                      throws java.io.IOException
        Throws:
        java.io.IOException
      • nsToSec

        private static double nsToSec​(long ns)