Class SimilarityBase

    • Field Detail

      • LOG_2

        private static final double LOG_2
        For log2(double). Precomputed for efficiency reasons.
      • discountOverlaps

        protected boolean discountOverlaps
        True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.
      • LENGTH_TABLE

        private static final float[] LENGTH_TABLE
        Cache of decoded bytes.
    • Constructor Detail

      • SimilarityBase

        public SimilarityBase()
        Sole constructor. (For invocation by subclass constructors, typically implicit.)
    • Method Detail

      • setDiscountOverlaps

        public void setDiscountOverlaps​(boolean v)
        Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.
        See Also:
        computeNorm(org.apache.lucene.index.FieldInvertState)
      • getDiscountOverlaps

        public boolean getDiscountOverlaps()
        Returns true if overlap tokens are discounted from the document's length.
        See Also:
        setDiscountOverlaps(boolean)
      • scorer

        public final Similarity.SimScorer scorer​(float boost,
                                                 CollectionStatistics collectionStats,
                                                 TermStatistics... termStats)
        Description copied from class: Similarity
        Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
        Specified by:
        scorer in class Similarity
        Parameters:
        boost - a multiplicative factor to apply to the produces scores
        collectionStats - collection-level statistics, such as the number of tokens in the collection.
        termStats - term-level statistics, such as the document frequency of a term across the collection.
        Returns:
        SimWeight object with the information this Similarity needs to score a query.
      • newStats

        protected BasicStats newStats​(java.lang.String field,
                                      double boost)
        Factory method to return a custom stats object
      • fillBasicStats

        protected void fillBasicStats​(BasicStats stats,
                                      CollectionStatistics collectionStats,
                                      TermStatistics termStats)
        Fills all member fields defined in BasicStats in stats. Subclasses can override this method to fill additional stats.
      • score

        protected abstract double score​(BasicStats stats,
                                        double freq,
                                        double docLen)
        Scores the document doc.

        Subclasses must apply their scoring formula in this class.

        Parameters:
        stats - the corpus level statistics.
        freq - the term frequency.
        docLen - the document length.
        Returns:
        the score.
      • explain

        protected void explain​(java.util.List<Explanation> subExpls,
                               BasicStats stats,
                               double freq,
                               double docLen)
        Subclasses should implement this method to explain the score. expl already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae.

        The default implementation does nothing.

        Parameters:
        subExpls - the list of details of the explanation to extend
        stats - the corpus level statistics.
        freq - the term frequency.
        docLen - the document length.
      • explain

        protected Explanation explain​(BasicStats stats,
                                      Explanation freq,
                                      double docLen)
        Explains the score. The implementation here provides a basic explanation in the format score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the score(BasicStats, double, double) method) and the explanation for the term frequency. Subclasses content with this format may add additional details in explain(List, BasicStats, double, double).
        Parameters:
        stats - the corpus level statistics.
        freq - the term frequency and its explanation.
        docLen - the document length.
        Returns:
        the explanation.
      • toString

        public abstract java.lang.String toString()
        Subclasses must override this method to return the name of the Similarity and preferably the values of parameters (if any) as well.
        Overrides:
        toString in class java.lang.Object
      • computeNorm

        public final long computeNorm​(FieldInvertState state)
        Encodes the document length in the same way as BM25Similarity.
        Specified by:
        computeNorm in class Similarity
        Parameters:
        state - current processing state for this field
        Returns:
        computed norm value
      • log2

        public static double log2​(double x)
        Returns the base two logarithm of x.