Class MurmurHash3
- java.lang.Object
-
- org.apache.commons.codec.digest.MurmurHash3
-
public final class MurmurHash3 extends java.lang.Object
Implementation of the MurmurHash3 32-bit and 128-bit hash functions.MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop. Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
This contains a Java port of the 32-bit hash function
MurmurHash3_x86_32
and the 128-bit hash functionMurmurHash3_x64_128
from Austin Applyby's originalc++
code in SMHasher.This is public domain code with no copyrights. From home page of SMHasher:
"All MurmurHash versions are public domain software, and the author disclaims all copyright to their code."
Original adaption from Apache Hive. That adaption contains a
hash64
method that is not part of the original MurmurHash3 code. It is not recommended to use these methods. They will be removed in a future release. To obtain a 64-bit hash use half of the bits from thehash128x64
methods using the input data converted to bytes.- Since:
- 1.13
- See Also:
- MurmurHash, Original MurmurHash3 c++ code, Apache Hive Murmer3
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
MurmurHash3.IncrementalHash32
Deprecated.Use IncrementalHash32x86.static class
MurmurHash3.IncrementalHash32x86
Generates 32-bit hash from input bytes.
-
Field Summary
Fields Modifier and Type Field Description private static long
C1
private static int
C1_32
private static long
C2
private static int
C2_32
static int
DEFAULT_SEED
A default seed to use for the murmur hash algorithm.(package private) static int
INTEGER_BYTES
TODO Replace on Java 8 with Integer.BYTES.(package private) static int
LONG_BYTES
TODO Replace on Java 8 with Long.BYTES.private static int
M
private static int
M_32
private static int
N_32
private static int
N1
private static int
N2
static long
NULL_HASHCODE
Deprecated.This is not used internally and will be removed in a future release.private static int
R1
private static int
R1_32
private static int
R2
private static int
R2_32
private static int
R3
(package private) static int
SHORT_BYTES
TODO Replace on Java 8 with Short.BYTES.
-
Constructor Summary
Constructors Modifier Constructor Description private
MurmurHash3()
No instance methods.
-
Method Summary
All Methods Static Methods Concrete Methods Deprecated Methods Modifier and Type Method Description private static int
fmix32(int hash)
Performs the final avalanche mix step of the 32-bit hash functionMurmurHash3_x86_32
.private static long
fmix64(long hash)
Performs the final avalanche mix step of the 64-bit hash functionMurmurHash3_x64_128
.private static int
getLittleEndianInt(byte[] data, int index)
Gets the little-endian int from 4 bytes starting at the specified index.private static long
getLittleEndianLong(byte[] data, int index)
Gets the little-endian long from 8 bytes starting at the specified index.static long[]
hash128(byte[] data)
Generates 128-bit hash from the byte array with a default seed.static long[]
hash128(byte[] data, int offset, int length, int seed)
Deprecated.static long[]
hash128(java.lang.String data)
Deprecated.Usehash128x64(byte[])
using the bytes returned fromString.getBytes(java.nio.charset.Charset)
.static long[]
hash128x64(byte[] data)
Generates 128-bit hash from the byte array with a seed of zero.static long[]
hash128x64(byte[] data, int offset, int length, int seed)
Generates 128-bit hash from the byte array with the given offset, length and seed.private static long[]
hash128x64Internal(byte[] data, int offset, int length, long seed)
Generates 128-bit hash from the byte array with the given offset, length and seed.static int
hash32(byte[] data)
Deprecated.static int
hash32(byte[] data, int length)
Deprecated.static int
hash32(byte[] data, int length, int seed)
Deprecated.static int
hash32(byte[] data, int offset, int length, int seed)
Deprecated.static int
hash32(long data)
Generates 32-bit hash from a long with a default seed value.static int
hash32(long data, int seed)
Generates 32-bit hash from a long with the given seed.static int
hash32(long data1, long data2)
Generates 32-bit hash from two longs with a default seed value.static int
hash32(long data1, long data2, int seed)
Generates 32-bit hash from two longs with the given seed.static int
hash32(java.lang.String data)
Deprecated.Usehash32x86(byte[], int, int, int)
with the bytes returned fromString.getBytes(java.nio.charset.Charset)
.static int
hash32x86(byte[] data)
Generates 32-bit hash from the byte array with a seed of zero.static int
hash32x86(byte[] data, int offset, int length, int seed)
Generates 32-bit hash from the byte array with the given offset, length and seed.static long
hash64(byte[] data)
Deprecated.Not part of the MurmurHash3 implementation.static long
hash64(byte[] data, int offset, int length)
Deprecated.Not part of the MurmurHash3 implementation.static long
hash64(byte[] data, int offset, int length, int seed)
Deprecated.Not part of the MurmurHash3 implementation.static long
hash64(int data)
Deprecated.Not part of the MurmurHash3 implementation.static long
hash64(long data)
Deprecated.Not part of the MurmurHash3 implementation.static long
hash64(short data)
Deprecated.Not part of the MurmurHash3 implementation.private static int
mix32(int k, int hash)
Performs the intermediate mix step of the 32-bit hash functionMurmurHash3_x86_32
.
-
-
-
Field Detail
-
NULL_HASHCODE
@Deprecated public static final long NULL_HASHCODE
Deprecated.This is not used internally and will be removed in a future release.A random number to use for a hash code.- See Also:
- Constant Field Values
-
DEFAULT_SEED
public static final int DEFAULT_SEED
A default seed to use for the murmur hash algorithm. Has the value104729
.- See Also:
- Constant Field Values
-
LONG_BYTES
static final int LONG_BYTES
TODO Replace on Java 8 with Long.BYTES.- See Also:
- Constant Field Values
-
INTEGER_BYTES
static final int INTEGER_BYTES
TODO Replace on Java 8 with Integer.BYTES.- See Also:
- Constant Field Values
-
SHORT_BYTES
static final int SHORT_BYTES
TODO Replace on Java 8 with Short.BYTES.- See Also:
- Constant Field Values
-
C1_32
private static final int C1_32
- See Also:
- Constant Field Values
-
C2_32
private static final int C2_32
- See Also:
- Constant Field Values
-
R1_32
private static final int R1_32
- See Also:
- Constant Field Values
-
R2_32
private static final int R2_32
- See Also:
- Constant Field Values
-
M_32
private static final int M_32
- See Also:
- Constant Field Values
-
N_32
private static final int N_32
- See Also:
- Constant Field Values
-
C1
private static final long C1
- See Also:
- Constant Field Values
-
C2
private static final long C2
- See Also:
- Constant Field Values
-
R1
private static final int R1
- See Also:
- Constant Field Values
-
R2
private static final int R2
- See Also:
- Constant Field Values
-
R3
private static final int R3
- See Also:
- Constant Field Values
-
M
private static final int M
- See Also:
- Constant Field Values
-
N1
private static final int N1
- See Also:
- Constant Field Values
-
N2
private static final int N2
- See Also:
- Constant Field Values
-
-
Method Detail
-
hash32
public static int hash32(long data1, long data2)
Generates 32-bit hash from two longs with a default seed value. This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(16) .putLong(data1) .putLong(data2) .array(), offset, 16, seed);
- Parameters:
data1
- The first long to hashdata2
- The second long to hash- Returns:
- The 32-bit hash
- See Also:
hash32x86(byte[], int, int, int)
-
hash32
public static int hash32(long data1, long data2, int seed)
Generates 32-bit hash from two longs with the given seed. This is a helper method that will produce the same result as:int offset = 0; int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(16) .putLong(data1) .putLong(data2) .array(), offset, 16, seed);
- Parameters:
data1
- The first long to hashdata2
- The second long to hashseed
- The initial seed value- Returns:
- The 32-bit hash
- See Also:
hash32x86(byte[], int, int, int)
-
hash32
public static int hash32(long data)
Generates 32-bit hash from a long with a default seed value. This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(8) .putLong(data) .array(), offset, 8, seed);
- Parameters:
data
- The long to hash- Returns:
- The 32-bit hash
- See Also:
hash32x86(byte[], int, int, int)
-
hash32
public static int hash32(long data, int seed)
Generates 32-bit hash from a long with the given seed. This is a helper method that will produce the same result as:int offset = 0; int hash = MurmurHash3.hash32x86(ByteBuffer.allocate(8) .putLong(data) .array(), offset, 8, seed);
- Parameters:
data
- The long to hashseed
- The initial seed value- Returns:
- The 32-bit hash
- See Also:
hash32x86(byte[], int, int, int)
-
hash32
@Deprecated public static int hash32(byte[] data)
Deprecated.Usehash32x86(byte[], int, int, int)
. This corrects the processing of trailing bytes.Generates 32-bit hash from the byte array with a default seed. This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; int hash = MurmurHash3.hash32(data, offset, data.length, seed);
This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.
- Parameters:
data
- The input byte array- Returns:
- The 32-bit hash
- See Also:
hash32(byte[], int, int, int)
-
hash32
@Deprecated public static int hash32(java.lang.String data)
Deprecated.Usehash32x86(byte[], int, int, int)
with the bytes returned fromString.getBytes(java.nio.charset.Charset)
. This corrects the processing of trailing bytes.Generates 32-bit hash from a string with a default seed.Before 1.14 the string was converted using default encoding. Since 1.14 the string is converted to bytes using UTF-8 encoding.
This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; byte[] bytes = data.getBytes(StandardCharsets.UTF_8); int hash = MurmurHash3.hash32(bytes, offset, bytes.length, seed);
This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.
- Parameters:
data
- The input string- Returns:
- The 32-bit hash
- See Also:
hash32(byte[], int, int, int)
-
hash32
@Deprecated public static int hash32(byte[] data, int length)
Deprecated.Usehash32x86(byte[], int, int, int)
. This corrects the processing of trailing bytes.Generates 32-bit hash from the byte array with the given length and a default seed. This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; int hash = MurmurHash3.hash32(data, offset, length, seed);
This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.
- Parameters:
data
- The input byte arraylength
- The length of array- Returns:
- The 32-bit hash
- See Also:
hash32(byte[], int, int, int)
-
hash32
@Deprecated public static int hash32(byte[] data, int length, int seed)
Deprecated.Usehash32x86(byte[], int, int, int)
. This corrects the processing of trailing bytes.Generates 32-bit hash from the byte array with the given length and seed. This is a helper method that will produce the same result as:int offset = 0; int hash = MurmurHash3.hash32(data, offset, length, seed);
This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.
- Parameters:
data
- The input byte arraylength
- The length of arrayseed
- The initial seed value- Returns:
- The 32-bit hash
- See Also:
hash32(byte[], int, int, int)
-
hash32
@Deprecated public static int hash32(byte[] data, int offset, int length, int seed)
Deprecated.Usehash32x86(byte[], int, int, int)
. This corrects the processing of trailing bytes.Generates 32-bit hash from the byte array with the given offset, length and seed.This is an implementation of the 32-bit hash function
MurmurHash3_x86_32
from from Austin Applyby's original MurmurHash3c++
code in SMHasher.This implementation contains a sign-extension bug in the finalization step of any bytes left over from dividing the length by 4. This manifests if any of these bytes are negative.
- Parameters:
data
- The input byte arrayoffset
- The offset of datalength
- The length of arrayseed
- The initial seed value- Returns:
- The 32-bit hash
-
hash32x86
public static int hash32x86(byte[] data)
Generates 32-bit hash from the byte array with a seed of zero. This is a helper method that will produce the same result as:int offset = 0; int seed = 0; int hash = MurmurHash3.hash32x86(data, offset, data.length, seed);
- Parameters:
data
- The input byte array- Returns:
- The 32-bit hash
- Since:
- 1.14
- See Also:
hash32x86(byte[], int, int, int)
-
hash32x86
public static int hash32x86(byte[] data, int offset, int length, int seed)
Generates 32-bit hash from the byte array with the given offset, length and seed.This is an implementation of the 32-bit hash function
MurmurHash3_x86_32
from from Austin Applyby's original MurmurHash3c++
code in SMHasher.- Parameters:
data
- The input byte arrayoffset
- The offset of datalength
- The length of arrayseed
- The initial seed value- Returns:
- The 32-bit hash
- Since:
- 1.14
-
hash64
@Deprecated public static long hash64(long data)
Deprecated.Not part of the MurmurHash3 implementation. Use half of the hash bytes fromhash128x64(byte[])
with the bytes from thelong
.Generates 64-bit hash from a long with a default seed.This is not part of the original MurmurHash3
c++
implementation.This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data from the
long
. This method will be removed in a future release.Note: The sign extension bug in
hash64(byte[], int, int, int)
does not effect this result as the default seed is positive.This is a helper method that will produce the same result as:
int offset = 0; int seed = 104729; long hash = MurmurHash3.hash64(ByteBuffer.allocate(8) .putLong(data) .array(), offset, 8, seed);
- Parameters:
data
- The long to hash- Returns:
- The 64-bit hash
- See Also:
hash64(byte[], int, int, int)
-
hash64
@Deprecated public static long hash64(int data)
Deprecated.Not part of the MurmurHash3 implementation. Use half of the hash bytes fromhash128x64(byte[])
with the bytes from theint
.Generates 64-bit hash from an int with a default seed.This is not part of the original MurmurHash3
c++
implementation.This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data from the
int
. This method will be removed in a future release.Note: The sign extension bug in
hash64(byte[], int, int, int)
does not effect this result as the default seed is positive.This is a helper method that will produce the same result as:
int offset = 0; int seed = 104729; long hash = MurmurHash3.hash64(ByteBuffer.allocate(4) .putInt(data) .array(), offset, 4, seed);
- Parameters:
data
- The int to hash- Returns:
- The 64-bit hash
- See Also:
hash64(byte[], int, int, int)
-
hash64
@Deprecated public static long hash64(short data)
Deprecated.Not part of the MurmurHash3 implementation. Use half of the hash bytes fromhash128x64(byte[])
with the bytes from theshort
.Generates 64-bit hash from a short with a default seed.This is not part of the original MurmurHash3
c++
implementation.This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data from the
short
. This method will be removed in a future release.Note: The sign extension bug in
hash64(byte[], int, int, int)
does not effect this result as the default seed is positive.This is a helper method that will produce the same result as:
int offset = 0; int seed = 104729; long hash = MurmurHash3.hash64(ByteBuffer.allocate(2) .putShort(data) .array(), offset, 2, seed);
- Parameters:
data
- The short to hash- Returns:
- The 64-bit hash
- See Also:
hash64(byte[], int, int, int)
-
hash64
@Deprecated public static long hash64(byte[] data)
Deprecated.Not part of the MurmurHash3 implementation. Use half of the hash bytes fromhash128x64(byte[])
.Generates 64-bit hash from a byte array with a default seed.This is not part of the original MurmurHash3
c++
implementation.This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data. This method will be removed in a future release.
Note: The sign extension bug in
hash64(byte[], int, int, int)
does not effect this result as the default seed is positive.This is a helper method that will produce the same result as:
int offset = 0; int seed = 104729; long hash = MurmurHash3.hash64(data, offset, data.length, seed);
- Parameters:
data
- The input byte array- Returns:
- The 64-bit hash
- See Also:
hash64(byte[], int, int, int)
-
hash64
@Deprecated public static long hash64(byte[] data, int offset, int length)
Deprecated.Not part of the MurmurHash3 implementation. Use half of the hash bytes fromhash128x64(byte[], int, int, int)
.Generates 64-bit hash from a byte array with the given offset and length and a default seed.This is not part of the original MurmurHash3
c++
implementation.This is a Murmur3-like 64-bit variant. The method does not produce the same result as either half of the hash bytes from hash128x64(byte[]) with the same byte data. This method will be removed in a future release.
Note: The sign extension bug in
hash64(byte[], int, int, int)
does not effect this result as the default seed is positive.This is a helper method that will produce the same result as:
int seed = 104729; long hash = MurmurHash3.hash64(data, offset, length, seed);
- Parameters:
data
- The input byte arrayoffset
- The offset of datalength
- The length of array- Returns:
- The 64-bit hash
- See Also:
hash64(byte[], int, int, int)
-
hash64
@Deprecated public static long hash64(byte[] data, int offset, int length, int seed)
Deprecated.Not part of the MurmurHash3 implementation. Use half of the hash bytes fromhash128x64(byte[], int, int, int)
.Generates 64-bit hash from a byte array with the given offset, length and seed.This is not part of the original MurmurHash3
c++
implementation.This is a Murmur3-like 64-bit variant. This method will be removed in a future release.
This implementation contains a sign-extension bug in the seed initialization. This manifests if the seed is negative.
This algorithm processes 8 bytes chunks of data in a manner similar to the 16 byte chunks of data processed in the MurmurHash3
MurmurHash3_x64_128
method. However the hash is not mixed with a hash chunk from the next 8 bytes of data. The method will not return the same value as the first or second 64-bits of the functionhash128(byte[], int, int, int)
.Use of this method is not advised. Use the first long returned from
hash128x64(byte[], int, int, int)
.- Parameters:
data
- The input byte arrayoffset
- The offset of datalength
- The length of arrayseed
- The initial seed value- Returns:
- The 64-bit hash
-
hash128
public static long[] hash128(byte[] data)
Generates 128-bit hash from the byte array with a default seed. This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; int hash = MurmurHash3.hash128(data, offset, data.length, seed);
Note: The sign extension bug in
hash128(byte[], int, int, int)
does not effect this result as the default seed is positive.- Parameters:
data
- The input byte array- Returns:
- The 128-bit hash (2 longs)
- See Also:
hash128(byte[], int, int, int)
-
hash128x64
public static long[] hash128x64(byte[] data)
Generates 128-bit hash from the byte array with a seed of zero. This is a helper method that will produce the same result as:int offset = 0; int seed = 0; int hash = MurmurHash3.hash128x64(data, offset, data.length, seed);
- Parameters:
data
- The input byte array- Returns:
- The 128-bit hash (2 longs)
- Since:
- 1.14
- See Also:
hash128x64(byte[], int, int, int)
-
hash128
@Deprecated public static long[] hash128(java.lang.String data)
Deprecated.Usehash128x64(byte[])
using the bytes returned fromString.getBytes(java.nio.charset.Charset)
.Generates 128-bit hash from a string with a default seed.Before 1.14 the string was converted using default encoding. Since 1.14 the string is converted to bytes using UTF-8 encoding.
This is a helper method that will produce the same result as:int offset = 0; int seed = 104729; byte[] bytes = data.getBytes(StandardCharsets.UTF_8); int hash = MurmurHash3.hash128(bytes, offset, bytes.length, seed);
Note: The sign extension bug in
hash128(byte[], int, int, int)
does not effect this result as the default seed is positive.- Parameters:
data
- The input String- Returns:
- The 128-bit hash (2 longs)
- See Also:
hash128(byte[], int, int, int)
-
hash128
@Deprecated public static long[] hash128(byte[] data, int offset, int length, int seed)
Deprecated.Usehash128x64(byte[], int, int, int)
. This corrects the seed initialization.Generates 128-bit hash from the byte array with the given offset, length and seed.This is an implementation of the 128-bit hash function
MurmurHash3_x64_128
from from Austin Applyby's original MurmurHash3c++
code in SMHasher.This implementation contains a sign-extension bug in the seed initialization. This manifests if the seed is negative.
- Parameters:
data
- The input byte arrayoffset
- The first element of arraylength
- The length of arrayseed
- The initial seed value- Returns:
- The 128-bit hash (2 longs)
-
hash128x64
public static long[] hash128x64(byte[] data, int offset, int length, int seed)
Generates 128-bit hash from the byte array with the given offset, length and seed.This is an implementation of the 128-bit hash function
MurmurHash3_x64_128
from from Austin Applyby's original MurmurHash3c++
code in SMHasher.- Parameters:
data
- The input byte arrayoffset
- The first element of arraylength
- The length of arrayseed
- The initial seed value- Returns:
- The 128-bit hash (2 longs)
- Since:
- 1.14
-
hash128x64Internal
private static long[] hash128x64Internal(byte[] data, int offset, int length, long seed)
Generates 128-bit hash from the byte array with the given offset, length and seed.This is an implementation of the 128-bit hash function
MurmurHash3_x64_128
from from Austin Applyby's original MurmurHash3c++
code in SMHasher.- Parameters:
data
- The input byte arrayoffset
- The first element of arraylength
- The length of arrayseed
- The initial seed value- Returns:
- The 128-bit hash (2 longs)
-
getLittleEndianLong
private static long getLittleEndianLong(byte[] data, int index)
Gets the little-endian long from 8 bytes starting at the specified index.- Parameters:
data
- The dataindex
- The index- Returns:
- The little-endian long
-
getLittleEndianInt
private static int getLittleEndianInt(byte[] data, int index)
Gets the little-endian int from 4 bytes starting at the specified index.- Parameters:
data
- The dataindex
- The index- Returns:
- The little-endian int
-
mix32
private static int mix32(int k, int hash)
Performs the intermediate mix step of the 32-bit hash functionMurmurHash3_x86_32
.- Parameters:
k
- The data to add to the hashhash
- The current hash- Returns:
- The new hash
-
fmix32
private static int fmix32(int hash)
Performs the final avalanche mix step of the 32-bit hash functionMurmurHash3_x86_32
.- Parameters:
hash
- The current hash- Returns:
- The final hash
-
fmix64
private static long fmix64(long hash)
Performs the final avalanche mix step of the 64-bit hash functionMurmurHash3_x64_128
.- Parameters:
hash
- The current hash- Returns:
- The final hash
-
-