Java: How to use complex objects as keys in Ehcache (and Maps generally)
I've been using the open source Ehcache for some projects I'm working on.
Ehcache uses Java "Map" classes to store the cache in memory. From the memory store, they can be overflowed to disk (if all the allocated memory is used) and/or persisted to disk (so that stopping and starting an application preserves cached objects between runs).
I have some issues with the bumpy spots, omissions and redundancies in the new documentation, but overall this is great stuff. Finding ehcache saved me from writing my own, less-capable cache, and I am grateful to have it.
There is, however, a problem storing and finding cached data with ehcache when using some kinds of objects as keys.
Quick refresher
Maps (and any cache, for that matter) have two parts
(1) a key that's used for finding objects;
(2) the stored objects
So, the basic operations on a Map (or cache) are:
void put (Object key, Object item);
Object get (Object key);
The problem
Specifically, if a cache key is not a String but some other kind of object (say, a custom Class with some fields in it), cache lookups may fail some of the time, even when the stuff you're looking for is right there in the cache.
EXAMPLE 1. The key is a String, and this succeeds:
WhateverObject item1 = new WhateverObject();
Element el = new Element("Item1",item1);
cache.put(el);
cache.get("Item1");
EXAMPLE 2. The key is not a simple String, but this succeeds:
class MyKey {
int a,b;
}
WhateverObject item1 = new WhateverObject();
MyKey key = new MyKey(); key.a = 12; key.b = 22;
Element el = new Element(key,item1);
cache.put(el);
cache.get(key);
EXAMPLE 3. ... but a similar lookup fails when a new MyKey object (containing the same values as the original key) is used for the lookup:
class MyKey {
int a,b;
}
WhateverObject item1 = new WhateverObject();
MyKey key = new MyKey(); key.a = 12; key.b = 22;
Element el = new Element(key,item1);
cache.put(el);
MyKey key2 = new MyKey(); key.a = 12; key.b = 22;
cache.get(key2);
What's going on?
This behaviour is consistent with the contract for Map objects, and may be the desired behavior for some object persistence designs. However, it may be completely undesirable for others. With regard to the storage model and ehcache in particular, the behaviour is sometimes right and sometimes wrong for memory-only caches, but always wrong for caches that persist to disk, if you buy into the presumption that a disk-persisted cache will be accessed in separate runs of an application.
The crux of the problem is in the way Java's Map classes store and index the keys in a Map (and thus, the keys to ehcache objects, which use Maps).
To make searches fast and straightforward to implement (internally, in Java), the keys are stored as integers in a table. The integers are hashes derived from the key objects. That is, they're one-way mappings from something like "ABC" to an integer value like 64578. Integers are fast to search, fast to sort, and fast to compare.
So you've probably guessed where this is going.
Finding an object in a cache or in a Map requires that the caller present a key that reduces to exactly the same hash as was created when put(key,item) was called to add the item to the Map in the first place.
In other words:
To find an object in a Map or Map-based cache, keys with identical content must reduce to exactly the same hash every time they're converted from their usual form into hashes.
Java generates these hashes by calling the hashCode() method that every Object has. And it compares them by calling the equals() method that's a standard method in every Object.
The problem comes from the way that Java makes the hashes.
Strings implement the hashCode() method in a way that is based on the content of the String, and that is repeatable every time the method is called on an identical string.
For example, "ABCDE".hashCode() always returns the integer value 62061635 no matter how many times you call it.
Objects that are not Strings use the default hashCode() method of the Object class. This hashCode() is not based on the content of the object, but on its OID, which is different for every instance of an object, and every run of an application. That's why the code in Example 3 fails: objects 'key' and 'key2' are different objects (even though they're of the same class), so their default hashCode() values will be different.
As well, the class's equals() method doesn't compare object content, but OIDs, so (from Example 3):
key != key2Example 2 works fine, because the object "key" that was used to put the data into the cache is the same object for both the
put and the get... and thus the OID is the same for both operations.Solution
The solution to this problem is to give your key objects a
hashCode() method that return identical hash values for identical content, and an equals() method that returns true when two keys have the same content. With that change to the class that's used as a key, Map class accesses, and by extension, ehcache cache lookups, will work as expected.Here is a sample class that implements the methods using a lazy trick to just make the object's contents into a String, and then returns the hashCode() of that String (because, as mentioned, Strings return the same hash for the same String content every time).
A better, more efficient implementation of hashCode() would generate a return value directly from the fields of the key class, without the String overhead... but you get the idea. This example also implements a toString() method that returns the content of the class as a String. Obviously that won't be needed if you make a hashCode() method that goes right from fields of the class to a distinct integer value.
In this example, the "key" to some data is an object that has two fields, a Date and a double. Date is just a thin wrapper around a
long, and this example uses that long. I did this to save the conversion from long to the formatted date string that would happen if the Date were used directly.
import java.io.Serializable;
import java.util.Date;
public class CacheKey implements Serializable {
private static final long serialVersionUID = 1L;
public Date when;
public double howMuch;
public CacheKey(Date when, double howMuch) {
this.when = when;
this.howMuch = howMuch;
}
/**
* Generate a hashcode based on the content of the object
*/
public int hashCode() {
return this.toString().hashCode();
}
/**
* Make a string for use by hashCode(), including all fields
*/
public String toString() {
return ""+when.getTime()+howMuch;
}
/**
* Compare this object with another. Note that comparisons
* MUST compare the values of the fields of the object that are
* of interest (or whatever else is of interest). You can't compare the
* hashes and get a good result -- hashes are allowed to collide. They just
* reduce the search space tremendously.
*/
public boolean equals(Object o) {
CacheKey ck = (CacheKey) o;
return ( (ck.when == this.when) && (ck.howMuch == this.howMuch));
}
return (this.hashCode() == ((CacheKey) o).hashCode() );
}
}
References:
http://www.oracle.com/technology/pub/articles/maps1.html
http://sourceforge.net/projects/ehcache/
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html
Postscript:
Notes about the "cheap" method I used to make the hashCode()
Someone suggested that I use Strings as keys.
The problem is that I'm using a self-populating (or pull-through) cache. With this kind of cache, the cache itself calls the method that fetches things not found in the cache, by calling that method with the key value. Thus, the key must encapsulate everything that's needed to fetch the desired data. In the case of my application, that's two Date objects, and two doubles... and there's no way around it. A String just won't work. Well, not directly.
Anyway, carefully overriding hashCode() to generate an integer based on the contents of the key object, and then overriding equals() so that it compares the hashCode of the current Object to the hashCode of the compared Object, is the proper solution.
My example code takes a cheap shortcut, tacking the fields together in a String and returning the hashCode of that String.
It shows the general idea of what needs to happen, and for that matter, I'm not totally opposed to the approach, given the kind of data I'm working with.
I could generate an integer hashCode() by summing the integer values of the fields. However... (1) the Date objects will be clustered very close together so their Date.getTime() methods will return nearly identical values; (2) if using long getTime() values, I'd have to subtract a large constant to get them into integer range...
... and even after all that, both overflows and collisions would remain a strong concern.
Another approach would be to perform a hash operation over, say, the bytes of the values of the fields. For my money, that's not so different from making a String out of the fields and letting its hashCode() method deal with it... String conversions aren't cheap, but the base operation to fetch data takes something like 5000msec, and even with this unoptimized key-handling, cache retrievals take on the order of 25msec (from memory) to 100msec (from disk), so it's a tremendous savings.


0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home