Blog

JavaBeans and SolrJ and Realtime Get Oh My!

A project that I’m working on was previously using AWS DynamoDB for state tracking of processing workloads. We got, very late in the project, the mandate to move away from DynamoDB. Don’t ask 😉

Since we already had Solr in our toolkit, I moved our job tracking objects from being backed by DynamoDB to being backed by Solr. The JobTracker pojo was already using annotations for persisting into DynamoDB, so adding on the Solr @Field annotation for each property was really easy.

Here is our job tracking object JobTracker, ready to be persisted into either DynamoDB or Solr:

@DynamoDBTable(tableName = "JobTracker")public class JobTracker {  @DynamoDBHashKey(attributeName = "Id")	@Field("id")	private String id;  @DynamoDBAttribute(attributeName="JobConfig")  	@Field("jobConfig")	private String jobConfig;  @DynamoDBAttribute(attributeName = "Status")	@Field("status")	private String status;

We save the object into Solr via an instance of SolrClient:

jobTrackerSolrClient.addBean(jobTracker);

We load the object back from Solr via:

SolrQuery q = new SolrQuery();q.set("q","id:" + objectId);q.set("fl","*");QueryResponse queryResponse = jobTrackerSolrClient.query(q);ListJobTracker> foundDocuments = response.getBeans(JobTracker.class);return foundDocuments.get(0);

However, the first wrinkle that came up was we had situations where multiple job processors would do a select against the JobTracker collection for PENDING jobs, and return the same document, and start processing it in parallel. So it was time to enforce only one job tracker could update the status to RUNNING from pending. This is easily done via Solr’s optimic versioning.

The first challenge was simple, I needed to add a Solr specific annotation to my JobTracker pojo for the _version_ field:

public class JobTracker {  @Field("_version_")  private Long version;  public Long getVersion() {    return version;  }  public void setVersion(Long version) {    this.version = version;  }

If the _version_ field doesn’t match on the update, then Solr returns a HTTP 409 code. To deal with this scenario, I had to wrap the addBean call:

	public void save(JobTracker jobTracker) throws IOException,			SolrServerException, VersionConflictException {		try {			jobTrackerSolrClient.addBean(jobTracker);		} catch (RemoteSolrException rse) {			if (rse.code()==409){				throw new VersionConflictException(jobTracker.getId());			}			else {				throw rse;			}		}

The VersionConflictException object just signals to my application that the update failed. It’s a very simple class that just stores the objectId of what you tried to persist to Solr. I did debate storing the object as part of it, but couldn’t come up with a good reason.

public class VersionConflictException extends Exception {	private static final long serialVersionUID = 1618255109313434652L;	private String objectId;	public VersionConflictException(String objectId) {		super();		this.objectId = objectId;	}	public String getObjectId() {		return objectId;	}}

This all looked really great till I realized, in writing my unit test, that since my JobTracker pojo is very long lived, that once I saved it once, the second time it saved it would trigger the VersionConflictException because the save method in Solr, a call to /update request handler, doesn’t return the new _version_.

JobTracker jobTracker = new JobTracker();JobTracker.setId("testId" + System.currentTimeMillis());JobTracker.setStatus(JobStatus.PENDING);assertNull(jobTracker.getVersion());mapper.save(jobTracker);jobTracker.setStatus(JobStatus.RUNNING);mapper.save(jobTracker)[ // BOOM, VersionConflictException

Argh. Nothing is ever easy.

Okay, so this is what I get for using Solr as a database. So on to the use of the real time get request handler. This would let me quickly get back the _version_ field. The call is actually quite simple:

SolrQuery q = new SolrQuery();q.setRequestHandler("/get");q.set("id", id);q.set("fl", "*");QueryResponse response = jobTrackerSolrClient.query(q);

However, that is when I ran into one of the assumptions behind SolrJ, which is that if you are using JavaBeans, then you are working with the standard data structure returned by doing a /select?q=*:* type of query.

I needed the /get request handler to do the real time get query, and the results format is slightly different. So doing response.getBeans() was blowing up. Again ARGH.

So I did a bit of splunking in QueryResponse.java, and discovered a hitherto unknown Java class, DocumentObjectBinder. This class basically takes care of mapping from Solr objects lik SoldDocument and SolrDocumentList to pojos. I got very lucky and copying the code out of SolrJ to my project it just worked:

SolrDocumentList sdl = new SolrDocumentList();SolrDocument doc = (SolrDocument)response.getResponse().get("doc");if (doc != null){	sdl.add(doc);}if (!sdl.isEmpty()){	DocumentObjectBinder dob = new DocumentObjectBinder();				ListJobTrackingRecord> foundDocuments = dob.getBeans(JobTrackingRecord.class,sdl);	return foundDocuments.get(0);}else {	return null;}

Now, when I save a long lived pojo, I just do a realtime get to update the version, so I can keep doing saves:

jobTracker.setVersion(loadJobTracker(jobTracker.getId()).getVersion());

So far it seems to be working very well.

Update! After writing this blog, I dug around in the real time get code base, re-read the wiki, and discovered that if my parameter was ids, versus id, then the results are returned in the standard format, and I can use the response.getBeans() method. Oh well. Still glad to learn about DocumentObjectBinder