When writeBatch to HBaseTable generates OOM (Out Of Memory exception)

Whenever I start to work with a new technology stack I spend the first few years knee-deep in doubt.

Not “why isn’t this thing working?” doubt. That kind of doubt comes after a few years of experience, when I’m sure that what I am trying to do should work.
The first few years are full of “I wonder what this thing is doing?” doubt, or “I wonder if I’m using this thing in the right way?” doubt.

 

Today, a spark job (that batched data into a Hbase table) threw an “Out of memory” exception after inserting about 30% of the desired payload.
The knee-jerk response was “throw some more hardware at it”, but after a bit of tinkering a few discoveries were made: the most important of these was that the HTable class does some interesting things internally with threads – it appears to create a thread for each HTable.put() — and that said threads are not GC’d until you call the HTable.close() method.

 

Today’s solution was to get a BufferedMutator for the Hbase table and to explicitly call .flush() and .close() after every write.

	public void writeBatch(List<Put> puts) throws IOException {
		final BufferedMutator bufferedMutator = tableConnection.getBufferedMutator(tableName);
		bufferedMutator.mutate(puts);
		bufferedMutator.flush();
		bufferedMutator.close();
	}

This probably is not the correct way to do this, but it delivered the expected result (comment if you can see something glaringly wrong).

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s