Wednesday, June 28, 2006

Issues with Java ArrayList handling streaming data

In my master project called "index processing for complex event detection", I create several data structure to temporarily store and processing streaming data.
I only used data structure implementations provided by JDK 1.4.2 once, but that's where problems occurred. There's a module in my prototype system where a time window is defined and simple events in this window are stored in memory for later processing. The data structure for storing such information is very simple, so I just used ArrayList (It is commented by Java as the fastest list implementation).

During prototype system evaluation, "ArrayIndexOutOfBound" exception frequently appeared in run-time. I checked my code and made sure that everything was OK. According to Java API Specification, ArrayList is supposed to grow automatically. But why is there such an error?
I decreased the generation rate of streaming data, the exception disappeared. After I gradually increased the rate, the exception appeared again, which indicates that there must be something wrong with the automated growth of ArrayList.

Finally, I found out that the implemenation of ArrayList is not synchronized. But I'm still not sure if it will lead to the aforementioned problem during size growth.
There are 2 solutions to this situation: one is to use ensureCapacity() method to declare the minimal capacity for this ArrayList to avoid automatic resize operations when handling large amount of streaming data. The other is to wrapping the unsafe ArrayList.
For simplicity, I just tried the first alternative by giving a suitable initial capacity. The problem never occurs even when handling a more intensive data stream.

No comments: