2011-12-21

Android: Using dynamically loaded classes for fast loading of data

In an aeronautical application I need to load airport data, such as name, location and altitude. There are about 40000 airports in the database, but not all need to be loaded at the same time. Small aeroplanes, like the ones targeted by my application, need only a few thousand. The rest are outside their reach.
Even with this limited number, loading the data from file turned out to be time-consuming. This article discusses the problem encountered, and how I used dynamically loaded classes to achieve a reasonable performance.

All the airports are stored in a single CSV file. From that I generate smaller data files bases on the airports locations. An airport entry looks like this:D: AAXX # Airport designator N: Rothera Point # Name of airport C: AQ # Country T: Rothera Point # Location (city or town) X: -68.12699 # Longitude Y: -67.56694 # Latitude Z: 567 # Elevation except for the comments. Each entry is followed by an empty line. A first, straight-forward implementation of the inner reading loop:
while (null != (s = f.readLine())) { if (0 == s.length()) { new Airport(designator, name, region, city, lat, lon, alt); } else { switch (s.charAt(0)) { case 'D': designator = s.substring(3); break; case 'N': name = s.substring(3); break; case 'C': region = s.substring(3); break; case 'T': city = s.substring(3); break; case 'X': lon = Double .valueOf(s.substring(3)); break; case 'Y': lat = Double .valueOf(s.substring(3)); break; case 'Z': alt = Integer.valueOf(s.substring(3)); break; } } } The Airport() constructor adds the airport being created to the application-wide list of airports. On my Android phone it took 20 seconds to read the 2800 airports of Brazil, 140 airports per second. This is way to slow.

Using Debug.startMethodTracing("airports"); Airports.load(); Debug.stopMethodTracing(); and
adb pull /mnt/sdcard/airports.trace /tmp
traceview /tmp/airports.trace
revelaed that most of the time was spend in Double.valueOf() and Integer.valueOf(), so I wrote my own conversion routines. Here the double routine, except that this version reads from a byte array instead of a String:
static public double atof(byte[] s, int offset) { int factor = 1; int mult = 1; int val = 0; while (s[offset] == ' ') offset++; if (s[offset] == '-') { factor = -1; offset++; } for (int n = offset; ; n++) { byte ch = s[n]; if ('.' == ch) mult = 10; else if (ch >= 48 && ch <= 57) { val = 10*val + (ch - 48); factor *= mult; } else break; } return (double)val / factor; }
static public double atof(byte[] s, int offset) { int factor = 1; int mult = 1; int val = 0; while (s[offset] == ' ') offset++; if (s[offset] == '-') { factor = -1; offset++; } for (int n = offset; ; n++) { byte ch = s[n]; if ('.' == ch) mult = 10; else if (ch >= 48 && ch <= 57) { val = 10*val + (ch - 48); factor *= mult; } else break; } return (double)val / factor; } The integer routine is even simpler.

Using these routines decreased the time from 20 seconds to 4 seconds, 700 airports per second. Faster, but not fast enough. Another profiling revealed that now almost all time was spen in f.readLine(), so that routine had to go. Instead, I read the whole file into a byte[] buffer. This was acceptable, as no file is more than 250 kB. The main loop now changed to:

while (get < put) { byte ch = buffer[get++]; if (ch == 10) { new Airport(designator, name, region, city, lat, lon, alt); } else { switch (ch) { case (int)'D': start = (get += 2); while (buffer[get] != 10) get++; designator = bytes_to_string(buffer, start, get); get++; break; case (int)'N': ... case (int)'C': ... case (int)'T': ... case (int)'X': start = (get += 2); while (buffer[get] != 10) get++; lon = Util.atof(buffer,start); get++; break; case (int)'Y': ... case (int)'Z': ... } } } where bytes_to_string() does the obvious things. This loop really should be much faster, but to my surprise execution time doubled to 8 seconds. Profiling showed that the string creation was the perpetrator, so I tried several versions of the String constructor, but in vain.

Time for some lateral thinking. If reading data from a file in Java takes such a long time, then perhaps I shouldn't do that. There are several alternatives:

  1. Rewrite the code in C++, using NDK, the Native Development Kit.
  2. Not create any strings, but instead store the strings as byte arrays, converting on demand.
  3. Put the data into a resource file.
  4. Put the data into a .class file that is dynamically loaded.
Alternative 1 was not an obvious winner, as the string creation would still need to happen in Java. Alternative 2 felt too ugly and complicated. Of the remaining two, I chose alternative 4, the dynamically loaded class:

public class Airports_US_AK { private static void load1() { new Airport("00AK", "Lowell Field", "US-AK", "Anchor Point", 59.94920, -151.69600, 450); new Airport("01A", "Purkeypile", "US-AK", "Purkeypile", 62.94360, -152.27000, 1950); ...; } private static void load2() { new Airport("Z93", "Copper Center 2", "US-AK", "Copper Center", 61.94120, -145.29401, 1150); new Airport("ZNC", "Nyac", "US-AK", "Nyac", 60.98070, -159.99400, 460); ...; } public static void load() { load1(); load2(); } } The data is split into several functions to avoid the limit of 65535 bytes per function.

To find out how to compile these files, since they should not be part of the project, I did "File > Export > General > Ant Buildfiles" followed by ant -v build-project. Removing all the lint from the command line, I got
javac -encoding UTF-8 -d bin/classes -cp  bin/classes airports/*.java
This version "reads" about 3000 airports per second, which seems reasonable. While there is still room for some minor improvements, that seems roughly the speed limit.

No comments:

Post a Comment