Java and GIS, Part 1: Intro to GIS (From www.java.net)
by Sue Spielman and Tom Whitehill
If you've been paying attention to the recent groundswell, chances are that you've heard the terms GIS, geospatial, GPS, and a host of other acronyms being thrown around. But do you really understand what they mean -- the types of applications that use this stuff, and what it means to the way we look at our development? Naturally, neither of us did at first, but we've been working (well actually, mostly playing) in this space for a couple of years now, and have only recently realized the huge potential it presents us as Java programmers.
This series will serve a couple of purposes: Part 1 will get you up to speed on what all this GIS stuff means, explaining some of the terminology and how it relates to what we build. Part 2 will talk about some of the open source geospatial systems and how the pieces fit together, and Part 3 will be a walkthrough of putting together a Mobile-GIS-enabled application.
Sound like fun? It should, because this is pretty cool stuff.
Let's start with the basics. The Geographic Information Systems (GIS) space comes with its own vocabulary. It's important to understand what some of these things mean, because without a firm understanding of the semantics, it is very difficult to figure out what's going on.
As some of you might know, this technology has been around for about 30 years now, but hasn't really hit the mainstream primarily because it's been very difficult and cumbersome to use. It's just recently that it's become widespread because the GIS databases and the software are making it attainable to even us mortal folk. What that means is that we can start using Java to interface with the GIS environment.
"Geospatial" refers to a class of data that has a geographic or spatial nature. The Global Positioning System (commonly known as GPS) is a collection of twenty-four satellites, developed by the U.S. Department of Defense, that orbit the earth at an altitude of 20,200 kilometers. These satellites transmit signals that allow a GPS receiver anywhere on earth to calculate its own location. The Global Positioning System is used in navigation, mapping, surveying, and other applications where precise positioning is necessary. GPS handheld devices have been available for a while now, but it is only until recently that GPS has been built into mobile devices such as PDAs and smart phones.
Map This! -- GIS Analysis
GIS analysis is a fancy term for looking at patterns in data and relationships between those features. How do we approach understanding what these patterns and relationships are? Simple. We ask a question such as, "Show me all of the customers within five miles of a given retail store." Or, "Show me the population density map of Colorado." Or even, "Show me all of the crimes that took place in given area," like the maps shown in Figure 1.
Figure 1. Crime locations in a given area
All of these are questions that can be graphically displayed on a map or become values in a table, or can be processed as a chart based on some type of query.
Features are displayed in any number of ways. Data can be discrete, continuous, or summarized by area. A discrete feature can be a location or a line. At any given spot, the feature is either there or it's not there. For example, a business' location is a discrete point, whereas a stream or river is a discrete linear feature. And a color-coded map showing land parcels is an example of a discrete area, as shown in Figure 2.
When we're talking about continuous features, we're talking about something that can be described or measured anywhere. For example, if you have a map showing temperature readings, elevation data, or average rainfall, you would have continuous data that flows from quantity to quantity. This type of map is shown in Figure 3.
A feature summarized by area represents a closed two-dimensional shape defined by its boundaries. Usually this area is measured in its square units. So if you wanted to look at areas by ZIP code, or at all of the businesses within a particular country, that would be an example of a summarized area.
Being able to do interesting queries on summarized areas becomes a function of how well your GIS database is populated. If you have the data captured in the data table, then it becomes possible to do something like, "Find all of the homeowners who make more than $75,000 a year in Lowell, MA." What we just did in the last query example was actually create three different layers. So now you are reasonably asking, "What's a layer?"
What You Layer Is What You Get
A layer is a set of vector data organized by some subject. Think of each layer as a film transparency (remember those?), only a digital version. When you put one layer on top of another, you form a map that contains new information. This concept is shown in Figure 4.
Figure 4. How layers make up a map
Taking our previous query as an example, the three layers that we have are: homeowners as discrete points, income level as discrete points, and Lowell, MA as a summarized area. If we lay the transparencies on top of one another, we'd get all homeowners who make more than $75K who live in Lowell.
Still, this simple concept of layering has a number of issues that need to be dealt with. For instance, how are the various layers represented? Are they represented as vector data, as a raster image, or perhaps in a shapefile? And what are vectors and rasters? Also, what are the coordinate systems used for each layer?
Vectors, Rasters, and Shapefiles
In the GIS world, there are two main models used to represent the geographic features. The first is a vector and the second is a raster. When you are working with a vector model, you are basically working with an x/y coordinate. Each feature is defined as a row in a table. It's possible to have points, lines, or areas represented in a vector model. If you are defining a line (like a river or a road), then it would be represented by a series of coordinate pairs. For other types of features (like land parcels, for instance) that can be represented by closed polygons, you would define the borders as series of coordinate points.
When dealing with a raster model, features are represented as rows and columns of cells. Each cell has an attribute value, as well as location coordinates. The coordinates are contained in the ordering of the matrix. Usually, each layer represents one attribute. For example, you might have a raster model to display the burn area of a forest fire, while a vector model would be more a more appropriate way to display highways. Keep in mind that you can represent a feature in either model, but usually, discrete features and data summarized by area are more likely to be a vector model, while something like a continuous category (such as temperature readings, or elevations) make more sense presented in a raster model.
This leads us to a shapefile, which is a vector file format for storing the location, shape, and attributes of a feature. This information is stored in a set of related files and contains one feature class. A feature class is a collection of spatial data with the same shape type (point, line, or polygon).
Projecting and Coordinating
The last topic on the tip of the iceberg of our exploration of GPS has to do with map projections and coordinate systems. While in theory this seems like it shouldn't be that difficult to grasp, personally we both find it to be pretty confusing. All of the data layers that are being used on the same map need to have the same projection and coordinate system. Otherwise, you'll have a pretty funky-looking map, because the layers will not overlay one another.
What exactly is a map projection? It's a mathematical model that transforms the locations of features on the Earth's curved surface to locations on a two-dimensional surface. There are a couple of different types of map projections (which we won't cover in detail here), but they all have one thing in common: distortion. You can see how projections and their distortions differ in Figure 5. A projection will distort distance, area, shape, direction, or a combination of any of these.
Figure 5. Projections and distortions
The coordinate system is the reference used for a set of points, line, or surfaces, and a set of rules used to define the positions of points in space. This can be either in two (x,y) or three (x,y,z) dimensions. Are all of those linear algebra classes coming back to you yet?
Before we wrap up the GIS basics here, let's talk about one more topic: attributes. Your maps and features will only be as good as your data and its attributes. Most of the time, you will probably be getting data supplied for you, but depending on the types of analysis you'll be doing, you might need to add your own attributes to the data. For example, if you want to categorize similar activities together, you might have a separate table column for each. This could be presented as a numeric value (i.e., "Return me all values that are in category 1"), or a text value (i.e., "Find me all values that are in the BUSINESS category").
Another frequent attribute is a rank, where in features are rated on a specified scale. For example, if roadways are ranked by safety, you can query, "Find me all the roads ranked 4 in safety," and know that these should be the first in line for road improvements. These are just a few samples of the types of attributes that can be applied to geographic features.
Wrapping Up Part 1
We've just spent a fair amount of time talking (or actually, writing) about GIS, but our intention is to lay the ground work for talking about how this fits into Java development. This is a relatively new area for Java development to take place. Of the 238 JSRs out there (at last count), there are zero in the GIS space. That should tell you that we're at the very beginning of the curve here. The types of applications that we'll be able to produce using GIS integration will be just astounding.
In Parts 2 and 3 of this series, we'll look at both the server and a mobile client. By using some of the cool features available in MIDP 2.0, we'll build a MIDLet that will connect to a server using network APIs and then, using a simple HTTP tunnel, we'll pass information to a servlet that will talk with a MapObjects client. In this series, we are making use of ESRI MapObjects for Java so that you can see what a production application might look like. If you are planning on doing any type of work in this area, the multitude of products available from ESRI are worth your time to investigate. ESRI is the front runner in this space, and there'd be a lot of catching up to do if you wanted to start somewhere else.
Now that we've got the basics down, stay tuned for Part 2 and Part 3, where we'll start getting into some server-side and J2ME coding.
Tom Whitehill is the co-founder of Mobilogics and has been working with Java since 1995.
All figure graphics provided by ESRI.