XML FAQ¶
Q: What parser does GeoTools use?¶
We have a range of tools to help you with parsing and encoding XML. As a general purpose library GeoTools does not have the luxury of choosing a parser to work with.
The following parsers are based on W3C APIs (SAX, DOM, and XML Transform). You can choose the most appropriate solution for the system you want to use GeoTools in.
Technology |
SAX |
DOM |
Encode |
Support |
Notes |
---|---|---|---|---|---|
SAX |
sax |
dom |
Filter GML SLD |
Hard to trace through, parse not easily extended |
|
DOM |
dom |
Filter GML SLD |
Forgiving and easy to trace through and debug, memory limitation for GIS data |
||
Transform |
XML |
Filter GML2 SLD |
Easy to trace through and debug, difficult to configure for specific data |
||
JABX |
sax |
dom |
XML |
n/a |
Fast but not suitable for dynamic data, precompiled |
Pull |
sax |
dom |
n/a |
Should combine the ease of DOM with the streaming performance of XDO and GTXML |
|
XDO |
sax |
dom |
XML |
Filter GML SLD WMS WFS1.0 XSD |
Proof of concept of schema assisted parsing allowing streaming into Java Objects. Code is fast and well tested but is hard to trace through |
GTXML |
sax |
dom |
XML |
Filter GML SLD WMS WFS1.0 WFS1.1 WPS XSD |
Schema assisted parsing backed by Eclipse XSD to represent schema. Easier to trace through but still not straight forward Allows streaming for large GIS data volumes. |
The XDO and GTXML solutions are configuration based. You set up a configuration matching the data you wish to work with and the technology will make use of any XML schema available to assist in parsing or encoding your content correctly.
The JAXB library from Java is a little different in that you start from an XML schema and generate a parser or encoder class for use. This solution performs well but cannot be used for dynamic content such as GML.
Q: What is SAX?¶
SAX is a W3C technology from the early days of XML. SAX Parsers work using callbacks, they pass control between several hard coded implementations. For basic use you create your own SAX Parser (say responding to a new Geometry being parsed) and pass control off to the GeoTools implementations and wait for it to call you.
Allows streaming for large content.
Rather tricky to set up
Reuse of a SAX parser is possible, but is very tricky to reuse
As a result SAX parsers are rather “brittle” and difficult to maintain are currently hard coded to pass control between themselves, making support for new specifications tricky.
Example: while parsing a Filter control will be need to be handed over to a Geometry parser if the expression contains a literal Geometry.
The later Schema Assisted parsers are an attempt to let mere mortals create a tree of handlers on the fly (they use the schema document to do a bunch of the grunt work) that is hard coded for the SAX Parsers.
Q: What is DOM?¶
The W3C Document Object Model (DOM) is most popularly used to represent the the contents of a web page as a series of Nodes. These notes form a tree structure and can be used to represent the contents of an XML document in memory.
As a result the GeoTools DOM parsers are functions that can wander through an in memory DOM doing their best to extract content.
Delegation is hard coded in much the same way as with the SAX parsers.
accessible easy to understand technology
very nice for quick examples
obvious how to use the parser
solution does not scale to large content as steaming not available
can only handle direct use of GML
additional coding is always required to parse your own content
Q: What is XML Transform?¶
Traditional XML generation, traverse your data structure and call methods to generate as you go.
very fast, so fast we use it for GeoServer even though it is hard to maintain
scalable (does not load features into memory)
not open ended
need to carefully provide hints before use
may revisit data several times (for bounding box and then content)
Q: Schema Assisted -did you make that up?¶
Yes we did.
This idea of “Schema Assisted” parsers is a GeoTools specific piece of technology. Then general idea is to makes use of XML Schema information to minimize the amount of code you need to write.
A parser is supplied a configuration of bindings; each binding maps an XML elements or XML attributes to Java class.
While this sounds similar to other XML parsing technologies we do have a couple of key differences:
taking special care to pay attention to the schema at runtime (so we can parse new documents using the “best” binding available rather than fail)
Ensure that data is not loaded into memory; allowing us to “stream” the XML document through an application.
We are on our third generation schema assisted parser.
Q: XDO?¶
XML Data Objects (XDO) is the third generation of our schema assisted parser idea (where the SAX bindings are referenced by the XMLSchema rather then directly hard coded). This is fast scalable solution that supports reading and writing.
fast and proven solution for geospatial data
ability to handle MASSIVE content like FeatureCollections
how to create new bindings is not obvious
Q: GTXML?¶
GeoTools XML (GTXML) is the fourth generation schema assisted parser, using the XML Schema data structure (rather then hard coding) to figure out what binding to call. The XSD is used to hold our representation of the schema at runtime.
schema aware allowing use of new content without additional coding
code generator for making custom bindings
streaming content for MASSIVE content like feature collections
support for content generation
hard to debug and trace through the parsing or encoding process
code generator available to jump start the development of bindings
examples how how to use Eclipse Modeling Objects (EMF) based bindings to work directly from the schema
Q: Why doesn’t GeoTools use JAXB?¶
JAXB is a set of Java technologies (now included as part of Java 6) that are able to generate a parser form an XML schema.
A couple of groups have used JAXB have bind things such as OGC Filter. In general works well; however it does have trouble responding to content that is negotiated dynamically … such as GML.
Q: For WFS why does Parser return a Map?¶
This shows up as the following error:
ClassCastException java.util.HashMap cannot be cast to FeatureCollection
To understand this error please remember that the GML returned by a WFS GetFeatures request is a normal XML file, with a reference to an XML schema at the top.
For a WFS GetFeature response the schema reference is usually a DescribeFeatureType call that returns an XML Schema.
If this schema is incorrectly configured (common with MapServer) or cannot be reached (common with restricted
environments) our parser will give up guessing what is a Feature and just return the values in a HashMap
.
Note
If you are using the GML utility class it will perform a bit of analysis and create an ad-hoc FeatureType in order to return you Features.
looking at the HashMaps returned
building a feature type that matches that kind of contents
building features that match that FeatureType
This is similar to the approach taken by OGR; OGR figures out where the “geometry” is; and then goes up two levels and assumes those things are features.
Both of these approaches are strictly a work around for a common problem of misconfigured WFS servers.
Here is how to review the configuration of your WFS Server:
If you are having a problem with a
HashMap
being returned when you expect a Feature, you should check GetCapabilities responses for remote server e.g. with browser:http://{URL}?SERVICE=WFS&VERSION=1.1.0&REQUEST=GetCapabilities
After that, ensure that
<ows:Operation>
elements contain URLs that actually workIn particular check that
DescribeFeatureType
responds with the expected XML Schema
Q: I am in a restricted environment, how to configure SchemaLocator?¶
You can configure the Parser with a SchemaLocator
(we use this internally to force the parser
to a copy of the GML schema included in the GeoTools jars; rather than force the Parser to download
the GML schema each time).
To customize how XML Schemas are located in a restricted environment (such as web portal for multiple WFS services that require authorization, or require the use of an HTTP proxy for schema requests).
Create custom SchemaLocator we start with configuration like so:
GMLConfiguration configuration = new GMLConfiguration() { public void configureContext(final MutablePicoContainer container) { super.configureContext(container); String username = "geotools user"; String password = "support osgeo"; Boolean useProxy = true; XSDSchemaLocator locator = new CachingSchemaLocator(username, password, useProxy); QName key = new QName("mycustom", "schemaLocator"); container.registerComponentInstance(key, locator); } };
In the above code, CachingSchemaLocator is a custom XSDSchemaLocator:
public class CachingSchemaLocator implements XSDSchemaLocator { public XSDSchema locateSchema(XSDSchema schema, String namespaceURI, String rawSchemaLocationURI, String resolvedSchemaLocationURI) { ... Implementation ... } }
Set up a configuration for use with the Parser:
Parser parser = new Parser(configuration); parser.setValidating(false); parser.setFailOnValidationError(false); parser.setStrict(false); FeatureCollection<SimpleFeatureType, SimpleFeature> features = parser.parse(... WFS response InputStream Here ...);