Shapefile Plugin

Allows the GeoTools library to work with ESRI shapefiles.

References

Maven:

<dependency>
   <groupId>org.geotools</groupId>
   <artifactId>gt-shapefile</artifactId>
   <version>${geotools.version}</version>
 </dependency>

Connection Parameters

The following connection parameters are available:

Parameter

Description

url

A URL of the file ending in shp or shp.gz (or in dbf or dbf.gz)

namespace

Optional: URI to use for the FeatureType

create spatial index

Optional: Use Boolean.TRUE to create an index

charset

Optional: Charset used to decode strings in the DBF file

timezone

Optional: Timezone used to parse dates in the DBF file

memory mapped buffer

Optional: memory map the files (not recommended for large files under windows, defaults to false)

cache memory maps

Optional: when memory mapping, cache and reuse memory maps (defaults to true)

create spatial index

Optional: if false, won’t try to create a spatial index if missing (defaults to true)

enable spatial index

Optional: if false, the spatial index won’t be used even if available (and won’t be created if missing.

This information is also in the javadocs .

Internally gt-shape provides a two implementations at this time; one for simple access and another that supports the use of an index. The factory will be able to sort out which one is appropriate when using DataStoreFinder or FileDataStoreFinder.

Shapefile

A Shapefile is a common file format which contains numerous features of the same type. Each shapefile has a single feature type.

The classic three files:

  • filename.shp: shapes

  • filename.shx: shapes to attributes index

  • filename.dbf: attributes

Basic metadata: * filename.prj: projection

Open source extensions:

  • filename.qix: quadtree spatial index

  • filename.fix: feature id index

  • filename.sld: Styled Layer Descriptor style XML object

ESRI proprietary extensions (ignored by GeoTools):

  • filename.sbn: attribute index

  • filename.sbx: spatial index

  • filename.lyr: ArcMap-only style object

  • filename.avl: ArcView style object

  • filename.shp.xml: FGDC metadata

This style of file format (from the dawn of time) is referred to as “sidecar” files, at a minimum file filename.shp and its sidecar file filename.dbf are needed.

If the DataStore is used for reading only, the files may be gzip-ped and marked by the additional filename extension .gz.

If the shp or shp.gz file is missing, features are furnished without geometries. Thus only a dbf or a dbf.gz file needs to be present. The given URL may end in shp, shp.gz, dbf or dbf.gz

Access

Working with an Existing Shapefile:

        File file = new File("example.shp");
        Map<String, Object> map = new HashMap<>();
        map.put("url", file.toURI().toURL());

        DataStore dataStore = DataStoreFinder.getDataStore(map);
        String typeName = dataStore.getTypeNames()[0];

        FeatureSource<SimpleFeatureType, SimpleFeature> source = dataStore.getFeatureSource(typeName);
        Filter filter = Filter.INCLUDE; // ECQL.toFilter("BBOX(THE_GEOM, 10,20,30,40)")

        FeatureCollection<SimpleFeatureType, SimpleFeature> collection = source.getFeatures(filter);
        try (FeatureIterator<SimpleFeature> features = collection.features()) {
            while (features.hasNext()) {
                SimpleFeature feature = features.next();
                System.out.print(feature.getID());
                System.out.print(": ");
                System.out.println(feature.getDefaultGeometryProperty().getValue());
            }
        }

Creating

Here is a quick example:

        FileDataStoreFactorySpi factory = new ShapefileDataStoreFactory();

        File file = new File("my.shp");
        Map<String, ?> map = Collections.singletonMap("url", file.toURI().toURL());

        DataStore myData = factory.createNewDataStore(map);
        SimpleFeatureType featureType =
                DataUtilities.createType("my", "geom:Point,name:String,age:Integer,description:String");
        myData.createSchema(featureType);

The featureType created above was just done quickly, in your application you may wish to use a DefaultFeatureTypeBuilder.

Supports:

  • attribute names must be 15 characters or you will get a warning:

  • a single geometry column named the_geom (stored in the SHP file) * LineString, MultiLineString * Polygon, MultiPolygon * Point, MultiPoint

    Geometries can also contain a measure (M) value or Z & M values.

  • “simple” attributes (stored in the DBF file)

    • String max length of 255

    • Integer

    • Double

    • Boolean

    • Date - TimeStamp interpretation that is just the date

Limitations:

  • only work with MultiLineStrings, MultiPolygon or MultiPoint. GIS data often travels in herds - so being restricted to the plural form is not a great limitation.

  • only work with fixed length strings (you will find the FeatureType has a restriction to help you check this, and warnings will be produced if your content ends up trimmed).

  • Only supports a single GeometryAttribute

  • Shapefile does not support plain Geometry (i.e. mixed LineString, Point and Polygon all in the same file).

  • The shapefile maximum size is limited to 2GB (its sidecar DBF file often to 2GB, some system being able to read 4GB or more)

  • Dates do not support the storage of time by default. If you must store time stamps and do not need interoperability then you can enable the storage of time in date columns by setting the system property org.geotools.shapefile.datetime to “true”. Almost no other program will be able to read these files.

Dumping almost anything into a shapefile

In case the feature collection to be turned into a shapefile is not fitting the shapefile format limitations it’s still possible to create shapefiles out of it, at ease, leaving all the structural bridging work to the ShapefileDumper class.

In particular, given one or more feature collections, the dumper will:

  • Reduce attribute names to the DBF accepted length, making sure there are not conflicts (counters being added at the end of the attribute name to handle this).

  • Fan out multiple geometry type into parallel shapefiles, named after the original feature type, plus the geometry type as a suffix.

  • Fan out multiple shapefiles in case the maximum size is reached.

Example usage:

        ShapefileDumper dumper = new ShapefileDumper(new File("./target/demo"));
        // optiona, set a target charset (ISO-8859-1 is the default)
        dumper.setCharset(Charset.forName("ISO-8859-15"));
        // split when shp or dbf reaches 100MB
        int maxSize = 100 * 1024 * 1024;
        dumper.setMaxDbfSize(maxSize);
        dumper.setMaxDbfSize(maxSize);
        // actually dump data
        SimpleFeatureCollection fc = getFeatureCollection();
        dumper.dump(fc);

Force Projection

If you run the above code, and then load the result in a GIS application like ArcMap it will complain that the projection is unknown.

You can “force” the projection using the following code:

CoordinateReferenceSystem crs = CRS.decode("EPSG:4326");
shape.forceSchemaCRS( crs );

This is only a problem if you did not specify the CoordinateReferenceSystem as part of your FeatureType’s GeometryAttribute, or if a prj file has not been provided.

Character Sets

If you are working with Arabic, Chinese or Korean character sets you will need to make use of the charset connection parameter when setting up your shapefile. The codes used here are the same as documented/defined for the Java Charset class. Indeed you can provide a Charset or if you provide a String it will be converted to a Charset.

Thanks to the University of Seoul for providing and testing this functionality.

Timezone

The store will build dates using the default timezone. If you need to work against meteorological data the timezone has normally to be forced to “UTC” instead.

Reading PRJ

You can use the CRS utility class to read the PRJ file if required. The contents of the file are in “well known text”:

CoordinateReferenceSystem crs = CRS.parseWKT(wkt);

Reading DBF

A shapefile is actually comprised of a core shp file and a number of “sidecar” files. One of the sidecar files is a dbf file used to record attributes. This is the original DBF file format provided by one of the original grandfather databases “DBase III”.

        File file = new File("my.shp");
        FileDataStore myData = FileDataStoreFinder.getDataStore(file);
        SimpleFeatureSource source = myData.getFeatureSource();
        SimpleFeatureType schema = source.getSchema();

        Query query = new Query(schema.getTypeName());
        query.setMaxFeatures(1);

        FeatureCollection<SimpleFeatureType, SimpleFeature> collection = source.getFeatures(query);
        try (FeatureIterator<SimpleFeature> features = collection.features()) {
            while (features.hasNext()) {
                SimpleFeature feature = features.next();
                System.out.println(feature.getID() + ": ");
                for (Property attribute : feature.getProperties()) {
                    System.out.println("\t" + attribute.getName() + ":" + attribute.getValue());
                }
            }
        }

The GeoTools library includes just enough DBF file format support to get out of bed in the morning; indeed you should considered these facilities an internal detail to our shapefile reading code.

Thanks to Larry Reeder form the user list for supplying the following code example:

// Here's an example that should work (warning, I haven't
// tried to compile this).  The example assumes the first field has a
// character data type and the second has a numeric data type:

FileInputStream fis = new FileInputStream( "yourfile.dbf" );
DbaseFileReader dbfReader =  new DbaseFileReader(fis.getChannel(),
false,  Charset.forName("ISO-8859-1"));

while ( dbfReader.hasNext() ){
   final Object[] fields = dbfReader.readEntry();

   String field1 = (String) fields[0];
   Integer field2 = (Integer) fields[1];

   System.out.println("DBF field 1 value is: " + field1);
   System.out.println("DBF field 2 value is: " + field2);
}

dbfReader.close();
fis.close();