Implementing FeatureSource
¶
Now with the setup out of the way we can get to work.
CSVDataStore
¶
The first step is to create a basic DataStore that only supports feature extraction. We will read data from a CSV file into the GeoTools feature model.
To implement a DataStore
we will subclass ContentDataStore
. This is a helpful base class for
making new kinds of content available to GeoTools. The GeoTools library works with an interaction
model very similar to a database - with transactions and locks. ContentDataStore
is going to handle
all of this for us - as long as we can teach it how to access our content.
ContentDataStore
requires us to implement the following two methods:
createTypeNames()
createFeatureSource(ContentEntry entry)
The class ContentEntry
is a bit of a scratch pad used to keep track of things for each type.
Our initial implementation will result in a read-only datastore for accessing CSV content:
Set up a
org.geotools.tutorial.csv
package insrc/main/java
.To begin create the file
CSVDataStore
extendingContentDataStore
package org.geotools.tutorial.csv; import com.csvreader.CsvReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.io.Reader; import java.util.Collections; import java.util.List; import org.geotools.api.data.Query; import org.geotools.api.feature.type.Name; import org.geotools.data.store.ContentDataStore; import org.geotools.data.store.ContentEntry; import org.geotools.data.store.ContentFeatureSource; import org.geotools.feature.NameImpl; /** * DataStore for Comma Seperated Value (CSV) files. * * @author Jody Garnett (Boundless) */ public class CSVDataStore extends ContentDataStore {
We are going to be working with a single CSV file
File file; public CSVDataStore(File file) { this.file = file; }
Add the reader
/** * Allow read access to file; for our package visible "friends". Please close the reader when * done. * * @return CsvReader for file */ CsvReader read() throws IOException { Reader reader = new FileReader(file); CsvReader csvReader = new CsvReader(reader); return csvReader; }
Listing
TypeNames
A
DataStore
may provide access to several different data products. The methodcreateTypeNames
provides a list of the information being published.After all that lead-in you will be disappointed to note that our list will be a single value - the name of the CSV file.
protected List<Name> createTypeNames() throws IOException { String name = file.getName(); name = name.substring(0, name.lastIndexOf('.')); Name typeName = new NameImpl(name); return Collections.singletonList(typeName); }
Next we have the
createFeatureSource
method.This is used to create a
FeatureSource
which is used by client code to access content. There is no cache forFeatureSource
instances as they are managed directly by client code. Don’t worry that is not as terrible as it sounds, we do track all the information and resources made available to aFeatureSource
in aContentEntry
data structure. You can think of theFeatureSource
instances sent out into the wild as light weight wrappers aroundContentEntry
.It is worth talking a little bit about
ContentEntry
which is passed into this method as a parameter.ContentEntry
is used as a scratchpad holding all recorded information about the content we are working with.ContentEntry
also contains a back pointer to theContentDataStore
in case your implementation ofFeatureSource
needs to phone home.Implement
createFeatureSource
. Technically theContentEntry
is provided as “parameter object” holding the type name requested by the user, and any other context known to the DataStore.@Override protected ContentFeatureSource createFeatureSource(ContentEntry entry) throws IOException { return new CSVFeatureSource(entry, Query.ALL); }
CSVFeatureSource
¶
Next we can create the CSVFeatureSource
mentioned above. This class is responsible for providing access to the contents of our CSVDataStore
.
Note
The distinction between DataStore
and FeatureSource
can be difficult to demonstrate as our example consists of a single file. If it helps DataStore
is an object representing the file, service or database. FeatureSource
meanwhile represents the contents, data product, or table being published.
Create the file
CSVFeatureSource
.package org.geotools.tutorial.csv; import com.csvreader.CsvReader; import java.io.IOException; import org.geotools.api.data.FeatureReader; import org.geotools.api.data.Query; import org.geotools.api.feature.simple.SimpleFeature; import org.geotools.api.feature.simple.SimpleFeatureType; import org.geotools.api.filter.Filter; import org.geotools.data.store.ContentEntry; import org.geotools.data.store.ContentFeatureSource; import org.geotools.feature.simple.SimpleFeatureTypeBuilder; import org.geotools.geometry.jts.ReferencedEnvelope; import org.geotools.referencing.crs.DefaultGeographicCRS; import org.locationtech.jts.geom.Point; /** * Read-only access to CSV File. * * @author Jody Garnett (Boundless) */ public class CSVFeatureSource extends ContentFeatureSource { public CSVFeatureSource(ContentEntry entry, Query query) { super(entry, query); }
To assist others we can type narrow our
getDataStore()
method to explicitly to return aCSVDataStore
. In addition to being accurate, this prevents a lot of casts resulting in more readable code./** Access parent CSVDataStore. */ public CSVDataStore getDataStore() { return (CSVDataStore) super.getDataStore(); }
The method
getReaderInternal(Query)
used to provide streaming access to out data - reading one feature at a time. TheCSVFeatureReader
returned is similar to an iterator, and is implemented in the next section.protected FeatureReader<SimpleFeatureType, SimpleFeature> getReaderInternal(Query query) throws IOException { return new CSVFeatureReader(getState(), query); }
Note
The DataStore interface provides a wide range of functionality for client code access feature content.
Here at the implementation level we provide a single implementation of
getReaderInternal
. This method is used by the super classContentFeatureSource
to access our content. All the additional functionality from filtering to transaction independence is implemented using a combination of wrappers and post-processing.ContentFeatureSource
supports two common optimizations out of the box.You are required to implement the abstract method
getCountInternal(Query)
using any tips or tricks available to return a count of available features. If there is no quick way to generate this information returning-1
indicates that they Query must be handled feature by feature.For CSV files we can check to see if the
Query
includes all features - in which case we can skip over the header and quickly count the number of lines in our file. This is much faster than reading and parsing each feature one at a time.protected int getCountInternal(Query query) throws IOException { if (query.getFilter() == Filter.INCLUDE) { CsvReader reader = getDataStore().read(); try { boolean connect = reader.readHeaders(); if (connect == false) { throw new IOException("Unable to connect"); } int count = 0; while (reader.readRecord()) { count += 1; } return count; } finally { reader.close(); } } return -1; // feature by feature scan required to count records }
The second optimization requires an implementation of
getBoundsInternal(Query)
making use of any spatial index, or header, record the data bounds. This value is used when rendering to determine the clipping area./** * Implementation that generates the total bounds (many file formats record this information in * the header) */ protected ReferencedEnvelope getBoundsInternal(Query query) throws IOException { return null; // feature by feature scan required to establish bounds }
The next bit of work involves declaring what kind of information we have available.
In database terms the schema for a table is defined by the columns and the order they are declared in.
The
FeatureType
generated here is based on the CSV Header, along with a few educated guesses to recognizeLAT
andLON
columns as comprising a single Location.protected SimpleFeatureType buildFeatureType() throws IOException { SimpleFeatureTypeBuilder builder = new SimpleFeatureTypeBuilder(); builder.setName(entry.getName()); // read headers CsvReader reader = getDataStore().read(); try { boolean success = reader.readHeaders(); if (success == false) { throw new IOException("Header of CSV file not available"); } // we are going to hard code a point location // columns like lat and lon will be gathered into a // Point called Location builder.setCRS(DefaultGeographicCRS.WGS84); // <- Coordinate reference system builder.add("Location", Point.class); for (String column : reader.getHeaders()) { if ("lat".equalsIgnoreCase(column)) { continue; // skip as it is part of Location } if ("lon".equalsIgnoreCase(column)) { continue; // skip as it is part of Location } builder.add(column, String.class); } // build the type (it is immutable and cannot be modified) final SimpleFeatureType SCHEMA = builder.buildFeatureType(); return SCHEMA; } finally { reader.close(); } }
CSVFeatureReader
¶
FeatureReader
is similar to the Java Iterator
construct, with the addition of
FeatureType
(and IOExceptions
).
The class ContentState
is available to store any state required. Out of the box ContentState
provides a cache of FeatureType
, count and bounds. You are encouraged to create your own subclass of ContentState
to track additional state - examples include security credentials or a database connection.
Note
Sub-classing ContentState
is a key improvement made for ContentDataStore
. In our earlier base class we noticed many developers creating HashMaps
to cache individual results in an effort to improve performance. Inevitability there would be difficulty keeping these caches in sync. Breaking out an object to handle the state required for data access is vast improvement.
FeatureReader
interface:
FeatureReader.getFeatureType()
FeatureReader.next()
FeatureReader.hasNext()
FeatureReader.close()
To implement our FeatureReader
, we will need to do several things: open a File
and read through it
line by line, parsing Features
as we go. Because this class actually does some work, we are going to include a few more comments in the code to keep our heads on straight.
Create the class
CSVFeatureReader
as follows:/* * GeoTools Sample code and Tutorials by Open Source Geospatial Foundation, and others * https://docs.geotools.org * * To the extent possible under law, the author(s) have dedicated all copyright * and related and neighboring rights to this software to the public domain worldwide. * This software is distributed without any warranty. * * You should have received a copy of the CC0 Public Domain Dedication along with this * software. If not, see <http://creativecommons.org/publicdomain/zero/1.0/>. */ package org.geotools.tutorial.csv; import com.csvreader.CsvReader; import java.io.IOException; import java.util.NoSuchElementException; import org.geotools.api.data.FeatureReader; import org.geotools.api.data.Query; import org.geotools.api.feature.IllegalAttributeException; import org.geotools.api.feature.simple.SimpleFeature; import org.geotools.api.feature.simple.SimpleFeatureType; import org.geotools.data.store.ContentState; import org.geotools.feature.simple.SimpleFeatureBuilder; import org.geotools.geometry.jts.JTSFactoryFinder; import org.locationtech.jts.geom.Coordinate; import org.locationtech.jts.geom.GeometryFactory; public class CSVFeatureReader implements FeatureReader<SimpleFeatureType, SimpleFeature> { /** State used when reading file */ protected ContentState state; /** * Current row number - used in the generation of FeatureId. TODO: Subclass ContentState to * track row */ private int row; protected CsvReader reader; /** Utility class used to build features */ protected SimpleFeatureBuilder builder; /** Factory class for geometry creation */ private GeometryFactory geometryFactory; public CSVFeatureReader(ContentState contentState, Query query) throws IOException { this.state = contentState; CSVDataStore csv = (CSVDataStore) contentState.getEntry().getDataStore(); reader = csv.read(); // this may throw an IOException if it could not connect boolean header = reader.readHeaders(); if (!header) { throw new IOException("Unable to read csv header"); } builder = new SimpleFeatureBuilder(state.getFeatureType()); geometryFactory = JTSFactoryFinder.getGeometryFactory(null); row = 0; } /** Access FeatureType (documenting available attributes) */ public SimpleFeatureType getFeatureType() { return state.getFeatureType(); } }
Implement the iterator
next()
andhasNext()
methods using a field to hold the value to return next./** The next feature */ private SimpleFeature next; /** * Access the next feature (if available). * * @return SimpleFeature read from property file * @throws IOException If problem encountered reading file * @throws IllegalAttributeException for invalid data * @throws NoSuchElementException If hasNext() indicates no more features are available */ public SimpleFeature next() throws IOException, IllegalArgumentException, NoSuchElementException { SimpleFeature feature; if (next != null) { feature = next; next = null; } else { feature = readFeature(); } return feature; } /** * Check if additional content is available. * * @return <code>true</code> if additional content is available */ public boolean hasNext() throws IOException { if (next != null) { return true; } else { next = readFeature(); // read next feature so we can check return next != null; } }
Note
The
next()
andhasNext()
methods are allowed to throwIOExceptions
making these methods easy to implement. Most client code will use this implementation behind aFeatureIterator
wrapper that converts any problems to aRuntimeException
. A classic ease of implementation vs ease of use trade-off.Using the
CSVReader
library to parse the content saves a lot of work - and lets us focus on building features. The utility classFeatureBuilder
gathers up state, employing aFeatureFactory
on your behalf to construct each feature./** Read a line of content from CSVReader and parse into values */ SimpleFeature readFeature() throws IOException { if (reader == null) { throw new IOException("FeatureReader is closed; no additional features can be read"); } boolean read = reader.readRecord(); // read the "next" record if (read == false) { close(); // automatic close to be nice return null; // no additional features are available } Coordinate coordinate = new Coordinate(); for (String column : reader.getHeaders()) { String value = reader.get(column); if ("lat".equalsIgnoreCase(column)) { coordinate.y = Double.valueOf(value.trim()); } else if ("lon".equalsIgnoreCase(column)) { coordinate.x = Double.valueOf(value.trim()); } else { builder.set(column, value); } } builder.set("Location", geometryFactory.createPoint(coordinate)); return this.buildFeature(); } /** Build feature using the current row number to generate FeatureId */ protected SimpleFeature buildFeature() { row += 1; return builder.buildFeature(state.getEntry().getTypeName() + "." + row); }
Note
A key API contact is the construction of a unique
FeatureID
for each feature in the system. Our convention has been to prefix thetypeName
ahead of any native identifier (in this case row number). EachFeatureID
being unique is a consequence of following the OGC Feature Model used for Web Feature Server. These identifiers created here are employed in the generation of XML documents and need to follow the restrictions on XML identifiers.Finally we can
close()
theCSVFeatureReader
when no longer used. Returning any system resources (in this case an open file handle)./** Close the FeatureReader when not in use. */ public void close() throws IOException { if (reader != null) { reader.close(); reader = null; } builder = null; geometryFactory = null; next = null; }
Note
The
FeatureState
is not closed or disposed - as several threads may be making concurrent use of theCSVDataStore
.
CSVDataStoreFactory
¶
Now that we have implemented accessing and reading content what could possibly be left?
This is GeoTools so we need to wire in our new creation to the Factory Service Provider (SPI) plug-in system so that application can smoothly integrate our new creation.
To make your DataStore
truly independent and pluggable, you must create a class implementing the
DataStoreFactorySPI
interface.
This allows the Service Provider Interface mechanism to dynamically plug in your new DataStore
with
no implementation knowledge. Code that uses the DataStoreFinder
can just add the new DataStore
to
the classpath and it will work!
The DataStoreFactorySpi
provides information on the Parameters
required for construction.
DataStoreFactoryFinder
provides the ability to create DataStores
representing existing
information and the ability to create new physical storage.
Implementing
DataStoreFactorySPI
:The “no argument” constructor is required as it will be used by the Factory Service Provider (SPI) plug-in system.
getImplementationHints()
is used to report on any “Hints” used for configuration by our factory. As an example our factory could allow people to specify a specificFeatureFactory
to use when creating a feature for each line.
Create
CSVDataStoreFactory
as follows:/* * GeoTools Sample code and Tutorials by Open Source Geospatial Foundation, and others * https://docs.geotools.org * * To the extent possible under law, the author(s) have dedicated all copyright * and related and neighboring rights to this software to the public domain worldwide. * This software is distributed without any warranty. * * You should have received a copy of the CC0 Public Domain Dedication along with this * software. If not, see <http://creativecommons.org/publicdomain/zero/1.0/>. */ // header start package org.geotools.tutorial.csv; import java.awt.RenderingHints.Key; import java.io.File; import java.io.IOException; import java.util.Collections; import java.util.Map; import org.geotools.api.data.DataStore; import org.geotools.api.data.DataStoreFactorySpi; import org.geotools.util.KVP; /** Provide access to CSV Files. */ public class CSVDataStoreFactory implements DataStoreFactorySpi { /** * Public "no argument" constructor called by Factory Service Provider (SPI) entry listed in * META-INF/services/org.geotools.data.DataStoreFactorySPI */ public CSVDataStoreFactory() {} /** No implementation hints required at this time */ public Map<Key, ?> getImplementationHints() { return Collections.emptyMap(); }
We have a couple of methods to describe the DataStore.
This
isAvailable
method is interesting in that it can become a performance bottleneck if not implemented efficiently.DataStoreFactorySPI
factories are all called when a user attempts to connect, only the factories marked as available are shortlisted for further interaction.public String getDisplayName() { return "CSV"; } public String getDescription() { return "Comma delimited text file."; } /** Confirm DataStore availability, null if unknown */ Boolean isAvailable = null; /** * Test to see if this DataStore is available, for example if it has all the appropriate * libraries to construct an instance. * * <p>This method is used for interactive applications, so as to not advertise support for * formats that will not function. * * @return <tt>true</tt> if and only if this factory is available to create DataStores. */ public synchronized boolean isAvailable() { if (isAvailable == null) { try { Class cvsReaderType = Class.forName("com.csvreader.CsvReader"); isAvailable = true; } catch (ClassNotFoundException e) { isAvailable = false; } } return isAvailable; }
The user is expected to supply a
Map
of connection parameters when creating a datastore.The allowable connection parameters are described using
Param[]
. EachParam
describes akey
used to store the value in the map, and the expected Java type for the value. Additional fields indicate if the value is required and if a default value is available.This array of parameters form an API contract used to drive the creation of user interfaces.
The API contract is open ended (we cannot hope to guess all the options needed in the future). The helper class
KVP
provides an easy to use implementation ofMap<String,Object>
. The keys used here are formally defined as static constants - complete with javadoc describing their use. If several authors agree on a new hint it will be added to these static constants./** Parameter description of information required to connect */ public static final Param FILE_PARAM = new Param( "file", File.class, "Comma seperated value file", true, null, new KVP(Param.EXT, "csv")); public Param[] getParametersInfo() { return new Param[] {FILE_PARAM}; }
Note
Does anything really use this?
The
gt-swing
module is able to construct a user interface based on theseParam
descriptions. The uDig and GeoServer projects have similar auto-generated screens.Next we have some code to check if a set of provided connection parameters can actually be used.
/** * Works for csv file. * * @param params connection parameters * @return true for connection parameters indicating a csv file */ public boolean canProcess(Map<String, ?> params) { try { File file = (File) FILE_PARAM.lookUp(params); if (file != null) { return file.getPath().toLowerCase().endsWith(".csv"); } } catch (IOException e) { // ignore as we are expected to return true or false } return false; }
Armed with a map of connection parameters we can now create a
Datastore
for an existing CSV file.Here is the code that finally calls our
CSVDataStore
constructor:public DataStore createDataStore(Map<String, ?> params) throws IOException { File file = (File) FILE_PARAM.lookUp(params); return new CSVDataStore(file); }
How about creating a DataStore for a new CSV file?
Since initially our
DataStore
is read-only we will just throw anUnsupportedOperationException
at this time.public DataStore createNewDataStore(Map<String, ?> params) throws IOException { throw new UnsupportedOperationException("CSV Datastore is read only"); }
The Factory Service Provider (SPI) system operates by looking at the META-INF/services folder and checking for implementations of
DataStoreFactorySpi
To “register” our
CSVDataStoreFactory
please create the following in src/main/resources/:META-INF/services/org.geotools.api.data.DataStoreFactorySpi
This file requires the file name of the factory that implements the
DataStoreSpi
interface.Fill in the following content for your
org.geotools.api.data.DataStoreFactorySpi
file:org.geotools.tutorial.csv.CSVDataStoreFactory
That is it, in the next section we will try out your new DataStore.