Introducing CSVDataStore¶
In our initial feature tutorial we provided a code snippet to read in comma separated value file and produce feature collection.
In this tutorial we will build a CSV DataStore, and in the process explore several aspects of how DataStore
s work and best to make use of them.
If you would like to follow along with this workshop, start a new Java project in your favorite IDE, and ensure GeoTools is on your CLASSPATH (using maven or downloading the jars).
Note
Terminology
DataStore borrows most of its concepts (and some of its syntax) from the Open Geospatial Consortium (OGC) Web Feature Server Specification:
Feature - atomic unit of geographic information
FeatureType - keeps track of what attributes each Feature can hold
FeatureId - a unique id associated with each Feature (must start with a non-numeric character)
FID - same as FeatureId
Schema - same as FeatureType (familiar to database developers)
Here is the sample locations.csv
file:
LAT, LON, CITY, NUMBER, YEAR
46.066667, 11.116667, Trento, 140, 2002
44.9441, -93.0852, St Paul, 125, 2003
13.752222, 100.493889, Bangkok, 150, 2004
45.420833, -75.69, Ottawa, 200, 2004
44.9801, -93.251867, Minneapolis, 350, 2005
46.519833, 6.6335, Lausanne, 560, 2006
48.428611, -123.365556, Victoria, 721, 2007
-33.925278, 18.423889, Cape Town, 550, 2008
-33.859972, 151.211111, Sydney, 436, 2009
41.383333, 2.183333, Barcelona, 914, 2010
39.739167, -104.984722, Denver, 869, 2011
52.95, -1.133333, Nottingham, 800, 2013
45.52, -122.681944, Portland, 840, 2014
37.5667,129.681944,Seoul,473,2015
50.733992,7.099814,Bonn,700,2016
42.3601, -71.0589, Boston, 800, 2017
The first line of our CSV file is a header that provides the column names:
LAT, LON, CITY, NUMBER, YEAR
Each column name is treated as a simple String. More complicated formats have the option of isolating names into different name spaces.
Each subsequent line is used to capture a single feature of information suitable for mapping.
46.066667, 11.116667, Trento, 140, 2002
In our example the LAT and LON information represents a POINT(46.066667, 11.116667), the CITY Trento and the NUMBER 140 and YEAR 2002 capture details of the GRASS users conference (and one of the earliest Free and Open Source Software for Geomatics (FOSS4G) events).
Approach to Parsing CSV¶
Here is our strategy for representing GeoTools concepts with a CSV file.
FeatureID
orFID
- uniquely defines a Feature.We will use the row number in our CSV file.
FeatureType
NameSame as the name of the
.csv
file (i.e. “locations” forlocations.csv
.)DataStore
We will create a
CSVDataStore
to access all theFeatureTypes
(.csv
files) in a directoryFeatureType
or SchemaWe will represent the names of the columns in our CSV (and if possible their types).
Geometry
Initially we will try to recognize several columns and map them into Point x and y ordinates. This technique is used to handle content from websites such as geonames.
We can also look at parsing a column using the Well-Known-Text representation of a Geometry.
# CoordinateReferenceSystem
Look for a
prj
sidecar file (i.e.:file:locations.prj forlocations.csv
.)
JavaCSV Reader¶
Rather than go through the joy of parsing a CSV file by hand, we are going to make use of a library to read CSV files.
The JavaCSV
project looks nice and simple and is available in maven:
http://opencsv.sourceforge.net (Apache 2.0)
For our purposes a key benefit of this implementation is streaming - it will read one line at a time and avoid loading the entire file into memory.
References:
Comparison of Java CSV libraries (Robert Bor)
Time to create a new project making use of this library:
Create a new project:
Using Eclipse:
to create a Maven Project with group org.geotools.tutorial and name csv.Using Maven:
mvn archetype:generate -DgroupId=org.geotools.tutorial -DartifactId=csv -Dversion=1.0-SNAPSHOT -DarchetypeGroupId=org.apache.maven.archetypes -DarchetypeArtifactId=maven-archetype-quickstart
Fill in project details, paying careful attention to the gt.version property you wish to use. You can choose a stable release (recommended) or use 32-SNAPSHOT for access to the latest nightly build.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.geotools.tutorial</groupId> <artifactId>csv</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging> <name>CSV DataStore</name> <description>CSV DataStore tutorial</description> <url>http://docs.geotools.org/latest/userguide/tutorial/datastore</url> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <geotools.version>32-SNAPSHOT</geotools.version> </properties> </project>
Add the following dependencies:
<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.13.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.geotools</groupId> <artifactId>gt-main</artifactId> <version>${geotools.version}</version> </dependency> <dependency> <groupId>org.geotools</groupId> <artifactId>gt-cql</artifactId> <version>${geotools.version}</version> </dependency> <dependency> <groupId>org.geotools</groupId> <artifactId>gt-epsg-hsql</artifactId> <version>${geotools.version}</version> </dependency> <dependency> <groupId>net.sourceforge.javacsv</groupId> <artifactId>javacsv</artifactId> <version>2.0</version> </dependency> </dependencies>
Available from these repositories:
<repositories> <repository> <id>osgeo</id> <name>OSGeo Release Repository</name> <url>https://repo.osgeo.org/repository/release/</url> <snapshots><enabled>false</enabled></snapshots> <releases><enabled>true</enabled></releases> </repository> <repository> <id>osgeo-snapshot</id> <name>OSGeo Snapshot Repository</name> <url>https://repo.osgeo.org/repository/snapshot/</url> <snapshots><enabled>true</enabled></snapshots> <releases><enabled>false</enabled></releases> </repository> </repositories>
Finally we get to switch to Java 11:
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>11</source> <target>11</target> </configuration> </plugin> </plugins> </build>
You can check against the completed
pom.xml
Create a directory src/test/resources and in there create package org.geotools.tutorial.csv. Then add
locations.csv
to this package.package:
org.geotools.tutorial.csv
file:
locations.csv
LAT, LON, CITY, NUMBER, YEAR 46.066667, 11.116667, Trento, 140, 2002 44.9441, -93.0852, St Paul, 125, 2003 13.752222, 100.493889, Bangkok, 150, 2004 45.420833, -75.69, Ottawa, 200, 2004 44.9801, -93.251867, Minneapolis, 350, 2005 46.519833, 6.6335, Lausanne, 560, 2006 48.428611, -123.365556, Victoria, 721, 2007 -33.925278, 18.423889, Cape Town, 550, 2008 -33.859972, 151.211111, Sydney, 436, 2009 41.383333, 2.183333, Barcelona, 914, 2010 39.739167, -104.984722, Denver, 869, 2011 52.95, -1.133333, Nottingham, 800, 2013 45.52, -122.681944, Portland, 840, 2014 37.5667,129.681944,Seoul,473,2015 50.733992,7.099814,Bonn,700,2016 42.3601, -71.0589, Boston, 800, 2017
Download
locations.csv
.Below is a
JUnit4
test case to confirmJavaCSV
is available and can read our file. Create a directory src/test/java and in there create package org.geotools.tutorial.csv. Then addCSVTest.java
to the package:/* * GeoTools Sample code and Tutorials by Open Source Geospatial Foundation, and others * https://docs.geotools.org * * To the extent possible under law, the author(s) have dedicated all copyright * and related and neighboring rights to this software to the public domain worldwide. * This software is distributed without any warranty. * * You should have received a copy of the CC0 Public Domain Dedication along with this * software. If not, see <http://creativecommons.org/publicdomain/zero/1.0/>. */ package org.geotools.tutorial.csv; import static org.junit.Assert.assertTrue; import com.csvreader.CsvReader; import java.io.File; import java.io.FileReader; import java.io.Serializable; import java.net.URL; import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; import org.geotools.api.data.DataStore; import org.geotools.api.data.DataStoreFinder; import org.geotools.api.data.FeatureReader; import org.geotools.api.data.Query; import org.geotools.api.data.SimpleFeatureSource; import org.geotools.api.data.Transaction; import org.geotools.api.feature.Property; import org.geotools.api.feature.simple.SimpleFeature; import org.geotools.api.feature.simple.SimpleFeatureType; import org.geotools.api.feature.type.AttributeDescriptor; import org.geotools.api.feature.type.GeometryDescriptor; import org.geotools.api.filter.Filter; import org.geotools.api.filter.FilterFactory; import org.geotools.api.filter.identity.FeatureId; import org.geotools.data.DataUtilities; import org.geotools.data.simple.SimpleFeatureCollection; import org.geotools.data.simple.SimpleFeatureIterator; import org.geotools.factory.CommonFactoryFinder; import org.geotools.feature.DefaultFeatureCollection; import org.geotools.filter.text.cql2.CQL; import org.geotools.referencing.CRS; import org.junit.Test; import org.locationtech.jts.geom.Geometry; public class CSVTest { @Test public void test() throws Exception { List<String> cities = new ArrayList<>(); URL url = CSVTest.class.getResource("locations.csv"); File file = new File(url.toURI()); try (FileReader reader = new FileReader(file)) { CsvReader locations = new CsvReader(reader); locations.readHeaders(); while (locations.readRecord()) { cities.add(locations.get("CITY")); } } assertTrue(cities.contains("Victoria")); } }