Introducing CSVDataStore

In our initial feature tutorial we provided a code snippet to read in comma separated value file and produce feature collection.

In this tutorial we will build a CSV DataStore, and in the process explore several aspects of how DataStores work and best to make use of them.

If you would like to follow along with this workshop, start a new Java project in your favorite IDE, and ensure GeoTools is on your CLASSPATH (using maven or downloading the jars).

Note

Terminology

DataStore borrows most of its concepts (and some of its syntax) from the Open Geospatial Consortium (OGC) Web Feature Server Specification:

  • Feature - atomic unit of geographic information

  • FeatureType - keeps track of what attributes each Feature can hold

  • FeatureId - a unique id associated with each Feature (must start with a non-numeric character)

  • FID - same as FeatureId

  • Schema - same as FeatureType (familiar to database developers)

Here is the sample locations.csv file:

LAT, LON, CITY, NUMBER, YEAR
46.066667, 11.116667, Trento, 140, 2002
44.9441, -93.0852, St Paul, 125, 2003
13.752222, 100.493889, Bangkok, 150, 2004
45.420833, -75.69, Ottawa, 200, 2004
44.9801, -93.251867, Minneapolis, 350, 2005
46.519833, 6.6335, Lausanne, 560, 2006
48.428611, -123.365556, Victoria, 721, 2007
-33.925278, 18.423889, Cape Town, 550, 2008
-33.859972, 151.211111, Sydney, 436, 2009
41.383333, 2.183333, Barcelona, 914, 2010
39.739167, -104.984722, Denver, 869, 2011
52.95, -1.133333, Nottingham, 800, 2013
45.52, -122.681944, Portland, 840, 2014
37.5667,129.681944,Seoul,473,2015
50.733992,7.099814,Bonn,700,2016
42.3601, -71.0589, Boston, 800, 2017

The first line of our CSV file is a header that provides the column names:

LAT, LON, CITY, NUMBER, YEAR

Each column name is treated as a simple String. More complicated formats have the option of isolating names into different name spaces.

Each subsequent line is used to capture a single feature of information suitable for mapping.

46.066667, 11.116667, Trento, 140, 2002

In our example the LAT and LON information represents a POINT(46.066667, 11.116667), the CITY Trento and the NUMBER 140 and YEAR 2002 capture details of the GRASS users conference (and one of the earliest Free and Open Source Software for Geomatics (FOSS4G) events).

Approach to Parsing CSV

Here is our strategy for representing GeoTools concepts with a CSV file.

  • FeatureID or FID - uniquely defines a Feature.

    We will use the row number in our CSV file.

  • FeatureType Name

    Same as the name of the .csv file (i.e. “locations” for locations.csv.)

  • DataStore

    We will create a CSVDataStore to access all the FeatureTypes (.csv files) in a directory

  • FeatureType or Schema

    We will represent the names of the columns in our CSV (and if possible their types).

  • Geometry

    Initially we will try to recognize several columns and map them into Point x and y ordinates. This technique is used to handle content from websites such as geonames.

    We can also look at parsing a column using the Well-Known-Text representation of a Geometry.

# CoordinateReferenceSystem

Look for a prj sidecar file (i.e.:file:locations.prj for locations.csv .)

JavaCSV Reader

Rather than go through the joy of parsing a CSV file by hand, we are going to make use of a library to read CSV files.

The JavaCSV project looks nice and simple and is available in maven:

For our purposes a key benefit of this implementation is streaming - it will read one line at a time and avoid loading the entire file into memory.

References:

Time to create a new project making use of this library:

  1. Create a new project:

    • Using Eclipse: New ‣ Project to create a Maven Project with group org.geotools.tutorial and name csv.

    • Using Maven: mvn archetype:generate -DgroupId=org.geotools.tutorial -DartifactId=csv -Dversion=1.0-SNAPSHOT -DarchetypeGroupId=org.apache.maven.archetypes -DarchetypeArtifactId=maven-archetype-quickstart

  2. Fill in project details, paying careful attention to the gt.version property you wish to use. You can choose a stable release (recommended) or use 32-SNAPSHOT for access to the latest nightly build.

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
     <modelVersion>4.0.0</modelVersion>
    
     <groupId>org.geotools.tutorial</groupId>
     <artifactId>csv</artifactId>
     <version>0.0.1-SNAPSHOT</version>
     <packaging>jar</packaging>
    
     <name>CSV DataStore</name>
     <description>CSV DataStore tutorial</description>
     <url>http://docs.geotools.org/latest/userguide/tutorial/datastore</url>
    
      <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <geotools.version>32-SNAPSHOT</geotools.version>
      </properties>
    
    </project>
    
  3. Add the following dependencies:

      <dependencies>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>4.13.1</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>org.geotools</groupId>
          <artifactId>gt-main</artifactId>
          <version>${geotools.version}</version>
        </dependency>
        <dependency>
          <groupId>org.geotools</groupId>
          <artifactId>gt-cql</artifactId>
          <version>${geotools.version}</version>
        </dependency>
        <dependency>
          <groupId>org.geotools</groupId>
          <artifactId>gt-epsg-hsql</artifactId>
          <version>${geotools.version}</version>
        </dependency>
        <dependency>
          <groupId>net.sourceforge.javacsv</groupId>
          <artifactId>javacsv</artifactId>
          <version>2.0</version>
        </dependency>
      </dependencies>
    
  4. Available from these repositories:

      <repositories>
        <repository>
          <id>osgeo</id>
          <name>OSGeo Release Repository</name>
          <url>https://repo.osgeo.org/repository/release/</url>
          <snapshots><enabled>false</enabled></snapshots>
          <releases><enabled>true</enabled></releases>
        </repository>
        <repository>
          <id>osgeo-snapshot</id>
          <name>OSGeo Snapshot Repository</name>
          <url>https://repo.osgeo.org/repository/snapshot/</url>
          <snapshots><enabled>true</enabled></snapshots>
          <releases><enabled>false</enabled></releases>
        </repository>
      </repositories>
    
  5. Finally we get to switch to Java 11:

      <build>
        <plugins>
          <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
              <source>11</source>
              <target>11</target>
            </configuration>
          </plugin>
        </plugins>
      </build>
    
  6. You can check against the completed pom.xml

  7. Create a directory src/test/resources and in there create package org.geotools.tutorial.csv. Then add locations.csv to this package.

    • package: org.geotools.tutorial.csv

    • file: locations.csv

    LAT, LON, CITY, NUMBER, YEAR
    46.066667, 11.116667, Trento, 140, 2002
    44.9441, -93.0852, St Paul, 125, 2003
    13.752222, 100.493889, Bangkok, 150, 2004
    45.420833, -75.69, Ottawa, 200, 2004
    44.9801, -93.251867, Minneapolis, 350, 2005
    46.519833, 6.6335, Lausanne, 560, 2006
    48.428611, -123.365556, Victoria, 721, 2007
    -33.925278, 18.423889, Cape Town, 550, 2008
    -33.859972, 151.211111, Sydney, 436, 2009
    41.383333, 2.183333, Barcelona, 914, 2010
    39.739167, -104.984722, Denver, 869, 2011
    52.95, -1.133333, Nottingham, 800, 2013
    45.52, -122.681944, Portland, 840, 2014
    37.5667,129.681944,Seoul,473,2015
    50.733992,7.099814,Bonn,700,2016
    42.3601, -71.0589, Boston, 800, 2017
    

    Download locations.csv.

  8. Below is a JUnit4 test case to confirm JavaCSV is available and can read our file. Create a directory src/test/java and in there create package org.geotools.tutorial.csv. Then add CSVTest.java to the package:

    /*
     *    GeoTools Sample code and Tutorials by Open Source Geospatial Foundation, and others
     *    https://docs.geotools.org
     *
     *    To the extent possible under law, the author(s) have dedicated all copyright
     *    and related and neighboring rights to this software to the public domain worldwide.
     *    This software is distributed without any warranty.
     *
     *    You should have received a copy of the CC0 Public Domain Dedication along with this
     *    software. If not, see <http://creativecommons.org/publicdomain/zero/1.0/>.
     */
    package org.geotools.tutorial.csv;
    
    import static org.junit.Assert.assertTrue;
    
    import com.csvreader.CsvReader;
    import java.io.File;
    import java.io.FileReader;
    import java.io.Serializable;
    import java.net.URL;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.HashSet;
    import java.util.List;
    import java.util.Map;
    import java.util.Set;
    import org.geotools.api.data.DataStore;
    import org.geotools.api.data.DataStoreFinder;
    import org.geotools.api.data.FeatureReader;
    import org.geotools.api.data.Query;
    import org.geotools.api.data.SimpleFeatureSource;
    import org.geotools.api.data.Transaction;
    import org.geotools.api.feature.Property;
    import org.geotools.api.feature.simple.SimpleFeature;
    import org.geotools.api.feature.simple.SimpleFeatureType;
    import org.geotools.api.feature.type.AttributeDescriptor;
    import org.geotools.api.feature.type.GeometryDescriptor;
    import org.geotools.api.filter.Filter;
    import org.geotools.api.filter.FilterFactory;
    import org.geotools.api.filter.identity.FeatureId;
    import org.geotools.data.DataUtilities;
    import org.geotools.data.simple.SimpleFeatureCollection;
    import org.geotools.data.simple.SimpleFeatureIterator;
    import org.geotools.factory.CommonFactoryFinder;
    import org.geotools.feature.DefaultFeatureCollection;
    import org.geotools.filter.text.cql2.CQL;
    import org.geotools.referencing.CRS;
    import org.junit.Test;
    import org.locationtech.jts.geom.Geometry;
    
    public class CSVTest {
    
        @Test
        public void test() throws Exception {
            List<String> cities = new ArrayList<>();
            URL url = CSVTest.class.getResource("locations.csv");
            File file = new File(url.toURI());
            try (FileReader reader = new FileReader(file)) {
                CsvReader locations = new CsvReader(reader);
                locations.readHeaders();
                while (locations.readRecord()) {
                    cities.add(locations.get("CITY"));
                }
            }
            assertTrue(cities.contains("Victoria"));
        }
    
    }