Class GeoParquetDialect
- Object
-
- SQLDialect
-
- BasicSQLDialect
-
- DuckDBDialect
-
- GeoParquetDialect
-
public class GeoParquetDialect extends DuckDBDialect
SQL Dialect for GeoParquet format.This dialect extends the base DuckDB dialect with GeoParquet-specific functionality:
- Parsing and utilizing GeoParquet metadata from the "geo" field
- Setting up appropriate SQL views for GeoParquet files
- Optimizing spatial operations and bounds computations
- Handling both local and remote (HTTP, S3) GeoParquet data access
The dialect extracts and uses the GeoParquet specification metadata to provide improved performance for operations like bounds computation and feature access. It supports both standard GeoParquet format (1.1.0) and development versions (1.2.0-dev).
The dialect uses several performance optimizations:
- Extracting bounds from GeoParquet metadata rather than computing them
- Creating SQL views for consistent access to partitioned datasets
- Using DuckDB's spatial functions for efficient querying
- Maintaining a cache of metadata to avoid repeated parsing
It works in conjunction with
GeoParquetViewManager
to handle Hive-partitioned datasets, exposing each partition as a separate feature type.
-
-
Field Summary
-
Fields inherited from class SQLDialect
BASE_DBMS_CAPABILITIES, dataStore, forceLongitudeFirst, UNWRAPPER_NOT_FOUND, uwMap
-
-
Constructor Summary
Constructors Constructor Description GeoParquetDialect(JDBCDataStore dataStore)
Creates a new GeoParquetDialect.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description CoordinateReferenceSystem
createCRS(int srid, Connection cx)
Override to use theGeoParquetMetadata
provided axis order on a per-FeatureType basis.FilterToSQL
createFilterToSQL()
Creates a specialized filter-to-SQL converter for GeoParquet.void
encodeGeometryColumn(GeometryDescriptor gatt, String prefix, int srid, Hints hints, StringBuffer sql)
Encodes a geometry column for a SQL query with awareness of geometry types.void
ensureViewExists(String viewName)
Ensures that a database view exists for the specified feature type.SimpleFeatureType
fixGeometryTypes(SimpleFeatureType schema)
Creates a new feature type with more specific geometry types based on GeoParquet metadata, with results cached for performance.List<String>
getDatabaseInitSql()
Provides SQL statements to initialize the DuckDB database for GeoParquet access.Integer
getGeometrySRID(String schemaName, String tableName, String columnName, Connection cx)
Gets the SRID (Spatial Reference ID) for a geometry column.GeoparquetDatasetMetadata
getGeoparquetMetadata(String typeName)
Gets the GeoParquet metadata for a feature type.List<ReferencedEnvelope>
getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx)
Returns optimized bounds for a feature type by using GeoParquet metadata.PrimaryKeyFinder
getPrimaryKeyFinder()
Provides a PrimaryKeyFinder that identifies the 'id' column as the primary key.List<String>
getTypeNames()
Returns a list of all available feature type names.void
initialize(GeoParquetConfig config)
Registers SQL views for GeoParquet data partitions.GeoparquetDatasetMetadata
loadGeoparquetMetadata(String viewName, Connection cx)
Loads GeoParquet metadata for a specific view.-
Methods inherited from class DuckDBDialect
addSupportedHints, applyLimitOffset, decodeGeometryEnvelope, decodeGeometryValue, encodeColumnName, encodeGeometryColumnGeneralized, encodeGeometryColumnInternal, encodeGeometryColumnSimplified, encodeGeometryEnvelope, encodeGeometryValue, encodePostColumnCreateTable, encodePrimaryKey, escapeName, getMapping, getNameEscape, includeTable, isConcreteGeometry, isLimitOffsetSupported, optimizedBounds, parseWKB, parseWKB, registerClassToSqlMappings, registerSqlTypeToClassMappings, setScreenMapEnabled, setSimplifyEnabled
-
Methods inherited from class BasicSQLDialect
encodeValue, onDelete, onInsert, onSelect, onUpdate
-
Methods inherited from class SQLDialect
applyHintsOnVirtualTables, canGroupOnGeometry, canSimplifyPoints, convertValue, createIndex, decodeGeometryValue, dropIndex, encodeColumnAlias, encodeColumnName, encodeColumnType, encodeCreateTable, encodeJoin, encodeNextSequenceValue, encodePostCreateTable, encodePostSelect, encodeSchemaName, encodeTableAlias, encodeTableName, getAggregateConverter, getDefaultVarcharSize, getDesiredTablesType, getGeometryDimension, getGeometryTypeName, getIndexes, getLastAutoGeneratedValue, getLastAutoGeneratedValue, getMapping, getNextAutoGeneratedValue, getNextSequenceValue, getPkColumnValue, getPrimaryKey, getRestrictions, getResultTypes, getSequenceForColumn, getSQLType, handleSelectHints, handleUserDefinedType, initializeConnection, isAggregatedSortSupported, isArray, isAutoCommitQuery, isGroupBySupported, lookupGeneratedValuesPostInsert, ne, postCreateAttribute, postCreateFeatureType, postCreateTable, postDropTable, preDropTable, registerAggregateFunctions, registerSqlTypeNameToClassMappings, registerSqlTypeToSqlTypeNameOverrides, splitFilter, supportsSchemaForIndex, unwrapConnection
-
-
-
-
Constructor Detail
-
GeoParquetDialect
public GeoParquetDialect(JDBCDataStore dataStore)
Creates a new GeoParquetDialect.- Parameters:
dataStore
- The JDBC datastore this dialect will work with
-
-
Method Detail
-
ensureViewExists
public void ensureViewExists(String viewName) throws IOException
Ensures that a database view exists for the specified feature type.This method is called before any operations that require access to a feature type's schema or data, implementing the lazy initialization pattern. If the view already exists, this method has no effect.
- Parameters:
viewName
- The name of the view/feature type to ensure exists- Throws:
IOException
- If there is an error creating the view
-
getTypeNames
public List<String> getTypeNames() throws IOException
Returns a list of all available feature type names.This method queries the view manager to get the names of all registered views, which correspond to available feature types in the GeoParquet dataset.
- Returns:
- A list of feature type names
- Throws:
IOException
- If there is an error retrieving the names
-
createFilterToSQL
public FilterToSQL createFilterToSQL()
Creates a specialized filter-to-SQL converter for GeoParquet.- Overrides:
createFilterToSQL
in classDuckDBDialect
- Returns:
- A new GeoParquetFilterToSQL instance
-
getDatabaseInitSql
public List<String> getDatabaseInitSql()
Provides SQL statements to initialize the DuckDB database for GeoParquet access.This installs and loads required DuckDB extensions:
- httpfs - For HTTP/S3 access to remote GeoParquet files
- parquet - For reading Parquet file format
- Overrides:
getDatabaseInitSql
in classDuckDBDialect
- Returns:
- List of SQL statements to initialize the database
-
initialize
public void initialize(GeoParquetConfig config) throws IOException
Registers SQL views for GeoParquet data partitions.This method is called by
GeoParquetDataStoreFactory#setupDataStore(JDBCDataStore, Map)
to initialize the dialect with the provided configuration. It:- Clears any cached metadata
- Initializes the view manager with the new configuration
- Parameters:
config
- The GeoParquet configuration- Throws:
IOException
- If there's an error registering the views
-
getGeoparquetMetadata
public GeoparquetDatasetMetadata getGeoparquetMetadata(String typeName) throws IOException
Gets the GeoParquet metadata for a feature type.This is a convenience method that creates a connection and delegates to
getGeoparquetMetadata(String, Connection)
if the metadata fortypeName
is not already cached.- Parameters:
typeName
- The feature type to get metadata for- Returns:
- The GeoParquet metadata for the feature type
- Throws:
IOException
- If there is an error retrieving the metadata
-
loadGeoparquetMetadata
public GeoparquetDatasetMetadata loadGeoparquetMetadata(String viewName, Connection cx)
Loads GeoParquet metadata for a specific view.This method:
- Retrieves the URI for the view
- Queries the Parquet key-value metadata to extract the 'geo' field
- Parses the metadata for each file in the dataset
- Parameters:
viewName
- The name of the view to load metadata forcx
- Database connection to use for querying- Returns:
- The combined dataset metadata
-
getPrimaryKeyFinder
public PrimaryKeyFinder getPrimaryKeyFinder()
Provides a PrimaryKeyFinder that identifies the 'id' column as the primary key.This is a helper for
GeoParquetDataStoreFactory
to establish the feature ID column in GeoParquet datasets. It always identifies the 'id' column as a String primary key, which is the standard convention for GeoParquet files.- Returns:
- A PrimaryKeyFinder for GeoParquet datasets
-
getOptimizedBounds
public List<ReferencedEnvelope> getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx) throws SQLException, IOException
Returns optimized bounds for a feature type by using GeoParquet metadata.This method uses a multi-stage approach to efficiently determine dataset bounds:
- First tries to extract bounds from the GeoParquet 'geo' metadata field
- If 'geo' metadata is not available, checks for a 'bbox' column and uses aggregate functions on its components (common in datasets like OvertureMaps)
- Finally falls back to the generic DuckDB bounds computation using spatial functions
Each method is progressively more computationally expensive, so we try them in order of efficiency.
- Overrides:
getOptimizedBounds
in classDuckDBDialect
- Parameters:
schema
- The database schema (unused in GeoParquet)featureType
- The feature type to get bounds forcx
- Database connection to use for querying- Returns:
- A list containing a single ReferencedEnvelope representing the dataset bounds
- Throws:
SQLException
- If there's an error executing SQLIOException
- If there's an error accessing the data
-
getGeometrySRID
public Integer getGeometrySRID(String schemaName, String tableName, String columnName, Connection cx)
Gets the SRID (Spatial Reference ID) for a geometry column.This method attempts to extract the SRID from the GeoParquet metadata's CRS information:
- First tries to get the CRS from the GeoParquet metadata for the specified column
- If available, extracts the SRID from the CRS definition using the PROJJSON representation
- Falls back to trying the primary geometry column if the specific column CRS is not found
- Falls back to EPSG:4326 (WGS84) if the CRS information is not available or doesn't contain SRID
The CRS information is extracted from the GeoParquet 'geo' metadata field, which follows the PROJJSON v0.7 schema as defined by the OGC GeoParquet specification. This includes proper handling of CRS identifiers with authority and code properties.
The implementation supports strongly-typed CRS objects, converting between the PROJJSON format used in GeoParquet files and GeoTools CoordinateReferenceSystem objects.
- Overrides:
getGeometrySRID
in classDuckDBDialect
- Parameters:
schemaName
- The database schema (unused in GeoParquet)tableName
- The table/view namecolumnName
- The geometry column namecx
- Database connection- Returns:
- The SRID of the geometry column (from metadata or 4326 as default)
-
createCRS
public CoordinateReferenceSystem createCRS(int srid, Connection cx) throws SQLException
Override to use theGeoParquetMetadata
provided axis order on a per-FeatureType basis.SQLDialect.createCRS(int, java.sql.Connection)
uses theSQLDialect.forceLongitudeFirst
flag as a constant.- Overrides:
createCRS
in classSQLDialect
- Throws:
SQLException
-
encodeGeometryColumn
public void encodeGeometryColumn(GeometryDescriptor gatt, String prefix, int srid, Hints hints, StringBuffer sql)
Encodes a geometry column for a SQL query with awareness of geometry types.This overridden method enhances the base DuckDB dialect implementation by checking if multi-geometry encoding should be enforced for the current feature type. It uses the CURRENT_TYPENAME thread-local variable to determine the appropriate behavior based on the GeoParquet metadata.
For example, if the geometry column is a MultiPolygon according to the GeoParquet metadata, this method will add ST_Multi() to the SQL encoding to ensure proper handling of collection geometries. This is crucial because the JDBCDataStore calls this method without providing full feature type context.
- Overrides:
encodeGeometryColumn
in classDuckDBDialect
- Parameters:
gatt
- The geometry descriptor to encodeprefix
- Column prefix to usesrid
- The spatial reference IDhints
- Rendering hints that may affect encodingsql
- The SQL buffer to append to
-
fixGeometryTypes
public SimpleFeatureType fixGeometryTypes(SimpleFeatureType schema) throws IOException
Creates a new feature type with more specific geometry types based on GeoParquet metadata, with results cached for performance.This method processes a feature type to enhance its geometry descriptors with more specific geometry types derived from the GeoParquet metadata. It:
- Ensures the database view exists for the feature type
- Delegates to the GeoParquetViewManager to check for a cached version of the enhanced schema
- If no cached version exists, creates a new schema with correct geometry type bindings
- Caches the result for future use
This is essential because DuckDB only reports a generic GEOMETRY type, while the GeoParquet metadata contains information about the actual geometry types (Point, LineString, etc.).
The caching mechanism improves performance by avoiding repeated metadata lookups and feature type construction while maintaining thread safety through the GeoParquetViewManager.
- Parameters:
schema
- The original feature type with generic geometry types- Returns:
- A new feature type with more specific geometry types, either freshly built or from cache
- Throws:
IOException
- If there is an error accessing the GeoParquet metadata
-
-