GeoParquetDialect

public class GeoParquetDialect extends DuckDBDialect

SQL Dialect for GeoParquet format.

This dialect extends the base DuckDB dialect with GeoParquet-specific functionality:

Parsing and utilizing GeoParquet metadata from the "geo" field
Setting up appropriate SQL views for GeoParquet files
Optimizing spatial operations and bounds computations
Handling both local and remote (HTTP, S3) GeoParquet data access

The dialect extracts and uses the GeoParquet specification metadata to provide improved performance for operations like bounds computation and feature access. It supports both standard GeoParquet format (1.1.0) and development versions (1.2.0-dev).

The dialect uses several performance optimizations:

Extracting bounds from GeoParquet metadata rather than computing them
Creating SQL views for consistent access to partitioned datasets
Using DuckDB's spatial functions for efficient querying
Maintaining a cache of metadata to avoid repeated parsing

It works in conjunction with GeoParquetViewManager to handle Hive-partitioned datasets, exposing each partition as a separate feature type.

Field Summary

Fields inherited from class SQLDialect
BASE_DBMS_CAPABILITIES, dataStore, forceLongitudeFirst, UNWRAPPER_NOT_FOUND, uwMap
Constructor Summary

Constructors

Constructor

Description

GeoParquetDialect(JDBCDataStore dataStore)

Creates a new GeoParquetDialect.
Method Summary

Modifier and Type

Method

Description

CoordinateReferenceSystem

createCRS(int srid, Connection cx)

Override to use the GeoParquetMetadata provided axis order on a per-FeatureType basis.

FilterToSQL

createFilterToSQL()

Creates a specialized filter-to-SQL converter for GeoParquet.

void

encodeGeometryColumn(GeometryDescriptor gatt, String prefix, int srid, Hints hints, StringBuffer sql)

Encodes a geometry column for a SQL query with awareness of geometry types.

void

ensureViewExists(String viewName)

Ensures that a database view exists for the specified feature type.

SimpleFeatureType

fixGeometryTypes(SimpleFeatureType schema)

Creates a new feature type with more specific geometry types based on GeoParquet metadata, with results cached for performance.

List<String>

getDatabaseInitSql()

Provides SQL statements to initialize the DuckDB database for GeoParquet access.

Integer

getGeometrySRID(String schemaName, String tableName, String columnName, Connection cx)

Gets the SRID (Spatial Reference ID) for a geometry column.

GeoparquetDatasetMetadata

getGeoparquetMetadata(String typeName)

Gets the GeoParquet metadata for a feature type.

List<ReferencedEnvelope>

getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx)

Returns optimized bounds for a feature type by using GeoParquet metadata.

PrimaryKeyFinder

getPrimaryKeyFinder()

Provides a PrimaryKeyFinder that identifies the 'id' column as the primary key.

List<String>

getTypeNames()

Returns a list of all available feature type names.

void

initialize(GeoParquetConfig config)

Registers SQL views for GeoParquet data partitions.

GeoparquetDatasetMetadata

loadGeoparquetMetadata(String viewName, Connection cx)

Loads GeoParquet metadata for a specific view.

Methods inherited from class DuckDBDialect
addSupportedHints, applyLimitOffset, decodeGeometryEnvelope, decodeGeometryValue, encodeColumnName, encodeGeometryColumnGeneralized, encodeGeometryColumnInternal, encodeGeometryColumnSimplified, encodeGeometryEnvelope, encodeGeometryValue, encodePostColumnCreateTable, encodePrimaryKey, escapeName, getMapping, getNameEscape, includeTable, isConcreteGeometry, isLimitOffsetSupported, optimizedBounds, parseWKB, parseWKB, registerClassToSqlMappings, registerSqlTypeToClassMappings, setScreenMapEnabled, setSimplifyEnabled

Methods inherited from class BasicSQLDialect
encodeValue, onDelete, onInsert, onSelect, onUpdate

Methods inherited from class SQLDialect
applyHintsOnVirtualTables, canGroupOnGeometry, canSimplifyPoints, convertValue, createIndex, decodeGeometryValue, dropIndex, encodeColumnAlias, encodeColumnName, encodeColumnType, encodeCreateTable, encodeJoin, encodeNextSequenceValue, encodePostCreateTable, encodePostSelect, encodeSchemaName, encodeTableAlias, encodeTableName, getAggregateConverter, getDefaultVarcharSize, getDesiredTablesType, getGeometryDimension, getGeometryTypeName, getIndexes, getLastAutoGeneratedValue, getLastAutoGeneratedValue, getMapping, getNextAutoGeneratedValue, getNextSequenceValue, getPkColumnValue, getPrimaryKey, getRestrictions, getResultTypes, getSequenceForColumn, getSQLType, handleSelectHints, handleUserDefinedType, initializeConnection, isAggregatedSortSupported, isArray, isAutoCommitQuery, isGroupBySupported, lookupGeneratedValuesPostInsert, ne, postCreateAttribute, postCreateFeatureType, postCreateTable, postDropTable, preDropTable, registerAggregateFunctions, registerSqlTypeNameToClassMappings, registerSqlTypeToSqlTypeNameOverrides, splitFilter, supportsSchemaForIndex, unwrapConnection

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- GeoParquetDialect
  
  public GeoParquetDialect(JDBCDataStore dataStore)
  
  Creates a new GeoParquetDialect.
  
  Parameters:
  
  dataStore - The JDBC datastore this dialect will work with
Method Details
- ensureViewExists
  
  public void ensureViewExists(String viewName) throws IOException
  
  Ensures that a database view exists for the specified feature type.
  This method is called before any operations that require access to a feature type's schema or data, implementing the lazy initialization pattern. If the view already exists, this method has no effect.
  
  Parameters:
  
  viewName - The name of the view/feature type to ensure exists
  
  Throws:
  
  IOException - If there is an error creating the view
- getTypeNames
  
  public List<String> getTypeNames() throws IOException
  
  Returns a list of all available feature type names.
  This method queries the view manager to get the names of all registered views, which correspond to available feature types in the GeoParquet dataset.
  
  Returns:
  
  A list of feature type names
  
  Throws:
  
  IOException - If there is an error retrieving the names
- createFilterToSQL
  
  public FilterToSQL createFilterToSQL()
  
  Creates a specialized filter-to-SQL converter for GeoParquet.
  
  Overrides:
  
  createFilterToSQL in class DuckDBDialect
  
  Returns:
  
  A new GeoParquetFilterToSQL instance
- getDatabaseInitSql
  
  public List<String> getDatabaseInitSql()
  Provides SQL statements to initialize the DuckDB database for GeoParquet access.
  This installs and loads required DuckDB extensions:
  
  httpfs - For HTTP/S3 access to remote GeoParquet files
  parquet - For reading Parquet file format
  Overrides:
  
  getDatabaseInitSql in class DuckDBDialect
  
  Returns:
  
  List of SQL statements to initialize the database
- initialize
  
  public void initialize(GeoParquetConfig config) throws IOException
  Registers SQL views for GeoParquet data partitions.
  This method is called by GeoParquetDataStoreFactory#setupDataStore(JDBCDataStore, Map) to initialize the dialect with the provided configuration. It:
  
  Clears any cached metadata
  Initializes the view manager with the new configuration
  Parameters:
  
  config - The GeoParquet configuration
  
  Throws:
  
  IOException - If there's an error registering the views
- getGeoparquetMetadata
  
  public GeoparquetDatasetMetadata getGeoparquetMetadata(String typeName) throws IOException
  
  Gets the GeoParquet metadata for a feature type.
  This is a convenience method that creates a connection and delegates to getGeoparquetMetadata(String, Connection) if the metadata for typeName is not already cached.
  
  Parameters:
  
  typeName - The feature type to get metadata for
  
  Returns:
  
  The GeoParquet metadata for the feature type
  
  Throws:
  
  IOException - If there is an error retrieving the metadata
- loadGeoparquetMetadata
  
  public GeoparquetDatasetMetadata loadGeoparquetMetadata(String viewName, Connection cx)
  Loads GeoParquet metadata for a specific view.
  This method:
  
  Retrieves the URI for the view
  Queries the Parquet key-value metadata to extract the 'geo' field
  Parses the metadata for each file in the dataset
  Parameters:
  
  viewName - The name of the view to load metadata for
  
  cx - Database connection to use for querying
  
  Returns:
  
  The combined dataset metadata
- getPrimaryKeyFinder
  
  public PrimaryKeyFinder getPrimaryKeyFinder()
  
  Provides a PrimaryKeyFinder that identifies the 'id' column as the primary key.
  This is a helper for GeoParquetDataStoreFactory to establish the feature ID column in GeoParquet datasets. It always identifies the 'id' column as a String primary key, which is the standard convention for GeoParquet files.
  
  Returns:
  
  A PrimaryKeyFinder for GeoParquet datasets
- getOptimizedBounds
  
  public List<ReferencedEnvelope> getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx) throws SQLException, IOException
  Returns optimized bounds for a feature type by using GeoParquet metadata.
  This method uses a multi-stage approach to efficiently determine dataset bounds:
  
  First tries to extract bounds from the GeoParquet 'geo' metadata field
  If 'geo' metadata is not available, checks for a 'bbox' column and uses aggregate functions on its components (common in datasets like OvertureMaps)
  Finally falls back to the generic DuckDB bounds computation using spatial functions
  
  Each method is progressively more computationally expensive, so we try them in order of efficiency.
  Overrides:
  
  getOptimizedBounds in class DuckDBDialect
  
  Parameters:
  
  schema - The database schema (unused in GeoParquet)
  
  featureType - The feature type to get bounds for
  
  cx - Database connection to use for querying
  
  Returns:
  
  A list containing a single ReferencedEnvelope representing the dataset bounds
  
  Throws:
  
  SQLException - If there's an error executing SQL
  
  IOException - If there's an error accessing the data
- getGeometrySRID
  
  public Integer getGeometrySRID(String schemaName, String tableName, String columnName, Connection cx)
  Gets the SRID (Spatial Reference ID) for a geometry column.
  This method attempts to extract the SRID from the GeoParquet metadata's CRS information:
  
  First tries to get the CRS from the GeoParquet metadata for the specified column
  If available, extracts the SRID from the CRS definition using the PROJJSON representation
  Falls back to trying the primary geometry column if the specific column CRS is not found
  Falls back to EPSG:4326 (WGS84) if the CRS information is not available or doesn't contain SRID
  
  The CRS information is extracted from the GeoParquet 'geo' metadata field, which follows the PROJJSON v0.7 schema as defined by the OGC GeoParquet specification. This includes proper handling of CRS identifiers with authority and code properties.
  The implementation supports strongly-typed CRS objects, converting between the PROJJSON format used in GeoParquet files and GeoTools CoordinateReferenceSystem objects.
  Overrides:
  
  getGeometrySRID in class DuckDBDialect
  
  Parameters:
  
  schemaName - The database schema (unused in GeoParquet)
  
  tableName - The table/view name
  
  columnName - The geometry column name
  
  cx - Database connection
  
  Returns:
  
  The SRID of the geometry column (from metadata or 4326 as default)
- createCRS
  
  public CoordinateReferenceSystem createCRS(int srid, Connection cx) throws SQLException
  
  Override to use the GeoParquetMetadata provided axis order on a per-FeatureType basis. SQLDialect.createCRS(int, java.sql.Connection) uses the SQLDialect.forceLongitudeFirst flag as a constant.
  
  Overrides:
  
  createCRS in class SQLDialect
  
  Throws:
  
  SQLException
- encodeGeometryColumn
  
  public void encodeGeometryColumn(GeometryDescriptor gatt, String prefix, int srid, Hints hints, StringBuffer sql)
  
  Encodes a geometry column for a SQL query with awareness of geometry types.
  This overridden method enhances the base DuckDB dialect implementation by checking if multi-geometry encoding should be enforced for the current feature type. It uses the CURRENT_TYPENAME thread-local variable to determine the appropriate behavior based on the GeoParquet metadata.
  For example, if the geometry column is a MultiPolygon according to the GeoParquet metadata, this method will add ST_Multi() to the SQL encoding to ensure proper handling of collection geometries. This is crucial because the JDBCDataStore calls this method without providing full feature type context.
  
  Overrides:
  
  encodeGeometryColumn in class DuckDBDialect
  
  Parameters:
  
  gatt - The geometry descriptor to encode
  
  prefix - Column prefix to use
  
  srid - The spatial reference ID
  
  hints - Rendering hints that may affect encoding
  
  sql - The SQL buffer to append to
- fixGeometryTypes
  
  public SimpleFeatureType fixGeometryTypes(SimpleFeatureType schema) throws IOException
  Creates a new feature type with more specific geometry types based on GeoParquet metadata, with results cached for performance.
  This method processes a feature type to enhance its geometry descriptors with more specific geometry types derived from the GeoParquet metadata. It:
  
  Ensures the database view exists for the feature type
  Delegates to the GeoParquetViewManager to check for a cached version of the enhanced schema
  If no cached version exists, creates a new schema with correct geometry type bindings
  Caches the result for future use
  
  This is essential because DuckDB only reports a generic GEOMETRY type, while the GeoParquet metadata contains information about the actual geometry types (Point, LineString, etc.).
  The caching mechanism improves performance by avoiding repeated metadata lookups and feature type construction while maintaining thread safety through the GeoParquetViewManager.
  Parameters:
  
  schema - The original feature type with generic geometry types
  
  Returns:
  
  A new feature type with more specific geometry types, either freshly built or from cache
  
  Throws:
  
  IOException - If there is an error accessing the GeoParquet metadata

Class GeoParquetDialect

Field Summary

Fields inherited from class SQLDialect

Constructor Summary

Method Summary

Methods inherited from class DuckDBDialect

Methods inherited from class BasicSQLDialect

Methods inherited from class SQLDialect

Methods inherited from class Object

Constructor Details

GeoParquetDialect

Method Details

ensureViewExists

getTypeNames

createFilterToSQL

getDatabaseInitSql

initialize

getGeoparquetMetadata

loadGeoparquetMetadata

getPrimaryKeyFinder

getOptimizedBounds

getGeometrySRID

createCRS

encodeGeometryColumn

fixGeometryTypes