Class GeoParquetDialect
- Object
- 
- SQLDialect
- 
- BasicSQLDialect
- 
- DuckDBDialect
- 
- GeoParquetDialect
 
 
 
 
- 
 public class GeoParquetDialect extends DuckDBDialect SQL Dialect for GeoParquet format.This dialect extends the base DuckDB dialect with GeoParquet-specific functionality: - Parsing and utilizing GeoParquet metadata from the "geo" field
- Setting up appropriate SQL views for GeoParquet files
- Optimizing spatial operations and bounds computations
- Handling both local and remote (HTTP, S3) GeoParquet data access
 The dialect extracts and uses the GeoParquet specification metadata to provide improved performance for operations like bounds computation and feature access. It supports both standard GeoParquet format (1.1.0) and development versions (1.2.0-dev). The dialect uses several performance optimizations: - Extracting bounds from GeoParquet metadata rather than computing them
- Creating SQL views for consistent access to partitioned datasets
- Using DuckDB's spatial functions for efficient querying
- Maintaining a cache of metadata to avoid repeated parsing
 It works in conjunction with GeoParquetViewManagerto handle Hive-partitioned datasets, exposing each partition as a separate feature type.
- 
- 
Field Summary- 
Fields inherited from class SQLDialectBASE_DBMS_CAPABILITIES, dataStore, forceLongitudeFirst, UNWRAPPER_NOT_FOUND, uwMap
 
- 
 - 
Constructor SummaryConstructors Constructor Description GeoParquetDialect(JDBCDataStore dataStore)Creates a new GeoParquetDialect.
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description CoordinateReferenceSystemcreateCRS(int srid, Connection cx)Override to use theGeoParquetMetadataprovided axis order on a per-FeatureType basis.FilterToSQLcreateFilterToSQL()Creates a specialized filter-to-SQL converter for GeoParquet.voidencodeGeometryColumn(GeometryDescriptor gatt, String prefix, int srid, Hints hints, StringBuffer sql)Encodes a geometry column for a SQL query with awareness of geometry types.voidensureViewExists(String viewName)Ensures that a database view exists for the specified feature type.SimpleFeatureTypefixGeometryTypes(SimpleFeatureType schema)Creates a new feature type with more specific geometry types based on GeoParquet metadata, with results cached for performance.List<String>getDatabaseInitSql()Provides SQL statements to initialize the DuckDB database for GeoParquet access.IntegergetGeometrySRID(String schemaName, String tableName, String columnName, Connection cx)Gets the SRID (Spatial Reference ID) for a geometry column.GeoparquetDatasetMetadatagetGeoparquetMetadata(String typeName)Gets the GeoParquet metadata for a feature type.List<ReferencedEnvelope>getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx)Returns optimized bounds for a feature type by using GeoParquet metadata.PrimaryKeyFindergetPrimaryKeyFinder()Provides a PrimaryKeyFinder that identifies the 'id' column as the primary key.List<String>getTypeNames()Returns a list of all available feature type names.voidinitialize(GeoParquetConfig config)Registers SQL views for GeoParquet data partitions.GeoparquetDatasetMetadataloadGeoparquetMetadata(String viewName, Connection cx)Loads GeoParquet metadata for a specific view.- 
Methods inherited from class DuckDBDialectaddSupportedHints, applyLimitOffset, decodeGeometryEnvelope, decodeGeometryValue, encodeColumnName, encodeGeometryColumnGeneralized, encodeGeometryColumnInternal, encodeGeometryColumnSimplified, encodeGeometryEnvelope, encodeGeometryValue, encodePostColumnCreateTable, encodePrimaryKey, escapeName, getMapping, getNameEscape, includeTable, isConcreteGeometry, isLimitOffsetSupported, optimizedBounds, parseWKB, parseWKB, registerClassToSqlMappings, registerSqlTypeToClassMappings, setScreenMapEnabled, setSimplifyEnabled
 - 
Methods inherited from class BasicSQLDialectencodeValue, onDelete, onInsert, onSelect, onUpdate
 - 
Methods inherited from class SQLDialectapplyHintsOnVirtualTables, canGroupOnGeometry, canSimplifyPoints, convertValue, createIndex, decodeGeometryValue, dropIndex, encodeColumnAlias, encodeColumnName, encodeColumnType, encodeCreateTable, encodeJoin, encodeNextSequenceValue, encodePostCreateTable, encodePostSelect, encodeSchemaName, encodeTableAlias, encodeTableName, getAggregateConverter, getDefaultVarcharSize, getDesiredTablesType, getGeometryDimension, getGeometryTypeName, getIndexes, getLastAutoGeneratedValue, getLastAutoGeneratedValue, getMapping, getNextAutoGeneratedValue, getNextSequenceValue, getPkColumnValue, getPrimaryKey, getRestrictions, getResultTypes, getSequenceForColumn, getSQLType, handleSelectHints, handleUserDefinedType, initializeConnection, isAggregatedSortSupported, isArray, isAutoCommitQuery, isGroupBySupported, lookupGeneratedValuesPostInsert, ne, postCreateAttribute, postCreateFeatureType, postCreateTable, postDropTable, preDropTable, registerAggregateFunctions, registerSqlTypeNameToClassMappings, registerSqlTypeToSqlTypeNameOverrides, splitFilter, supportsSchemaForIndex, unwrapConnection
 
- 
 
- 
- 
- 
Constructor Detail- 
GeoParquetDialectpublic GeoParquetDialect(JDBCDataStore dataStore) Creates a new GeoParquetDialect.- Parameters:
- dataStore- The JDBC datastore this dialect will work with
 
 
- 
 - 
Method Detail- 
ensureViewExistspublic void ensureViewExists(String viewName) throws IOException Ensures that a database view exists for the specified feature type.This method is called before any operations that require access to a feature type's schema or data, implementing the lazy initialization pattern. If the view already exists, this method has no effect. - Parameters:
- viewName- The name of the view/feature type to ensure exists
- Throws:
- IOException- If there is an error creating the view
 
 - 
getTypeNamespublic List<String> getTypeNames() throws IOException Returns a list of all available feature type names.This method queries the view manager to get the names of all registered views, which correspond to available feature types in the GeoParquet dataset. - Returns:
- A list of feature type names
- Throws:
- IOException- If there is an error retrieving the names
 
 - 
createFilterToSQLpublic FilterToSQL createFilterToSQL() Creates a specialized filter-to-SQL converter for GeoParquet.- Overrides:
- createFilterToSQLin class- DuckDBDialect
- Returns:
- A new GeoParquetFilterToSQL instance
 
 - 
getDatabaseInitSqlpublic List<String> getDatabaseInitSql() Provides SQL statements to initialize the DuckDB database for GeoParquet access.This installs and loads required DuckDB extensions: - httpfs - For HTTP/S3 access to remote GeoParquet files
- parquet - For reading Parquet file format
 - Overrides:
- getDatabaseInitSqlin class- DuckDBDialect
- Returns:
- List of SQL statements to initialize the database
 
 - 
initializepublic void initialize(GeoParquetConfig config) throws IOExceptionRegisters SQL views for GeoParquet data partitions.This method is called by GeoParquetDataStoreFactory#setupDataStore(JDBCDataStore, Map)to initialize the dialect with the provided configuration. It:- Clears any cached metadata
- Initializes the view manager with the new configuration
 - Parameters:
- config- The GeoParquet configuration
- Throws:
- IOException- If there's an error registering the views
 
 - 
getGeoparquetMetadatapublic GeoparquetDatasetMetadata getGeoparquetMetadata(String typeName) throws IOException Gets the GeoParquet metadata for a feature type.This is a convenience method that creates a connection and delegates to getGeoparquetMetadata(String, Connection)if the metadata fortypeNameis not already cached.- Parameters:
- typeName- The feature type to get metadata for
- Returns:
- The GeoParquet metadata for the feature type
- Throws:
- IOException- If there is an error retrieving the metadata
 
 - 
loadGeoparquetMetadatapublic GeoparquetDatasetMetadata loadGeoparquetMetadata(String viewName, Connection cx) Loads GeoParquet metadata for a specific view.This method: - Retrieves the URI for the view
- Queries the Parquet key-value metadata to extract the 'geo' field
- Parses the metadata for each file in the dataset
 - Parameters:
- viewName- The name of the view to load metadata for
- cx- Database connection to use for querying
- Returns:
- The combined dataset metadata
 
 - 
getPrimaryKeyFinderpublic PrimaryKeyFinder getPrimaryKeyFinder() Provides a PrimaryKeyFinder that identifies the 'id' column as the primary key.This is a helper for GeoParquetDataStoreFactoryto establish the feature ID column in GeoParquet datasets. It always identifies the 'id' column as a String primary key, which is the standard convention for GeoParquet files.- Returns:
- A PrimaryKeyFinder for GeoParquet datasets
 
 - 
getOptimizedBoundspublic List<ReferencedEnvelope> getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx) throws SQLException, IOException Returns optimized bounds for a feature type by using GeoParquet metadata.This method uses a multi-stage approach to efficiently determine dataset bounds: - First tries to extract bounds from the GeoParquet 'geo' metadata field
- If 'geo' metadata is not available, checks for a 'bbox' column and uses aggregate functions on its components (common in datasets like OvertureMaps)
- Finally falls back to the generic DuckDB bounds computation using spatial functions
 Each method is progressively more computationally expensive, so we try them in order of efficiency. - Overrides:
- getOptimizedBoundsin class- DuckDBDialect
- Parameters:
- schema- The database schema (unused in GeoParquet)
- featureType- The feature type to get bounds for
- cx- Database connection to use for querying
- Returns:
- A list containing a single ReferencedEnvelope representing the dataset bounds
- Throws:
- SQLException- If there's an error executing SQL
- IOException- If there's an error accessing the data
 
 - 
getGeometrySRIDpublic Integer getGeometrySRID(String schemaName, String tableName, String columnName, Connection cx) Gets the SRID (Spatial Reference ID) for a geometry column.This method attempts to extract the SRID from the GeoParquet metadata's CRS information: - First tries to get the CRS from the GeoParquet metadata for the specified column
- If available, extracts the SRID from the CRS definition using the PROJJSON representation
- Falls back to trying the primary geometry column if the specific column CRS is not found
- Falls back to EPSG:4326 (WGS84) if the CRS information is not available or doesn't contain SRID
 The CRS information is extracted from the GeoParquet 'geo' metadata field, which follows the PROJJSON v0.7 schema as defined by the OGC GeoParquet specification. This includes proper handling of CRS identifiers with authority and code properties. The implementation supports strongly-typed CRS objects, converting between the PROJJSON format used in GeoParquet files and GeoTools CoordinateReferenceSystem objects. - Overrides:
- getGeometrySRIDin class- DuckDBDialect
- Parameters:
- schemaName- The database schema (unused in GeoParquet)
- tableName- The table/view name
- columnName- The geometry column name
- cx- Database connection
- Returns:
- The SRID of the geometry column (from metadata or 4326 as default)
 
 - 
createCRSpublic CoordinateReferenceSystem createCRS(int srid, Connection cx) throws SQLException Override to use theGeoParquetMetadataprovided axis order on a per-FeatureType basis.SQLDialect.createCRS(int, java.sql.Connection)uses theSQLDialect.forceLongitudeFirstflag as a constant.- Overrides:
- createCRSin class- SQLDialect
- Throws:
- SQLException
 
 - 
encodeGeometryColumnpublic void encodeGeometryColumn(GeometryDescriptor gatt, String prefix, int srid, Hints hints, StringBuffer sql) Encodes a geometry column for a SQL query with awareness of geometry types.This overridden method enhances the base DuckDB dialect implementation by checking if multi-geometry encoding should be enforced for the current feature type. It uses the CURRENT_TYPENAME thread-local variable to determine the appropriate behavior based on the GeoParquet metadata. For example, if the geometry column is a MultiPolygon according to the GeoParquet metadata, this method will add ST_Multi() to the SQL encoding to ensure proper handling of collection geometries. This is crucial because the JDBCDataStore calls this method without providing full feature type context. - Overrides:
- encodeGeometryColumnin class- DuckDBDialect
- Parameters:
- gatt- The geometry descriptor to encode
- prefix- Column prefix to use
- srid- The spatial reference ID
- hints- Rendering hints that may affect encoding
- sql- The SQL buffer to append to
 
 - 
fixGeometryTypespublic SimpleFeatureType fixGeometryTypes(SimpleFeatureType schema) throws IOException Creates a new feature type with more specific geometry types based on GeoParquet metadata, with results cached for performance.This method processes a feature type to enhance its geometry descriptors with more specific geometry types derived from the GeoParquet metadata. It: - Ensures the database view exists for the feature type
- Delegates to the GeoParquetViewManager to check for a cached version of the enhanced schema
- If no cached version exists, creates a new schema with correct geometry type bindings
- Caches the result for future use
 This is essential because DuckDB only reports a generic GEOMETRY type, while the GeoParquet metadata contains information about the actual geometry types (Point, LineString, etc.). The caching mechanism improves performance by avoiding repeated metadata lookups and feature type construction while maintaining thread safety through the GeoParquetViewManager. - Parameters:
- schema- The original feature type with generic geometry types
- Returns:
- A new feature type with more specific geometry types, either freshly built or from cache
- Throws:
- IOException- If there is an error accessing the GeoParquet metadata
 
 
- 
 
-