A domain-agnostic R6 class that provides an interface to a folder-based data storage system using DuckDB for in-memory SQL operations and parquet files for efficient on-disk persistence. This class can be inherited by domain-specific database classes.
Public fields
connectionDBI connection object to an in-memory DuckDB database
pathCharacter string path to the data folder
writeoptsWrite options for DuckDB parquet output
read_onlyIf true, prevents writes that are not parallel-safe
Methods
Method new()
Initialize a new parquet_db object
Usage
parquet_db$new(path, read_only = FALSE, extensions = character(0))Method row_count()
Get row count for a table (without applying id_run subsetting); returns 0 if table does not exist
Method column_max()
Get maximum for a column in a table (without applying id_run subsetting); returns 0 if table does not exist
Method fetch()
Fetch data from a table
Arguments
table_nameCharacter string. Name of the table to query.
colsSQL column selection string (e.g., "col1, col2" or "*")
whereCharacter string. Optional WHERE clause for the SQL query.
limitInteger. Optional limit on number of rows to return.
map_colsVector of columns to be converted from key/value structs to R lists
Method delete_from()
Delete rows from a table
Method commit()
Commit data using overwrite, append, or upsert modes. Handles partitioning,
key identity columns, and list-to-MAP conversion. These four
special column types may be passed as attributes to the x argument. If the
table has previously been written to, these settings are recovered from the
parquet metadata.
Usage
parquet_db$commit(x, table_name, method = c("overwrite", "append", "upsert"))Arguments
xIf data.table, the data to commit. If character, treated as an in-DuckDB-memory table or view name.
table_nameTarget table name to commit to.
methodCharacter, one of "overwrite", "append", "upsert" (upsert being an update for existing rows, and insert for new rows; this necessitates loading the full data into memory to know what to update. This may be expensive.