Cron Expressions For DuckDB
Introducing a DuckDB extension that interprets cron expressions to generate scheduled timestamps directly in SQL, built with the Rust croner crate.
Motivation
Cron jobs are a fundamental tool for scheduling tasks, often used for analyzing system behavior or verifying the completeness of data collections. For instance, a task might be scheduled to run daily at 10:30 AM, while another runs every Tuesday at 3 PM. These times might be in different time zones, such as UTC or New York time.
To efficiently compare these scheduled tasks against actual data, it’s beneficial to generate a list of expected timestamps directly within DuckDB. While DuckDB’s generate_series() function can create a series of timestamps, it lacks the expressiveness needed for complex recurring patterns. I want to fill this with a new DuckDB extension that interprets cron expressions, leveraging the Rust crate croner. Get the extension here:
https://github.com/rustyconover/duckdb-cron-extension
This extension introduces a table-returning function cron(), which calculates upcoming or past times that satisfy a given cron expression.
Examples
Basic Usage
The cron() function returns the next timestamp for a given cron expression:
SELECT * FROM cron('*/5 * * * *');
Future Timestamps
To retrieve all future occurrences up to a specific date:
SELECT * FROM cron('*/5 * * * *', start='2024-01-01', until='2024-01-02');
Past Timestamps
To retrieve occurrences within a past date range:
SELECT * FROM cron('*/5 * * * *', start='2023-01-01', until='2023-01-02');
Timezone Handling
Cron expressions can be evaluated in specific time zones:
SELECT * FROM cron('30 10 * * *', timezone='America/New_York');
Changing DuckDB’s timezone to match.
SET timezone='America/New_York';
Second Level Precision
SELECT * FROM cron('*/10 * * * * *');
More complicated cron expressions
Using cron expressions for specific days of the week:
SELECT * FROM cron('0 9 * * MON-FRI');
Function Documentation
cron(VARCHAR, start=TIMESTAMP, until=TIMESTAMP, timezone=VARCHAR)
Parameters:
cron_expression (VARCHAR) - A valid cron expression string
Optional Named Parameters:
start (TIMESTAMP) - The start of the time range to evaluate
until (TIMESTAMP) - The end of the time range to evaluate
timezone (VARCHAR) - The timezone to use for evaluation
Returning
A single column cron, which contains timestamps when the cron pattern is satisfied.
Installation
Clone the GitHub Repo.