DuckDB

Cron Expressions For DuckDB

Introducing a DuckDB extension that interprets cron expressions to generate scheduled timestamps directly in SQL, built with the Rust croner crate.

Motivation

Cron jobs are a fundamental tool for scheduling tasks, often used for analyzing system behavior or verifying the completeness of data collections. For instance, a task might be scheduled to run daily at 10:30 AM, while another runs every Tuesday at 3 PM. These times might be in different time zones, such as UTC or New York time.

To efficiently compare these scheduled tasks against actual data, it’s beneficial to generate a list of expected timestamps directly within DuckDB. While DuckDB’s generate_series() function can create a series of timestamps, it lacks the expressiveness needed for complex recurring patterns. I want to fill this with a new DuckDB extension that interprets cron expressions, leveraging the Rust crate croner. Get the extension here:

https://github.com/rustyconover/duckdb-cron-extension

This extension introduces a table-returning function cron(), which calculates upcoming or past times that satisfy a given cron expression.

Examples

Basic Usage

The cron() function returns the next timestamp for a given cron expression:

SELECT * FROM cron('*/5 * * * *');

Future Timestamps

To retrieve all future occurrences up to a specific date:

SELECT * FROM cron('*/5 * * * *', start='2024-01-01', until='2024-01-02');

Past Timestamps

To retrieve occurrences within a past date range:

SELECT * FROM cron('*/5 * * * *', start='2023-01-01', until='2023-01-02');

Timezone Handling

Cron expressions can be evaluated in specific time zones:

SELECT * FROM cron('30 10 * * *', timezone='America/New_York');

Changing DuckDB’s timezone to match.

SET timezone='America/New_York';

Second Level Precision

SELECT * FROM cron('*/10 * * * * *');

More complicated cron expressions

Using cron expressions for specific days of the week:

SELECT * FROM cron('0 9 * * MON-FRI');

Function Documentation

cron(VARCHAR, start=TIMESTAMP, until=TIMESTAMP, timezone=VARCHAR)

Parameters:

cron_expression (VARCHAR) - A valid cron expression string

Optional Named Parameters:

start (TIMESTAMP) - The start of the time range to evaluate
until (TIMESTAMP) - The end of the time range to evaluate
timezone (VARCHAR) - The timezone to use for evaluation

Returning

A single column cron, which contains timestamps when the cron pattern is satisfied.

Installation

Clone the GitHub Repo.

https://github.com/rustyconover/duckdb-cron-extension

#DuckDB #Extensions #Cron #Scheduling