ID Generation¶

CDP provides a number of ID generation strategies that can be used to generate or regenerate IDs for a given dataset.

The following strategies are available:

Usage¶

Help

Run cdp id --help for more information.

This strategy generates a unique ID based on UUIDv4.

cat sample-data/metadata/metadata.jsonl | head -1 | cdp id --uuid | jq '.id'

Returns:

"5bf1d91b-817c-47a2-a6cb-94894c1b42c3"

This strategy generates a unique ID based on ULID.

cat sample-data/metadata/metadata.jsonl | head -1 | cdp id --ulid | jq '.id'

Returns:

"01HMR7V5MMVD3Q1PA5PPSF0FA1"

This strategy generates unique IDs based on the document hash (SHA256).

cat sample-data/metadata/metadata.jsonl | head -1 | cdp id --doc-hash | jq '.id'

Returns:

"3143643f8520f32f7b04fff2cd524acbe32ef989b2bd6cc89d687743a909bfa6"

This strategy generates unique IDs based on a random hash (SHA256).

cat sample-data/metadata/metadata.jsonl | head -1 | cdp id --random-hash | jq '.id'

Returns:

"f4d7bc16f4ddbd080b08c4836efa93ed51e85ea1289df6c5851e261893e6ad52"

Generates ID based on provided Jinja2 expression. The following variables are available for use in the expression:

cat sample-data/metadata/metadata.jsonl | head -1 | cdp id --expr '{{ulid()}}-{{metadata.title}}' | jq '.id'

Returns:

"01HMR8308TMG80XYV9CHNAF0A3-Animalia (book)"