ID Generation¶
CDP provides a number of ID generation strategies that can be used to generate or regenerate IDs for a given dataset.
The following strategies are available:
- UUID - IDS are generated using the UUID
- ULID - IDS are generated using the ULID
- Document Hash - IDS are generated using the document hash (SHA256)
- Random Hash - IDS are generated using a random hash (SHA256)
- Expression - IDS are generated using a Jinja2 expression
Usage¶
Help
Run cdp id --help
for more information.
UUID¶
This strategy generates a unique ID based on UUIDv4.
Returns:
ULID¶
This strategy generates a unique ID based on ULID.
Returns:
Document Hash¶
This strategy generates unique IDs based on the document hash (SHA256).
Returns:
Random Hash¶
This strategy generates unique IDs based on a random hash (SHA256).
Returns:
Expression¶
Generates ID based on provided Jinja2 expression. The following variables are available for use in the expression:
metadata
- the metadata for the documenttext_chunks
- the text chunks for the documentid
- existing ID for the documentembedding
- the embedding for the documentuuid
- function that generates a UUID (example usage{{uuid()}}
)ulid
- function that generates a ULID (example usage{{ulid()}}
)
cat sample-data/metadata/metadata.jsonl | head -1 | cdp id --expr '{{ulid()}}-{{metadata.title}}' | jq '.id'
Returns: