Chunking¶
Usage¶
The following example will chunk the document into 500 character chunks and print the chunks to stdout. We will also add (-a
option) the offset position of each chunk within the document as metadata start_index.
Alternatively you can chunk from an input jsonl file:
jsonl format
It is expected that the jsonl file contains chroma_dp.EmbeddableTextResource objects (one per line).
Help
Run cdp chunk --help for more information.