Chunking¶
Usage¶
The following example will chunk the document into 500 character chunks and print the chunks to stdout. We will also add (-a
option) the offset position of each chunk within the document as metadata start_index
.
Alternatively you can chunk from an input jsonl
file:
jsonl format
It is expected that the jsonl
file contains chroma_dp.EmbeddableTextResource
objects (one per line).
Help
Run cdp chunk --help
for more information.