Skip to content

CLI Naming Conventions

Date: 2024-01-20 Tags: cli, naming, conventions Authors: @tazarov

Context and Problem Statement

We want to be mindful of the naming conventions we use for the CLI. We want to be consistent and clear about the commands we use and the order of the arguments.

Considered Options

  • Extensive use of subcommands
  • Ecosystem centric naming

Decision Outcome

Chosen option: "Ecosystem centric naming", because it will make the CLI more aligned with the ecosystem thus reducing the cognitive load on users.

The use of import and export for Chroma

This is a toolchain in the chroma ecosystem and we want to be very explicit about the use of import and export. They must convey the core idea behind CDP, getting data in and out of Chroma.

The use of embed

A core concept around vector databases is embedding. We want to make the embedding a first class citizen of CDP this is why we choose to have embed as a top level command.

The use of chunk

Another core concept around vector databases and embedding models is context window. Each embedding model has a specific context window and sometimes the embedding models, make a trade-off to truncate data for better usability rather than warning or preventing the user from using the model with an oversized context window. To that end chunking (of mostly text data) has become a core concept in the ecosystem. We want to convey that by elevating the chunk command to a top level command.

The use of ds-get and ds-put

Working with datasets is another core concept in the econsystem. While we currently only plan to support HuggingFace datasets, we want to also put the datasets front and center in CDP CLI. This is why we choose to have ds-get and ds-put as top level commands.

Pros and Cons of the Options

Extensive use of subcommands

  • Good, because it allows for a more natural grouping of commands
  • Good, because it will solve a top-level command sprawl
  • Bad, because it will put a burden on the user to remember and navigate the subcommands

Ecosystem centric naming

  • Good, because it will make the CLI more consistent with the ecosystem
  • Good, because it reduces the cognitive load on users by aligning with expectations set by the ecosystem
  • Bad, because it will increase the number of top-level commands