orichain.knowledge_base

class orichain.knowledge_base.KnowledgeBase(vector_db_type: str | None, **kwds: Any)[source]

Bases: object

Synchronous interface for interacting with vector databases.

This class provides a unified API to communicate with supported vector databases. Currently, Pinecone and ChromaDB are supported.

default_knowledge_base = 'pinecone'
__init__(vector_db_type: str | None, **kwds: Any) None[source]

Initializes the knowledge base.

Args:
  • vector_db_type (str, optional): Type of knowledge base. Default: pinecone

Authentication parameters by provider:

Pinecone:
  • api_key (str): Pinecone API key

  • index_name (str): Pinecone index name

  • namespace (str): Pinecone namespace

ChromaDB:
  • collection_name (str): ChromaDB collection name

  • path (str, optional): Path to the ChromaDB database Default: /home/ubuntu/projects/chromadb

Raises:
  • ValueError: If the knowledge base type is not supported

  • KeyError: If the required params is not found

Warns:
  • UserWarning: If the knowledge base type is not defined Default: pinecone

__call__(num_of_chunks: int, user_message_vector: List[float | int] | None = None, **kwds: Any) Dict[source]

Retrieves the chunks from the knowledge base

Args:
  • user_message_vector (Optional[List[Union[int, float]]]): Embedding of the text. Defaults to None.

  • num_of_chunks (int): Number of chunks to retrieve

Retrieval Arguments by VectorDB:

Pinecone:
  • vector (List[float]): The query vector. This should be the same length as the dimension of the index being queried. Each query() request can contain only one of the parameters id or vector.. [optional]

  • id (str): The unique ID of the vector to be used as a query vector. Each query() request can contain only one of the parameters vector or id.. [optional]

  • top_k (int): The number of results to return for each query. Must be an integer greater than 1.

  • namespace (str): The namespace to fetch vectors from. If not specified, the default namespace is used. [optional]

  • filter (Dict[str, Union[str, float, int, bool, List, dict]]): The filter to apply. You can use vector metadata to limit your search. See https://www.pinecone.io/docs/metadata-filtering/ [optional]

  • include_values (bool): Indicates whether vector values are included in the response. If omitted the server will use the default value of False [optional]

  • include_metadata (bool): Indicates whether metadata is included in the response as well as the ids. If omitted the server will use the default value of False [optional]

  • sparse_vector: (Union[SparseValues, Dict[str, Union[List[float], List[int]]]]): sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form: {'indices': List[int], 'values': List[float]}, where the lists each have the same length.

ChromaDB:
  • collection_name (str, optional): The name of the collection to get documents from. Defaults to the collection_name set during class instantiation.

  • where (Dict, optional): A Where type dict used to filter results by. E.g. {$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.

  • where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g. {"$contains" : "hello"}. Default: None.

  • include (List, optional): A list of what to include in the results. Can contain "embeddings", "metadatas", "documents", "distances". Ids are always included. Defaults to ["metadatas", "documents", "distances"]. Default: ["metadatas", "documents"]

Returns:

Dict: Result of retrieving the chunks

Raises:
  • ValueError: If user_message_vector is needed except for pinecone but if ids are also not provided for pinecone this error will be raised

  • KeyError: If required namespace is not found for pinecone

fetch(ids: List[str], **kwds: Any) Dict[source]

Fetches the chunks based on the ids from the knowledge base

Args:
  • ids (List[str]): List of ids to fetch

Retrieval Arguments by VectorDB:

Pinecone:
  • namespace (str, optional): The namespace to fetch vectors from. If not specified, the default namespace is used.

ChromaDB:
  • collection_name (str, optional): The name of the collection to fetch documents from. Defaults to the collection_name set during class instantiation.

  • limit (int, optional): The number of documents to return. Default: None.

  • offset (int, optional): The offset to start returning results from. Useful for paging results with limit. Default: None.

  • where (Dict, optional): A Where type dict used to filter results by. E.g. {$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.

  • where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g. {"$contains" : "hello"}. Default: None.

  • include (List, optional): A list of what to include in the results. Can contain "embeddings", "metadatas", "documents", "distances". Ids are always included. Defaults to ["metadatas", "documents", "distances"]. Default: ["metadatas", "documents"]

Returns:

Dict: Result of fetching the chunks

class orichain.knowledge_base.AsyncKnowledgeBase(vector_db_type: str | None, **kwds: Any)[source]

Bases: object

Asynchronous interface for interacting with vector databases.

This class provides a unified API to communicate with supported vector databases. Currently, Pinecone and ChromaDB are supported.

default_knowledge_base = 'pinecone'
__init__(vector_db_type: str | None, **kwds: Any) None[source]

Initializes the knowledge base.

Args:
  • vector_db_type (str, optional): Type of knowledge base. Default: pinecone

Authentication parameters by provider:

Pinecone:
  • api_key (str): Pinecone API key

  • index_name (str): Pinecone index name

  • namespace (str): Pinecone namespace

ChromaDB:
  • collection_name (str): ChromaDB collection name

  • path (str, optional): Path to the ChromaDB database Default: /home/ubuntu/projects/chromadb

Raises:
  • ValueError: If the knowledge base type is not supported

  • KeyError: If the required params is not found

Warns:
  • UserWarning: If the knowledge base type is not defined Default: pinecone

async __call__(num_of_chunks: int, user_message_vector: List[float | int] | None = None, **kwds: Any) Dict[source]

Retrieves the chunks from the knowledge base

Args:
  • num_of_chunks (int): Number of chunks to retrieve

  • user_message_vector (Optional[List[Union[int, float]]]): Embedding of text. Defaults to None.

Retrieval Arguments by VectorDB:

Pinecone:
  • vector (List[float]): The query vector. This should be the same length as the dimension of the index being queried. Each query() request can contain only one of the parameters id or vector.. [optional]

  • id (str): The unique ID of the vector to be used as a query vector. Each query() request can contain only one of the parameters vector or id.. [optional]

  • top_k (int): The number of results to return for each query. Must be an integer greater than 1.

  • namespace (str): The namespace to fetch vectors from. If not specified, the default namespace is used. [optional]

  • filter (Dict[str, Union[str, float, int, bool, List, dict]]): The filter to apply. You can use vector metadata to limit your search. See https://www.pinecone.io/docs/metadata-filtering/ [optional]

  • include_values (bool): Indicates whether vector values are included in the response. If omitted the server will use the default value of False [optional]

  • include_metadata (bool): Indicates whether metadata is included in the response as well as the ids. If omitted the server will use the default value of False [optional]

  • sparse_vector: (Union[SparseValues, Dict[str, Union[List[float], List[int]]]]): sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form: {'indices': List[int], 'values': List[float]}, where the lists each have the same length.

ChromaDB:
  • collection_name (str, optional): The name of the collection to get documents from. Defaults to the collection_name set during class instantiation.

  • where (Dict, optional): A Where type dict used to filter results by. E.g. {$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.

  • where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g. {"$contains" : "hello"}. Default: None.

  • include (List, optional): A list of what to include in the results. Can contain "embeddings", "metadatas", "documents", "distances". Ids are always included. Defaults to ["metadatas", "documents", "distances"]. Default: ["metadatas", "documents"]

Returns:

Dict: Result of retrieving the chunks

Raises:
  • ValueError: If user_message_vector is needed except for pinecone but if ids are also not provided for pinecone this error will be raised

  • KeyError: If required namespace is not found for pinecone

async fetch(ids: List[str], **kwds: Any) Dict[source]

Fetches the chunks based on the ids from the knowledge base

Args:
  • ids (List[str]): List of ids to fetch

Retrieval Arguments by VectorDB:

Pinecone:
  • namespace (str, optional): The namespace to fetch vectors from. If not specified, the default namespace is used.

ChromaDB:
  • collection_name (str, optional): The name of the collection to fetch documents from. Defaults to the collection_name set during class instantiation.

  • limit (int, optional): The number of documents to return. Default: None.

  • offset (int, optional): The offset to start returning results from. Useful for paging results with limit. Default: None.

  • where (Dict, optional): A Where type dict used to filter results by. E.g. {$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.

  • where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g. {"$contains" : "hello"}. Default: None.

  • include (List, optional): A list of what to include in the results. Can contain "embeddings", "metadatas", "documents", "distances". Ids are always included. Defaults to ["metadatas", "documents", "distances"]. Default: ["metadatas", "documents"]

Returns:

Dict: Result of fetching the chunks