orichain.knowledge_base¶
- class orichain.knowledge_base.KnowledgeBase(vector_db_type: str | None, **kwds: Any)[source]¶
Bases:
objectSynchronous interface for interacting with vector databases.
This class provides a unified API to communicate with supported vector databases. Currently, Pinecone and ChromaDB are supported.
- default_knowledge_base = 'pinecone'¶
- __init__(vector_db_type: str | None, **kwds: Any) None[source]¶
Initializes the knowledge base.
- Args:
vector_db_type (str, optional): Type of knowledge base. Default: pinecone
Authentication parameters by provider:
- Pinecone:
api_key (str): Pinecone API key
index_name (str): Pinecone index name
namespace (str): Pinecone namespace
- ChromaDB:
collection_name (str): ChromaDB collection name
path (str, optional): Path to the ChromaDB database Default: /home/ubuntu/projects/chromadb
- Raises:
ValueError: If the knowledge base type is not supported
KeyError: If the required params is not found
- Warns:
UserWarning: If the knowledge base type is not defined Default: pinecone
- __call__(num_of_chunks: int, user_message_vector: List[float | int] | None = None, **kwds: Any) Dict[source]¶
Retrieves the chunks from the knowledge base
- Args:
user_message_vector (Optional[List[Union[int, float]]]): Embedding of the text. Defaults to None.
num_of_chunks (int): Number of chunks to retrieve
Retrieval Arguments by VectorDB:
- Pinecone:
vector (List[float]): The query vector. This should be the same length as the dimension of the index being queried. Each query() request can contain only one of the parameters id or vector.. [optional]
id (str): The unique ID of the vector to be used as a query vector. Each query() request can contain only one of the parameters vector or id.. [optional]
top_k (int): The number of results to return for each query. Must be an integer greater than 1.
namespace (str): The namespace to fetch vectors from. If not specified, the default namespace is used. [optional]
filter (Dict[str, Union[str, float, int, bool, List, dict]]): The filter to apply. You can use vector metadata to limit your search. See https://www.pinecone.io/docs/metadata-filtering/ [optional]
include_values (bool): Indicates whether vector values are included in the response. If omitted the server will use the default value of False [optional]
include_metadata (bool): Indicates whether metadata is included in the response as well as the ids. If omitted the server will use the default value of False [optional]
sparse_vector: (Union[SparseValues, Dict[str, Union[List[float], List[int]]]]): sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form:
{'indices': List[int], 'values': List[float]}, where the lists each have the same length.
- ChromaDB:
collection_name (str, optional): The name of the collection to get documents from. Defaults to the collection_name set during class instantiation.
where (Dict, optional): A Where type dict used to filter results by. E.g.
{$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g.
{"$contains" : "hello"}. Default: None.include (List, optional): A list of what to include in the results. Can contain
"embeddings","metadatas","documents","distances". Ids are always included. Defaults to["metadatas", "documents", "distances"]. Default:["metadatas", "documents"]
- Returns:
Dict: Result of retrieving the chunks
- Raises:
ValueError: If user_message_vector is needed except for pinecone but if ids are also not provided for pinecone this error will be raised
KeyError: If required namespace is not found for pinecone
- fetch(ids: List[str], **kwds: Any) Dict[source]¶
Fetches the chunks based on the ids from the knowledge base
- Args:
ids (List[str]): List of ids to fetch
Retrieval Arguments by VectorDB:
- Pinecone:
namespace (str, optional): The namespace to fetch vectors from. If not specified, the default namespace is used.
- ChromaDB:
collection_name (str, optional): The name of the collection to fetch documents from. Defaults to the collection_name set during class instantiation.
limit (int, optional): The number of documents to return. Default: None.
offset (int, optional): The offset to start returning results from. Useful for paging results with limit. Default: None.
where (Dict, optional): A Where type dict used to filter results by. E.g.
{$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g.
{"$contains" : "hello"}. Default: None.include (List, optional): A list of what to include in the results. Can contain
"embeddings","metadatas","documents","distances". Ids are always included. Defaults to["metadatas", "documents", "distances"]. Default:["metadatas", "documents"]
- Returns:
Dict: Result of fetching the chunks
- class orichain.knowledge_base.AsyncKnowledgeBase(vector_db_type: str | None, **kwds: Any)[source]¶
Bases:
objectAsynchronous interface for interacting with vector databases.
This class provides a unified API to communicate with supported vector databases. Currently, Pinecone and ChromaDB are supported.
- default_knowledge_base = 'pinecone'¶
- __init__(vector_db_type: str | None, **kwds: Any) None[source]¶
Initializes the knowledge base.
- Args:
vector_db_type (str, optional): Type of knowledge base. Default: pinecone
Authentication parameters by provider:
- Pinecone:
api_key (str): Pinecone API key
index_name (str): Pinecone index name
namespace (str): Pinecone namespace
- ChromaDB:
collection_name (str): ChromaDB collection name
path (str, optional): Path to the ChromaDB database Default: /home/ubuntu/projects/chromadb
- Raises:
ValueError: If the knowledge base type is not supported
KeyError: If the required params is not found
- Warns:
UserWarning: If the knowledge base type is not defined Default: pinecone
- async __call__(num_of_chunks: int, user_message_vector: List[float | int] | None = None, **kwds: Any) Dict[source]¶
Retrieves the chunks from the knowledge base
- Args:
num_of_chunks (int): Number of chunks to retrieve
user_message_vector (Optional[List[Union[int, float]]]): Embedding of text. Defaults to None.
Retrieval Arguments by VectorDB:
- Pinecone:
vector (List[float]): The query vector. This should be the same length as the dimension of the index being queried. Each query() request can contain only one of the parameters id or vector.. [optional]
id (str): The unique ID of the vector to be used as a query vector. Each query() request can contain only one of the parameters vector or id.. [optional]
top_k (int): The number of results to return for each query. Must be an integer greater than 1.
namespace (str): The namespace to fetch vectors from. If not specified, the default namespace is used. [optional]
filter (Dict[str, Union[str, float, int, bool, List, dict]]): The filter to apply. You can use vector metadata to limit your search. See https://www.pinecone.io/docs/metadata-filtering/ [optional]
include_values (bool): Indicates whether vector values are included in the response. If omitted the server will use the default value of False [optional]
include_metadata (bool): Indicates whether metadata is included in the response as well as the ids. If omitted the server will use the default value of False [optional]
sparse_vector: (Union[SparseValues, Dict[str, Union[List[float], List[int]]]]): sparse values of the query vector. Expected to be either a SparseValues object or a dict of the form:
{'indices': List[int], 'values': List[float]}, where the lists each have the same length.
- ChromaDB:
collection_name (str, optional): The name of the collection to get documents from. Defaults to the collection_name set during class instantiation.
where (Dict, optional): A Where type dict used to filter results by. E.g.
{$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g.
{"$contains" : "hello"}. Default: None.include (List, optional): A list of what to include in the results. Can contain
"embeddings","metadatas","documents","distances". Ids are always included. Defaults to["metadatas", "documents", "distances"]. Default:["metadatas", "documents"]
- Returns:
Dict: Result of retrieving the chunks
- Raises:
ValueError: If user_message_vector is needed except for pinecone but if ids are also not provided for pinecone this error will be raised
KeyError: If required namespace is not found for pinecone
- async fetch(ids: List[str], **kwds: Any) Dict[source]¶
Fetches the chunks based on the ids from the knowledge base
- Args:
ids (List[str]): List of ids to fetch
Retrieval Arguments by VectorDB:
- Pinecone:
namespace (str, optional): The namespace to fetch vectors from. If not specified, the default namespace is used.
- ChromaDB:
collection_name (str, optional): The name of the collection to fetch documents from. Defaults to the collection_name set during class instantiation.
limit (int, optional): The number of documents to return. Default: None.
offset (int, optional): The offset to start returning results from. Useful for paging results with limit. Default: None.
where (Dict, optional): A Where type dict used to filter results by. E.g.
{$and: [{"color" : "red"}, {"price": 4.20}]}. Default: None.where_document (Dict, optional): A WhereDocument type dict used to filter by the documents. E.g.
{"$contains" : "hello"}. Default: None.include (List, optional): A list of what to include in the results. Can contain
"embeddings","metadatas","documents","distances". Ids are always included. Defaults to["metadatas", "documents", "distances"]. Default:["metadatas", "documents"]
- Returns:
Dict: Result of fetching the chunks