Chroma：操作指南_多租户

atodo2025/8/8大约 3 分钟

在 ChromaDB 中，tenant（租户）和 database（数据库）是企业版（ChromaDB Enterprise）中用于实现多租户隔离的核心概念，通过逻辑隔离确保不同租户的数据和资源互不影响。

一、Chroma多租户

多租户核心概念

租户（Tenant）：Chroma 将租户定义为数据库的逻辑分组单元，用于模拟不同组织或用户的数据隔离。每个租户可包含多个数据库（Database），实现资源与数据的逻辑隔离。
数据库（Database）：隶属于租户的逻辑存储单元（如 research_db、production_db），同一租户下可创建多个数据库。
数据组织关系（ 租户 (Tenant) → 数据库 (Database) → 集合 (Collection) → 数据 (Documents/Embeddings)）：租户是顶层逻辑单元，向下管理多个数据库，每个数据库包含多个集合（Collection），集合中存储具体文档和向量。注：社区版需手动模拟此结构，企业版原生支持。

多租户核心特性

数据隔离性
不同租户的数据物理隔离存储（独立SQLite文件或存储路径），确保数据安全。
资源分配
支持为租户分配独立计算资源（如嵌入模型实例），避免资源竞争。
元数据扩展
租户可附加自定义元数据（如组织名称、访问权限），增强管理灵活性。

二、多租户操作

通过客户端初始化时，指定租户标识符，实现多租户隔离：

# 创建租户专属客户端1
tenant1_client = chromadb.PersistentClient(
    path="/chroma/tenant1_data",
    tenant_id="org1"  # 租户唯一标识
)

# 创建租户专属客户端2
tenant2_client = chromadb.PersistentClient(
    path="/chroma/tenant2_data",
    tenant_id="org2"  # 不同租户
)

在租户客户端下创建独立数据库：

# 租户1的数据库操作
db1 = tenant1_client.create_database(name="marketing_db")
# 租户1的数据库1的集合操作
collection1 = db1.create_collection(name="customer_data")

远程连接时指定租户和数据库：

# 企业版客户端连接
remote_client = HttpClient(
	host="localhost",
    port=8000,
    tenant="acme_inc",      # 租户名称
    database="product_db"   # 数据库名称
)

# 在指定租户的数据库下创建集合
collection = remote_client.create_collection("docs")

通过Chroma API监控租户资源使用：

tenant_usage = client.monitor_tenant(tenant_id="org_abc")
# 打印监控信息
print(tenant_usage.collection_count, tenant_usage.embedding_volume)

三、操作场景示例

场景：多企业数据隔离

# 企业A租户操作
client_companyA = chromadb.HttpClient(
    host="chroma-server",
    port=8000,
    tenant_id="company_A"
)
companyA_db = client_companyA.create_database("sales_db")
companyA_db.add(documents=[...], ids=[...])

# 企业B租户操作（完全隔离）
client_companyB = chromadb.HttpClient(
    host="chroma-server",
    port=8000,
    tenant_id="company_B"
)
companyB_db = client_companyB.get_collection("research_data")
results = companyB_db.query(query_texts=["AI trends"])

四、社区版模拟多租户方案

社区版需通过 元数据过滤（Metadata Filtering） 或 客户端封装 实现类似效果：

方案1：元数据注入租户标识

collection.add(
    documents=["doc1", "doc2"],
    metadatas=[
        {"tenant_id": "acme_inc", "db_name": "product_db"}, 
        {"tenant_id": "acme_inc", "db_name": "product_db"}
    ],
    ids=["id1", "id2"]
)

# 查询时强制过滤租户
results = collection.query(
    query_texts=["product info"],
    where={"tenant_id": "acme_inc", "db_name": "product_db"}
)

方案2：动态客户端封装

class TenantClient:
    def __init__(self, path, tenant, database):
        self.client = chromadb.PersistentClient(path)
        self.collection = self.client.get_or_create_collection(
            name=f"{tenant}_{database}_docs"  # 集合名包含租户+数据库标识
        )

五、多租户设计最佳实践

需求	实现案
强隔离	企业版直接使用 `tenant` 和 `database` 参数
社区版数据隔离	元数据注入 + 查询过滤（`where={"tenant_id": "xxx"}`）
资源配额控制	需结合外部系统（如 Kubernetes 资源限制）
跨租户数据迁移	导出元数据过滤后的数据，导入到目标租户集合

注意事项

企业版依赖：
原生多租户仅限 ChromaDB Enterprise，需部署企版服务端。
社区版性能：
元数据过滤方案可能影响查询效率，需确保 tenant_id 等字段建立索引。
租户管理接口：
企业版提供 API 管理租户/数据库生命周期（创建、删除、列表），社区版需自行维护元数据表。

参考资料

致谢

Chroma数据库的中文介绍文档