Fund: [BUG] I can't open chroma db

Bug Description
After running the evaluation, I attempted to retrieve the stored results from the vector database using Python, but no data was loaded.

To Reproduce

evaluate
export OPENAI_API_KEY=".."
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
trial(1) : autorag evaluate --config config.yaml --qa_data_path drl_qa.parquet --corpus_data_path drl_corpus.parquet --project_dir ./project
trial(2) : autorag evaluate --config config.yaml --qa_data_path drl_qa.parquet --corpus_data_path drl_corpus.parquet --project_dir ./project --skip_validation true

PydanticDeprecatedSince20: The `__fields__` attribute is deprecated, use `model_fields` 
instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration 
Guide at https://errors.pydantic.dev/2.9/migration/
  warnings.warn(
Generating embeddings:   0%|                                      | 0/7 [00:00<?, ?it/s]
[02/15/25 20:31:10] INFO     [_client.py:1038] >> HTTP Request: POST     _client.py:1038
                             https://api.openai.com/v1/embeddings                       
                             "HTTP/1.1 200 OK"                                          
Generating embeddings: 100%|##############################| 7/7 [00:01<00:00,  5.34it/s]
Generating embeddings: 100%|##############################| 7/7 [00:01<00:00,  5.33it/s]

Generating embeddings:   0%|                                      | 0/7 [00:00<?, ?it/s]
[02/15/25 20:31:11] INFO     [_client.py:1038] >> HTTP Request: POST     _client.py:1038
                             https://api.openai.com/v1/embeddings                       
                             "HTTP/1.1 200 OK"                                          
Generating embeddings: 100%|##############################| 7/7 [00:00<00:00, 14.41it/s]
Generating embeddings: 100%|##############################| 7/7 [00:00<00:00, 14.39it/s]

[02/15/25 20:31:13] INFO     [evaluator.py:218] >> Evaluation complete. evaluator.py:218
Ingesting VectorDB... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:01
Evaluating...         ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6 0:01:14

bring the db data with python
there is no dataFull Error log

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma(persist_directory="./autorag/project/resources/chroma",
                     embedding_function=OpenAIEmbeddings(), 
                     )

print(vectorstore.get())
docs = vectorstore.similarity_search("question....")
print(len(docs))

if docs:
    print(docs[0].page_content)
else:
    print("검색된 문서가 없습니다.")

{'ids': [], 'embeddings': None, 'documents': [], 'uris': None, 'data': None, 'metadatas': [], 'included': [<IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}
0
검색된 문서가 없습니다.

Code that bug is happened
config.yaml

node_lines:
- node_line_name: retrieve_node_line
  nodes:
    - node_type: retrieval
      strategy:
        batch_size: 1
        metrics: [ retrieval_f1, retrieval_recall, retrieval_precision,
                   retrieval_ndcg, retrieval_map, retrieval_mrr ]
        speed_threshold: 10
      top_k: 5
      modules:
        - module_type: bm25
          bm25_tokenizer: [ porter_stemmer, space, gpt2 ]
        - module_type: vectordb
          vectordb: default
        - module_type: hybrid_rrf
          weight_range: (4,80)
        - module_type: hybrid_cc
          normalize_method: [ mm, tmm, z, dbsf ]
          weight_range: (0.0, 1.0)
          test_weight_size: 101
    - node_type: passage_augmenter
      strategy:
        metrics: [ retrieval_f1, retrieval_recall, retrieval_precision ]
        speed_threshold: 5
      top_k: 5
      embedding_model: openai
      modules:
        - module_type: pass_passage_augmenter
        - module_type: prev_next_augmenter
          mode: next
    - node_type: passage_reranker
      strategy:
        metrics: [ retrieval_f1, retrieval_recall, retrieval_precision ]
        speed_threshold: 10
      top_k: 3
      modules:
        - module_type: pass_reranker
        - module_type: upr
        - module_type: rankgpt
        - module_type: sentence_transformer_reranker
        - module_type: flag_embedding_reranker
        - module_type: openvino_reranker
        - module_type: flashrank_reranker

    - node_type: passage_filter
      strategy:
        metrics: [ retrieval_f1, retrieval_recall, retrieval_precision ]
        speed_threshold: 5
      modules:
        - module_type: pass_passage_filter
        - module_type: similarity_threshold_cutoff
          threshold: 0.85
        - module_type: similarity_percentile_cutoff
          percentile: 0.6
        - module_type: threshold_cutoff
          threshold: 0.85
        - module_type: percentile_cutoff
          percentile: 0.6

- node_line_name: post_retrieve_node_line  # Arbitrary node line name
  nodes:
    - node_type: prompt_maker
      strategy:
        metrics:
          - metric_name: bleu
          - metric_name: meteor
          - metric_name: rouge
          - metric_name: sem_score
            embedding_model: openai
        speed_threshold: 10
        generator_modules:
          - module_type: llama_index_llm
            llm: openai
            model: [gpt-4o-mini]
      modules:
        - module_type: fstring
          prompt:
            - "Answer to given questions with the following passage: {retrieved_contents} \n\n Question: {query} \n\n Answer:"
            - "There is a passages related to user question. Please response carefully to the following question. \n\n Passage: {retrieved_contents} \n\n Question: {query} \n\n Answer the question. Think step by step." # Zero-shot CoT prompt
            - "{retrieved_contents} \n\n Read the passage carefully, and answer this question. \n\n Question: {query} \n\n Answer the question. Be concise." # concise prompt
        - module_type: long_context_reorder
          prompt:
            - "Answer to given questions with the following passage: {retrieved_contents} \n\n Question: {query} \n\n Answer:"
            - "There is a passages related to user question. Please response carefully to the following question. \n\n Passage: {retrieved_contents} \n\n Question: {query} \n\n Answer the question. Think step by step." # Zero-shot CoT prompt
            - "{retrieved_contents} \n\n Read the passage carefully, and answer this question. \n\n Question: {query} \n\n Answer the question. Be concise." # concise prompt
    - node_type: generator
      strategy:
        metrics:
          - metric_name: rouge
          - embedding_model: openai
            metric_name: sem_score
          - metric_name: bert_score
        speed_threshold: 10
      modules:
        - module_type: llama_index_llm
          llm: [openai]
          model: [gpt-4o-mini]
          temperature: [0.5, 1.0]

quantization_config:
  bits: 4
  group_size: 128
  dataset: "c4"
  model_seqlen: 2048
  desc_act: False
  device: "cpu"

model_load:
  low_cpu_mem_usage: True
  torch_dtype: "auto"
  trust_remote_code: True

Desktop (please complete the following information):

OS: [ubuntu 22.04]
Python version [3.10.12]

Additional context
my project tree
📦project
┣ 📂0
┃ ┣ 📂post_retrieve_node_line
┃ ┃ ┣ 📂generator
┃ ┃ ┃ ┣ 📜0.parquet
┃ ┃ ┃ ┣ 📜1.parquet
┃ ┃ ┃ ┣ 📜best_0.parquet
┃ ┃ ┃ ┗ 📜summary.csv
┃ ┃ ┣ 📂prompt_maker
┃ ┃ ┃ ┣ 📜0.parquet
┃ ┃ ┃ ┣ 📜1.parquet
┃ ┃ ┃ ┣ 📜2.parquet
┃ ┃ ┃ ┣ 📜3.parquet
┃ ┃ ┃ ┣ 📜4.parquet
┃ ┃ ┃ ┣ 📜5.parquet
┃ ┃ ┃ ┣ 📜best_0.parquet
┃ ┃ ┃ ┗ 📜summary.csv
┃ ┃ ┗ 📜summary.csv
┃ ┣ 📂retrieve_node_line
┃ ┃ ┣ 📂passage_augmenter
┃ ┃ ┃ ┣ 📜0.parquet
┃ ┃ ┃ ┣ 📜1.parquet
┃ ┃ ┃ ┣ 📜best_0.parquet
┃ ┃ ┃ ┗ 📜summary.csv
┃ ┃ ┣ 📂passage_filter
┃ ┃ ┃ ┣ 📜0.parquet
┃ ┃ ┃ ┣ 📜1.parquet
┃ ┃ ┃ ┣ 📜2.parquet
┃ ┃ ┃ ┣ 📜3.parquet
┃ ┃ ┃ ┣ 📜4.parquet
┃ ┃ ┃ ┣ 📜best_3.parquet
┃ ┃ ┃ ┗ 📜summary.csv
┃ ┃ ┣ 📂passage_reranker
┃ ┃ ┃ ┣ 📜0.parquet
┃ ┃ ┃ ┣ 📜1.parquet
┃ ┃ ┃ ┣ 📜2.parquet
┃ ┃ ┃ ┣ 📜3.parquet
┃ ┃ ┃ ┣ 📜4.parquet
┃ ┃ ┃ ┣ 📜5.parquet
┃ ┃ ┃ ┣ 📜6.parquet
┃ ┃ ┃ ┣ 📜best_0.parquet
┃ ┃ ┃ ┗ 📜summary.csv
┃ ┃ ┣ 📂retrieval
┃ ┃ ┃ ┣ 📜0.parquet
┃ ┃ ┃ ┣ 📜1.parquet
┃ ┃ ┃ ┣ 📜2.parquet
┃ ┃ ┃ ┣ 📜3.parquet
┃ ┃ ┃ ┣ 📜4.parquet
┃ ┃ ┃ ┣ 📜5.parquet
┃ ┃ ┃ ┣ 📜6.parquet
┃ ┃ ┃ ┣ 📜7.parquet
┃ ┃ ┃ ┣ 📜8.parquet
┃ ┃ ┃ ┣ 📜best_5.parquet
┃ ┃ ┃ ┗ 📜summary.csv
┃ ┃ ┗ 📜summary.csv
┃ ┣ 📜config.yaml
┃ ┗ 📜summary.csv
┣ 📂data
┃ ┣ 📜corpus.parquet
┃ ┗ 📜qa.parquet
┣ 📂resources
┃ ┣ 📂chroma
┃ ┃ ┣ 📂0ec9cd05-0d96-4fc7-9a7a-a1abea6c5ce8
┃ ┃ ┣ 📂50f4b08f-74bf-4fdc-9097-edbe9649cda4
┃ ┃ ┃ ┣ 📜data_level0.bin
┃ ┃ ┃ ┣ 📜header.bin
┃ ┃ ┃ ┣ 📜length.bin
┃ ┃ ┃ ┗ 📜link_lists.bin
┃ ┃ ┗ 📜chroma.sqlite3
┃ ┣ 📜bm25_gpt2.pkl
┃ ┣ 📜bm25_porter_stemmer.pkl
┃ ┣ 📜bm25_space.pkl
┃ ┗ 📜vectordb.yaml
┗ 📜trial.json
I don't know it is because of evaluate error, or config.yaml error, or python code error. can you find the cause of error?
cf. I made dataset myself (by hand), is it possible to occur error?

thank you for reading

Markr.AI/AutoRAG

[BUG] I can't open chroma db

How does funding with Polar work?

Backer

Contributor

Maintainer

Markr.AI/AutoRAG

[BUG] I can't open chroma db

How does funding with Polar work?

Backer

Why does "Fund on completion" require GitHub login?

When is the invoice due for "Fund on completion"?

What happens if the issue is never completed?

Do I get any extra benefits by funding?

Do I get progress updates?

Contributor

Do I get a reward?

Is rewards guaranteed?

Maintainer

How can I get funding like this for my open source initiatives?