Bug Description
After running the evaluation, I attempted to retrieve the stored results from the vector database using Python, but no data was loaded.
To Reproduce
PydanticDeprecatedSince20: The `__fields__` attribute is deprecated, use `model_fields`
instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration
Guide at https://errors.pydantic.dev/2.9/migration/
warnings.warn(
Generating embeddings: 0%| | 0/7 [00:00<?, ?it/s]
[02/15/25 20:31:10] INFO [_client.py:1038] >> HTTP Request: POST _client.py:1038
https://api.openai.com/v1/embeddings
"HTTP/1.1 200 OK"
Generating embeddings: 100%|##############################| 7/7 [00:01<00:00, 5.34it/s]
Generating embeddings: 100%|##############################| 7/7 [00:01<00:00, 5.33it/s]
Generating embeddings: 0%| | 0/7 [00:00<?, ?it/s]
[02/15/25 20:31:11] INFO [_client.py:1038] >> HTTP Request: POST _client.py:1038
https://api.openai.com/v1/embeddings
"HTTP/1.1 200 OK"
Generating embeddings: 100%|##############################| 7/7 [00:00<00:00, 14.41it/s]
Generating embeddings: 100%|##############################| 7/7 [00:00<00:00, 14.39it/s]
[02/15/25 20:31:13] INFO [evaluator.py:218] >> Evaluation complete. evaluator.py:218
Ingesting VectorDB... ββββββββββββββββββββββββββββββββββββββββ 100% 1/1 0:00:01
Evaluating... ββββββββββββββββββββββββββββββββββββββββ 100% 6/6 0:01:14
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma(persist_directory="./autorag/project/resources/chroma",
embedding_function=OpenAIEmbeddings(),
)
print(vectorstore.get())
docs = vectorstore.similarity_search("question....")
print(len(docs))
if docs:
print(docs[0].page_content)
else:
print("κ²μλ λ¬Έμκ° μμ΅λλ€.")
{'ids': [], 'embeddings': None, 'documents': [], 'uris': None, 'data': None, 'metadatas': [], 'included': [<IncludeEnum.documents: 'documents'>, <IncludeEnum.metadatas: 'metadatas'>]}
0
κ²μλ λ¬Έμκ° μμ΅λλ€.
Code that bug is happened
config.yaml
node_lines:
- node_line_name: retrieve_node_line
nodes:
- node_type: retrieval
strategy:
batch_size: 1
metrics: [ retrieval_f1, retrieval_recall, retrieval_precision,
retrieval_ndcg, retrieval_map, retrieval_mrr ]
speed_threshold: 10
top_k: 5
modules:
- module_type: bm25
bm25_tokenizer: [ porter_stemmer, space, gpt2 ]
- module_type: vectordb
vectordb: default
- module_type: hybrid_rrf
weight_range: (4,80)
- module_type: hybrid_cc
normalize_method: [ mm, tmm, z, dbsf ]
weight_range: (0.0, 1.0)
test_weight_size: 101
- node_type: passage_augmenter
strategy:
metrics: [ retrieval_f1, retrieval_recall, retrieval_precision ]
speed_threshold: 5
top_k: 5
embedding_model: openai
modules:
- module_type: pass_passage_augmenter
- module_type: prev_next_augmenter
mode: next
- node_type: passage_reranker
strategy:
metrics: [ retrieval_f1, retrieval_recall, retrieval_precision ]
speed_threshold: 10
top_k: 3
modules:
- module_type: pass_reranker
- module_type: upr
- module_type: rankgpt
- module_type: sentence_transformer_reranker
- module_type: flag_embedding_reranker
- module_type: openvino_reranker
- module_type: flashrank_reranker
- node_type: passage_filter
strategy:
metrics: [ retrieval_f1, retrieval_recall, retrieval_precision ]
speed_threshold: 5
modules:
- module_type: pass_passage_filter
- module_type: similarity_threshold_cutoff
threshold: 0.85
- module_type: similarity_percentile_cutoff
percentile: 0.6
- module_type: threshold_cutoff
threshold: 0.85
- module_type: percentile_cutoff
percentile: 0.6
- node_line_name: post_retrieve_node_line # Arbitrary node line name
nodes:
- node_type: prompt_maker
strategy:
metrics:
- metric_name: bleu
- metric_name: meteor
- metric_name: rouge
- metric_name: sem_score
embedding_model: openai
speed_threshold: 10
generator_modules:
- module_type: llama_index_llm
llm: openai
model: [gpt-4o-mini]
modules:
- module_type: fstring
prompt:
- "Answer to given questions with the following passage: {retrieved_contents} \n\n Question: {query} \n\n Answer:"
- "There is a passages related to user question. Please response carefully to the following question. \n\n Passage: {retrieved_contents} \n\n Question: {query} \n\n Answer the question. Think step by step." # Zero-shot CoT prompt
- "{retrieved_contents} \n\n Read the passage carefully, and answer this question. \n\n Question: {query} \n\n Answer the question. Be concise." # concise prompt
- module_type: long_context_reorder
prompt:
- "Answer to given questions with the following passage: {retrieved_contents} \n\n Question: {query} \n\n Answer:"
- "There is a passages related to user question. Please response carefully to the following question. \n\n Passage: {retrieved_contents} \n\n Question: {query} \n\n Answer the question. Think step by step." # Zero-shot CoT prompt
- "{retrieved_contents} \n\n Read the passage carefully, and answer this question. \n\n Question: {query} \n\n Answer the question. Be concise." # concise prompt
- node_type: generator
strategy:
metrics:
- metric_name: rouge
- embedding_model: openai
metric_name: sem_score
- metric_name: bert_score
speed_threshold: 10
modules:
- module_type: llama_index_llm
llm: [openai]
model: [gpt-4o-mini]
temperature: [0.5, 1.0]
quantization_config:
bits: 4
group_size: 128
dataset: "c4"
model_seqlen: 2048
desc_act: False
device: "cpu"
model_load:
low_cpu_mem_usage: True
torch_dtype: "auto"
trust_remote_code: True
Desktop (please complete the following information):
Additional context
my project tree
π¦project
β£ π0
β β£ πpost_retrieve_node_line
β β β£ πgenerator
β β β β£ π0.parquet
β β β β£ π1.parquet
β β β β£ πbest_0.parquet
β β β β πsummary.csv
β β β£ πprompt_maker
β β β β£ π0.parquet
β β β β£ π1.parquet
β β β β£ π2.parquet
β β β β£ π3.parquet
β β β β£ π4.parquet
β β β β£ π5.parquet
β β β β£ πbest_0.parquet
β β β β πsummary.csv
β β β πsummary.csv
β β£ πretrieve_node_line
β β β£ πpassage_augmenter
β β β β£ π0.parquet
β β β β£ π1.parquet
β β β β£ πbest_0.parquet
β β β β πsummary.csv
β β β£ πpassage_filter
β β β β£ π0.parquet
β β β β£ π1.parquet
β β β β£ π2.parquet
β β β β£ π3.parquet
β β β β£ π4.parquet
β β β β£ πbest_3.parquet
β β β β πsummary.csv
β β β£ πpassage_reranker
β β β β£ π0.parquet
β β β β£ π1.parquet
β β β β£ π2.parquet
β β β β£ π3.parquet
β β β β£ π4.parquet
β β β β£ π5.parquet
β β β β£ π6.parquet
β β β β£ πbest_0.parquet
β β β β πsummary.csv
β β β£ πretrieval
β β β β£ π0.parquet
β β β β£ π1.parquet
β β β β£ π2.parquet
β β β β£ π3.parquet
β β β β£ π4.parquet
β β β β£ π5.parquet
β β β β£ π6.parquet
β β β β£ π7.parquet
β β β β£ π8.parquet
β β β β£ πbest_5.parquet
β β β β πsummary.csv
β β β πsummary.csv
β β£ πconfig.yaml
β β πsummary.csv
β£ πdata
β β£ πcorpus.parquet
β β πqa.parquet
β£ πresources
β β£ πchroma
β β β£ π0ec9cd05-0d96-4fc7-9a7a-a1abea6c5ce8
β β β£ π50f4b08f-74bf-4fdc-9097-edbe9649cda4
β β β β£ πdata_level0.bin
β β β β£ πheader.bin
β β β β£ πlength.bin
β β β β πlink_lists.bin
β β β πchroma.sqlite3
β β£ πbm25_gpt2.pkl
β β£ πbm25_porter_stemmer.pkl
β β£ πbm25_space.pkl
β β πvectordb.yaml
β πtrial.json
I don't know it is because of evaluate error, or config.yaml error, or python code error. can you find the cause of error?
cf. I made dataset myself (by hand), is it possible to occur error?
thank you for reading
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too