Is your feature request related to a problem? Please describe.
Semantic chunking cuts into sentences and then merges those with close semantic scores (cosine similarity) to form passages.
However, when parsing on a page-by-page basis, semantic score comparisons are only made between sentences on a page, which may not perform as well as comparing sentences across the document.
Describe the solution you'd like
Implement a feature to parse on a per-document basis when doing semantic chunking, even if you're using a per-page parsing module (e.g. clova, table_hybrid_parse)
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too