Describe the bug
Hello,
It seems that when the parser module is configured with file_type: all_files
, only pdfminer is applied. I have tried using langchain_parser/upstagedocumentparse
and llamaparser
, and both appear to use pdfminer exclusively. Even when I set the output_format
to html
, it seems like pdfminer is still being used. Am I mistaken about something?
Below is the YAML file I configured:
- module_type: langchain_parse
parse_method: upstagedocumentparse
split: page
file_type: all_files
output_format: html
or
- module_type: llamaparse
result_type: markdown
file_type: all_files
language: ko
I would appreciate your help. Thank you.
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too