Hello,
I hope this message finds you well. I want to express my gratitude for providing the repository; it has been immensely helpful in enabling me to successfully execute the example.py script.
Furthermore, I have thoroughly reviewed the associated paper, which has given me a solid understanding of the project's context. However, I now have a few queries regarding the practical usage of the repository.
I have successfully managed to work with video and text inputs, but I am a bit unsure about how to incorporate the "RT-2" component, which is designed for Video-Language-Action interaction. I might be overlooking something, and I'd appreciate any guidance or clarification you could provide in this regard.
Additionally, while I've been able to obtain results using video and text inputs, I would greatly appreciate some clarification on the interpretation of these results. If you could shed some light on the meaning or implications of these outcomes, it would be immensely helpful.
Thank you very much for your assistance, and I look forward to your response.
Best regards,
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too