Is there some kind of branch or tag with the examples?
Any example of how to fine-tune it?
Any schematics of the different values what they correspond to and how flexible this is for different morphology of the robots?
Any simulation of the robot used is available?
For example my assumptions here seem wrong:
import torch
from rt2.model import RT2
from icecream import ic
# Load model
model = RT2()
# Example inputs
video = torch.randn(2, 3, 6, 224, 224)
instructions = [
'bring me that apple sitting on the table',
'please pass the butter'
]
# Get evaluation logits
train_logits = model.train(video, instructions)
model.model.eval()
eval_logits = model.eval(video, instructions, cond_scale=3.)
# Assuming softmax gives the probabilities of each action
probabilities = torch.nn.functional.softmax(eval_logits, dim=-1)
# Get the most probable action index
predicted_action_indices = torch.argmax(probabilities, dim=-1)
ic(predicted_action_indices)
# Assuming you have a list of action names corresponding to indices
actions = ["action1", "action2", "..."] # Replace with actual action names
predicted_actions = [actions[index] for index in predicted_action_indices]
print(predicted_actions)
Also, what kind of video input we are setting here with random values ?
Many question hope you can answer them :)
Pay now to fund the work behind this issue.
Get updates on progress being made.
Maintainer is rewarded once the issue is completed.
You're funding impactful open source efforts
You want to contribute to this effort
You want to get funding like this too