lucataco/apollo-3b 🔢🖼️📝 → 📝

▶️ 162 runs 📅 Dec 2024 ⚙️ Cog 0.13.6 🔗 GitHub 📄 Paper ⚖️ License

video-analysis video-auto-captioning video-question-answering video-to-text video-understanding

Performance

4.3sTypical run time

~59sCold start (first call)

162Total runs

About

Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models

Example Output

Prompt:

"Describe this video in detail"

Output

The video features a lone astronaut in a white spacesuit, equipped with a helmet and gloves, standing on the moon's surface. The backdrop is dominated by a large, detailed image of the moon, set against a starry space. The astronaut begins to run across the lunar terrain, leaving footprints behind. As he runs, the camera angle shifts to reveal more of the moon's rugged landscape. The astronaut continues his run until he reaches the edge of the frame, where he leaps into the vast expanse of space, floating away from the moon. Throughout the sequence, the moon remains a constant and prominent feature in the background, emphasizing the astronaut's journey into the cosmos. The video captures the astronaut's solitary trek across the moon's surface and his subsequent leap into the unknown, symbolizing humanity's boundless curiosity and spirit of exploration. The astronaut's actions are depicted with precision and grace, highlighting the beauty and isolation of space travel. The video concludes with the astronaut floating freely in space, surrounded by the endless void of space, underscoring the awe-inspiring scale and mystery of the universe. The astronaut's journey serves as a powerful metaphor for human ambition and the quest for knowledge, encapsulating the essence of space exploration. The

Performance Metrics

4.28s Prediction Time

58.54s Total Time

All Input Parameters

{
  "top_p": 0.7,
  "video": "https://replicate.delivery/pbxt/M9jsPwn8UqBPUCM6xiFEGcyfaZWXDwJ6EwcVqLPjKzcmctgn/replicate-prediction-jv0zykaqvhrmc0ckt4vtepaep4.mp4",
  "prompt": "Describe this video in detail",
  "temperature": 0.4,
  "max_new_tokens": 256
}

Input Parameters

top_p Type: numberDefault: 0.7Range: 0 - 1: Top-p sampling probability
video (required) Type: string: Input video file
prompt Type: stringDefault: Describe this video in detail: Question or prompt about the video
temperature Type: numberDefault: 0.4Range: 0.1 - 2: Sampling temperature
max_new_tokens Type: integerDefault: 256Range: 32 - 1024: Maximum number of tokens to generate

Output Schema

Output

Type: string

Example Execution Logs

/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Version Details

Version ID: ed5ec3233f7a4331a2b0d33b8a02e62057a335e1a03107d56db9fac10cf3ce55
Version Created: December 16, 2024

Run on Replicate →