ST-LLM: An Effective Video-LLM Baseline with Spatial-Temporal Sequence Modeling Inside LLM

ST-LLM: An Effective Video-LLM Baseline with Spatial-Temporal Sequence Modeling Inside LLM