SenseTime has open-sourced SenseNova-MARS (8B and 32B), a model designed to understand images and text, and solve tasks step by step.

AI software company SenseTime said it has open-sourced SenseNova-MARS, a multimodal model released in 8B and 32B versions.

The company said the model is designed to handle tasks that involve both images and text,
including searching for information and reasoning through problems in multiple steps, and describes it as an agentic visual language model that can plan actions and invoke tools such as image cropping, image search, and text search.

SenseNova-MARS scored an average of 69.74 across several benchmarks for multimodal
search and reasoning, compared with 69.06 for Gemini-3-Pro and 67.64 for GPT-5.2. It also
reported results on specific tests, including MMSearch (a text–image search benchmark), where
it scored 74.27, tying Gemini-3-Pro (74.27) and exceeding GPT-5.2 (66.08).

On HR-MMSearch, which focuses on high-resolution detail reasoning, SenseTime said the model scored 54.43. As an example, SenseTime said the model can identify a small logo in a photo, look up
information about the company, find a person’s background details, and calculate an answer by
automatically using image cropping and search tools.

SenseTime said more details are available in its technical report, and invited developers and
industry users to test the model.

Stay updated on crypto and AI by following our socials

Leave a Reply

Your email address will not be published. Required fields are marked *

Instagram