Skip to content

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

License

Notifications You must be signed in to change notification settings

SkyworkAI/Skywork-R1V

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skywork Logo

Skywork-R1V4

Welcome to the Skywork-R1V repository! Here, you'll find a series of state-of-the-art multimodal reasoning models with powerful agentic capabilities. From open-source versions with model weights and inference code to our latest closed-source offerings, the Skywork-R1V series delivers exceptional performance across vision understanding, code execution, and deep research tasks.

🔥 News

💥 November 18, 2025: We released Skywork-R1V4-Lite, a lightweight and ultra-fast closed-source multimodal reasoning model that achieves exceptional image understanding capabilities through code execution tools. R1V4-Lite features blazing-fast inference speed and can be integrated with search tools to enable deep research capabilities. Available now on Skywork Platform, and coming soon to OpenRouter—stay tuned!

July 15, 2025: We've released quantized versions of ​Skywork-R1V3​ for efficient inference:

July 9, 2025: We released Skywork-R1V3-38B [🤗 Skywork-R1V3-38B], the latest and most powerful open-source multimodal reasoning model in the Skywork series, pushing the boundaries of multimodal and cross-disciplinary intelligence. Mainly through RL algorithm in post-training, R1V3 significantly enhances multimodal reasoning ablity and achieves open-source state-of-the-art (SOTA) performance across multiple multimodal reasoning benchmarks, e.g. 76.0 on MMMU.

April 28, 2025: We released awq quantized version of Skywork R1V2[🤗 Skywork-R1V2-38B-AWQ], supporting single-card (above 30GB) inference.

April 24, 2025: We released Skywork-R1V2, an advanced open-source multimodal reasoning model that demonstrates strong performance across a range of multimodal reasoning benchmarks including MMMU, MMMU-Pro, MathVista, and OlympiadBench.[🤗 Skywork-R1V2-38B][📖R1V2 Report]

April 9, 2025: Our technical report is currently available on arxiv: [Skywork-R1V: Pioneering Multimodal Reasoning with CoT].

Mar 26, 2025: We released awq quantized version of Skywork R1V[🤗 Skywork-R1V-38B-AWQ], supporting single-card (above 30GB) inference.

Mar 18, 2025: We are thrilled to introduce Skywork R1V, the first industry open-sourced multimodal reasoning model with advanced visual chain-of-thought capabilities, pushing the boundaries of AI-driven vision and logical inference! 🚀

📊 Evaluation

Skywork-R1V4-Lite demonstrates state-of-the-art performance on various multimodal tasks, particularly excelling in perception and deep research capabilities.

Comparison of Skywork-R1V4 with Leading Multimodal Models

Benchmark Split Skywork-R1V4
30B(A3B)
Qwen3-VL
30B(A3B)
Qwen3-VL
235B(A22B)
Gemini 2.5 Flash Gemini 2.5 Pro
Perception
HIRbench-4K FSP 91.8 88.5 89.0 81.5 85.5
FCP 73.8 68.5 77.0 74.0 82.3
Overall 82.8 78.5 83.0 77.5 83.9
HIRbench-8K FSP 88.8 80.3 83.0 75.8 83.0
FCP 70.8 68.3 77.3 71.8 80.0
Overall 79.8 74.2 80.4 73.7 81.5
MME-Real Perception 73.4 70.4 74.3 62.3 73.1
Reasoning 56.4 47.7 52.5 51.0 58.2
Overall 71.4 67.7 71.6 60.9 71.3
MME-Real-CN Perception 76.3 72.6 76.0 65.8 74.5
Reasoning 59.4 45.0 53.8 51.3 58.3
Overall 70.8 63.7 68.8 61.2 69.3
MME-Real-Lite Perception 63.2 58.0 60.2 50.4 59.9
Reasoning 53.2 46.3 50.7 49.9 55.1
Overall 59.3 53.2 56.5 50.2 58.3
V* Attribute 90.4 81.7 79.1 77.3 86.8
Spatial 84.2 82.9 82.9 64.4 68.4
Overall 88.0 82.2 80.6 72.3 79.1
TreeBench Overall 48.4 42.7 49.6 45.9 54.6
Visual Probe Hard 42.4 30.1 42.4 28.3 33.9
Medium 42.9 35.8 39.1 31.3 35.4
Easy 66.7 65.2 65.9 45.3 49.6
Deep Research
MMSearch Overall 66.1 18.7 48.0 64.9 71.9
FVQA Overall 67.2 53.3 54.4 60.7 72.0
BrowseComp-VL Overall 38.4 30.0 31.6 40.8 45.4

Key Highlights:

  • 🏆 Skywork-R1V4 achieves top performance among 30B-class models across most perception benchmarks
  • 🚀 Outstanding FSP scores on HIRbench-4K (91.8) and HIRbench-8K (88.8), demonstrating exceptional high-resolution image understanding
  • 🔍 Strong deep research capabilities with competitive performance on MMSearch (66.1) and FVQA (67.2)

🚀 How to Use Skywork-R1V4-Lite

Skywork-R1V4-Lite is available as an API service. You can access it through Skywork Platform or OpenRouter (coming soon).

1. Get API Access

Visit Skywork Platform to obtain your API key.

2. Quick Start with Python

import requests
import base64

def image_to_base64(image_path):
    with open(image_path, "rb") as f:
        image_data = f.read()
        return base64.b64encode(image_data).decode("utf-8")

# API configuration
base_url = "https://api.skyworkmodel.ai"
api_key = "your_api_key_here"

# Prepare the request
image_base64 = image_to_base64("path/to/your/image.jpg")
content = [
    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}},
    {"type": "text", "text": "What's in this image?"}
]

# Call the API
response = requests.post(
    f"{base_url}/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    },
    json={
        "model": "skywork/r1v4-lite",
        "messages": [{"role": "user", "content": content}],
        "stream": False,
        "enable_search": False  # Set to True for deep research capabilities
    }
)

print(response.json()["choices"][0]["message"]["content"])

3. Batch Testing with Our Tool Suite

We provide a comprehensive testing toolkit in the r1v4 folder for batch processing and result visualization.

Clone and Setup

git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd Skywork-R1V/r1v4
pip install -r requirements.txt

Prepare Test Cases

Edit test_cases.jsonl with your test cases (one JSON per line):

{"image": "./demo_image/demo_1.png", "question": "What's in this image?"}
{"image": "", "question": "This is a text-only question"}

Run Batch Tests

# Non-streaming mode (default)
python3 batch_nonstream.py

# Streaming mode
python3 batch_stream.py

# With custom input/output files
python3 batch_nonstream.py input.jsonl output.jsonl

# Using planner model for task planning
python3 batch_planner_nonstream.py

Visualize Results

# Start the web viewer
python3 visual.py

# Then open browser and input result file path (e.g., result_nonstream.jsonl)

Parse Structured Responses

from parse_utils import parse_full_response

# Parse the response to extract reasoning steps, tool calls, and observations
parsed = parse_full_response(response_text)

# Access structured data
for round_data in parsed['rounds']:
    print(f"Round {round_data['round_num']}")
    print(f"Thinking: {round_data['think']}")
    print(f"Tool: {round_data['tool_call']['name']}")

4. Features

  • Code Execution: R1V4-Lite can write and execute Python code for complex tasks
  • Deep Research: Enable enable_search=True to integrate web search capabilities
  • Multi-turn Reasoning: Automatic multi-step reasoning with tool usage
  • Streaming Support: Real-time response streaming for better user experience

License

This code repository is licensed under the MIT License.

✅ Commercial use permitted

✅ Modification allowed

✅ Distribution allowed

❌ No liability

Skywork-R1V4-Lite is based on Qwen3-VL-30B-A3B-Instruct as the base model, which is licensed under the Apache 2.0 License.

Acknowledgments

We would like to express our gratitude to the following open-source projects that have been instrumental in our work:

  • MS-SWIFT: A powerful framework for model training and fine-tuning that greatly facilitated our model development process.
  • VLMEvalKit: A comprehensive evaluation toolkit for vision-language models that enabled our extensive benchmarking.

🔮 Future Directions

We are excited to share our vision for the future development of the Skywork-R1V series:

  • Skywork-R1V4-Pro: We are developing a more powerful model with enhanced capabilities across all benchmarks. Stay tuned for the upcoming release!
  • Reinforcement Learning Research: We are actively exploring the application of reinforcement learning techniques to advance multimodal reasoning and agentic capabilities, pushing the boundaries of what's possible in vision-language AI.

❤️Misc

Star History Chart

Citation

If you use Skywork-R1V in your research, please cite:

@misc{zhang2025skyworkr1v4agenticmultimodalintelligence,
      title={Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch}, 
      author={Yifan Zhang and Liang Hu and Haofeng Sun and Peiyu Wang and Yichen Wei and Shukang Yin and Jiangbo Pei and Wei Shen and Peng Xia and Yi Peng and Tianyidan Xie and Eric Li and Yang Liu and Xuchen Song and Yahui Zhou},
      year={2025},
      eprint={2512.02395},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.02395}, 
}
@misc{shen2025skyworkr1v3technicalreport,
      title={Skywork-R1V3 Technical Report}, 
      author={Wei Shen and Jiangbo Pei and Yi Peng and Xuchen Song and Yang Liu and Jian Peng and Haofeng Sun and Yunzhuo Hao and Peiyu Wang and Jianhao Zhang and Yahui Zhou},
      year={2025},
      eprint={2507.06167},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.06167}, 
}
@misc{chris2025skyworkr1v2multimodalhybrid,
      title={Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning}, 
      author={Peiyu Wang and Yichen Wei and Yi Peng and Xiaokun Wang and Weijie Qiu and Wei Shen and Tianyidan Xie and Jiangbo Pei and Jianhao Zhang and Yunzhuo Hao and Xuchen Song and Yang Liu and Yahui Zhou},
      year={2025},
      eprint={2504.16656},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.16656}, 
}
@misc{peng2025skyworkr1vpioneeringmultimodal,
      title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought}, 
      author={Yi Peng and Peiyu Wang and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
      year={2025},
      eprint={2504.05599},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.05599}, 
}

About

Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 11