This development was detailed in a recent paper titled “MM1: Method, Analysis & In-depth Insights from Multimodal Pretraining” and it introduces a model with impressive capabilities in both image recognition and natural language reasoning.
MM1 comes in three sizes: 3 billion, 7 billion, and 30 billion parameters. Researchers have used these models to conduct experiments, identifying key factors influencing performance.
The research team meticulously constructed MM1 using the “Mixture of Experts” architecture and the “Top-2 Gating” method. This approach not only yields outstanding results in pretraining standards but also delivers high performance on existing multimodal benchmarks. Even after fine-tuning for specific tasks, the MM1 model continues to maintain promising performance.
Testing shows that MM1-3B-Chat and MM1-7B-Chat models outperform most similarly sized competitors in the market. These models particularly shine in tasks such as VQAv2 (answering questions based on images and text), TextVQA (answering text-based questions about images), and ScienceQA (answering scientific questions).
However, the overall performance of MM1 still hasn't surpassed Google Gemini or OpenAI -4. While Apple still has work to do with MM1, this is a significant step for Apple in artificial intelligence promising it will soon be brought to , , and other company devices, or Apple may integrate it into Siri.
Recently, Apple has acquired DarwinAI, a company that has achieved considerable success in the field of AI.
Learn more: Do you know how Apple has used AI and machine learning in iOS?To get ready for experiencing Apple's AI features, check out the iPhone models below:
