找回密码
 注册

微信登录

微信扫一扫,快速登录

查看: 174|回复: 0

聊聊Spring AI的Multimodality

[复制链接]
发表于 2025-4-12 12:55 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?注册 微信登录

×
作者:微信文章


本文主要研究一下Spring AI的Multimodality

示例

chatModel示例

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage(
        "Explain what do you see in this picture?", // content
        new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource)); // media

ChatResponse response = chatModel.call(new Prompt(this.userMessage));
chatClient示例

String response = ChatClient.create(chatModel).prompt()
                .user(u -> u.text("Explain what do you see on this picture?")
                                    .media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
                .call()
                .content();

目前是如下几种模型支持多模态
    Anthropic Claude 3AWS Bedrock ConverseAzure Open AI (e.g. GPT-4o models)Mistral AI (e.g. Mistral Pixtral models)Ollama (e.g. LLaVA, BakLLaVA, Llama3.2 models)OpenAI (e.g. GPT-4 and GPT-4o models)Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)

源码

UserMessage


org/springframework/ai/chat/messages/UserMessage.java
public class UserMessage extends AbstractMessage implements MediaContent {

        protected final List<Media> media;

        public UserMessage(String textContent) {
                this(MessageType.USER, textContent, new ArrayList<>(), Map.of());
        }

        public UserMessage(Resource resource) {
                super(MessageType.USER, resource, Map.of());
                this.media = new ArrayList<>();
        }

        public UserMessage(String textContent, List<Media> media) {
                this(MessageType.USER, textContent, media, Map.of());
        }

        public UserMessage(String textContent, Media... media) {
                this(textContent, Arrays.asList(media));
        }

        public UserMessage(String textContent, Collection<Media> mediaList, Map<String, Object> metadata) {
                this(MessageType.USER, textContent, mediaList, metadata);
        }

        public UserMessage(MessageType messageType, String textContent, Collection<Media> media,
                        Map<String, Object> metadata) {
                super(messageType, textContent, metadata);
                Assert.notNull(media, "media data must not be null");
                this.media = new ArrayList<>(media);
        }

        @Override
        public String toString() {
                return "UserMessage{" + "content='" + getText() + '\'' + ", properties=" + this.metadata + ", messageType="
                                + this.messageType + '}';
        }

        @Override
        public List<Media> getMedia() {
                return this.media;
        }

        @Override
        public String getText() {
                return this.textContent;
        }

}

UserMessage实现了MediaContent的getMedia方法
Media


org/springframework/ai/model/Media.java
public class Media {

        private static final String NAME_PREFIX = "media-";

        /**
         * An Id of the media object, usually defined when the model returns a reference to
         * media it has been passed.
         */
        @Nullable
        private String id;

        private final MimeType mimeType;

        private final Object data;

        /**
         * The name of the media object that can be referenced by the AI model.
         * <p>
         * Important security note: This field is vulnerable to prompt injections, as the
         * model might inadvertently interpret it as instructions. It is recommended to
         * specify neutral names.
         *
         * <p>
         * The name must only contain:
         * <ul>
         * <li>Alphanumeric characters
         * <li>Whitespace characters (no more than one in a row)
         * <li>Hyphens
         * <li>Parentheses
         * <li>Square brackets
         * </ul>
         */
        private String name;

        //......
}       

Media定义了id、mimeType、data、name属性
Format

        public static class Format {

                // -----------------
                // Document formats
                // -----------------
                /**
                 * Public constant mime type for {@code application/pdf}.
                 */
                public static final MimeType DOC_PDF = MimeType.valueOf("application/pdf");

                /**
                 * Public constant mime type for {@code text/csv}.
                 */
                public static final MimeType DOC_CSV = MimeType.valueOf("text/csv");

                /**
                 * Public constant mime type for {@code application/msword}.
                 */
                public static final MimeType DOC_DOC = MimeType.valueOf("application/msword");

                /**
                 * Public constant mime type for
                 * {@code application/vnd.openxmlformats-officedocument.wordprocessingml.document}.
                 */
                public static final MimeType DOC_DOCX = MimeType
                        .valueOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document");

                /**
                 * Public constant mime type for {@code application/vnd.ms-excel}.
                 */
                public static final MimeType DOC_XLS = MimeType.valueOf("application/vnd.ms-excel");

                /**
                 * Public constant mime type for
                 * {@code application/vnd.openxmlformats-officedocument.spreadsheetml.sheet}.
                 */
                public static final MimeType DOC_XLSX = MimeType
                        .valueOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

                /**
                 * Public constant mime type for {@code text/html}.
                 */
                public static final MimeType DOC_HTML = MimeType.valueOf("text/html");

                /**
                 * Public constant mime type for {@code text/plain}.
                 */
                public static final MimeType DOC_TXT = MimeType.valueOf("text/plain");

                /**
                 * Public constant mime type for {@code text/markdown}.
                 */
                public static final MimeType DOC_MD = MimeType.valueOf("text/markdown");

                // -----------------
                // Video Formats
                // -----------------
                /**
                 * Public constant mime type for {@code video/x-matros}.
                 */
                public static final MimeType VIDEO_MKV = MimeType.valueOf("video/x-matros");

                /**
                 * Public constant mime type for {@code video/quicktime}.
                 */
                public static final MimeType VIDEO_MOV = MimeType.valueOf("video/quicktime");

                /**
                 * Public constant mime type for {@code video/mp4}.
                 */
                public static final MimeType VIDEO_MP4 = MimeType.valueOf("video/mp4");

                /**
                 * Public constant mime type for {@code video/webm}.
                 */
                public static final MimeType VIDEO_WEBM = MimeType.valueOf("video/webm");

                /**
                 * Public constant mime type for {@code video/x-flv}.
                 */
                public static final MimeType VIDEO_FLV = MimeType.valueOf("video/x-flv");

                /**
                 * Public constant mime type for {@code video/mpeg}.
                 */
                public static final MimeType VIDEO_MPEG = MimeType.valueOf("video/mpeg");

                /**
                 * Public constant mime type for {@code video/mpeg}.
                 */
                public static final MimeType VIDEO_MPG = MimeType.valueOf("video/mpeg");

                /**
                 * Public constant mime type for {@code video/x-ms-wmv}.
                 */
                public static final MimeType VIDEO_WMV = MimeType.valueOf("video/x-ms-wmv");

                /**
                 * Public constant mime type for {@code video/3gpp}.
                 */
                public static final MimeType VIDEO_THREE_GP = MimeType.valueOf("video/3gpp");

                // -----------------
                // Image Formats
                // -----------------
                /**
                 * Public constant mime type for {@code image/png}.
                 */
                public static final MimeType IMAGE_PNG = MimeType.valueOf("image/png");

                /**
                 * Public constant mime type for {@code image/jpeg}.
                 */
                public static final MimeType IMAGE_JPEG = MimeType.valueOf("image/jpeg");

                /**
                 * Public constant mime type for {@code image/gif}.
                 */
                public static final MimeType IMAGE_GIF = MimeType.valueOf("image/gif");

                /**
                 * Public constant mime type for {@code image/webp}.
                 */
                public static final MimeType IMAGE_WEBP = MimeType.valueOf("image/webp");

        }

Format定义了常用的几种MimeType

小结

Spring AI设计了各种message类型用于支持多模态,其中UserMessage有个media属性,类型List<Media>,支持传入图像、音频、视频,MimeType用于指定是哪种类型。

doc

    multimodality
Die von den Nutzern eingestellten Information und Meinungen sind nicht eigene Informationen und Meinungen der DOLC GmbH.
您需要登录后才可以回帖 登录 | 注册 微信登录

本版积分规则

Archiver|手机版|AGB|Impressum|Datenschutzerklärung|萍聚社区-德国热线-德国实用信息网

GMT+2, 2025-5-16 04:18 , Processed in 0.097582 second(s), 27 queries .

Powered by Discuz! X3.5 Licensed

© 2001-2025 Discuz! Team.

快速回复 返回顶部 返回列表