Our semantic correlation recognition model using feature space mapping and support vector machine, using textual-linguistic, visual, and social features, significantly improves accuracy compared to traditional support vector machine models.

Recognizing semantic correlation in image-text weibo via feature space mapping

Our neural space-time representation for text-to-image personalization improves visual fidelity, controllability, and disk space usage without fine-tuning generative models.

A Neural Space-Time Representation for Text-to-Image Personalization

Our method improves human image manipulation accuracy and interactiveness by using structured information, disentangling appearance and spatial structure, and learning image-text shared space.

Text-Guided Human Image Manipulation via Image-Text Shared Space

Our model can generate diverse and realistic images without paired training data, using content features and attribute vectors from a single input image.

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Our approach based on disentangled representations effectively generates diverse and realistic images without paired training images in image-to-image translation tasks.

This paper presents a method to significantly speed up dimension reduction in linear mappings, allowing more efficient computation and data manipulation in various applications.

Faster dimension reduction

This method enables the generation of specific colored images from text data by mapping low-dimensional embeddings of image and natural language to a common latent space, extracting semantic relationships between them.

Text to image generative model using constrained embedding space mapping

The VQ-Diffusion model for text-to-image generation improves image quality and speed, achieving a better trade-off between quality and speed compared to traditional autoregressive methods.

Vector Quantized Diffusion Model for Text-to-Image Synthesis

StyleCLIP, a text-based interface for StyleGAN image manipulation, allows faster and more stable manipulation of images without manual effort.

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

Our robust system using Mutual Direction Symmetry, Mutual Magnitude Symmetry, and Gradient Vector Symmetry effectively detects text in natural scene images, regardless of orientation and curves.

A robust arbitrary text detection system for natural scene images

The P2RM method effectively addresses one-to-many correspondence in image-text retrieval by extending representations of different modalities to rectangles, improving performance on common benchmarks.

Point to Rectangle Matching for Image Text Retrieval

The Multi-generator Text Conditioned Generative Adversarial Network (MTC-GAN) effectively generates more diverse images and improves mode collapse in text-to-image synthesis tasks.

Text to image synthesis using multi-generator text conditioned generative adversarial networks

Our text-guided sketch-to-image synthesis model efficiently synthesizes visually appealing images from human facial sketches and their text descriptions, using a Contextual GAN and c-Map.

Text-Guided Sketch-to-Photo Image Synthesis

The proposed fast text-embedded image Chinese text extracting algorithm achieves 86% recognition rate, providing secure malicious filtering performance.

Fast Image Embedded Chinese Text Extracting by Homogeneous Space Mapping

Our learning-based encoder, ELITE, enables fast and accurate customized text-to-image generation by encoding visual concepts into textual embeddings, resulting in high-fidelity inversion and robust editability.

ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation

Our model combines large language models with image encoder and decoder models, enabling image retrieval, novel image generation, and multimodal dialogue, outperforming non-LLM models in context-dependent tasks.

Generating Images with Multimodal Language Models

Linearly mapping from image to text space can effectively transfer conceptual representations from vision models to frozen text-only models, achieving competitive performance on captioning and visual question-answer tasks.

Linearly Mapping from Image to Text Space

This fast and robust algorithm using Support Vector Machine (SVM) effectively identifies text in complex backgrounds and compression effects, outperforming conventional methods in both identification quality and computation time.

Text identification in complex background using SVM

This paper presents a novel texture-based method for text detection in images using support vector machines and the continuously adaptive mean shift algorithm (CAMSHIFT), resulting in robust and efficient results.

Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm

This paper presents a system that generates space vectors for 3D scenes from natural language input, eliminating the need for fixed format text control in 3D scenes.

Space Vector Generation for 3D Scenes from Text Descriptions

maps text and images to a shared vector space

These studies suggest that various models and methods can map text and images to a shared vector space, improving accuracy, visual fidelity, and image manipulation capabilities.