Commit 2bcc4c2
authored
feat: Add support for multimodal embeddings in vectorizers (#452)
This PR generalizes the `BaseVectorizer` to be agnostic to any modality
(since it previously exclusively supported text inputs). Building from
the new base, this PR then extends the implementation for some
vectorizers to support multimodal embeddings (renaming them away from
being specifically for text).
## `BaseVectorizer`
The move away from having the `BaseVectorizer` explicitly expect text
inputs means a change in the signature of the `embed` methods away from
`vectorizer.embed(text="lorem ipsum...")` to
`vectorizer.embed(content="lorem ipsum...")`. This is a breaking change
for existing usages of the vectorizers that use the keyword argument,
and the usages will need to be updated to align with the new schema.
Caching for multimodal embeddings is supported for all vectorizers
introduced in this PR.
## Multimodal Implementations
The following vectorizers have been renamed to no longer be explicitly
text vectorizers, and moved to no longer be defined in the
`vectorize.text` module. Imports and usages for these vectorizers will
need to be updated to avoid errors. The `CustomTextVectorizer` has also
been renamed and moved to be
`redisvl.utils.vectorize.custom.CustomVectorizer`.
### VoyageAI
Old: `redisvl.utils.vectorize.text.voyageai.VoyageAITextVectorizer`
New: `redisvl.utils.vectorize.voyageai.VoyageAIVectorizer`
```python
from redisvl.utils.vectorize import VoyageAIVectorizer
# --- Basic usage
vectorizer = VoyageAIVectorizer(
model="voyage-3-large",
api_config={"api_key": "your-voyageai-api-key"} # OR set VOYAGE_API_KEY in your env
)
query_embedding = vectorizer.embed(
content="your input query text here",
input_type="query"
)
doc_embeddings = vectorizer.embed_many(
contents=["your document text", "more document text"],
input_type="document"
)
# --- Multimodal usage - requires Pillow and voyageai>=0.3.6 (for video)
from PIL import Image
from voyageai.video_utils import Video
vectorizer = VoyageAIVectorizer(
model="voyage-multimodal-3.5",
api_config={"api_key": "your-voyageai-api-key"} # OR set VOYAGE_API_KEY in your env
)
# text
text_embedding = vectorizer.embed(
content="your input query text here",
input_type="query"
)
# image
image_embedding = vectorizer.embed_image(
"path/to/your/image.jpg",
input_type="query"
)
image_embedding = vectorizer.embed(
Image.open("path/to/your/image.jpg"),
input_type="query"
# video
video_embedding = vectorizer.embed_video(
"path/to/your/video.mp4",
input_type="document"
)
video_embedding = vectorizer.embed(
Video.from_path("path/to/your/video.mp4", model=vectorizer.model),
input_type="document"
)
```
### Vertex AI
Old: `redisvl.utils.vectorize.text.vertexai.VertexAITextVectorizer`
New: `redisvl.utils.vectorize.vertexai.VertexAIVectorizer`
```python
from redisvl.utils.vectorize import VertexAIVectorizer
# Basic usage
vectorizer = VertexAIVectorizer(
model="textembedding-gecko",
api_config={
"project_id": "your_gcp_project_id", # OR set GCP_PROJECT_ID
"location": "your_gcp_location", # OR set GCP_LOCATION
})
embedding = vectorizer.embed("Hello, world!")
# Multimodal usage
from vertexai.vision_models import Image, Video
vectorizer = VertexAIVectorizer(
model="multimodalembedding@001",
api_config={
"project_id": "your_gcp_project_id", # OR set GCP_PROJECT_ID
"location": "your_gcp_location", # OR set GCP_LOCATION
}
)
text_embedding = vectorizer.embed("Hello, world!")
image_embedding = vectorizer.embed(Image.load_from_file("path/to/your/image.jpg"))
image_embedding = vectorizer.embed_image("path/to/your/image.jpg")
video_embedding = vectorizer.embed(Video.load_from_file("path/to/your/video.mp4"))
video_embedding = vectorizer.embed_video("path/to/your/video.mp4")
```
### Amazon Bedrock
Old: `redisvl.utils.vectorize.text.bedrock.BedrockTextVectorizer`
New: `redisvl.utils.vectorize.bedrock.BedrockVectorizer`
```python
from redisvl.utils.vectorize import BedrockVectorizer
vectorizer = BedrockVectorizer(
model="amazon.titan-embed-text-v2:0",
api_config={
"aws_access_key_id": "your_access_key",
"aws_secret_access_key": "your_secret_key",
"aws_region": "us-east-1"
}
)
embedding = vectorizer.embed("Hello, world!")
# Multimodal usage
from pathlib import Path
from PIL import Image
vectorizer = BedrockVectorizer(
model="amazon.titan-embed-image-v1:0",
api_config={
"aws_access_key_id": "your_access_key",
"aws_secret_access_key": "your_secret_key",
"aws_region": "us-east-1"
}
)
image_embedding = vectorizer.embed(Path("path/to/your/image.jpg"))
image_embedding = vectorizer.embed(Image.open("path/to/other/image.png"))
image_embedding = vectorizer.embed_image("path/to/your/image.jpg")
# Embedding a list of mixed modalities
embeddings = vectorizer.embed_many(
["Hello", "world!", Path("path/to/your/image.jpg"), Image.open("path/to/other/image.png")],
batch_size=2
)
```
### Hugging Face
While the sentence-transformers package does not explicitly allow for
multimodal usage (the package is designed for text-based use-cases),
some officially supported multimodal models can be used without issue
via the `SentenceTransformer` class. This PR removes strict enforcement
of text inputs for the `HFTextVectorizer` to enable these use-cases.
```python
from PIL import Image
from redisvl.utils.vectorize import HFTextVectorizer
vectorizer = HFTextVectorizer(model="sentence-transformers/clip-ViT-L-14")
embeddings1 = vectorizer.embed("Hello, world!")
embeddings2 = vectorizer.embed(Image.open("path/to/your/image.jpg"))
```
## Open Topics
Since this PR introduces a few breaking changes, do we want to maintain
backwards compatibility (with deprecation warnings) for syntax that is
changing? This includes:
- `vectorizer.embed(text=...)` -> `vectorizer.embed(content=...)`
- `VoyageAITextVectorizer` -> `VoyageAIVectorizer`
- `VertexAITextVectorizer` -> `VertexAIVectorizer`
- `BedrockTextVectorizer` -> `BedrockVectorizer`
- `CustomTextVectorizer` -> `CustomVectorizer`1 parent 5e1e95c commit 2bcc4c2
File tree
30 files changed
+3263
-2692
lines changed- docs
- api
- user_guide
- redisvl
- extensions/cache/embeddings
- query
- utils
- vectorize
- text
- tests
- integration
- unit
30 files changed
+3263
-2692
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
5 | 18 | | |
6 | 19 | | |
7 | 20 | | |
| |||
38 | 51 | | |
39 | 52 | | |
40 | 53 | | |
41 | | - | |
| 54 | + | |
42 | 55 | | |
43 | 56 | | |
44 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
45 | 60 | | |
46 | | - | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
47 | 65 | | |
48 | | - | |
| 66 | + | |
49 | 67 | | |
50 | 68 | | |
51 | 69 | | |
| |||
62 | 80 | | |
63 | 81 | | |
64 | 82 | | |
65 | | - | |
| 83 | + | |
66 | 84 | | |
67 | 85 | | |
68 | | - | |
| 86 | + | |
69 | 87 | | |
70 | | - | |
| 88 | + | |
71 | 89 | | |
72 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
73 | 96 | | |
74 | 97 | | |
75 | 98 | | |
76 | 99 | | |
77 | | - | |
| 100 | + | |
78 | 101 | | |
79 | 102 | | |
80 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
81 | 106 | | |
82 | | - | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
83 | 111 | | |
84 | | - | |
| 112 | + | |
85 | 113 | | |
86 | 114 | | |
87 | 115 | | |
88 | 116 | | |
89 | | - | |
| 117 | + | |
90 | 118 | | |
91 | 119 | | |
92 | | - | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
93 | 138 | | |
94 | | - | |
| 139 | + | |
95 | 140 | | |
96 | | - | |
| 141 | + | |
97 | 142 | | |
98 | 143 | | |
Large diffs are not rendered by default.
0 commit comments