Skip to content

[Bug]: Gemini's embedding model does not work through the proxy #17759

@eliorc

Description

@eliorc

What happened?

Context

I am using the proxy with various models, all work as expected, but this is the first time I try to use an embedding endpoint.

When I go into my model hub, I can clearly see the model available:

Image

Goal and problem

I tried making the call in a few ways. When trying to interact with the proxy via the litellm SDK I fail getting a method not allowed error

response = litellm.embedding(
    model="gemini/gemini-embedding-001", # The user-facing name from your config.yaml
    input=["Hello world"],
    api_base='https://MY_PROXY',
    api_key='XXXXX',
    task_type="RETRIEVAL_DOCUMENT"
)
Request to litellm:
litellm.embedding(model='gemini/gemini-embedding-001', input=['Hello world'], api_base='https://MY_PROXY', api_key='MY_KEY', task_type='RETRIEVAL_DOCUMENT')
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
12:17:02 - LiteLLM:DEBUG: utils.py:381 - 
12:17:02 - LiteLLM:DEBUG: utils.py:381 - Request to litellm:
12:17:02 - LiteLLM:DEBUG: utils.py:381 - litellm.embedding(model='gemini/gemini-embedding-001', input=['Hello world'], api_base='MY_PROXY', api_key='MY_KEY', task_type='RETRIEVAL_DOCUMENT')
12:17:02 - LiteLLM:DEBUG: utils.py:381 - 
12:17:02 - LiteLLM:WARNING: utils.py:557 - `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:510 - self.optional_params: {}
12:17:02 - LiteLLM:DEBUG: utils.py:381 - SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:510 - self.optional_params: {'task_type': 'RETRIEVAL_DOCUMENT'}
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:1049 - 
POST Request Sent from LiteLLM:
curl -X POST \
https://MYPROXY/models/gemini-embedding-001:batchEmbedContents \
-H 'Content-Type: application/json; charset=utf-8' \
-d '{'requests': [{'model': 'models/gemini-embedding-001', 'content': {'parts': [{'text': 'Hello world'}]}, 'task_type': 'RETRIEVAL_DOCUMENT'}]}'
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:1122 - RAW RESPONSE:
Client error '405 Method Not Allowed' for url 'https://MYPROXY/models/gemini-embedding-001:batchEmbedContents'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/405
12:17:02 - LiteLLM:DEBUG: exception_mapping_utils.py:2358 - Logging Details: logger_fn - None | callable(logger_fn) - False
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:2694 - Logging Details LiteLLM-Failure Call: []
Traceback (most recent call last):
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/main.py", line 4577, in embedding
    response = google_batch_embeddings.batch_embeddings(  # type: ignore
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py", line 117, in batch_embeddings
    response = sync_handler.post(
               ^^^^^^^^^^^^^^^^^^
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 950, in post
    raise e
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 932, in post
    response.raise_for_status()
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '405 Method Not Allowed' for url 'https://MYPROXY/models/gemini-embedding-001:batchEmbedContents'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/405
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1382, in wrapper
    raise e
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1251, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/main.py", line 5101, in embedding
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2329, in exception_type
    raise e
  File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2298, in exception_type
    raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: GeminiException - {"detail":"Method Not Allowed"}

When I tried interacting with direct request, it worked but task type does not work

response = requests.post(
    "https://MYPROXY/embeddings",
    headers={"Authorization": "Bearer MY_KEY"},
    json={
        "model": "gemini/gemini-embedding-001",
        "input": ["I am looking for the most fitting model for my test"],
        "dimensions": 768,
        "task_type": "RETRIEVAL_QUERY"
    }
)

This works successfully, but the task_type is ignored. How do I verify it is ignored? Because when interacting directly with Gemini's endpoint (not through the proxy) I get different embeddings for task_type=RETRIEVAL_QUERY and task_type=RETRIEVAL_DOCUMENT and through the proxy not.

Summary

My problem and what I can't figure out is how to use Gemini's embeddings endpoint through the proxy, while utilizing the task_type parameter.

A secondary problem and less important to me but still a bug, is that the litellm SDK cannot make this call

Relevant log output

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.79.0

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions