-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
What happened?
Context
I am using the proxy with various models, all work as expected, but this is the first time I try to use an embedding endpoint.
When I go into my model hub, I can clearly see the model available:
Goal and problem
I tried making the call in a few ways. When trying to interact with the proxy via the litellm SDK I fail getting a method not allowed error
response = litellm.embedding(
model="gemini/gemini-embedding-001", # The user-facing name from your config.yaml
input=["Hello world"],
api_base='https://MY_PROXY',
api_key='XXXXX',
task_type="RETRIEVAL_DOCUMENT"
)
Request to litellm:
litellm.embedding(model='gemini/gemini-embedding-001', input=['Hello world'], api_base='https://MY_PROXY', api_key='MY_KEY', task_type='RETRIEVAL_DOCUMENT')
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
12:17:02 - LiteLLM:DEBUG: utils.py:381 -
12:17:02 - LiteLLM:DEBUG: utils.py:381 - Request to litellm:
12:17:02 - LiteLLM:DEBUG: utils.py:381 - litellm.embedding(model='gemini/gemini-embedding-001', input=['Hello world'], api_base='MY_PROXY', api_key='MY_KEY', task_type='RETRIEVAL_DOCUMENT')
12:17:02 - LiteLLM:DEBUG: utils.py:381 -
12:17:02 - LiteLLM:WARNING: utils.py:557 - `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:510 - self.optional_params: {}
12:17:02 - LiteLLM:DEBUG: utils.py:381 - SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:510 - self.optional_params: {'task_type': 'RETRIEVAL_DOCUMENT'}
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:1049 -
POST Request Sent from LiteLLM:
curl -X POST \
https://MYPROXY/models/gemini-embedding-001:batchEmbedContents \
-H 'Content-Type: application/json; charset=utf-8' \
-d '{'requests': [{'model': 'models/gemini-embedding-001', 'content': {'parts': [{'text': 'Hello world'}]}, 'task_type': 'RETRIEVAL_DOCUMENT'}]}'
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:1122 - RAW RESPONSE:
Client error '405 Method Not Allowed' for url 'https://MYPROXY/models/gemini-embedding-001:batchEmbedContents'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/405
12:17:02 - LiteLLM:DEBUG: exception_mapping_utils.py:2358 - Logging Details: logger_fn - None | callable(logger_fn) - False
12:17:02 - LiteLLM:DEBUG: litellm_logging.py:2694 - Logging Details LiteLLM-Failure Call: []
Traceback (most recent call last):
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/main.py", line 4577, in embedding
response = google_batch_embeddings.batch_embeddings( # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py", line 117, in batch_embeddings
response = sync_handler.post(
^^^^^^^^^^^^^^^^^^
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 950, in post
raise e
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 932, in post
response.raise_for_status()
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '405 Method Not Allowed' for url 'https://MYPROXY/models/gemini-embedding-001:batchEmbedContents'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/405
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1382, in wrapper
raise e
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/utils.py", line 1251, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/main.py", line 5101, in embedding
raise exception_type(
^^^^^^^^^^^^^^^
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2329, in exception_type
raise e
File "/Users/elior.cohen/Dev/musashi/repo-rag/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2298, in exception_type
raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: GeminiException - {"detail":"Method Not Allowed"}
When I tried interacting with direct request, it worked but task type does not work
response = requests.post(
"https://MYPROXY/embeddings",
headers={"Authorization": "Bearer MY_KEY"},
json={
"model": "gemini/gemini-embedding-001",
"input": ["I am looking for the most fitting model for my test"],
"dimensions": 768,
"task_type": "RETRIEVAL_QUERY"
}
)
This works successfully, but the task_type is ignored. How do I verify it is ignored? Because when interacting directly with Gemini's endpoint (not through the proxy) I get different embeddings for task_type=RETRIEVAL_QUERY and task_type=RETRIEVAL_DOCUMENT and through the proxy not.
Summary
My problem and what I can't figure out is how to use Gemini's embeddings endpoint through the proxy, while utilizing the task_type parameter.
A secondary problem and less important to me but still a bug, is that the litellm SDK cannot make this call
Relevant log output
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
v1.79.0
Twitter / LinkedIn details
No response