uform-gen2-qwen-500m Beta
Image-to-Text • unumUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
Usage
Workers - TypeScript
Parameters
Input
-
0
stringBinary string representing the image contents.
-
1
object-
temperature
numberControls the randomness of the output; higher values produce more random results.
-
prompt
stringThe input text prompt for the model to generate a response.
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
image
-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
max_tokens
integer default 512The maximum number of tokens to generate in the response.
-
Output
-
description
string
API Schemas
The following schemas are based on JSON Schema