Image To Text

Type: xai.vision.ImageToText

Namespace: xai.vision

Description

Analyze images and generate text using xAI’s Grok vision models. xai, grok, vision, image, ocr, analysis, multimodal

Uses Grok's multimodal models to understand and describe images via the
OpenAI-compatible chat completions endpoint. Can perform OCR, image
analysis, and answer questions about images. Requires an xAI API key.

Use cases:
- Describe image contents
- Answer questions about images
- Extract text from images (OCR)
- Analyze charts and diagrams

Properties

Property	Type	Description	Default
image	`image`	The image to analyze	`{"type":"image","uri":"","asset_id":null,"data"...`
prompt	`str`	The prompt/question about the image	`Describe this image in detail.`
model	`str`	The Grok vision model to use (e.g. grok-2-vision-1212, grok-4).	`grok-2-vision-1212`
temperature	`float`	Sampling temperature for response generation	`0.3`
max_tokens	`int`	Maximum number of tokens to generate	`1024`

Outputs

Output	Type	Description
output	`str`

Browse other nodes in the xai.vision namespace.

Edit this page on GitHub

Image To Text

Description

Properties

Outputs

Related Nodes