Support OCR of image files in File Parser block (#4609)

2026-02-02 21:37:12 -08:00
parent b64c7d4032
commit fcbe7fe84f
7 changed files with 60 additions and 10 deletions
--- a/skyvern/forge/prompts/skyvern/extract-text-from-image.j2
+++ b/skyvern/forge/prompts/skyvern/extract-text-from-image.j2
@@ -0,0 +1,19 @@
+Extract all visible text from this image.
+
+MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments, no unnecessary quotes.
+
+Reply in JSON format with the following keys:
+{
+    "extracted_text": str // All text extracted from the image
+}
+
+TEXT EXTRACTION GUIDELINES:
+- Preserve reading order (top to bottom, left to right)
+- For tables: format as rows separated by newlines, columns separated by " | "
+- For multi-column layouts: extract each column separately, separated by blank lines
+- For forms: format as "Label: Value" on each line
+- Preserve line breaks where they appear meaningful (paragraphs, list items)
+- Include all visible text: headers, body text, labels, captions, watermarks
+- For handwritten text: do your best to transcribe, use [illegible] for unclear parts
+
+If no text is visible in the image, return an empty string for extracted_text.