Cleanups

2026-06-04 21:03:53 +08:00 · 2026-05-14 16:26:39 -04:00 · 2026-05-14 16:26:39 -04:00 · 9158557600
commit 9158557600
parent 8e1d94ff7e
3 changed files with 10 additions and 24 deletions
--- a/README.md
+++ b/README.md
@ -46,13 +46,9 @@ Commercial self-hosting of the model weights requires a license — see [Commerc
 |:----------------------------------------------------------------:|:-----------------------------------------------------------------------:|
 |  <img src="static/images/excerpt.png" width="280"/>  |  <img src="static/images/excerpt_text.png" width="280"/> |
-|                               Layout                               |                               Reading Order                                |
+|                               Layout                               |                       Table Recognition                       |
-|:------------------------------------------------------------------:|:--------------------------------------------------------------------------:|
+|:------------------------------------------------------------------:|:-------------------------------------------------------------:|
-| <img src="static/images/excerpt_layout.png" width="280"/> | <img src="static/images/excerpt_reading.png" width="280"/> |
+| <img src="static/images/excerpt_layout.png" width="280"/> | <img src="static/images/scanned_tablerec.png" width="280"/> |
 |                       Table Recognition                       |                       Math / Equations                       |
 |:-------------------------------------------------------------:|:------------------------------------------------------------:|
 | <img src="static/images/scanned_tablerec.png" width="280"/> | <img src="static/images/latex_ocr.png" width="280"/> |
 Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who has universal vision.
@ -76,8 +72,6 @@ The Surya code is licensed under Apache 2.0. The model weights use a modified AI
 # Installation
 You'll need python 3.10+ and PyTorch. You may need to install the CPU version of torch first if you're not using a Mac or a GPU machine.  See [here](https://pytorch.org/get-started/locally/) for more details.
 Install with:
 ```shell
@ -377,7 +371,7 @@ standard quality benchmark for document parsers.
 \* **LightOnOCR 2-1B** uses a different benchmark methodology than the other entries (see their [release notes](https://huggingface.co/lightonai/LightOnOCR-2-1B)); the score is included for context but is not directly comparable.
-Comparison scores from the [olmOCR-bench dataset card](https://huggingface.co/datasets/allenai/olmOCR-bench). Surya OCR 2 is reported as 0.69B params — the on-disk safetensors duplicates the tied embedding + lm_head, so HuggingFace shows ~0.75B; the underlying parameter count is 0.69B.
+Comparison scores from the [olmOCR-bench dataset card](https://huggingface.co/datasets/allenai/olmOCR-bench).
 Surya 2, per-source pass rate on the `default` preset (8,413 tests total):
@ -430,7 +424,7 @@ RecognitionPredictor defaults to that mode.
 # Training
 Layout, OCR, and table recognition all share a single vision-language model
-(Qwen3.5-style architecture, ~770M params). It's trained on diverse document
+(Qwen3.5-style architecture, ~690M params). It's trained on diverse document
 images to emit either a layout JSON or a full-page HTML output, depending on
 prompt. Text-line detection is a separate small torch model — a modified
 EfficientViT segformer trained from scratch on document line annotations.
@ -442,7 +436,7 @@ training stack, reach us at hi@datalab.to.
 This work would not have been possible without amazing open source AI work:
- [Qwen3-VL](https://huggingface.co/Qwen) from Alibaba (architecture basis for the Surya 2 VLM)
+- [Qwen3-VL](https://huggingface.co/Qwen) from Alibaba
 - [vllm](https://github.com/vllm-project/vllm) and [llama.cpp](https://github.com/ggerganov/llama.cpp) for inference
 - [Segformer](https://arxiv.org/pdf/2105.15203.pdf) from NVIDIA
 - [EfficientViT](https://github.com/mit-han-lab/efficientvit) from MIT
@ -461,6 +455,6 @@ If you use surya (or the associated models) in your work or research, please con
  author       = {Vikas Paruchuri and Datalab Team},
  title        = {Surya: A lightweight document OCR and analysis toolkit},
  year         = {2025},
-  howpublished = {\url{https://github.com/VikParuchuri/surya}},
+  howpublished = {\url{https://github.com/datalab-to/surya}},
  note         = {GitHub repository},
 }
--- a/surya/inference/backends/vllm.py
+++ b/surya/inference/backends/vllm.py
@ -31,6 +31,9 @@ BASELINE_MAX_BATCHED_TOKENS = 8192
 BASELINE_MAX_NUM_SEQS = 32
 GPU_VRAM_GB = {
    "b300": 270,
    "b200": 180,
    "h200": 141,
    "h100": 80,
    "a100-80": 80,
    "a100": 40,
--- a/tests/test_detection.py
+++ b/tests/test_detection.py
@ -6,14 +6,3 @@ def test_detection(detection_predictor, test_image):
    bboxes = detection_results[0].bboxes
    assert len(bboxes) == 4
 def test_detection_chunking(detection_predictor, test_image_tall):
    detection_results = detection_predictor([test_image_tall])
    assert len(detection_results) == 1
    assert detection_results[0].image_bbox == [0, 0, 4096, 4096]
    bboxes = detection_results[0].bboxes
    assert len(bboxes) >= 3 # Sometimes merges into 3
    assert abs(4000 - bboxes[1].polygon[0][0]) < 50