Add streamlit app

2026-06-04 21:03:53 +08:00 · 2024-02-09 12:44:01 -08:00 · 2024-02-09 12:44:01 -08:00 · 272619af3e
commit 272619af3e
parent 9d3e9063e0
39 changed files with 561 additions and 88 deletions
--- a/.gitignore
+++ b/.gitignore
@ -8,6 +8,7 @@ wandb
 notebooks
 results
 data
+slices

 # Byte-compiled / optimized / DLL files
 __pycache__/
--- a/README.md
+++ b/README.md
@ -25,14 +25,16 @@ Surya is named for the [Hindu sun god](https://en.wikipedia.org/wiki/Surya), who

 | Name             |           Text Detection            |                                      OCR |
 |------------------|:-----------------------------------:|-----------------------------------------:|
+| Japanese         | [Image](static/images/japanese.jpg) | [Image](static/images/japanese_text.jpg) |
+| Chinese          | [Image](static/images/chinese.jpg)  |  [Image](static/images/chinese_text.jpg) |
+| Hindi            |  [Image](static/images/hindi.jpg)   |    [Image](static/images/hindi_text.jpg) |
+| Arabic           |  [Image](static/images/arabic.jpg)  |   [Image](static/images/arabic_text.jpg) |
+| Presentation     |   [Image](static/images/pres.png)   |     [Image](static/images/pres_text.jpg) |
+| Scientific Paper |  [Image](static/images/paper.jpg)   |    [Image](static/images/paper_text.jpg) |
+| Scanned Document | [Image](static/images/scanned.png)  |  [Image](static/images/scanned_text.jpg) |
 | New York Times   |   [Image](static/images/nyt.png)    |      [Image](static/images/nyt_text.png) |
-| Japanese         | [Image](static/images/japanese.png) | [Image](static/images/japanese_text.png) |
-| Chinese          | [Image](static/images/chinese.png)  |  [Image](static/images/chinese_text.png) |
-| Hindi            |  [Image](static/images/hindi.png)   |    [Image](static/images/hindi_text.png) |
-| Presentation     |   [Image](static/images/pres.png)   |     [Image](static/images/pres_text.png) |
-| Scientific Paper |  [Image](static/images/paper.png)   |    [Image](static/images/paper_text.png) |
-| Scanned Document | [Image](static/images/scanned.png)  |  [Image](static/images/scanned_text.png) |
-| Scanned Form     |  [Image](static/images/funsd.png)   |                                          |
+| Scanned Form     |  [Image](static/images/funsd.png)   |    [Image](static/images/funsd_text.jpg) |
+| Textbook         | [Image](static/images/textbook.jpg) | [Image](static/images/textbook_text.jpg) |

 # Installation

@ -51,6 +53,15 @@ Model weights will automatically download the first time you run surya.  Note th
 - Inspect the settings in `surya/settings.py`.  You can override any settings with environment variables.
 - Your torch device will be automatically detected, but you can override this.  For example, `TORCH_DEVICE=cuda`. For text detection, the `mps` device has a bug (on the [Apple side](https://github.com/pytorch/pytorch/issues/84936)) that may prevent it from working properly.

+## Interactive App
+
+I've included a streamlit app that lets you interactively try Surya on images or PDF files.  Run it with:
+
+```
+pip install streamlit
+surya_gui
+```
+
 ## OCR (text recognition)

 You can detect text in an image, pdf, or folder of images/pdfs with the following command.  This will write out a json file with the detected text and bboxes, and optionally save images of the reconstructed page.
@ -78,10 +89,7 @@ The `results.json` file will contain these keys for each page of the input docum

 **Performance tips**

-Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU.  Each batch item will use `40MB` of VRAM, so very high batch sizes are possible.  The default is a batch size `256`, which will use about 10GB of VRAM.
-
-Depending on your CPU core count, `RECOGNITION_BATCH_SIZE` might make a difference there too - the default CPU batch size is `32`.
-
+Setting the `RECOGNITION_BATCH_SIZE` env var properly will make a big difference when using a GPU.  Each batch item will use `40MB` of VRAM, so very high batch sizes are possible.  The default is a batch size `256`, which will use about 10GB of VRAM.  Depending on your CPU core count, it may help, too - the default CPU batch size is `32`.

 ### From python

@ -94,20 +102,15 @@ from surya.model.recognition.processor import load_processor as load_rec_process

 image = Image.open(IMAGE_PATH)
 langs = ["en"] # Replace with your languages
-
-det_processor = load_det_processor()
-det_model = load_det_model()
-
-rec_model = load_rec_model()
-rec_processor = load_rec_processor()
+det_processor, det_model = load_det_processor(), load_det_model()
+rec_model, rec_processor = load_rec_model(), load_rec_processor()

 predictions = run_ocr([image], langs, det_model, det_processor, rec_model, rec_processor)
 ```

-
 ## Text line detection

-You can detect text lines in an image, pdf, or folder of images/pdfs with the following command.  This will write out a json file with the detected bboxes, and optionally save images of the pages with the bboxes.
+You can detect text lines in an image, pdf, or folder of images/pdfs with the following command.  This will write out a json file with the detected bboxes.

 ```
 surya_detect DATA_PATH --images
@ -128,12 +131,7 @@ The `results.json` file will contain these keys for each page of the input docum

 **Performance tips**

-Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU.  Each batch item will use `280MB` of VRAM, so very high batch sizes are possible.  The default is a batch size `32`, which will use about 9GB of VRAM.
-
-Depending on your CPU core count, `DETECTOR_BATCH_SIZE` might make a difference there too - the default CPU batch size is `2`.
-
-You can adjust `DETECTOR_NMS_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results.  Try lowering them to detect more text, and vice versa.
-
+Setting the `DETECTOR_BATCH_SIZE` env var properly will make a big difference when using a GPU.  Each batch item will use `280MB` of VRAM, so very high batch sizes are possible.  The default is a batch size `32`, which will use about 9GB of VRAM.  Depending on your CPU core count, it might help, too - the default CPU batch size is `2`.

 ### From python

@ -149,9 +147,20 @@ model, processor = load_model(), load_processor()
 predictions = batch_detection([image], model, processor)
 ```

-## Table and chart detection
+# Limitations
+
+- This is specialized for document OCR.  It will likely not work on photos or other images.
+- It is for printed text, not handwriting (though it may work on some handwriting).
+- The model has trained itself to ignore advertisements.
+- You can find language support for OCR in `surya/languages.py`.  Text detection should work with any language.
+
+## Troubleshooting
+
+If OCR isn't working properly:
+
+- If the lines aren't detected properly, try increasing resolution of the image if the width is below `896px`, and vice versa.  Very high width images don't work well with the detector.
+- You can adjust `DETECTOR_BLANK_THRESHOLD` and `DETECTOR_TEXT_THRESHOLD` if you don't get good results.  `DETECTOR_BLANK_THRESHOLD` controls the space between lines - any prediction below this number will be considered blank space.  `DETECTOR_TEXT_THRESHOLD` controls how text is joined - any number above this is considered text.  `DETECTOR_TEXT_THRESHOLD` should always be higher than `DETECTOR_BLANK_THRESHOLD`, and both should be in the 0-1 range.  Looking at the heatmap from the debug output of the detector can tell you how to adjust these (if you see faint things that look like boxes, lower the thresholds, and if you see bboxes being joined together, raise the thresholds).

-Coming soon.

 # Manual install

@ -162,13 +171,6 @@ If you want to develop surya, you can install it manually:
 - `poetry install` - installs main and dev dependencies
 - `poetry shell` - activates the virtual environment

-# Limitations
-
- This is specialized for document OCR.  It will likely not work on photos or other images.
- It is for printed text, not handwriting (though it may work on some handwriting).
- The model has trained itself to ignore advertisements.
- You can find language support for OCR in `surya/languages.py`.  Text detection should work with any language.
-
 # Benchmarks

 ## OCR
--- a/demo_app.py
+++ b/demo_app.py
@ -1,38 +0,0 @@
-import gradio as gr
-from surya.detection import batch_detection
-from surya.model.detection.segformer import load_model, load_processor
-from surya.postprocessing.heatmap import draw_polys_on_image
-
-model, processor = load_model(), load_processor()
-
-HEADER = """
-# Surya OCR Demo
-
-This demo will let you try surya, a multilingual OCR model.  It supports text detection now, but will support text recognition in the future.
-
-Notes:
- This works best on documents with printed text.
- Set DETECTOR_MODEL_CHECKPOINT=vikp/line_detector_math before running this app if you want better math detection.
-
-Learn more [here](https://github.com/VikParuchuri/surya).
-""".strip()
-
-def text_detection(img):
-    preds = batch_detection([img], model, processor)[0]
-    img = draw_polys_on_image(preds["polygons"], img)
-    return img, preds
-
-
-with gr.Blocks() as app:
-    gr.Markdown(HEADER)
-    with gr.Row():
-        input_image = gr.Image(label="Input Image", type="pil")
-        output_image = gr.Image(label="Output Image", type="pil", interactive=False)
-    text_detection_btn = gr.Button("Run Text Detection")
-
-    json_output = gr.JSON(label="JSON Output")
-    text_detection_btn.click(fn=text_detection, inputs=input_image, outputs=[output_image, json_output], api_name="text_detection")
-
-
-if __name__ == "__main__":
-    app.launch()
--- a/ocr_app.py
+++ b/ocr_app.py
@ -0,0 +1,123 @@
+import io
+
+import pypdfium2
+import streamlit as st
+from surya.detection import batch_detection
+from surya.model.detection.segformer import load_model, load_processor
+from surya.model.recognition.model import load_model as load_rec_model
+from surya.model.recognition.processor import load_processor as load_rec_processor
+from surya.postprocessing.heatmap import draw_polys_on_image
+from surya.ocr import run_ocr
+from surya.postprocessing.text import draw_text_on_image
+from PIL import Image
+from surya.languages import CODE_TO_LANGUAGE
+from surya.input.langs import replace_lang_with_code
+
+
+@st.cache_resource()
+def load_det_cached():
+    return load_model(), load_processor()
+
+
+@st.cache_resource()
+def load_rec_cached():
+    return load_rec_model(), load_rec_processor()
+
+
+def text_detection(img):
+    preds = batch_detection([img], det_model, det_processor)[0]
+    det_img = draw_polys_on_image(preds["polygons"], img.copy())
+    return det_img, preds
+
+
+# Function for OCR
+def ocr(img, langs):
+    replace_lang_with_code(langs)
+    pred = run_ocr([img], [langs], det_model, det_processor, rec_model, rec_processor)[0]
+    rec_img = draw_text_on_image(pred["bboxes"], pred["text_lines"], img.size)
+    return rec_img, pred
+
+
+def open_pdf(pdf_file):
+    stream = io.BytesIO(pdf_file.getvalue())
+    return pypdfium2.PdfDocument(stream)
+
+
+@st.cache_data()
+def get_page_image(pdf_file, page_num, dpi=96):
+    doc = open_pdf(pdf_file)
+    renderer = doc.render(
+        pypdfium2.PdfBitmap.to_pil,
+        page_indices=[page_num - 1],
+        scale=dpi / 72,
+    )
+    png = list(renderer)[0]
+    png_image = png.convert("RGB")
+    return png_image
+
+
+@st.cache_data()
+def page_count(pdf_file):
+    doc = open_pdf(pdf_file)
+    return len(doc)
+
+
+st.set_page_config(layout="wide")
+col1, col2 = st.columns([.5, .5])
+
+det_model, det_processor = load_det_cached()
+rec_model, rec_processor = load_rec_cached()
+
+
+st.markdown("""
+# Surya OCR Demo
+
+This app will let you try surya, a multilingual OCR model. It supports text detection in any language, and text recognition in 90+ languages.
+
+Notes:
+- This works best on documents with printed text.
+- Try to keep the image width around 1024, especially if you have large text.
+- This supports 90+ languages, see [here](https://github.com/VikParuchuri/surya/tree/master/surya/languages.py) for a full list of codes.
+
+Find the project [here](https://github.com/VikParuchuri/surya).
+""")
+
+in_file = st.sidebar.file_uploader("PDF file or image:", type=["pdf", "png", "jpg", "jpeg", "gif", "webp"])
+languages = st.sidebar.multiselect("Languages", sorted(list(CODE_TO_LANGUAGE.values())), default=["English"], max_selections=4)
+
+if in_file is None:
+    st.stop()
+
+filetype = in_file.type
+whole_image = False
+if "pdf" in filetype:
+    page_count = page_count(in_file)
+    page_number = st.sidebar.number_input(f"Page number out of {page_count}:", min_value=1, value=1, max_value=page_count)
+
+    pil_image = get_page_image(in_file, page_number)
+else:
+    pil_image = Image.open(in_file).convert("RGB")
+
+text_det = st.sidebar.button("Run Text Detection")
+text_rec = st.sidebar.button("Run OCR")
+
+# Run Text Detection
+if text_det and pil_image is not None:
+    det_img, preds = text_detection(pil_image)
+    with col1:
+        st.image(det_img, caption="Detected Text", use_column_width=True)
+        st.json(preds, expanded=True)
+
+# Run OCR
+if text_rec and pil_image is not None:
+    rec_img, pred = ocr(pil_image, languages)
+    with col1:
+        st.image(rec_img, caption="OCR Result", use_column_width=True)
+        json_tab, text_tab = st.tabs(["JSON", "Full Text"])
+        with json_tab:
+            st.json(pred, expanded=True)
+        with text_tab:
+            st.text("\n".join(pred["text_lines"]))
+
+with col2:
+    st.image(pil_image, caption="Uploaded Image", use_column_width=True)
--- a/poetry.lock
+++ b/poetry.lock
@ -110,6 +110,30 @@ files = [
 [package.dependencies]
 frozenlist = ">=1.1.0"

+[[package]]
+name = "altair"
+version = "5.2.0"
+description = "Vega-Altair: A declarative statistical visualization library for Python."
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "altair-5.2.0-py3-none-any.whl", hash = "sha256:8c4888ad11db7c39f3f17aa7f4ea985775da389d79ac30a6c22856ab238df399"},
+    {file = "altair-5.2.0.tar.gz", hash = "sha256:2ad7f0c8010ebbc46319cc30febfb8e59ccf84969a201541c207bc3a4fa6cf81"},
+]
+
+[package.dependencies]
+jinja2 = "*"
+jsonschema = ">=3.0"
+numpy = "*"
+packaging = "*"
+pandas = ">=0.25"
+toolz = "*"
+typing-extensions = {version = ">=4.0.1", markers = "python_version < \"3.11\""}
+
+[package.extras]
+dev = ["anywidget", "geopandas", "hatch", "ipython", "m2r", "mypy", "pandas-stubs", "pyarrow (>=11)", "pytest", "pytest-cov", "ruff (>=0.1.3)", "types-jsonschema", "types-setuptools", "vega-datasets", "vegafusion[embed] (>=1.4.0)", "vl-convert-python (>=1.1.0)"]
+doc = ["docutils", "jinja2", "myst-parser", "numpydoc", "pillow (>=9,<10)", "pydata-sphinx-theme (>=0.14.1)", "scipy", "sphinx", "sphinx-copybutton", "sphinx-design", "sphinxext-altair"]
+
 [[package]]
 name = "annotated-types"
 version = "0.6.0"
@ -359,6 +383,28 @@ webencodings = "*"
 [package.extras]
 css = ["tinycss2 (>=1.1.0,<1.3)"]

+[[package]]
+name = "blinker"
+version = "1.7.0"
+description = "Fast, simple object-to-object and broadcast signaling"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "blinker-1.7.0-py3-none-any.whl", hash = "sha256:c3f865d4d54db7abc53758a01601cf343fe55b84c1de4e3fa910e420b438d5b9"},
+    {file = "blinker-1.7.0.tar.gz", hash = "sha256:e6820ff6fa4e4d1d8e2747c2283749c3f547e4fee112b98555cdcdae32996182"},
+]
+
+[[package]]
+name = "cachetools"
+version = "5.3.2"
+description = "Extensible memoizing collections and decorators"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "cachetools-5.3.2-py3-none-any.whl", hash = "sha256:861f35a13a451f94e301ce2bec7cac63e881232ccce7ed67fab9b5df4d3beaa1"},
+    {file = "cachetools-5.3.2.tar.gz", hash = "sha256:086ee420196f7b2ab9ca2db2520aca326318b68fe5ba8bc4d49cca91add450f2"},
+]
+
 [[package]]
 name = "certifi"
 version = "2024.2.2"
@ -533,6 +579,20 @@ files = [
    {file = "charset_normalizer-3.3.2-py3-none-any.whl", hash = "sha256:3e4d1f6587322d2788836a99c69062fbb091331ec940e02d12d179c1d53e25fc"},
 ]

+[[package]]
+name = "click"
+version = "8.1.7"
+description = "Composable command line interface toolkit"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "click-8.1.7-py3-none-any.whl", hash = "sha256:ae74fb96c20a0277a1d615f1e4d73c8414f5a98db8b799a7931d1582f3390c28"},
+    {file = "click-8.1.7.tar.gz", hash = "sha256:ca9853ad459e787e2192211578cc907e7594e294c7ccc834310722b41b9ca6de"},
+]
+
+[package.dependencies]
+colorama = {version = "*", markers = "platform_system == \"Windows\""}
+
 [[package]]
 name = "colorama"
 version = "0.4.6"
@ -873,6 +933,37 @@ smb = ["smbprotocol"]
 ssh = ["paramiko"]
 tqdm = ["tqdm"]

+[[package]]
+name = "gitdb"
+version = "4.0.11"
+description = "Git Object Database"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "gitdb-4.0.11-py3-none-any.whl", hash = "sha256:81a3407ddd2ee8df444cbacea00e2d038e40150acfa3001696fe0dcf1d3adfa4"},
+    {file = "gitdb-4.0.11.tar.gz", hash = "sha256:bf5421126136d6d0af55bc1e7c1af1c397a34f5b7bd79e776cd3e89785c2b04b"},
+]
+
+[package.dependencies]
+smmap = ">=3.0.1,<6"
+
+[[package]]
+name = "gitpython"
+version = "3.1.41"
+description = "GitPython is a Python library used to interact with Git repositories"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "GitPython-3.1.41-py3-none-any.whl", hash = "sha256:c36b6634d069b3f719610175020a9aed919421c87552185b085e04fbbdb10b7c"},
+    {file = "GitPython-3.1.41.tar.gz", hash = "sha256:ed66e624884f76df22c8e16066d567aaa5a37d5b5fa19db2c6df6f7156db9048"},
+]
+
+[package.dependencies]
+gitdb = ">=4.0.1,<5"
+
+[package.extras]
+test = ["black", "coverage[toml]", "ddt (>=1.1.1,!=1.4.3)", "mock", "mypy", "pre-commit", "pytest (>=7.3.1)", "pytest-cov", "pytest-instafail", "pytest-mock", "pytest-sugar", "sumtypes"]
+
 [[package]]
 name = "huggingface-hub"
 version = "0.20.3"
@ -1406,6 +1497,30 @@ files = [
    {file = "jupyterlab_widgets-3.0.9.tar.gz", hash = "sha256:6005a4e974c7beee84060fdfba341a3218495046de8ae3ec64888e5fe19fdb4c"},
 ]

+[[package]]
+name = "markdown-it-py"
+version = "3.0.0"
+description = "Python port of markdown-it. Markdown parsing, done right!"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "markdown-it-py-3.0.0.tar.gz", hash = "sha256:e3f60a94fa066dc52ec76661e37c851cb232d92f9886b15cb560aaada2df8feb"},
+    {file = "markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1"},
+]
+
+[package.dependencies]
+mdurl = ">=0.1,<1.0"
+
+[package.extras]
+benchmarking = ["psutil", "pytest", "pytest-benchmark"]
+code-style = ["pre-commit (>=3.0,<4.0)"]
+compare = ["commonmark (>=0.9,<1.0)", "markdown (>=3.4,<4.0)", "mistletoe (>=1.0,<2.0)", "mistune (>=2.0,<3.0)", "panflute (>=2.3,<3.0)"]
+linkify = ["linkify-it-py (>=1,<3)"]
+plugins = ["mdit-py-plugins"]
+profiling = ["gprof2dot"]
+rtd = ["jupyter_sphinx", "mdit-py-plugins", "myst-parser", "pyyaml", "sphinx", "sphinx-copybutton", "sphinx-design", "sphinx_book_theme"]
+testing = ["coverage", "pytest", "pytest-cov", "pytest-regressions"]
+
 [[package]]
 name = "markupsafe"
 version = "2.1.5"
@ -1489,6 +1604,17 @@ files = [
 [package.dependencies]
 traitlets = "*"

+[[package]]
+name = "mdurl"
+version = "0.1.2"
+description = "Markdown URL utilities"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8"},
+    {file = "mdurl-0.1.2.tar.gz", hash = "sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba"},
+]
+
 [[package]]
 name = "mistune"
 version = "3.0.2"
@ -2270,6 +2396,26 @@ files = [
 [package.dependencies]
 wcwidth = "*"

+[[package]]
+name = "protobuf"
+version = "4.25.2"
+description = ""
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "protobuf-4.25.2-cp310-abi3-win32.whl", hash = "sha256:b50c949608682b12efb0b2717f53256f03636af5f60ac0c1d900df6213910fd6"},
+    {file = "protobuf-4.25.2-cp310-abi3-win_amd64.whl", hash = "sha256:8f62574857ee1de9f770baf04dde4165e30b15ad97ba03ceac65f760ff018ac9"},
+    {file = "protobuf-4.25.2-cp37-abi3-macosx_10_9_universal2.whl", hash = "sha256:2db9f8fa64fbdcdc93767d3cf81e0f2aef176284071507e3ede160811502fd3d"},
+    {file = "protobuf-4.25.2-cp37-abi3-manylinux2014_aarch64.whl", hash = "sha256:10894a2885b7175d3984f2be8d9850712c57d5e7587a2410720af8be56cdaf62"},
+    {file = "protobuf-4.25.2-cp37-abi3-manylinux2014_x86_64.whl", hash = "sha256:fc381d1dd0516343f1440019cedf08a7405f791cd49eef4ae1ea06520bc1c020"},
+    {file = "protobuf-4.25.2-cp38-cp38-win32.whl", hash = "sha256:33a1aeef4b1927431d1be780e87b641e322b88d654203a9e9d93f218ee359e61"},
+    {file = "protobuf-4.25.2-cp38-cp38-win_amd64.whl", hash = "sha256:47f3de503fe7c1245f6f03bea7e8d3ec11c6c4a2ea9ef910e3221c8a15516d62"},
+    {file = "protobuf-4.25.2-cp39-cp39-win32.whl", hash = "sha256:5e5c933b4c30a988b52e0b7c02641760a5ba046edc5e43d3b94a74c9fc57c1b3"},
+    {file = "protobuf-4.25.2-cp39-cp39-win_amd64.whl", hash = "sha256:d66a769b8d687df9024f2985d5137a337f957a0916cf5464d1513eee96a63ff0"},
+    {file = "protobuf-4.25.2-py3-none-any.whl", hash = "sha256:a8b7a98d4ce823303145bf3c1a8bdb0f2f4642a414b196f04ad9853ed0c8f830"},
+    {file = "protobuf-4.25.2.tar.gz", hash = "sha256:fe599e175cb347efc8ee524bcd4b902d11f7262c0e569ececcb89995c15f0a5e"},
+]
+
 [[package]]
 name = "psutil"
 version = "5.9.8"
@ -2518,6 +2664,25 @@ files = [
 pydantic = ">=2.3.0"
 python-dotenv = ">=0.21.0"

+[[package]]
+name = "pydeck"
+version = "0.8.0"
+description = "Widget for deck.gl maps"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "pydeck-0.8.0-py2.py3-none-any.whl", hash = "sha256:a8fa7757c6f24bba033af39db3147cb020eef44012ba7e60d954de187f9ed4d5"},
+    {file = "pydeck-0.8.0.tar.gz", hash = "sha256:07edde833f7cfcef6749124351195aa7dcd24663d4909fd7898dbd0b6fbc01ec"},
+]
+
+[package.dependencies]
+jinja2 = ">=2.10.1"
+numpy = ">=1.16.4"
+
+[package.extras]
+carto = ["pydeck-carto"]
+jupyter = ["ipykernel (>=5.1.2)", "ipython (>=5.8.0)", "ipywidgets (>=7,<8)", "traitlets (>=4.3.2)"]
+
 [[package]]
 name = "pygments"
 version = "2.17.2"
@ -3177,6 +3342,24 @@ files = [
    {file = "rfc3986_validator-0.1.1.tar.gz", hash = "sha256:3d44bde7921b3b9ec3ae4e3adca370438eccebc676456449b145d533b240d055"},
 ]

+[[package]]
+name = "rich"
+version = "13.7.0"
+description = "Render rich text, tables, progress bars, syntax highlighting, markdown and more to the terminal"
+optional = false
+python-versions = ">=3.7.0"
+files = [
+    {file = "rich-13.7.0-py3-none-any.whl", hash = "sha256:6da14c108c4866ee9520bbffa71f6fe3962e193b7da68720583850cd4548e235"},
+    {file = "rich-13.7.0.tar.gz", hash = "sha256:5cb5123b5cf9ee70584244246816e9114227e0b98ad9176eede6ad54bf5403fa"},
+]
+
+[package.dependencies]
+markdown-it-py = ">=2.2.0"
+pygments = ">=2.13.0,<3.0.0"
+
+[package.extras]
+jupyter = ["ipywidgets (>=7.5.1,<9)"]
+
 [[package]]
 name = "rpds-py"
 version = "0.17.1"
@ -3444,6 +3627,17 @@ files = [
    {file = "six-1.16.0.tar.gz", hash = "sha256:1e61c37477a1626458e36f7b1d82aa5c9b094fa4802892072e49de9c60c4c926"},
 ]

+[[package]]
+name = "smmap"
+version = "5.0.1"
+description = "A pure Python implementation of a sliding window memory map manager"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "smmap-5.0.1-py3-none-any.whl", hash = "sha256:e6d8668fa5f93e706934a62d7b4db19c8d9eb8cf2adbb75ef1b675aa332b69da"},
+    {file = "smmap-5.0.1.tar.gz", hash = "sha256:dceeb6c0028fdb6734471eb07c0cd2aae706ccaecab45965ee83f11c8d3b1f62"},
+]
+
 [[package]]
 name = "snakeviz"
 version = "2.2.0"
@ -3499,6 +3693,45 @@ pure-eval = "*"
 [package.extras]
 tests = ["cython", "littleutils", "pygments", "pytest", "typeguard"]

+[[package]]
+name = "streamlit"
+version = "1.31.0"
+description = "A faster way to build and share data apps"
+optional = false
+python-versions = ">=3.8, !=3.9.7"
+files = [
+    {file = "streamlit-1.31.0-py2.py3-none-any.whl", hash = "sha256:4d95c4f5d6881f7adebaec14997fa7024bb38853412d1bba9588074d585563f9"},
+    {file = "streamlit-1.31.0.tar.gz", hash = "sha256:40d71944e30394612481f80a8bc09e7de40d33b7a472989807467a5299e342ca"},
+]
+
+[package.dependencies]
+altair = ">=4.0,<6"
+blinker = ">=1.0.0,<2"
+cachetools = ">=4.0,<6"
+click = ">=7.0,<9"
+gitpython = ">=3.0.7,<3.1.19 || >3.1.19,<4"
+importlib-metadata = ">=1.4,<8"
+numpy = ">=1.19.3,<2"
+packaging = ">=16.8,<24"
+pandas = ">=1.3.0,<3"
+pillow = ">=7.1.0,<11"
+protobuf = ">=3.20,<5"
+pyarrow = ">=7.0"
+pydeck = ">=0.8.0b4,<1"
+python-dateutil = ">=2.7.3,<3"
+requests = ">=2.27,<3"
+rich = ">=10.14.0,<14"
+tenacity = ">=8.1.0,<9"
+toml = ">=0.10.1,<2"
+tornado = ">=6.0.3,<7"
+typing-extensions = ">=4.3.0,<5"
+tzlocal = ">=1.1,<6"
+validators = ">=0.2,<1"
+watchdog = {version = ">=2.1.5", markers = "platform_system != \"Darwin\""}
+
+[package.extras]
+snowflake = ["snowflake-connector-python (>=2.8.0)", "snowflake-snowpark-python (>=0.9.0)"]
+
 [[package]]
 name = "sympy"
 version = "1.12"
@ -3527,6 +3760,20 @@ files = [
 [package.extras]
 widechars = ["wcwidth"]

+[[package]]
+name = "tenacity"
+version = "8.2.3"
+description = "Retry code until it succeeds"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "tenacity-8.2.3-py3-none-any.whl", hash = "sha256:ce510e327a630c9e1beaf17d42e6ffacc88185044ad85cf74c0a8887c6a0f88c"},
+    {file = "tenacity-8.2.3.tar.gz", hash = "sha256:5398ef0d78e63f40007c1fb4c0bff96e1911394d2fa8d194f77619c05ff6cc8a"},
+]
+
+[package.extras]
+doc = ["reno", "sphinx", "tornado (>=4.5)"]
+
 [[package]]
 name = "terminado"
 version = "0.18.0"
@ -3693,6 +3940,17 @@ dev = ["tokenizers[testing]"]
 docs = ["setuptools_rust", "sphinx", "sphinx_rtd_theme"]
 testing = ["black (==22.3)", "datasets", "numpy", "pytest", "requests"]

+[[package]]
+name = "toml"
+version = "0.10.2"
+description = "Python Library for Tom's Obvious, Minimal Language"
+optional = false
+python-versions = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"
+files = [
+    {file = "toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b"},
+    {file = "toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f"},
+]
+
 [[package]]
 name = "tomli"
 version = "2.0.1"
@ -3704,6 +3962,17 @@ files = [
    {file = "tomli-2.0.1.tar.gz", hash = "sha256:de526c12914f0c550d15924c62d72abc48d6fe7364aa87328337a31007fe8a4f"},
 ]

+[[package]]
+name = "toolz"
+version = "0.12.1"
+description = "List processing tools and functional utilities"
+optional = false
+python-versions = ">=3.7"
+files = [
+    {file = "toolz-0.12.1-py3-none-any.whl", hash = "sha256:d22731364c07d72eea0a0ad45bafb2c2937ab6fd38a3507bf55eae8744aa7d85"},
+    {file = "toolz-0.12.1.tar.gz", hash = "sha256:ecca342664893f177a13dac0e6b41cbd8ac25a358e5f215316d43e2100224f4d"},
+]
+
 [[package]]
 name = "torch"
 version = "2.2.0"
@ -3941,6 +4210,23 @@ files = [
    {file = "tzdata-2023.4.tar.gz", hash = "sha256:dd54c94f294765522c77399649b4fefd95522479a664a0cec87f41bebc6148c9"},
 ]

+[[package]]
+name = "tzlocal"
+version = "5.2"
+description = "tzinfo object for the local timezone"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "tzlocal-5.2-py3-none-any.whl", hash = "sha256:49816ef2fe65ea8ac19d19aa7a1ae0551c834303d5014c6d5a62e4cbda8047b8"},
+    {file = "tzlocal-5.2.tar.gz", hash = "sha256:8d399205578f1a9342816409cc1e46a93ebd5755e39ea2d85334bea911bf0e6e"},
+]
+
+[package.dependencies]
+tzdata = {version = "*", markers = "platform_system == \"Windows\""}
+
+[package.extras]
+devenv = ["check-manifest", "pytest (>=4.3)", "pytest-cov", "pytest-mock (>=3.3)", "zest.releaser"]
+
 [[package]]
 name = "uri-template"
 version = "1.3.0"
@ -3972,6 +4258,69 @@ h2 = ["h2 (>=4,<5)"]
 socks = ["pysocks (>=1.5.6,!=1.5.7,<2.0)"]
 zstd = ["zstandard (>=0.18.0)"]

+[[package]]
+name = "validators"
+version = "0.22.0"
+description = "Python Data Validation for Humans™"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "validators-0.22.0-py3-none-any.whl", hash = "sha256:61cf7d4a62bbae559f2e54aed3b000cea9ff3e2fdbe463f51179b92c58c9585a"},
+    {file = "validators-0.22.0.tar.gz", hash = "sha256:77b2689b172eeeb600d9605ab86194641670cdb73b60afd577142a9397873370"},
+]
+
+[package.extras]
+docs-offline = ["myst-parser (>=2.0.0)", "pypandoc-binary (>=1.11)", "sphinx (>=7.1.1)"]
+docs-online = ["mkdocs (>=1.5.2)", "mkdocs-git-revision-date-localized-plugin (>=1.2.0)", "mkdocs-material (>=9.2.6)", "mkdocstrings[python] (>=0.22.0)", "pyaml (>=23.7.0)"]
+hooks = ["pre-commit (>=3.3.3)"]
+package = ["build (>=1.0.0)", "twine (>=4.0.2)"]
+runner = ["tox (>=4.11.1)"]
+sast = ["bandit[toml] (>=1.7.5)"]
+testing = ["pytest (>=7.4.0)"]
+tooling = ["black (>=23.7.0)", "pyright (>=1.1.325)", "ruff (>=0.0.287)"]
+tooling-extras = ["pyaml (>=23.7.0)", "pypandoc-binary (>=1.11)", "pytest (>=7.4.0)"]
+
+[[package]]
+name = "watchdog"
+version = "4.0.0"
+description = "Filesystem events monitoring"
+optional = false
+python-versions = ">=3.8"
+files = [
+    {file = "watchdog-4.0.0-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:39cb34b1f1afbf23e9562501673e7146777efe95da24fab5707b88f7fb11649b"},
+    {file = "watchdog-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl", hash = "sha256:c522392acc5e962bcac3b22b9592493ffd06d1fc5d755954e6be9f4990de932b"},
+    {file = "watchdog-4.0.0-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:6c47bdd680009b11c9ac382163e05ca43baf4127954c5f6d0250e7d772d2b80c"},
+    {file = "watchdog-4.0.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:8350d4055505412a426b6ad8c521bc7d367d1637a762c70fdd93a3a0d595990b"},
+    {file = "watchdog-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:c17d98799f32e3f55f181f19dd2021d762eb38fdd381b4a748b9f5a36738e935"},
+    {file = "watchdog-4.0.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:4986db5e8880b0e6b7cd52ba36255d4793bf5cdc95bd6264806c233173b1ec0b"},
+    {file = "watchdog-4.0.0-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:11e12fafb13372e18ca1bbf12d50f593e7280646687463dd47730fd4f4d5d257"},
+    {file = "watchdog-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl", hash = "sha256:5369136a6474678e02426bd984466343924d1df8e2fd94a9b443cb7e3aa20d19"},
+    {file = "watchdog-4.0.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:76ad8484379695f3fe46228962017a7e1337e9acadafed67eb20aabb175df98b"},
+    {file = "watchdog-4.0.0-cp38-cp38-macosx_10_9_universal2.whl", hash = "sha256:45cc09cc4c3b43fb10b59ef4d07318d9a3ecdbff03abd2e36e77b6dd9f9a5c85"},
+    {file = "watchdog-4.0.0-cp38-cp38-macosx_10_9_x86_64.whl", hash = "sha256:eed82cdf79cd7f0232e2fdc1ad05b06a5e102a43e331f7d041e5f0e0a34a51c4"},
+    {file = "watchdog-4.0.0-cp38-cp38-macosx_11_0_arm64.whl", hash = "sha256:ba30a896166f0fee83183cec913298151b73164160d965af2e93a20bbd2ab605"},
+    {file = "watchdog-4.0.0-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:d18d7f18a47de6863cd480734613502904611730f8def45fc52a5d97503e5101"},
+    {file = "watchdog-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl", hash = "sha256:2895bf0518361a9728773083908801a376743bcc37dfa252b801af8fd281b1ca"},
+    {file = "watchdog-4.0.0-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:87e9df830022488e235dd601478c15ad73a0389628588ba0b028cb74eb72fed8"},
+    {file = "watchdog-4.0.0-pp310-pypy310_pp73-macosx_10_9_x86_64.whl", hash = "sha256:6e949a8a94186bced05b6508faa61b7adacc911115664ccb1923b9ad1f1ccf7b"},
+    {file = "watchdog-4.0.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl", hash = "sha256:6a4db54edea37d1058b08947c789a2354ee02972ed5d1e0dca9b0b820f4c7f92"},
+    {file = "watchdog-4.0.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl", hash = "sha256:d31481ccf4694a8416b681544c23bd271f5a123162ab603c7d7d2dd7dd901a07"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_aarch64.whl", hash = "sha256:8fec441f5adcf81dd240a5fe78e3d83767999771630b5ddfc5867827a34fa3d3"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_armv7l.whl", hash = "sha256:6a9c71a0b02985b4b0b6d14b875a6c86ddea2fdbebd0c9a720a806a8bbffc69f"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_i686.whl", hash = "sha256:557ba04c816d23ce98a06e70af6abaa0485f6d94994ec78a42b05d1c03dcbd50"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_ppc64.whl", hash = "sha256:d0f9bd1fd919134d459d8abf954f63886745f4660ef66480b9d753a7c9d40927"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_ppc64le.whl", hash = "sha256:f9b2fdca47dc855516b2d66eef3c39f2672cbf7e7a42e7e67ad2cbfcd6ba107d"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_s390x.whl", hash = "sha256:73c7a935e62033bd5e8f0da33a4dcb763da2361921a69a5a95aaf6c93aa03a87"},
+    {file = "watchdog-4.0.0-py3-none-manylinux2014_x86_64.whl", hash = "sha256:6a80d5cae8c265842c7419c560b9961561556c4361b297b4c431903f8c33b269"},
+    {file = "watchdog-4.0.0-py3-none-win32.whl", hash = "sha256:8f9a542c979df62098ae9c58b19e03ad3df1c9d8c6895d96c0d51da17b243b1c"},
+    {file = "watchdog-4.0.0-py3-none-win_amd64.whl", hash = "sha256:f970663fa4f7e80401a7b0cbeec00fa801bf0287d93d48368fc3e6fa32716245"},
+    {file = "watchdog-4.0.0-py3-none-win_ia64.whl", hash = "sha256:9a03e16e55465177d416699331b0f3564138f1807ecc5f2de9d55d8f188d08c7"},
+    {file = "watchdog-4.0.0.tar.gz", hash = "sha256:e3e7065cbdabe6183ab82199d7a4f6b3ba0a438c5a512a68559846ccb76a78ec"},
+]
+
+[package.extras]
+watchmedo = ["PyYAML (>=3.10)"]
+
 [[package]]
 name = "wcwidth"
 version = "0.2.13"
@ -4273,5 +4622,5 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p

 [metadata]
 lock-version = "2.0"
-python-versions = ">=3.9,<3.13"
-content-hash = "3283801c83fc07a81307276855a4479dd11069bfc9821279cc5cbd42bf7794c6"
+python-versions = ">=3.9,<3.13,!=3.9.7"
+content-hash = "b6abaf81bb850c204b073e638c539f47a0c2bf1cfb46dbce2482265beed73198"
--- a/pyproject.toml
+++ b/pyproject.toml
@ -12,10 +12,13 @@ packages = [
 ]
 include = [
    "detect_text.py",
+    "ocr_text.py",
+    "ocr_app.py",
+    "run_ocr_app.py"
 ]

 [tool.poetry.dependencies]
-python = ">=3.9,<3.13"
+python = ">=3.9,<3.13,!=3.9.7"
 transformers = "4.36.2"
 torch = "^2.1.2"
 pydantic = "^2.5.3"
@ -35,10 +38,12 @@ snakeviz = "^2.2.0"
 datasets = "^2.16.1"
 rapidfuzz = "^3.6.1"
 arabic-reshaper = "^3.0.0"
+streamlit = "^1.31.0"

 [tool.poetry.scripts]
 surya_detect = "detect_text:main"
 surya_ocr = "ocr_text:main"
+surya_gui = "run_ocr_app:run_app"

 [build-system]
 requires = ["poetry-core"]
--- a/run_ocr_app.py
+++ b/run_ocr_app.py
@ -0,0 +1,8 @@
+import subprocess
+import os
+
+
+def run_app():
+    cur_dir = os.path.dirname(os.path.abspath(__file__))
+    ocr_app_path = os.path.join(cur_dir, "ocr_app.py")
+    subprocess.run(["streamlit", "run", ocr_app_path])
--- a/static/images/arabic.jpg
+++ b/static/images/arabic.jpg
--- a/static/images/arabic_text.jpg
+++ b/static/images/arabic_text.jpg
--- a/static/images/chinese.jpg
+++ b/static/images/chinese.jpg
--- a/static/images/chinese.png
+++ b/static/images/chinese.png
--- a/static/images/chinese_text.jpg
+++ b/static/images/chinese_text.jpg
--- a/static/images/chinese_text.png
+++ b/static/images/chinese_text.png
--- a/static/images/funsd_text.jpg
+++ b/static/images/funsd_text.jpg
--- a/static/images/hindi.jpg
+++ b/static/images/hindi.jpg
--- a/static/images/hindi.png
+++ b/static/images/hindi.png
--- a/static/images/hindi_text.jpg
+++ b/static/images/hindi_text.jpg
--- a/static/images/hindi_text.png
+++ b/static/images/hindi_text.png
--- a/static/images/japanese.jpg
+++ b/static/images/japanese.jpg
--- a/static/images/japanese.png
+++ b/static/images/japanese.png
--- a/static/images/japanese_text.jpg
+++ b/static/images/japanese_text.jpg
--- a/static/images/japanese_text.png
+++ b/static/images/japanese_text.png
--- a/static/images/paper.jpg
+++ b/static/images/paper.jpg
--- a/static/images/paper.png
+++ b/static/images/paper.png
--- a/static/images/paper_text.jpg
+++ b/static/images/paper_text.jpg
--- a/static/images/paper_text.png
+++ b/static/images/paper_text.png
--- a/static/images/pres_text.jpg
+++ b/static/images/pres_text.jpg
--- a/static/images/pres_text.png
+++ b/static/images/pres_text.png
--- a/static/images/scanned_text.jpg
+++ b/static/images/scanned_text.jpg
--- a/static/images/scanned_text.png
+++ b/static/images/scanned_text.png
--- a/static/images/textbook.jpg
+++ b/static/images/textbook.jpg
--- a/static/images/textbook_text.jpg
+++ b/static/images/textbook_text.jpg
--- a/surya/input/langs.py
+++ b/surya/input/langs.py
@ -4,8 +4,8 @@ from surya.languages import LANGUAGE_TO_CODE, CODE_TO_LANGUAGE

 def replace_lang_with_code(langs: List[str]):
    for i in range(len(langs)):
-        if langs[i] in LANGUAGE_TO_CODE:
-            langs[i] = LANGUAGE_TO_CODE[langs[i]]
+        if langs[i].title() in LANGUAGE_TO_CODE:
+            langs[i] = LANGUAGE_TO_CODE[langs[i].title()]
        if langs[i] not in CODE_TO_LANGUAGE:
            raise ValueError(f"Language code {langs[i]} not found.")

--- a/surya/input/processing.py
+++ b/surya/input/processing.py
@ -1,4 +1,5 @@
 import os
+import random
 from typing import List

 import numpy as np
--- a/surya/languages.py
+++ b/surya/languages.py
@ -6,7 +6,7 @@ CODE_TO_LANGUAGE = {
    'az': 'Azerbaijani',
    'be': 'Belarusian',
    'bg': 'Bulgarian',
-    'bn': 'Bangla',
+    'bn': 'Bengali',
    'br': 'Breton',
    'bs': 'Bosnian',
    'ca': 'Catalan',
--- a/surya/model/recognition/tokenizer.py
+++ b/surya/model/recognition/tokenizer.py
@ -73,7 +73,7 @@ class Byt5LangTokenizer(ByT5Tokenizer):

        super().__init__()

-    def __call__(self, texts: List[str] | str, langs: List[List[str]] | List[str], pad_token_id: int = 0, **kwargs):
+    def __call__(self, texts: Union[List[str], str], langs: Union[List[List[str]], List[str]], pad_token_id: int = 0, **kwargs):
        tokenized = []
        all_langs = []

--- a/surya/postprocessing/heatmap.py
+++ b/surya/postprocessing/heatmap.py
@ -30,13 +30,35 @@ def clean_contained_boxes(boxes: List[PolygonBox]):
    return new_boxes


+def get_dynamic_thresholds(linemap, text_threshold, low_text, typical_top10_avg=.7):
+    # Find average intensity of top 10% pixels
+    # Do top 10% to account for pdfs that are mostly whitespace, etc.
+    flat_map = linemap.flatten()
+    sorted_map = np.sort(flat_map)[::-1]
+    top_10_count = int(np.ceil(len(flat_map) * 0.1))
+    top_10 = sorted_map[:top_10_count]
+    avg_intensity = np.mean(top_10)
+
+    # Adjust thresholds based on normalized intensityy
+    scaling_factor = min(1, avg_intensity / typical_top10_avg) ** (1 / 2)
+
+    low_text = max(low_text * scaling_factor, 0.1)
+    text_threshold = max(text_threshold * scaling_factor, 0.15)
+
+    low_text = min(low_text, 0.6)
+    text_threshold = min(text_threshold, 0.8)
+    return text_threshold, low_text
+
+
 def detect_boxes(linemap, text_threshold, low_text):
    # From CRAFT - https://github.com/clovaai/CRAFT-pytorch
    # prepare data
    linemap = linemap.copy()
    img_h, img_w = linemap.shape

-    ret, text_score = cv2.threshold(linemap, low_text, 1, 0)
+    text_threshold, low_text = get_dynamic_thresholds(linemap, text_threshold, low_text)
+
+    ret, text_score = cv2.threshold(linemap, low_text, 1, cv2.THRESH_BINARY)

    text_score_comb = np.clip(text_score, 0, 1)
    label_count, labels, stats, centroids = cv2.connectedComponentsWithStats(text_score_comb.astype(np.uint8), connectivity=4)
@ -96,7 +118,7 @@ def detect_boxes(linemap, text_threshold, low_text):
    return det, labels


-def get_detected_boxes(textmap, text_threshold=settings.DETECTOR_TEXT_THRESHOLD,  low_text=settings.DETECTOR_NMS_THRESHOLD):
+def get_detected_boxes(textmap, text_threshold=settings.DETECTOR_TEXT_THRESHOLD,  low_text=settings.DETECTOR_BLANK_THRESHOLD):
    textmap = textmap.copy()
    textmap = textmap.astype(np.float32)
    boxes, labels = detect_boxes(textmap, text_threshold, low_text)
--- a/surya/postprocessing/text.py
+++ b/surya/postprocessing/text.py
@ -12,7 +12,7 @@ def get_text_size(text, font):
    return width, height


-def draw_text_on_image(bboxes, texts, image_size=(1024, 1024), font_path=settings.RECOGNITION_RENDER_FONT, font_size=18, res_upscale=2):
+def draw_text_on_image(bboxes, texts, image_size=(1024, 1024), font_path=settings.RECOGNITION_RENDER_FONT, max_font_size=60, res_upscale=2):
    new_image_size = (image_size[0] * res_upscale, image_size[1] * res_upscale)
    image = Image.new('RGB', new_image_size, color='white')
    draw = ImageDraw.Draw(image)
@ -23,7 +23,7 @@ def draw_text_on_image(bboxes, texts, image_size=(1024, 1024), font_path=setting
        bbox_height = s_bbox[3] - s_bbox[1]

        # Shrink the text to fit in the bbox if needed
-        box_font_size = font_size
+        box_font_size = min(int(.75 * bbox_height), max_font_size)

        # Download font if it doesn't exist
        if not os.path.exists(font_path):
--- a/surya/settings.py
+++ b/surya/settings.py
@ -44,14 +44,14 @@ class Settings(BaseSettings):

    # Text detection
    DETECTOR_BATCH_SIZE: Optional[int] = None # Defaults to 2 for CPU, 32 otherwise
-    DETECTOR_MODEL_CHECKPOINT: str = "vikp/line_detector"
+    DETECTOR_MODEL_CHECKPOINT: str = "vikp/surya_det"
    DETECTOR_BENCH_DATASET_NAME: str = "vikp/doclaynet_bench"
-    DETECTOR_IMAGE_CHUNK_HEIGHT: int = 1200 # Height at which to slice images vertically
-    DETECTOR_TEXT_THRESHOLD: float = 0.6 # Threshold for text detection
-    DETECTOR_NMS_THRESHOLD: float = 0.35 # Threshold for non-maximum suppression
+    DETECTOR_IMAGE_CHUNK_HEIGHT: int = 1280 # Height at which to slice images vertically
+    DETECTOR_TEXT_THRESHOLD: float = 0.6 # Threshold for text detection (above this is considered text)
+    DETECTOR_BLANK_THRESHOLD: float = 0.35 # Threshold for blank space (below this is considered blank)

    # Text recognition
-    RECOGNITION_MODEL_CHECKPOINT: str = "vikp/text_recognizer_test"
+    RECOGNITION_MODEL_CHECKPOINT: str = "vikp/surya_rec"
    RECOGNITION_MAX_TOKENS: int = 160
    RECOGNITION_BATCH_SIZE: Optional[int] = None # Defaults to 8 for CPU/MPS, 256 otherwise
    RECOGNITION_IMAGE_SIZE: Dict = {"height": 196, "width": 896}