Tarun Menta
|
d4496f8caa
|
Fix bad test - Add real latex image
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-29 14:53:41 -04:00 |
|
Tarun Menta
|
e7ec40ecb4
|
Move new model to R2
|
2025-08-29 14:40:14 -04:00 |
|
Tarun Menta
|
cdc7b18af9
|
Merge branch 'table-cell-updates' of https://github.com/VikParuchuri/surya into table-cell-updates
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-28 10:42:55 -04:00 |
|
Zach Nussbaum
|
4cdf1080cd
|
fix: ignore on utf16 errors
|
2025-08-28 00:08:51 +00:00 |
|
Zach Nussbaum
|
5d1c369477
|
feat: new tokenizer
|
2025-08-28 00:08:51 +00:00 |
|
Tarun Menta
|
c37c42e72c
|
Merge branch 'vik/new-enc' into table-cell-newenc
|
2025-08-27 10:21:59 -04:00 |
|
Zach Nussbaum
|
bc7ee4895a
|
fix: ignore on utf16 errors
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-27 10:45:08 +00:00 |
|
Tarun Menta
|
a4ed5523d0
|
Filter more unwanted tags
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-26 15:16:31 -04:00 |
|
Zach Nussbaum
|
a37919feff
|
feat: new tokenizer
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-26 13:59:13 +00:00 |
|
Tarun Menta
|
78302facbf
|
Merge branch 'vik/new-enc' into table-cell-updates
|
2025-08-19 16:56:28 -04:00 |
|
Vik Paruchuri
|
8def7db80e
|
Patch in new encoder
|
2025-08-19 14:20:52 -04:00 |
|
Tarun Menta
|
82b88729aa
|
Correct dtype when forcing to table rec to CPU
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-18 16:29:25 -04:00 |
|
Tarun Menta
|
a9d5a093e5
|
Pin table model to CPU if using MPS. Fixes datalab-to/marker#827
|
2025-08-18 11:01:04 -04:00 |
|
Tarun Menta
|
38a452d2b2
|
Make list of tags to filter an argument to get passed in
Required so that lists are not skipped in tables
|
2025-08-16 15:53:33 -04:00 |
|
Tarun Menta
|
508ad43735
|
Improve behavior of disable_tqdm
|
2025-08-16 12:58:51 -04:00 |
|
Tarun Menta
|
a3efce1830
|
Improve filtering of tags + Increase tags in blacklist
|
2025-08-15 01:06:00 -04:00 |
|
Tarun Menta
|
5497449bfa
|
Merge pull request #429 from datalab-to/dev
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
Improve model performance on math
|
2025-08-12 19:11:44 -04:00 |
|
Tarun Menta
|
5bb47b2f09
|
Bump model
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-12 18:59:39 -04:00 |
|
Vik Paruchuri
|
17b875fd55
|
Merge pull request #424 from datalab-to/dev
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
Dev
|
2025-08-08 20:04:06 -04:00 |
|
Vik Paruchuri
|
632a5a9621
|
Bump version
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-08 18:11:35 -04:00 |
|
Vik Paruchuri
|
9f6d957f57
|
Merge pull request #422 from datalab-to/finetuning
[WIP]: Finetuning Script
|
2025-08-08 18:10:25 -04:00 |
|
Vik Paruchuri
|
685e63c0d6
|
Bump surya checkpoint
|
2025-08-08 18:09:49 -04:00 |
|
Tarun Menta
|
57fb761ac6
|
Bump model
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-08 17:36:49 -04:00 |
|
Tarun Menta
|
98010bee5c
|
Update README [skip ci]
|
2025-08-08 16:45:08 -04:00 |
|
Tarun Menta
|
59bc1a781c
|
Fix trailing whitespace
|
2025-08-08 16:33:07 -04:00 |
|
Tarun Menta
|
f97add87a0
|
Update README
|
2025-08-08 16:32:15 -04:00 |
|
Tarun Menta
|
e1bb6306b0
|
Update README
|
2025-08-08 16:29:02 -04:00 |
|
Tarun Menta
|
3689c5aa8c
|
Update README with finetuning details
|
2025-08-08 16:16:46 -04:00 |
|
Tarun Menta
|
68d9c7916f
|
Merge pull request #423 from starikovplusplus/finetuning
Fix tokenizer to correctly tokenize script tokens
|
2025-08-08 15:21:59 -04:00 |
|
Tarun Menta
|
e8fb02dad4
|
Add in language scripts to text inputs
|
2025-08-08 15:19:41 -04:00 |
|
Tarun Menta
|
64f0bd0c8b
|
Improve processing + limit image size
|
2025-08-08 15:11:31 -04:00 |
|
github-actions[bot]
|
1486d0bdca
|
@starikovplusplus has signed the CLA in datalab-to/surya#423
|
2025-08-08 18:29:58 +00:00 |
|
starikov.y.e
|
563054f0b5
|
Fix tokenizer to correctly tokenize script tokens
|
2025-08-08 23:24:24 +05:00 |
|
Tarun Menta
|
4451aa4716
|
Minimal working finetuning
|
2025-08-07 19:33:51 -04:00 |
|
Tarun Menta
|
9f5b2535fe
|
Typo - Fix #419
|
2025-08-07 15:45:34 -04:00 |
|
Zach Nussbaum
|
2fa3a1ee9a
|
Merge pull request #421 from datalab-to/download-progbar
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-07 13:51:40 -04:00 |
|
Zach Nussbaum
|
644f5feb13
|
feat: download progress bar for each file
|
2025-08-07 13:11:53 -04:00 |
|
github-actions[bot]
|
30e59deb64
|
@mebriki has signed the CLA in datalab-to/surya#418
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-05 10:54:39 +00:00 |
|
Vik Paruchuri
|
b215de26e7
|
Merge pull request #415 from datalab-to/dev
Dev
|
2025-08-04 13:05:59 -04:00 |
|
Vik Paruchuri
|
f47b0cdb96
|
Bump version
|
2025-08-04 13:05:36 -04:00 |
|
Tarun Menta
|
f2b6363482
|
Merge pull request #414 from datalab-to/tag-fix
Fix edge case for empty tags
|
2025-08-04 12:58:31 -04:00 |
|
Tarun Menta
|
1dd9b95a25
|
Fix edge case for empty tags
|
2025-08-04 12:42:32 -04:00 |
|
Vik Paruchuri
|
894dbd1d3c
|
Merge pull request #413 from datalab-to/dev
Bump recognition model
|
2025-08-04 11:04:08 -04:00 |
|
Vik Paruchuri
|
04d2ba9d9b
|
Bump recognition model
|
2025-08-04 11:02:57 -04:00 |
|
Vik Paruchuri
|
b3a1aab4d3
|
Merge pull request #412 from datalab-to/dev
OCR model update
|
2025-08-04 09:44:18 -04:00 |
|
Vik Paruchuri
|
54d55bb0e7
|
Bump version
|
2025-08-04 09:16:46 -04:00 |
|
Vik Paruchuri
|
e55703eff5
|
Merge pull request #411 from datalab-to/foundation-ocr-release
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
Foundation ocr release
|
2025-08-01 18:42:33 -04:00 |
|
Tarun Menta
|
006becd9f3
|
Merge branch 'dev' into foundation-ocr-release
|
2025-08-01 18:41:31 -04:00 |
|
Tarun Menta
|
fe8545cfc8
|
Better calculation of max image token count
|
2025-08-01 18:39:06 -04:00 |
|
Tarun Menta
|
3212707c49
|
Move checkpoint to S3
|
2025-08-01 17:22:49 -04:00 |
|