Zach Nussbaum
|
d6f3515009
|
feat: new unified tokenizer
|
2025-08-25 14:21:32 +00:00 |
|
Vik Paruchuri
|
e1aa09d3bc
|
Set disable tqdm
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-19 17:39:18 -04:00 |
|
Vik Paruchuri
|
a95b6cafe5
|
Fix layout and table rec image bbox
|
2025-08-19 11:33:19 -04:00 |
|
Vik Paruchuri
|
4613f45a5b
|
Prefill fix
|
2025-08-18 08:06:45 -04:00 |
|
Vik Paruchuri
|
053f13cde7
|
Fix padding on tpu
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-15 15:52:03 -04:00 |
|
Vik Paruchuri
|
a73eee6648
|
Force bf16
|
2025-08-15 15:38:50 -04:00 |
|
Vik Paruchuri
|
cbe23fae03
|
Tables can have a lot of cells
|
2025-08-15 15:29:50 -04:00 |
|
Vik Paruchuri
|
14e7ee6ed9
|
Avoid truncating layout and table
|
2025-08-15 11:18:31 -04:00 |
|
Vik Paruchuri
|
f2eecf1ad1
|
Properly pad
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-12 12:45:56 -04:00 |
|
Vik Paruchuri
|
609caf42c9
|
Fix tensor creation
|
2025-08-12 12:18:30 -04:00 |
|
Vik Paruchuri
|
a511a095b9
|
Pad image embeddings
|
2025-08-12 12:03:42 -04:00 |
|
Vik Paruchuri
|
9e5fa2931b
|
Wire in table structure
|
2025-08-12 09:53:17 -04:00 |
|
Vik Paruchuri
|
fc6657e8a6
|
Use fix-length index
|
2025-08-11 21:33:23 -04:00 |
|
Vik Paruchuri
|
de947006a5
|
Fix text lengths
|
2025-08-11 16:30:27 -04:00 |
|
Vik Paruchuri
|
2748109d33
|
Fix encoder chunking
|
2025-08-11 12:45:49 -04:00 |
|
Vik Paruchuri
|
8367a631a2
|
Accuracy fixes
|
2025-08-11 12:40:09 -04:00 |
|
Vik Paruchuri
|
eee29d4ae7
|
Fix beacon issue
|
2025-08-11 11:59:51 -04:00 |
|
Vik Paruchuri
|
f03b58b4e1
|
Fix table rec
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
|
2025-08-08 14:34:06 -04:00 |
|
Vik Paruchuri
|
d55f00a49e
|
Integrate table rec predictor
|
2025-08-08 11:11:07 -04:00 |
|
Vik Paruchuri
|
669ce4869d
|
Patch clamp issue
|
2025-08-06 21:52:47 -04:00 |
|
Vik Paruchuri
|
e1df24c93e
|
Cleanup embedding
|
2025-08-06 16:54:38 -04:00 |
|
Vik Paruchuri
|
185b57abd7
|
Cleanup
|
2025-08-06 16:34:49 -04:00 |
|
Vik Paruchuri
|
8d1ef8517c
|
Merge remote-tracking branch 'origin/vik/tpu-layout' into vik/tpu-layout
|
2025-08-06 16:28:12 -04:00 |
|
Vik Paruchuri
|
0600fc5904
|
Enable re-embedding bboxes
|
2025-08-06 16:22:39 -04:00 |
|
Vik Paruchuri
|
523bd6664c
|
Merge branch 'vik/layout' into vik/tpu3
|
2025-08-06 12:44:57 -04:00 |
|
Vik Paruchuri
|
768d8d54a7
|
Move layout
|
2025-08-06 12:44:19 -04:00 |
|
Vik Paruchuri
|
d4461c6d30
|
Fix mark steps
|
2025-08-06 12:06:39 -04:00 |
|
Vik Paruchuri
|
3b30120601
|
Enable compile
|
2025-08-05 11:58:15 -04:00 |
|
Vik Paruchuri
|
2c60d24a81
|
Cleanup debug logs
|
2025-08-05 11:41:07 -04:00 |
|
Vik Paruchuri
|
a1aa1557a6
|
Fix embedding with a static scatter
|
2025-08-05 10:57:42 -04:00 |
|
Vik Paruchuri
|
d9f6e4c52e
|
Fix issues with GPU codepaths
|
2025-08-04 16:27:50 -04:00 |
|
Vik Paruchuri
|
dd7b127d92
|
Add in original codepath
|
2025-08-04 16:03:35 -04:00 |
|
Vik Paruchuri
|
ab9eff4d69
|
Static cache impl
|
2025-08-04 13:57:40 -04:00 |
|
Vik Paruchuri
|
f47b0cdb96
|
Bump version
|
2025-08-04 13:05:36 -04:00 |
|
Vik Paruchuri
|
4fcc094159
|
Fresh tpu start
|
2025-08-04 13:05:06 -04:00 |
|
Tarun Menta
|
f2b6363482
|
Merge pull request #414 from datalab-to/tag-fix
Fix edge case for empty tags
|
2025-08-04 12:58:31 -04:00 |
|
Tarun Menta
|
1dd9b95a25
|
Fix edge case for empty tags
|
2025-08-04 12:42:32 -04:00 |
|
Vik Paruchuri
|
04d2ba9d9b
|
Bump recognition model
|
2025-08-04 11:02:57 -04:00 |
|
Vik Paruchuri
|
54d55bb0e7
|
Bump version
|
2025-08-04 09:16:46 -04:00 |
|
Vik Paruchuri
|
e55703eff5
|
Merge pull request #411 from datalab-to/foundation-ocr-release
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
Foundation ocr release
|
2025-08-01 18:42:33 -04:00 |
|
Tarun Menta
|
006becd9f3
|
Merge branch 'dev' into foundation-ocr-release
|
2025-08-01 18:41:31 -04:00 |
|
Tarun Menta
|
fe8545cfc8
|
Better calculation of max image token count
|
2025-08-01 18:39:06 -04:00 |
|
Tarun Menta
|
3212707c49
|
Move checkpoint to S3
|
2025-08-01 17:22:49 -04:00 |
|
Tarun Menta
|
729ffc9295
|
Fix max image cache space logic
|
2025-08-01 16:04:52 -04:00 |
|
Tarun Menta
|
bb2d77d729
|
Filter more HTML tags out
|
2025-08-01 11:46:58 -04:00 |
|
Tarun Menta
|
02b6588de8
|
Filter unwanted tags from characters instead of joined text
This allows it to be filtered when appearing in marker as well
|
2025-08-01 10:46:16 -04:00 |
|
Tarun Menta
|
1cf444d752
|
Clean out unwanted formatting tags from OCR
|
2025-07-31 20:33:11 -04:00 |
|
Tarun Menta
|
48b98856bc
|
Optimize decode cache update
|
2025-07-31 19:56:00 -04:00 |
|
Tarun Menta
|
34f1148fd9
|
Cleanup
|
2025-07-31 19:51:25 -04:00 |
|
Tarun Menta
|
de9e9e74d4
|
Allow max tokens and sliding window to be set to custom values
|
2025-07-31 19:49:05 -04:00 |
|