Vik Paruchuri
5b830e2298
Change move to device
2025-08-01 09:51:42 -04:00
Vik Paruchuri
9911a9d928
Batched encoder
2025-07-31 15:36:41 -04:00
Vik Paruchuri
d71a38576c
Fix grid sizes
2025-07-31 05:24:04 -04:00
Vik Paruchuri
5bce56ea26
Misc cleanup
2025-07-30 13:44:51 -04:00
Vik Paruchuri
9a042b127e
Fix batch assignment
2025-07-30 13:19:41 -04:00
Vik Paruchuri
5196fd8e2d
Fix bugs with forward
2025-07-29 19:00:30 -04:00
Vik Paruchuri
26f964574b
Maybe add batch to encoder
2025-07-29 17:44:49 -04:00
Tarun Menta
a41fb6d99f
Merge branch 'vik/tpu' into foundation-update
2025-07-28 17:27:24 -04:00
Tarun Menta
a754f8fed9
Cleanup
2025-07-28 16:59:51 -04:00
Tarun Menta
d681c04bd1
Cleanup
2025-07-28 16:58:11 -04:00
Tarun Menta
92eee41256
Fix topk
2025-07-28 16:57:42 -04:00
Tarun Menta
bb8e5f8935
Allow multi token prediction for OCR
2025-07-28 16:29:03 -04:00
Tarun Menta
e1cab15a9c
Extend functionality of the new cache
...
Cache now supports "ragged" input_ids where each batch can have a different
number of "true tokens", with padding. This helps for lots of scenarios,
including MTP and beacons, when some sequences have shorter preds than others
Can be improved further
2025-07-28 16:24:02 -04:00
Tarun Menta
5a7347ab57
Bugfix in decode udpate - Text token counts were wrong
...
We wanted to limit the text token count to max of `text_sliding_window`,
but were clamping to min instead, which messed up the logic in a lot
of places downstream
Also removed dependence on huggingface caching
2025-07-28 15:05:48 -04:00
Tarun Menta
99693a9abb
Fix speed issues due to topk
...
Do the topK on GPU before moving to CPU, avoids an expensive and slow
GPU<->CPU memory transfer of the full logits
2025-07-28 11:07:23 -04:00
Tarun Menta
a489a58086
Fix decode attention mask update
...
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
During decode, if the sliding window is not full, we should update
the attention mask in the last `sliding_window` positions to only
attend to valid tokens. This update was not offset by the `text_cache_start`,
so we were actually making updates in the image cache space
Simple change to include this offset
2025-07-26 17:18:03 -04:00
Tarun Menta
467e7024d9
Delete unused function
2025-07-26 15:31:21 -04:00
Tarun Menta
fabcb0ed79
Cleanup
2025-07-26 13:40:56 -04:00
Tarun Menta
6251ab2568
Faster static cache implementation
...
Decode update is way faster now. Leverages the fact that flash
attention now has an option to both left and right pad the
cache
2025-07-25 19:51:20 -04:00
Vik Paruchuri
2351d34b0d
Add item conv
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
2025-07-21 15:21:48 -04:00
Vik Paruchuri
ca9137d0c1
Fix issues
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
2025-07-21 15:17:08 -04:00
Vik Paruchuri
a8d6509685
Default sliding window:
2025-07-21 09:49:40 -04:00
Vik Paruchuri
5160f774cf
Keep on cpu for longer
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
2025-07-15 10:02:57 -04:00
Vik Paruchuri
f8a9cedd1e
Prefill experiments
2025-07-14 20:18:45 -04:00
Vik Paruchuri
d16721362d
Improve prefill
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
2025-07-11 09:32:06 -04:00
Vik Paruchuri
9bb7fe5fd5
Improve embeddings
2025-07-10 22:56:40 -04:00
Vik Paruchuri
58b3054f6e
Fix
2025-07-10 16:42:07 -04:00
Vik Paruchuri
b256889b0b
Improve prefill and decode speed
2025-07-10 16:21:41 -04:00
Vik Paruchuri
6ae83d8df1
Revert encoder changes
2025-07-10 12:31:18 -04:00
Vik Paruchuri
7e819c3442
Refactor cache for tpu
2025-07-10 12:17:58 -04:00
Vik Paruchuri
8044edaaef
Fix compile issues
2025-07-09 18:46:40 -04:00
Vik Paruchuri
64872755a6
Remove the loop
2025-07-09 18:14:45 -04:00
Vik Paruchuri
402003a346
Cleanup cache
2025-07-09 15:42:05 -04:00
Vik Paruchuri
c23bd234c2
Refactor cache
2025-07-09 11:59:47 -04:00
Vik Paruchuri
2eae380119
Fix graph break
2025-07-08 19:03:49 -04:00
Vik Paruchuri
cd0c46b9b9
Refactor caching
2025-07-08 17:42:57 -04:00
Vik Paruchuri
53d06c0da9
Pad the encoder properly
2025-07-08 16:25:23 -04:00
Vik Paruchuri
dce33261e3
Vectorize, add static shapes
2025-07-08 11:58:40 -04:00
Vik Paruchuri
d2ee4f241b
Work on tpu
2025-07-07 20:07:11 -04:00
Tarun Menta
9ab2cd7753
Static cache on encoder when required
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
2025-07-03 17:45:44 -04:00
Tarun Menta
344d1834f8
Cleanup
2025-07-03 15:58:48 -04:00
Tarun Menta
ee68baa137
Pad prefill inputs batch size for compiled static shape
...
Cache was already static shape, but not prefill inputs since prefill can
happen at 0.2 times the initial batch size
2025-07-03 15:57:35 -04:00
Tarun Menta
1995b65783
Cleanup
2025-07-03 15:33:54 -04:00
Tarun Menta
d2a52ce02d
Pin to seq len for static cache
2025-07-03 15:32:16 -04:00
Tarun Menta
4cc0c574cd
Some more fixes when moving from right to left padding
2025-07-03 14:04:11 -04:00
Tarun Menta
2bfb8168bf
Minor comments for SDPA [no ci]
2025-07-03 12:26:43 -04:00
Tarun Menta
33bd9c1bfd
Cleanup
2025-07-03 11:45:14 -04:00
Tarun Menta
6e11b95bef
Expose topk through foundation - Pipe into layout and rec
Integration test / build (push) Has been cancelled
Unit tests / build (t4_gpu) (push) Has been cancelled
Unit tests / build (ubuntu-latest) (push) Has been cancelled
Unit tests / build (windows-latest) (push) Has been cancelled
Test CLI scripts / build (push) Has been cancelled
2025-07-01 19:16:48 -04:00
Tarun Menta
f6371d51d9
Fix math mode for layout
2025-06-30 14:58:14 -04:00
Tarun Menta
c733a9ba86
Make lookahead prediction configurable
2025-06-30 14:54:27 -04:00