Mohd Sanad Zaki Rizvi

@sanad_maker

M.S. Thesis @ U. of Edinburgh | Previously: Google Research India, Microsoft Research India | #nlproc #NLU #AI | Thoughts my own

India

Joined April 2015

2KFollowing

409Followers

Pinned

Mohd Sanad Zaki Rizvi@sanad_maker · Apr 18, 2021

Our paper - "GCM: A Toolkit for Generating Synthetic Code-mixed Text" has been accepted at the 16th Conference of European Chapter of the Association for Computational Linguistics @eaclmeeting in the demo paper track.

Mohd Sanad Zaki Rizvi@sanad_maker · Nov 15

Definitely true!

AAndrej Karpathy@karpathy · Nov 14

Probably not what you want to hear but docs 😅. Actual real life examples. Better and more comprehensive kwarg docs. More helpful links to actual code not just wrapper of wrapper of wrapper code. Example code of larger apps showing best practices (style of torch titan, nanoGPT or…

104

Mohd Sanad Zaki Rizvi Retweeted

Vivek Iyer@remorax98 · Sep 16

We know LLMs are poor at MT in low-resource languages (LRLs): curious how to adapt them to perform better? 🚀 Our new paper explores the interplay between scale (of MT data) and diversity (of tasks/langs) in instruction tuning in determining LLM-MT performance for LRLs💡…

16.0K

Mohd Sanad Zaki Rizvi Retweeted

Matúš Falis@matt_falis · Sep 16

Can GPT-3.5 generate plausible clinical notes? Generated discharge summaries have the correct diseases and procedures 🙂! but don’t tell plausible stories 🙁 . Though not authentic, the generated documents can augment ICD coding training! 👉 academic.oup.com/jamia/advance-… 🧵 1/8

7.0K

Mohd Sanad Zaki Rizvi Retweeted

Piotr Nawrot@p_nawrot · Mar 15, 2024

The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance…

462

301

80.0K

Mohd Sanad Zaki Rizvi Retweeted

Edoardo Ponti@PontiEdoardo · Mar 14, 2024

Can open-source LLMs execute *chains of instructions* in a single query? Not so well, we found. However, they can learn this ability by: - augmenting examples from public SFT mixtures with chains of instructions automatically - performing *sequential instruction tuning* on them.…

10.0K