Mohd Sanad Zaki Rizvi
@sanad_maker
M.S. Thesis @ U. of Edinburgh | Previously: Google Research India, Microsoft Research India | #nlproc #NLU #AI | Thoughts my own
Our paper - "GCM: A Toolkit for Generating Synthetic Code-mixed Text" has been accepted at the 16th Conference of European Chapter of the Association for Computational Linguistics @eaclmeeting in the demo paper track.
Definitely true!
Probably not what you want to hear but docs 😅. Actual real life examples. Better and more comprehensive kwarg docs. More helpful links to actual code not just wrapper of wrapper of wrapper code. Example code of larger apps showing best practices (style of torch titan, nanoGPT or…
We know LLMs are poor at MT in low-resource languages (LRLs): curious how to adapt them to perform better? 🚀 Our new paper explores the interplay between scale (of MT data) and diversity (of tasks/langs) in instruction tuning in determining LLM-MT performance for LRLs💡…
Can GPT-3.5 generate plausible clinical notes? Generated discharge summaries have the correct diseases and procedures 🙂! but don’t tell plausible stories 🙁 . Though not authentic, the generated documents can augment ICD coding training! 👉 academic.oup.com/jamia/advance-… 🧵 1/8
The memory in Transformers grows linearly with the sequence length at inference time. In SSMs it is constant, but often at the expense of performance. We introduce Dynamic Memory Compression (DMC) where we retrofit LLMs to compress their KV cache while preserving performance…
Can open-source LLMs execute *chains of instructions* in a single query? Not so well, we found. However, they can learn this ability by: - augmenting examples from public SFT mixtures with chains of instructions automatically - performing *sequential instruction tuning* on them.…