Shashank Sonkar
@shashank_nlp
In Academic Job Market | NLP+Education | Grad Student @rbaraniuk group | @RiceECE @rice_dsp @OpenStax @IITKanpur
Some times you wait years to see whether anyone can replicate your work. Some times you discover another paper the next day that pretty much reaches the same conclusion. Further evidence that deep learning still has deep trouble with comprehension: dsp.rice.edu/2022/10/25/a-v…
Claims that Dall-E 2 understands human language do not withstand scrutiny. New experimental work by @EvelinaLeivada @ElliotMurphy91 and myself shows systematic failure in mapping syntax to semantics, across wide range of common linguistic constructions. arxiv.org/abs/2210.12889
Mistakes are key learning opportunities!🧑🎓 Can LLMs help students learn from them through dialog? 💬 While they often struggle to diagnose student errors when generating responses directly, adding a verification step ✅ could make a difference. #EMNLP2024
𝗖𝗮𝗻 𝗟𝗟𝗠𝘀 𝗵𝗲𝗹𝗽 𝘀𝘁𝘂𝗱𝗲𝗻𝘁𝘀 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺 𝗺𝗶𝘀𝘁𝗮𝗸𝗲𝘀? Models struggle to spot student errors, but a verification step could help. More below! 🧵(1/9) #EMNLP2024 📰 arxiv.org/abs/2407.09136
🚨"Towards Aligning Language Models with Textual Feedback" has been accepted at #EMNLP2024! We explore if textual feedback can better align LLMs vs. numeric rewards. Our approach, ALT, adapts the Decision Transformer to condition responses on textual feedback. What we find 👇🧵
🚀 New paper on LLM reasoning 🚀 We present MathGAP, a framework for evaluating LLMs on math word problems with arbitrarily complex proof structures--resulting in problems that are challenging even for GPT-4o and OpenAI o1 💥 A thread 🧵 arxiv.org/pdf/2410.13502
🚀✨ OpenStax is proud to announce we have partnered with @GeminiApp to enable our library of resources to be discovered, searched, and available to users 18+ in the U.S.! Read more here: blog.google/products/gemin…
📚 We're also introducing new #Gemini features to help you learn more confidently. For example, Gemini will soon provide trustworthy responses based on textbooks from @OpenStax, a division of @RiceUniversity—including in-line citations and links to relevant peer-reviewed content.
Increasingly evident that LLM-grading is best done formatively and in tandem with human instructors, even when trained with student responses and good rubrics. Long answers are particularly challenging even for frontier models. Good work from @shashank_nlp arxiv.org/abs/2404.14316
AGI is gonna be wild! meantime we have some problems.
Same analysis holds for prepositions of movement like (e) up, (f) down, (g) from, and (h) towards. Unsurprisingly, SDMs also fail for the hardest abstract category of particles, which include (i) on, (j) off, (k) with, and (l) without.