Ramya Ravi
@Ramya_ravi19
Product Marketing Engineer @Intel | MSU Spartan | Nature lover | Opinions are my own
Excited to share my latest @towards_AI article! In this article, I break down ⚙️ What LLM serving frameworks are 🛠️ Why they matter for hosting large language models 📌 Key frameworks you should know 👉 Read here: medium.com/towards-artifi… #AI #LLMs #GenAI #DeepLearning
Building #GenAI apps with @OPEAdev? This article covers orchestration, microservices, and real-time data using Amazon Bedrock and OpenSearch: intel.ly/3HNb7Lc
What makes OPEA work for scalable GenAI apps? With Amazon Bedrock and OpenSearch, it brings orchestration, RAG, and microservices into one integrated stack: intel.com/content/www/us…
LLMs generate. Agents act. But the real magic? When they work together. In this article, I break down: 🔗 How they complement each other 🧠 When to use which — or both 👉 Read more: medium.com/@ramyaravi19/l… #AI #LLMs #AIAgents #ArtificialIntelligence #GenAI
Explore how OPEA, AWS Bedrock, and #OpenSearch simplify building #RAG pipelines, agents & more. Built for #developers who want to move from prototype to production with confidence. Read more: intel.com/content/www/us… #AI #GenAI #AIAgents #AWSBedrock @OpenSearchProj @OPEAdev
Confused about when to fine-tune a model, use adapters like LoRA, or build with RAG? If you're working with foundation models and want to customize smarter, not harder, this guide is for you. Read here: medium.com/@ramyaravi19/l… #FineTuning #AI #MachineLearning #DeepLearning
🚀 Smarter, Faster #AI with Mixture of Experts (#MoE) 🤖 MoE lets AI models scale efficiently by activating only what’s needed. In my Medium post, I cover: 🔹 What MoE is 🔹 How it works 🔹 Why it matters for #GenAI & #LLMs 📖 Dive in: medium.com/@ramyaravi19/w… #DeepLearning
Quantization using torch.export graph mode is easier and more efficient than eager mode. And in @PyTorch 2.7 it's available on Intel GPUs. Learn how to get started: youtube.com/watch?v=nek7u5…
Ever tried searching for “more ladybugs than flowers”? We did. And the AI nailed it. Fine-tuned LLMs can really deliver when trained on the right datasets. This demo shows what happens when #Qwen3 models are optimized and deployed on Intel hardware. Read the full article:…
Still prompt-and-hope? Agentic LLMs plan, adapt, and act—turning AI into an active problem solver. This blog breaks down how devs are using AutoGen, multi-agent design, and low-code tools to build smarter apps—featuring insights from Microsoft’s Daron Yöndem at Intel AI…
On devices with Intel® Core™ Ultra processors, you can unlock full acceleration using WebNN—thanks to integrated NPUs. Full dev guide → intel.ly/3Gc5yoV
Frameworks like WebLLM & Transformers.js are designed to run LLMs inside the browser using WebGPU, WebNN, and WebAssembly. No server required. No cloud roundtrips. Just fast, local inference. • WebGPU = GPU compute • WebNN = NPU/CPU/GPU inference • Wasm = near-native speed
LLMs in the browser? It’s not sci-fi. You can now run chatbots, summarizers, and other AI tools entirely in-browser—with JavaScript frameworks and hardware-accelerated APIs. Here's how it works
Automated prompt engineering on-device—no fine-tuning, no RAG. This new guide shows how to use #DSPy with Intel #oneAPI and llama.cpp to boost task accuracy from 📉 35% → 📈 78% Run LLMs locally, optimize efficiently. Read the guide → intel.ly/4lBzYRw
Big step forward for long-context #LLM benchmarks. We’re excited to share HELMET — a benchmark co-developed by Intel and @Princeton University to evaluate models across real-world, long-context tasks. Evaluate any LLMs. Extensible context lengths. Built for scale on Intel…
With insights from Intel AI DevSummit 2025, Ramya Ravi shares key considerations for fine-tuning and self-hosting large language models. Read the blog: intel.ly/4jvVJAw
Learn how to get started running on Intel Gaudi AI accelerators using models from Hugging Face Hub youtube.com/watch?v=ibpsVj… #IamIntel