Chenchen Gu
@chenchenygu
CS @stanford
Prompt caching lowers inference costs but can leak private information from timing differences. Our audits found 7 API providers with potential leakage of user data. Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer! 🧵1/9

I will be presenting a poster on this @icmlconf Tuesday 11 am–1:30 pm! East Exhibition Hall #E-1204 (Had to reupload the original post because I accidentally deleted it)
Prompt caching lowers inference costs but can leak private information from timing differences. Our audits found 7 API providers with potential leakage of user data. Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer! 🧵1/9