Diptanu Choudhury

@diptanu

Founder @tensorlake. Past - AI and Distributed Systems at @meta, @hashicorp, @linkedin and @netflix

San Francisco, CA

Joined December 2007

697Following

3KFollowers

Pinned

Diptanu Choudhury@diptanu · May 15

Excited to announce @tensorlake Cloud! 🧵 Tensorlake converts real-world documents into clean, structured data for business workflow automation and for building Agents in mission-critical documents. It's powered by a state-of-the-art document layout understanding model trained…

TTensorlake@tensorlake · May 15

Announcing Tensorlake Cloud Up-leveling Document Ingestion and Workflows for building agentic applications and complex business workflows.

17.0K

Pinned

Diptanu Choudhury@diptanu · Jul 21

Our OCR Models are really good at extracting text and other visual information from any image into structured data! One of our partner energy companies recently tried using it to extract structured data from plates on transformers and it worked surprisingly well. From the very…

TTensorlake@tensorlake · Jul 21

Structured Extraction from images power a lot of real world Agentic use cases, such as validation of license plates, driving licenses, information from invoices captured by images. Our Document Ingestion API allows you to extract data from millions of images without spinning up…

1.0K

Diptanu Choudhury@diptanu · 13 h

Streamable HTTP is the perfect transport for ambient agents using MCPs. Making it easy for resuming a broken stream would be important. I can see them being useful for building deep research type workflows where the client can off load the long running task and optionally…

206

Diptanu Choudhury@diptanu · 14 h

The Fast MCP docs are a great resource to learn and understand how MCP works and what it is about. If you are new to MCP start here, and then read the protocol and other official docs. gofastmcp.com/tutorials/mcp

diptanu's tweet card. An introduction to the core concepts of the Model Context Protocol (MCP), explaining what it is, why it's useful, and how it works.

172

Diptanu Choudhury@diptanu · Jul 25

Serverless is cool because compute costs scale with business outcomes.

164

Diptanu Choudhury@diptanu · Jul 25

We just shipped a feature to detect Document Structure Hierarchy in @tensorlake to improve chunking for research papers where content is usually hierarchical. The alternative is to use LLMs and post process - which is slower and expensive.

194

Diptanu Choudhury Retweeted

Chris@criccomini · Jul 21

The stuff going on in SlateDB's #zerofs Discord channel right now is bonkers. ZeroFS now has: - NFS server - NBD server - Full encryption Checkpoints (and disk branching) are in progress... 🤯 discord.gg/NNtVmsPU

1.0K

Diptanu Choudhury Retweeted

Tensorlake@tensorlake · Jul 22

Route contracts efficiently and confidently by quickly extracting key information (e.g. the buyer and seller of a property) and detecting the presence of signatures. With a single API call, your workflow just got faster.

434

Diptanu Choudhury@diptanu · Jul 22

This blog from Snowflake is a must read for AI Engineers. They show that adding Document level global context to chunks make retrieval better. And, probably to no-one's surprise better retrieval using a well tuned ingestion and document pre-processing pipeline closes the gap…

216

Diptanu Choudhury@diptanu · Jul 20

I read posts every day where people report full document retrieval don’t work well in real life, but every single time a new model with 1M+ context length come out I read about retrieval algorithms, document pre processing techniques are dead. reddit.com/r/Rag/s/jRfpIj…

diptanu's tweet card. Explore this post and more from the Rag community

214

Diptanu Choudhury@diptanu · Jul 19

We found this trick to be really useful for making small LLMs like Qwen3-4Bs understand concepts in figures better for RAG applications! Mermaid diagrams explain concepts and relationships between components illustrated in flowcharts much better than normal figure summaries!

TTensorlake@tensorlake · Jul 19

Small LLMs struggle to answer questions from research reports with a lot of visual content describing algorithms. We have seen sumarizing figures as Mermaid Diagrams helps LLMs like Qwen 3 4B learn about concepts illustrated in diagrams much better, than just text summaries!…

455

Diptanu Choudhury@diptanu · Jul 18

One of our customers told me today the biggest lift for structured extraction with @tensorlake is that their engineering team can now tweak the schema they want to extract from documents every week as they evolve their insurance platform. It’s the little things like this that…

1.0K

Diptanu Choudhury@diptanu · Jul 17

A common problem with structured extraction from Documents with OpenAI are limits in the number of properties that can be extracted and nesting limits. We just shipped an update to @tensorlake's Document Ingestion API to extract unlimited number of properties and nesting!…

TTensorlake@tensorlake · Jul 17

Have you wished you could extract more than >100 structured data fields, or Pydantic objects with > 5 levels of nesting with OpenAI? Our Document Ingestion API now supports unlimited number of structured data fields, and any amount of nesting! A single schema is all you need…

263

Diptanu Choudhury@diptanu · Jul 14

Just saw a Waymo make a right turn from the middle lane while my uber was to its right at an intersection. How do these systems work? Assuming these are hybrid systems with many models + some amount of heuristics involved to decide if it can make a turn legally and safely.

347

Diptanu Choudhury@diptanu · Jul 5

I find it interesting Polars calls their new cloud “serverless” but they launch the compute infrastructure insides customers clouds. I am guessing their pilot customers asked them for data residency. This is a challenge at the moment for AI/ML data companies.

360

Diptanu Choudhury@diptanu · Jul 4

Event Driven systems are great for scalability. I have seen them get hard to maintain over time as more events are added to add new features. Have to reason about how the application state changes as various permutations of sequence of events are applied. I like LangGraph’s…

HHarrison Chase@hwchase17 · Jul 3

This is langgraph Most people think of langgraph as agent abstractions, but it’s powered by a low level event driven framework under the hood If we exposed that - would people be interested? Or focus more on agent abstractions?

4.0K

Diptanu Choudhury@diptanu · Jul 3

The algorithm really awards shit posting on emerging topics. Is there an AI agent to automate slop production?

297

Diptanu Choudhury@diptanu · Jun 27

We run 4 different models for parsing documents in @tensorlake's document ingestion pipeline. We implemented dynamic scale outs in our workflow engine to make the bigger and slower model eat up GPU capacity in our cluster to run ingestion faster!

diptanu's tweet image. We run 4 different models for parsing documents in @tensorlake's document ingestion pipeline. We implemented dynamic scale outs in our workflow engine to make the bigger and slower model eat up GPU capacity in our cluster to run ingestion faster!

907

Diptanu Choudhury@diptanu · Jun 26

Terraform wouldn’t have replaced Puppet/Chef if cloud-native primitives like ELBs and subnets didn’t become essential. I don’t see Terraform being replaced for what it does well. But the future lies in abstractions that let devs ship workflows(and services) without thinking about…

MMitchell Hashimoto@mitchellh · Jun 25

Terraform is still the best. But I'd like to see someone replace it. The major alternatives aren't interesting to me cause they're too iterative and copycat. I want to see fundamentally new ideas take hold. IaC feels stagnant.

634