#81 Building AI RAG Systems, Vector Databases and Information Theory

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

#81 Building AI RAG Systems, Vector Databases and Information Theory

Featured Guest: Dr. Majid Fekri, Co-Founder & CTO, Edge AI Innovations

Prathamesh Patel

and

Dr. Majid Fekri

May 01, 2025

Transcript

In this episode, Dr. Majid explores the intersection of edge AI and information theory. He shares his vision of running AI efficiently on small devices while preserving privacy and reducing energy consumption.

Dr. Majid explains how his company developed a unique approach to retrieval augmented generation (RAG) for edge devices by redefining what "relevance" means through information theory. Their solution uses Shannon's theory of information as the foundation to compress vector signals into binary format, making operations extremely fast and efficient while maintaining accuracy.

The conversation covers:

The vision behind running AI on edge devices instead of large cloud servers
How their universal relevance score helps control what context gets passed to language models
The mathematical foundations in information theory that make their approach unique
Dr. Majid's journey from atmospheric science to AI, and how his PhD research on weather forecasting led to these breakthroughs
The real-world applications and benefits of their technology in making RAG systems more accurate and reliable

Dr. Majid also discusses the challenges of deploying AI systems we don't fully understand, the mystery of vector spaces in neural networks, and how their approach provides more mathematical control over AI behavior.

About SundayPyjamas AI Suite:

A Suite for teams to collaborate with AI: The all-in-one platform for collaborative content creation: streamline workflows with real-time editing, AI-powered assistance, secure access, and effortless export.

Learn more: suite.sundaypyjamas.com

Join the Pilot Program

The Vision: AI That Lives on Your Device

Dr. Majid's journey into edge AI began with a simple but powerful vision: AI should be accessible to everyone, not just tech giants with massive GPU clusters.

"My vision was that AI is going to change many things. It's going to change industry, how people work, finances, many aspects of people's lives... But going forward, it makes sense to run AI on small devices because there will be technology... especially large language models would become smaller and smaller, and on the other side, the computational capacity of devices has become better and better even on tiny devices like your phone."

This vision isn't just about accessibility—it's about fundamentally changing our relationship with AI:

"If that's the case, it makes a lot more sense for people to do that, the first thing being preserving your privacy and data and having control over your own AI system. The second thing is, of course, energy consumption. You preserve a lot of energy by putting less pressure on those giant servers."

The Problem: Knowledge Bottlenecks and Computational Limits

The challenge Dr. Majid identified was two-fold: language models have limited knowledge despite being trained on vast datasets, and edge devices have severe computation and memory constraints.

"There is a limitation to how much knowledge these language models have. They have been trained on a vast amount of knowledge, but they have not incorporated all of that knowledge. So they have extract of everything, a tiny extract of all different knowledge base."

His solution? Apply retrieval augmented generation (RAG), but with a twist—redefining what "relevance" means through information theory:

"On tiny devices, the problem of space and computation is extreme... You don't have much memory or computation left to do the retrieval... and that's where our product comes in. We have redefined what similarity means in terms of like semantic similarity."

The Innovation: Shannon's Information Theory Applied to Modern AI

What makes Dr. Majid's approach unique is its foundation in Claude Shannon's theory of information:

"Our way of defining relevance is kind of revolutionary. It's unique. It combines information theory for encoding or Shannon's theory of information which compresses the signals of the vectors, gives you a binary, and then on the binary space you can do operations very fast and efficiently because you're just doing XOR, which is the most fundamental function that all computers do."

This approach not only makes operations faster but actually improves accuracy:

"It was getting a 40% higher score at the end for every question answer compared to Euclidean distance or cosine transform that are usually used in all similarity measurements."

Universal Relevance Scores

Perhaps the most valuable aspect of Dr. Majid's work is the creation of a universal relevance score:

"People who are in the know of RAG development, they know how valuable it is to have a universal score. Because all these rerankers are trying to do it. But if you have the score from the get-go, you just know where to put your thresholds, which parts of the context you want to pass to LLM, which part you want to keep away from the LLM."

This creates powerful security benefits as well:

"Having this universal relevance score helps you to limit that... you put a threshold on the relevance limit and anything below that threshold is never passes the quality test to be passed to LLM. That also gives you leverage to create built-in security measures."

From Weather Forecasting to AI

Fascinatingly, Dr. Majid's breakthrough came from his PhD work in atmospheric science, where he was trying to improve rain and snow forecasts:

"In my PhD defense, one of the problems is comparing forecast to observation—what does the accuracy of a forecast mean? I did a deep dive into that and I found that at the core of it, you just have a binary comparison of two binary signals."

This led him to information theory and a realization:

"When I learned about Shannon's theory, it was like a big turn on my head. It is not just mathematical, it's like a fundamental physics rule as well."

The Mystery of Neural Networks

Dr. Majid also offers a sobering perspective on our understanding of how language models actually work:

"We make a lot of assumptions about what's happening in the n-dimensional vectorial space of these LLMs or neural networks. But we don't know... the space of vectors, the n-dimensional space is an unknown space."

This has serious implications for AI safety:

"A lot of guardrails and security systems that are being built for AI, they use AI to monitor AI. Basically, they ask politely to AI that you should not do that... but if we understand the vectorial space and we can map them to lower dimensions... this gives us a lot more reliable control over what's happening."

What's Next: Making It Available to Everyone

Dr. Majid and his team are preparing to launch their technology, which they've named "Moorche" (Persian for "ant"):

"Ants have a search algorithm that is one of the most efficient search algorithms that is there in nature. You know how the ants go around and find food? It's fascinating. Each ant is the explorer that goes and finds things."

Their approach promises to make retrieval augmented generation both more accurate and more efficient—potentially transforming how we implement AI on the edge.

Dr. Majid's work represents a fascinating convergence of classical information theory and cutting-edge AI. By rethinking how we measure relevance and similarity in vector spaces, the Edge AI team is making it possible to run sophisticated AI applications on devices with limited resources, all while potentially improving their accuracy and security.