Distillative AI

4 min readJun 13, 2024

Have we heard enough about Generative AI?

Maybe we have and maybe we haven’t. Distillative AI is like generative AI in that it makes use of transformer models and diffusion models. But don’t worry it’s different.

Functional Goals

There are some key aspects of Distillative AI that almost make it like the opposite of Generative AI from a functional perspective. The functional goal of Generative AI is to generate content while the functional goal of Distillative AI is to distill verbose, raw, impure content into processed and refined results. So Distillative AI is functionally the opposite of Generative AI: Rather than talking about generating more content we’re talking about pairing it down. And there’s a lot of content out there, trust me.

Functional Planning for Function-First Fashioning

Distillative AI takes an end-to-end approach which considers not just the inferencing nearest the user but also the underlying model training required to efficiently and reliably get consistent accurate results with high availability and performance.

Transfer Learning by Adaptive Distillation

Adaptive distillation is a technique used in machine learning to transfer knowledge from a teacher model to a student model. It involves training the student model on a subset of the input data that it has not seen before, while also providing it with access to the predictions made by the teacher model for those same inputs.

The idea behind adaptive distillation is that the student model can learn from the teacher model’s predictions and use this knowledge to make its own predictions on new, unseen data. By doing so, the student model can improve its performance without having to be trained on a large amount of labeled data.

Attention Is All You Need And All I Need Is The Attention I Want

In traditional attention mechanisms, the model assigns relevance weights based on a fixed set of inputs and outputs. This can be limiting as different parts of the input data may have varying levels of importance for the task at hand.

Adaptive distillation differs from traditional distillation in that it allows the student model to adaptively choose which inputs to use for learning and how much weight to give to the teacher’s predictions versus its own predictions; enabling the student model to learn increasingly more efficiently and effectively over time, as it can focus more and more on the parts of the input data that are most relevant to its tasks.

Guardrails in Production, but Not Always in Dev

Safety and reliability are not only about the popular plan of putting some new type of “guard rail” in place that keeps a dynamic thing from straying outside the intended behavior zone. These are the same old governance land grabs we see every time a disruptive technology emerges. We saw it with cloud computing, social media / GDPR, the Internet, automobiles, and everything else. And that’s fine it’s the cycle of life or whatever so governance is a necessary evil or else how would we know what good is right? The problem with having guardrails in development environments is that the the guard rails tend to significantly reduce the velocity of AI development productivity. The “guard rails” often are exceptions which are handled by implementations such as for example a chatbot who thinks the best answer to anything is let’s mention ethics but not really do any ethical analysis of the topic due to the abnormal termination of the program.

Distillative AI Implements Next-generation Guardrails

Functionally Atomic Neural Networks are an implementation of next-gen guardrails. But what are next-gen guard rails anyway? Next-gen guard rails are behavioral constraints built around the application. Step-by-step execution flows provide consistent and reliable results.

You can have both Client-side Automation & Distillation

DRY: Don’t repeat yourself. If a developer is constantly repeating themselves to get the result they want then they should not repeat themselves. There are two approaches. One is to automate the repetition and the other is to fine-tune the process which creates the output. Distillative AI takes both approaches, but fine-tunes the process which creates the output primarily. This is a hollistic approach; and it produces much higher efficiency than client-side automation alone. Efficiency gains improve over time as compared to client-side automation alone which is essentially a highly useful and efficient form of technical debt.

Responsible Use of leverage: Technical Debt with Low Interest Rates.

If Technical Debt isn’t so bad in some cases can we call it Technical Credit? I’m mean hey it is what it is (technical debt). Technical debt is best avoided; but if you’re gonna borrow, get good interest rates and perks where you can. That’s how I like to think of client-side automation. It’s good to be able to use a little bit of leverage but you don’t want to use too much. It’s ok in the short term but in the long term there’s a more efficient and cost-effective long term solution. This is how Distillative AI combines both client side automation with eventually persistent models. Over time small, mature models evolve as experts. Mixtures of experts can be trained with ensemble methods and so on.

Distillative AI

Written by Asher Bond

No responses yet