Infinite Machine Learning : Compressed

A few weeks ago, Bittensor co-founder Ala Shaabana appeared on the Infinite Machine Learning podcast with host Prateek Joshi.

Prateek asked all the right questions, and so like the Based Space Podcast, we decided to edit, transcribe and pare the podcast down into something readable. This transcription leaves a fair bit of interview out, so refer to the original podcast if you want a deeper dive into topics such as basic network architecture.

Q: Can you speak to the shortcomings of centralized AI? Why should we decentralize it?

A: The way we see it, there are three inherent issues in AI today.

The first is that AI is non-compounding.

There's research being performed every year - researchers will take a paper, tinker with it, make it better, and get better performance - so the research itself is compounding, but the AI itself isn't, we're always training from scratch. The knowledge is not being shared - the models are siloed completely. This is inefficient and wasteful.

The second issue is that - as it stands today - if someone wants to access powerful compute or AI resources, they need to either be working at a big company or an academic institution with lots of funding. So if it's just me, I'm not going to be able to compete.

In many ways, we're still living in the 1970s - but we're at a nexus point. It will no longer be these big computers operating everything but instead a bunch of smaller computers operating together, interconnected, like Bitcoin - which is essentially a gigantic, decentralized computer now. But all it's doing is guessing hashes and not really doing anything useful - so what if it was? What if we applied these computers to AI, how far could we get?

I don't think AGI can be achieved by just one company, one lab or any university - It's going to take everyone's ingenuity, everyone's talents, and everyone's intelligence put together.

Q: Can you explain how the Bittensor network works in practice?


I can explain it by saying that we're basically building a decentralized mixture-of-experts model. A mixture-of-experts model is essentially just a way to expand the capacity of one model by having it contain several sub-models within it. This model will intelligently pick which sub-model to route a specific input to - the one that will perform the right computation on it - to get the right result.

When me and Jacob first started building together, we were working on a decentralized mixture-of-experts model, but without the blockchain. We found it was possible to have several models living on separate computers, training together so they were essentially exchanging information, working in the same modality, and learning from each other.

There are other examples of decentralized AI out there but there are some key problems with it. First of all, the models will generally be working on the same problem, so it just becomes a competition of who can solve it first. Secondly, there is no incentive. At Bittensor, the system users are incentivized, and everyone is working on a different problem, and there are aspects to these problems that they can share information about, and things they can learn from each other.

In the network, you have many nodes (neurons), and each of them contains one neural network. Right now, all the neurons are focused on one modality - text -and we're not going to move on to others until we've fully solved that problem.

Q: Can you explain TAO?


TAO is our incentive mechanism. It’s the way that we incentivize miners and people to join and contribute, whether that’s compute, or a neural network - whatever it is that you contribute, you get rewarded with TAO.

Seeing as the entire team is made up of AI engineers though, we’re not a crypto company. We’re not crypto developers that are doing AI, we’re AI developers using cryptocurrency to solve a problem.

TAO itself was launched fairly, it was never pre-mined, there was no ICO, and we’re following the Bitcoin curve. That means that there are only ever going to be 21 million TAO and the only way to obtain TAO is to mine it, you can’t buy it anywhere, you can’t sell it anywhere and you can’t do much with it unless you have already mined it.

Q: Can you speak to the energy consumption of the network?


It depends on how you approach the question. The start off, as AI engineers we can't compare ourselves to something like Bitcoin because the energy there will likely be much higher. It's a larger network and proof-of-work algorithms are much harsher on GPUs than regular training or inference.

Generally speaking, Bitcoin introduced the concept of computers that can vote with power. In a proof-of-stake system, like Ethereum, voting happens with monetary wealth. That sort of consensus mechanism is important because that's how you build trust in a network - that's how you know who is legitimate. Because of that, Bitcoin created a market that was precise in what it needed, which was the validation of transactions.

So we are arguing that we can use the same mechanism of markets to decentralized AI - and doing this gives us the power to tap into Bitcoin-scale compute. And at the same time, it gives us the power to tap into a global hive mind in a borderless, decentralized system. So you end up with a massive, interconnected neural network in a sense because you are stitching together the output of one neural network to the inputs of another and this creates a giant hive mind of knowledge that is compounding over time. ‌

Q: So if there are only 4096 nodes in the network, doesn't this limit the network size? How will Bittensor grow? Won't more people want to participate?


We capped 4096 ourselves, and we're not allowing more than that. The reason is that we're trying to solve this problem in a controlled environment, and we want to make sure it works at that number. We want to get to something close to, or at, the state-of-the-art at 4096 nodes. It's still a nascent network and we need to be able to validate it first. This number will grow, we're going to double, triple it - but until the point is reached where we can confidently say that the algorithm works, we're going to keep it there. ‌

Q: What is the incentive for someone to join the network? Say, someone with a powerful GPU, but also just the average person.


For the average person, the incentive is that, first of all, mining is a lot of fun. More importantly, there is the opportunity to earn back the compute they have spent. Compared with Bitcoin, TAO requires much less power, it's relatively easy on GPUs. I can't speculate on the price, but it's out there, you can go look it up.

More important for us though, are the AI engineers. These are the people who are going to drive the project - they are going to bring the diversity that we need, and they are going to come up with new and interesting architectures - things we haven't seen before. For this group, the incentive will be the quality of logits and embeddings they are going to receive from the network that they can use to train their models. That's really where we're going to shine because we're getting so much diversity in the network. ‌

Q: How will this network be administered and governed?


For the time being, the network is entirely handled by the Opentensor Foundation. We're the ones guiding it in the right direction and making sure it's growing out as it should be.

In the future, we don't want this sort of control. We don't believe that it should reside within the control of one entity because then we're no different from corporate AI. What is want is something similar to a DAO - though I don't like that word because it's tied to so many negative connotations.

Effectively, we're exploring something like a locking mechanism for TAO. This means that if you have some TAO and you want to participate in network decisions, you can stake some of it to a Foundation Validator in exchange for voting rights. When you do this, you would also earn dividends on your TAO, and although we would keep a small percentage to keep the Foundation running you would earn back the majority in addition to receiving voting rights, so you can vote on any changes that we make.

Q: Where will Bittensor be in 24 months?


There are a few things that I would like to see for Bittensor in 24 months.

The first is conquering the problem of text analysis - we want to be able to say confidently that this mechanism works. We want to create a model that is close to or better than state-of-the-art in text generation or analysis.

Secondly, I'd love to see us moving into other modalities by next year, and at that point, we will want to expand even further. For example, one concept we're working on is the creation of subnetworks. This means you would have - within the same blockchain - a subnetwork for validating text, image, audio and so on. Another extension would be having multimodal models - things like stable diffusion or DALL-E - deployed on the network. We could make them even more powerful.

Finally, there is the decentralization of TAO - which means being able to list legally on an exchange and move forward from there.

Subscribe to Bittensor

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.