LocalAI leverages cutting-edge peer-to-peer technologies to distribute AI workloads intelligently across your network
Share complete LocalAI instances across your network for load balancing and redundancy. Perfect for scaling across multiple devices.
Split large model weights across multiple workers. Currently supported with llama.cpp backends for efficient memory usage.
Pool computational resources from multiple devices, including your friends' machines, to handle larger workloads collaboratively.
Parallel processing
Add more nodes
Fault tolerant
Resource optimization
b3RwOgogIGRodDoKICAgIGludGVydmFsOiAzNjAKICAgIGtleTogU1hVZzM1bktxcFI2NjF0emd0Y1pYblFHUktRYXJHbFVYOFVTb2xuWTlQNAogICAgbGVuZ3RoOiA0MwogIGNyeXB0bzoKICAgIGludGVydmFsOiA5MDAwCiAgICBrZXk6IEVKWW5Od3Azam1xVHpoVTRsUWlxZ3dnZFd6cjBRU3JjNU84Nmc3MEs1bm8KICAgIGxlbmd0aDogNDMKcm9vbTogMmlmV01FOHZuRjZjRndaeXYzZG1OOFdWeWVMalpJczRRUTZOdGdWUGdkWApyZW5kZXp2b3VzOiBEazNhUGZyOEhhV1pPVVVYZGZ1clMzRktUMUQ5TW5EanVQRTJwN1BKOG9YCm1kbnM6IDZPN3RHQlNBNmxxWUxVUVpudTZNVnI5dHczMFBrMEo1bEVrU2pQYXBINkoKbWF4X21lc3NhZ2Vfc2l6ZTogMjA5NzE1MjAK
The network token can be used to either share the instance or join a federation or a worker network. Below you will find examples on how to start a new instance or a worker with this token.
Instance sharing
nodes
Model sharding
workers
Connection token
Instance load balancing and sharing
Start LocalAI in federated mode to share your instance, or launch a federated server to distribute requests intelligently across multiple nodes in your network.
Distributed model computation (llama.cpp)
Deploy llama.cpp workers to split model weights across multiple devices. This enables processing larger models by distributing computational load and memory requirements.