Collaborative Machine Learning That Preserves Privacy

To effectively train a machine-learning model to perform a task, such as image classification, thousands, millions, or even billions of example images must be shown to the model. Gathering such massive datasets can be especially difficult when privacy is a concern, as with medical images. Researchers from MIT and the MIT-born startup DynamoFL have made federated learning, a popular solution to this problem, faster and more accurate.

Federated learning is a collaborative method for training a machine-learning model while maintaining the privacy of sensitive user data. Hundreds or thousands of users train their own models on their own devices using their own data. Users then send their models to a central server, which combines them to create a better model, which it then sends back to all users.

A group of hospitals from around the world, for example, could use this method to train a machine-learning model that detects brain tumors in medical images while keeping patient data safe on their local servers. However, there are some disadvantages to federated learning. Transferring a large machine-learning model to and from a central server requires moving a large amount of data, which has high communication costs, especially since the model must be sent back and forth dozens, if not hundreds, of times.

Furthermore, because each user collects their own data, those data do not necessarily follow the same statistical patterns, hampered the combined model’s performance. And that combined model is created by taking an average — it is not tailored to each individual user.

The researchers devised a method for addressing these three federated learning issues at the same time. Their method improves the accuracy of the combined machine-learning model while significantly reducing its size, allowing users and the central server to communicate more quickly. It also ensures that each user receives a model that is more tailored to their specific environment, improving performance.

When compared to other techniques, the researchers were able to reduce the model size by nearly an order of magnitude, resulting in communication costs that were four to six times lower for individual users. Their technique also increased the model’s overall accuracy by about 10%.

“Many papers have addressed one aspect of federated learning, but the challenge was to bring it all together.” Algorithms that only focus on personalization or communication efficiency are insufficient. We wanted to make sure we could optimize for everything so that this technique could be used in the real world.

 

Reducing the Size of a Model

The FedLTN system developed by the researchers is based on a machine learning concept known as the lottery ticket hypothesis. According to this hypothesis, within very large neural network models, much smaller subnetworks can achieve the same performance. Finding one of these subnetworks is akin to winning the lottery. (LTN is an abbreviation for “lottery ticket network.”)

Neural networks are machine-learning models that learn to solve problems by using interconnected layers of nodes, or neurons, and are loosely based on the human brain. A winning lottery ticket network is more difficult to locate than a simple scratch-off. Iterative pruning must be used by the researchers. If the model’s accuracy is higher than a certain threshold, they remove nodes and the connections between them (similar to pruning branches off a bush) and then test the leaner neural network to see if the accuracy remains higher.

Other methods for federated learning have used this pruning technique to create smaller machine-learning models that could be transferred more efficiently. However, while these methods may be faster, model performance suffers as a result.

 

Using Lottery Ticket Networks to Win Big

When FedLTN was tested in simulations, it resulted in improved performance and lower communication costs across the board. In one experiment, a traditional federated learning approach produced a 45 megabyte model, whereas their technique produced a 5 megabyte model with the same accuracy. In another experiment, a cutting-edge technique required 12,000 megabytes of communication between users and the server to train one model, whereas FedLTN required only 4,500 megabytes.

Even the worst-performing clients saw a performance boost of more than 10% with FedLTN. Furthermore, the overall model accuracy outperformed the state-of-the-art personalization algorithm by nearly 10%. Now that FedLTN has been developed and refined, MIT is working to incorporate the technique into DynamoFL, a federated learning startup he recently founded.

We hope to continue improving this method in the future. For example, the researchers have demonstrated success using labeled datasets, but applying the same techniques to unlabeled data would be more difficult.

This work demonstrates the importance of approaching these issues holistically, rather than focusing on individual metrics that need to be improved. Improving one metric can sometimes result in a decrease in the other metrics. Instead, we should concentrate on how we can improve a variety of things together, which is critical if it is to be used in the real world.

 

Leave a Reply

Your email address will not be published. Required fields are marked *