Google Trained a Trillion-Parameter AI Language Model

2021年1月14日 12:30

An anonymous reader quotes a report from VentureBeat: Google researchers developed and benchmarked techniques they claim enabled them to train a language model containing more than a trillion parameters. They say their 1.6-trillion-parameter model, which appears to be the largest of its size to date, achieved an up to 4 times speedup over the previously largest Google-developed language model (T5-XXL). As the researchers note in a paper detailing their work, large-scale training is an effective path toward powerful models. Simple architectures, backed by large datasets and parameter counts, surpass far more complicated algorithms. But effective, large-scale training is extremely computationally intensive. That's why the researchers pursued what they call the Switch Transformer, a "sparsely activated" technique that uses only a subset of a model's weights, or the parameters that transform input data within the model. In an experiment, the researchers pretrained several different Switch Transformer models using 32 TPU cores on the Colossal Clean Crawled Corpus, a 750GB-sized dataset of text scraped from Reddit, Wikipedia, and other web sources. They tasked the models with predicting missing words in passages where 15% of the words had been masked out, as well as other challenges, like retrieving text to answer a list of increasingly difficult questions. The researchers claim their 1.6-trillion-parameter model with 2,048 experts (Switch-C) exhibited "no training instability at all," in contrast to a smaller model (Switch-XXL) containing 395 billion parameters and 64 experts. However, on one benchmark -- the Sanford Question Answering Dataset (SQuAD) -- Switch-C scored lower (87.7) versus Switch-XXL (89.6), which the researchers attribute to the opaque relationship between fine-tuning quality, computational requirements, and the number of parameters. This being the case, the Switch Transformer led to gains in a number of downstream tasks. For example, it enabled an over 7 times pretraining speedup while using the same amount of computational resources, according to the researchers, who demonstrated that the large sparse models could be used to create smaller, dense models fine-tuned on tasks with 30% of the quality gains of the larger model. In one test where a Switch Transformer model was trained to translate between over 100 different languages, the researchers observed "a universal improvement" across 101 languages, with 91% of the languages benefitting from an over 4 times speedup compared with a baseline model. "Though this work has focused on extremely large models, we also find that models with as few as two experts improve performance while easily fitting within memory constraints of commonly available GPUs or TPUs," the researchers wrote in the paper. "We cannot fully preserve the model quality, but compression rates of 10 to 100 times are achievable by distilling our sparse models into dense models while achieving ~30% of the quality gain of the expert model."

FTC Settlement With Ever Orders Data and AIs Deleted After Facial Recognition Pivot

Slashdot

著者: BeauHD

2021年1月14日 09:02

The maker of a defunct cloud photo storage app that pivoted to selling facial recognition services has been ordered to delete user data and any algorithms trained on it, under the terms of an FTC settlement. TechCrunch reports: The regulator investigated complaints the Ever app -- which gained earlier notoriety for using dark patterns to spam users' contacts -- had applied facial recognition to users' photographs without properly informing them what it was doing with their selfies. Under the proposed settlement, Ever must delete photos and videos of users who deactivated their accounts and also delete all face embeddings (i.e. data related to facial features which can be used for facial recognition purposes) that it derived from photos of users who did not give express consent to such a use. Moreover, it must delete any facial recognition models or algorithms developed with users' photos or videos. This full suite of deletion requirements -- not just data but anything derived from it and trained off of it -- is causing great excitement in legal and tech policy circles, with experts suggesting it could have implications for other facial recognition software trained on data that wasn't lawfully processed. Or, to put it another way, tech giants that surreptitiously harvest data to train AIs could find their algorithms in hot water with the US regulator.