Data is the new soil, and in this fertile new ground, MIT researchers are planting more than just pixels. By using synthetic images to train machine learning models, a team of scientists recently surpassed results obtained from traditional "real- This approach considers multiple images spawned from identical text prompts as positive pairs, providing additional information during training, not just adding more diversity but specifying to the vision system which images are alike and which are different. Remarkably, StableRep outshone the prowess of top-tier models trained on real images, such as SimCLR and CLIP, in extensive datasets.
"While StableRep helps mitigate the challenges of data acquisition in machine learning, it also ushers in a stride towards a new era of AI training techniques. The capacity to produce high-caliber, diverse synthetic Yet, the path ahead isn’t without its potholes. The researchers candidly address several limitations, including the current slow pace of image generation, semantic mismatches between text prompts and the resultant images, potential amplification of biases, and complexities in image attribution, all of which are imperative to address for future advancements. Another issue is that StableRep requires first training the generative model on large-scale real data. The team acknowledges that starting with real data remains a necessity; however, when you have a good generative model, you can repurpose it for new tasks, like training recognition models and visual representations.
The team notes that they haven’t gotten around the need to start with real data; it’s just that once you have a good generative model you can repurpose it for new tasks, like training recognition models and visual representations.
While StableRep offers a good solution by diminishing the dependency on vast real-image collections, it brings to the fore concerns regarding hidden biases within the uncurated data used for these text-to-image models. The choice of text prompts, integral to the image synthesis process, is not entirely free from bias, "indicating the essential role of meticulous text selection or possible human curation," says Fan.
"Using the latest text-to-image models, we’ve gained unprecedented control over image generation, allowing for a diverse range of visuals from a single text input. This surpasses real-world image collection in efficiency and versatility. It proves especially useful in specialized tasks, like balancing image variety in long-tail recognition, presenting a practical supplement to using real images for training," says Fan. "Our work signifies a step forward in visual learning, towards the goal of offering cost-effective training alternatives while highlighting the need for ongoing improvements in data quality and synthesis."
"One dream of generative model learning has long been to be able to generate data useful for discriminative model training," says Google DeepMind researcher and University of Toronto professor of computer science David Fleet, who was not involved in the paper. "While we have seen some signs of life, the dream has been elusive, especially on large-scale complex domains like high-resolution images. This paper provides compelling evidence, for the first time to my knowledge, that the dream is becoming a reality. They show that contrastive learning from massive amounts of synthetic image data can produce representations that outperform those learned from real data at scale, with the potential to improve myriad downstream vision tasks."
Fan is joined by Yonglong Tian PhD ’22 as lead authors of the paper, as well as MIT associate professor of electrical engineering and computer science and CSAIL principal investigator Phillip Isola; Google researcher and OpenAI technical staff member Huiwen Chang; and Google staff research scientist Dilip Krishnan. The team will present StableRep at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans.
Related Links
Related Topics
Related Articles
A study by philosopher Kevin Dorst explains how political differences can result from a process of "rational polarization."At a "Heritage Meets Heritage" event, MIT students enjoy conversations, trivia, and delicacies from around the world.
At the 2023 Clean Energy Education and Empowerment symposium, participants emphasize working together to achieve net zero emissions by 2050.
Award recognizes professor’s synaptic plasticity research, its translation to potential amblyopia and autism treatments, and his career of mentorship.
The new sensor measures heart and breathing rate from patients with sleep apnea and could also be used to monitor people at risk of opioid overdose.