Stanford researcher discusses UMI gripper and diffusion AI models

25.05.2024 17:30

The Robot Report recently spoke with Ph.D. student Cheng Chi about his research at Stanford University and recent publications about using diffusion AI models for robotics applications. He also discussed the recent universal manipulation interface, or UMI gripper, project, which demonstrates the capabilities of diffusion model robotics.

The UMI gripper was part of his Ph.D. thesis work, and he has open-sourced the gripper design and all of the code so that others can continue to help evolve the AI diffusion policy work.

AI innovation accelerates

How did you get your start in robotics?

Stanford researcher Cheng Chi. | Credit: Huy Ha

I worked in the robotics industry for a while, starting at the autonomous vehicle company Nuro, where I was doing localization and mapping.

And then I applied for my Ph.D. program and ended up with my advisor Shuran Song. We were both at Columbia University when I started my Ph.D., and then last year, she moved to Stanford to become full-time faculty, and I moved [to Stanford] with her.

For my Ph.D. research, I started as a classical robotics researcher, and I started working with machine learning, specifically for perception. Then in early 2022, diffusion models started to work for image generation, that’s when DALL-E 2 came out, and that’s also when Stable Diffusion came out.

I realized the specific ways which diffusion models could be formulated to solve a couple of really big problems for robotics, in terms of end-to-end learning and in the actual representation for robotics.

So, I wrote one of the first papers that brought the diffusion model into robotics, which is called diffusion policy. That’s my paper for my previous project before the UMI project. And I think that’s the foundation of why the UMI gripper works. There’s a paradigm shift happening, my project was one of them, but there are also other robotics research projects that are also starting to work.

A lot has changed in the past few years. Is artificial intelligence innovation is accelerating?

Yes, exactly. I experienced it firsthand in academia. Imitation learning was the dumbest thing possible you could do for machine learning with robotics. It’s like, you teleoperate the robot to collect data, the data is paired with images and the corresponding actions.

In class, we’re taught that people proved that in this paradigm of imitation learning or behavior, cloning doesn’t work. People proved that errors grow exponentially. And that’s why you need reinforcement learning and all the other methods that can address these limitations.

But fortunately, I wasn’t paying too much attention in class. So I just went to the lab and tried it, and it worked surprisingly well. I wrote the code, I applied the diffusion model to this and for my first task; it just worked. I said, “That’s too easy. That’s not worth a paper.”

I kept adding more tasks like online benchmarks, trying to break the algorithm so that I could find a smart angle that I could improve on this dumb idea that would give me a paper, but I just kept adding more and more things, and it just refused to break.

So there are simulation benchmarks online. I used four different benchmarks and just tried to find an angle to break it so that I could write a better paper, but it just didn’t break. Our baseline performance was 50% to 60%. And after applying the diffusion model to that, it was like 95%. So it was a jump in terms of these. And that’s the moment I realized, maybe there’s something big happening here.

The first diffusion policy research at Columbia was to push a T into position on a table. | Credit: Cheng Chi

How did those findings lead to published research?

That summer, I interned at Toyota Research Institute, and that’s where I started doing real-world experiments using a UR5 [cobot] to push a block into a location. It turned out that this worked really well on the first try.

Normally, you need a lot of tuning to get something to work. But this was different. When I tried to perturb the system, it just kept pushing it back to its original place.

And so that paper got published, and I think that’s my proudest work, I made the paper open-source, and I open-sourced all the code because the results were so good, I was worried that people were not going to believe it. As it turned out, it’s not a coincidence, and other people can reproduce my results and also get very good performance.

I realized that now there’s a paradigm shift. Before [this UMI Gripper research], I needed to engineer a separate perception system, planning system, and then a control system. But now I can combine all of them with a single neural network.

The most important thing is that it’s agnostic to tasks. With the same robot, I can just collect a different data set and train a model with a different data set, and it will just do the different tasks.

Obviously, collecting the data set part is painful, as I need to do it 100 to 300 times for one environment to get it to work. But in actuality, it’s maybe one afternoon’s worth of work. Compared to tuning a sim-to-real transfer algorithm takes me a few months, so this is a big improvement.

Submit your presentation idea now.

UMI Gripper training ‘all about the data’

When you’re training the system for the UMI Gripper, you’re just using the vision feedback and nothing else?

Just the cameras and the end effector pose of the robot — that’s it. We had two cameras: one side camera that was mounted onto the table, and the other one on the wrist.

That was the original algorithm at the time, and I could change to another task and use the same algorithm, and it would just work. This was a big, big difference. Previously, we could only afford one or two tasks per paper because it was so time-consuming to set up a new task.

But with this paradigm, I can pump out a new task in a few days. It’s a really big difference. That’s also the moment I realized that the key trend is that it’s all about data now. I realized after training more tasks, that my code hadn’t been changed for a few months.

The only thing that changed was the data, and whenever the robot doesn’t work, it’s not the code, it’s the data. So when I just add more data, it works better.

And that prompted me to think, that we are into this paradigm of other AI fields as well. For example, large language models and vision models started with a small data regime in 2015, but now with a huge amount of internet data, it works like magic.

The algorithm doesn’t change that much. The only thing that changed is the scale of training, and maybe the size of the models, and makes me feel like maybe robotics is about to enter that that regime soon.

Two UR cobots equipped with UMI grippers demonstrate the folding of a shirt. | Credit: Cheng Chi video

Can these different AI models be stacked like Lego building blocks to build more sophisticated systems?

I believe in big models, but I think they might not be the same thing as you imagine, like Lego blocks. I suspect that the way you build AI for robotics will be that you take whatever tasks you want to do, you collect a whole bunch of data for the task, run that through a model, and then you get something you can use.

If you have a whole bunch of these different types of data sets, you can combine them, to train an even bigger model. You can call that a foundation model, and you can adapt it to whatever use case. You’re using data, not building blocks, and not code. That’s my expectation of how this will evolve.

But simultaneously, there’s a there’s a problem here. I think the robotics industry was tailored toward the assumption that robots are precise, repeatable, and predictable. But they’re not adaptable. So the entire robotics industry is geared towards vertical end-use cases optimized for these properties.

Whereas robots powered by AI will have different sets of properties, and they won’t be good at being precise. They won’t be good at being reliable, they won’t be good at being repeatable. But they will be good at generalizing to unseen environments. So you need to find specific use cases where it’s okay if you fail maybe 0.1% of the time.

Safety versus generalization

Robots in industry must be safe 100% of the time. What do you think the solution is to this requirement?

I think if you want to deploy robots in use cases where safety is critical, you either need to have a classical system or a shell that protects the AI system so that it guarantees that when something bad happens, at least there’s a worst-case scenario to make sure that something bad doesn’t actually happen.

Or you design the hardware such that the hardware is [inherently] safe. Hardware is simple. Industrial robots for example don’t rely that much on perception. They have expensive motors, gearboxes, and harmonic drives to make a really precise and very stiff mechanism.

When you have a robot with a camera, it is very easy to implement vision servoing and make adjustments for imprecise robots. So robots don’t have to be precise anymore. Compliance can be built into the robot mechanism itself, and this can make it safer. But all of this depends on finding the verticals and use cases where these properties are acceptable.

The post Stanford researcher discusses UMI gripper and diffusion AI models appeared first on The Robot Report.

Партнёры Smi24.net

Все новости за 24 часа

Life24.pro

Вскрылись жуткие подробности смерти 24-летней звезды «Дома-2»: несколько дней мертвая пролежала в квартире

Без ошибки определяем точный день зачатия

Концерт Тимберлейка в Стамбуле превратился в хаос: Мот рассказал о давке, сломанных заборах и драках

В Москве прошла седьмая премия в области здоровья и красоты THE MEDICAL STARS & BEAUTY AWARDS

Today24.pro

Trump’s threatened 40% tariff on ‘transshipped’ goods tries to target China and its manufacturing strength

OpenAI launches GPT-5, its most powerful AI yet—will it be enough to stay ahead in today’s ruthless AI race?

Report: AC Milan’s Christian Pulisic set to team up with $87 million Manchester United star

Man Utd have agreed deal with AC Milan for £40m star's exit, await player decision - report

News24.pro

ГК «КОРТРОС» — в числе лидеров страны по объему ввода жилья

«Мысли» под прицелом: нейросетевые прорывы России — новый вызов кибербезопасности?

Трамвай «Славянка» получил первые тяговые подстанции

Будущее уже наступило

Game24.pro

Modders are trying their hardest to add an NVMe SSD to the Switch 2, which is both impressive and something I'm not going to do

Mafia: The Old Country получила положительные оценки в Steam

Steam for Chromebooks is getting axed in 2026 instead of exiting its 4-year beta

Находи идеальные места для персонажей-фигурок в «Is This Seat Taken?»

Russia24.pro

Генерал МО Галимуллин получил условный срок за растрату и служебный подлог

«Бежим за Мечту — Ходить»: подростки на протезах пробегут марафон в Екатеринбурге

Без знаний не останутся: дети звёзд поступили в самые престижные вузы Москвы

News-life

При участии спецназа Росгвардии в Москве задержаны 300 нарушителей миграционного законодательства (видео)

ГК «КОРТРОС» — в числе лидеров страны по объему ввода жилья

Водолаз назвал вероятную причину гибели утонувшего в Болгарии режиссера Бутусова

Героическое участие армян в СВО. Часть шестнадцатая

Ru24.net

На Троицкой линии метро совершено 12 млн поездок: новые станции к 2025 году

Землетрясение 6,1 балла в Балыкесире: толчки ощущались в Стамбуле

Водолаз назвал вероятную причину гибели утонувшего в Болгарии режиссера Бутусова

Сергей Собянин. Главное за день

News.tennis

Павлюченкова уступила 94-й ракетке мира на турнире WTA в Цинциннати

Хачанов поднялся до 12-й позиции в мировом рейтинге ATP

Рахимова покинула турнир в Цинциннати, уступив Саккари

Александрова снизилась в рейтинге в борьбе за титул WTA

29ru.net

Причиной гибели Бутусова могло стать мертвое волнение в Черном море

Землетрясение 6,1 балла в Балыкесире: толчки ощущались в Стамбуле

В Перми количество заправок электромобилей выросло на 8 % за год

С наступающим новым городом // Мэрия рассказала и показала достоинства московского транспорта

Музыкальные новости

Poisk-music.ru

Никита Пресняков удалился в Испанию после того, как жена стерла их фото из соцсетей

Ровно 50 лет назад не стало великого композитора Дмитрия Шостаковича, выступавшего в Воронеже

Процесс сошел с рельсов // Верховный суд определил пересмотреть дело о наезде Kia на трамвай

Песня заслуженного работника культуры Крыма Натальи Прокопенко удостоена приза зрительских симпатий на фестивале в Москве

Ria.city

Продвижение Песни в Импульсе Яндекс Музыка.

Добро в каждой чашке: Елизавета Боярская, фонд «Жизнь в Движении» и сеть кофеен «Ягода» запускают авторский кофе для помощи детям с ОВЗ

"Динамо" Карпина упустило победу над "Сочи" в конце матча

«Бежим за Мечту — Ходить»: подростки на протезах пробегут марафон в Екатеринбурге

Rss.plus

Сергей Собянин рассказал, как в Москве выбирают деревья для озеленения города

Ближе Дубая, чище Байкала и намного дешевле Тая: морской курорт в России, где не стыдно провести свой отпуск

Собянин: Около 200 социальных объектов будет построено в Москве в ближайшие годы

Собянин: Около 13 млн деревьев и кустарников высадили в Москве за 13 лет

Auto.russia24.pro

Процесс сошел с рельсов // Верховный суд определил пересмотреть дело о наезде Kia на трамвай

Дилеры по всей России получают Lada Iskra. Пока отгружают только седаны в цвете «Капитан» и универсалы в цвете «Табаско»

Москва: Новая эра зарядных станций для электромобилей с поддержкой инвесторов

Участники АНО «СВОй спорт» организовали открытую тренировку по боксу на стадионе «Москвич» в День физкультурника

Putin.russia24.pro

Путин обратился с приветствием к участникам форума «Машук»

МИД РФ: Киев может устроить провокации перед встречей Путина и Трампа на Аляске

Медиа сообщили о планах Зеленского в ответ на встречу Путина и Трампа

Bloomberg сообщает, что ЕС хочет провести переговоры с Трампом до встречи с Путиным

Health.russia24.pro

Будут ли магнитные бури сегодня, 10 августа 2025 года?

Авторская методика Натальи Бородиной – путь к восстановлению памяти и защите от деменции

«Бежим за Мечту — Ходить»: подростки на протезах пробегут марафон в Екатеринбурге

Zelensky.russia24.pro

NYT: своими заявлениями Зеленский может разозлить Трампа и навлечь на себя

В Киеве сделали заявление о территориальных уступках

В Киеве заявили о тайном приказе Зеленского перед Аляской

В Киеве раскрыли тайный приказ Зеленского перед саммитом США и РФ на Аляске

Sport.russia24.pro

Московское «Торпедо» и «Спартак» из Костромы обменяются голами, «Пюник» возьмет три очка. Экспресс дня 11 августа: прогноз и ставка

Akon в Москве: полная арена и феерия хитов на главном концерте года

День физкультурника отметили спортивными шествиями по всей стране

«Торпедо» и «Спартак» Кострома сразятся за три очка. «Торпедо» Москва — «Спартак» Кострома: прогноз и ставка

Person.russian.city

Собянин поздравил московских строителей с профессиональным праздником

Сергей Собянин поздравил строителей Москвы с профессиональным праздником

Сергей Собянин. Главное за день

Собянин рассказал о московских НКО, помогающих участникам СВО и их семьям

Ecology.russia24.pro

Москва: Новая эра зарядных станций для электромобилей с поддержкой инвесторов

Без морщин. Эксперты рассказали о плюсах изменения климата в Москве

За прошедшие сутки в России ликвидировано 28 лесных пожаров

Специалист указал необходимое количество редкоземельных металлов для производства электромобиля

29ru.net

Овчинский: в новостройке по реновации в районе Алексеевский завершаются работы нулевого цикла

Замена хрусталика: советы офтальмолога о рисках и лазерной корректировке

С наступающим новым городом // Мэрия рассказала и показала достоинства московского транспорта

На Троицкой линии метро совершено 12 млн поездок: новые станции к 2025 году

Severodvinsk.ws

Полицейский погиб при задержании поджигателя релейного шкафа под Архангельском

Чёрный день календаря. 8 августа: Архангельская трагедия. Как ошибка пилота погубила рейс Як-40

Льготные ипотеки на Дальнем Востоке активно получают участники СВО и сотрудники ОПК

В Алтайском крае не будут проводить проверку на предмет чрезмерного роста тарифов на ЖКУ

Sevpoisk.ru

На консервном заводе в Симферополе изъяли 12 тонн небезопасного сырья

Анастасия Гридчина: В Симферополе, в стенах гостеприимного Дома дружбы народов, состоялась торжественная презентация военно-патриотических изданий, увидевших свет в 2025 году благодаря усилиям Медиацентра им. И. Гаспринского

Подросток на Мersedes сбил пешехода на трассе в Керчь

Прогноз погоды в Крыму на 10 августа

103news.com

Сергей Собянин. Главное за день

Землетрясение 6,1 балла в Балыкесире: толчки ощущались в Стамбуле

Замена хрусталика: советы офтальмолога о рисках и лазерной корректировке

В Перми количество заправок электромобилей выросло на 8 % за год

Агрегатор новостей 24СМИ