Natural language processing allows robots to actually be present in the world instead of just blindly following commands.
It is extremely difficult to create robots that mimic human actions. An EE Times article describes these challenges. “From a mechanics perspective, for example, bipedal locomotion (walking on two legs) is an extremely physically demanding task. In response, the human body has evolved and adapted such that the power density of human joints in areas like the knees is very high.” In other words, it is challenging for robots even to stand still.
Despite these obstacles, significant advancements are being made. For example, researchers from Oregon State University recently set a Guinness World Record for the fastest robot to complete a 100-meter dash, clocking in at under 25 seconds. Since 2017, the team has been training "Cassie" by rewarding the robot when it performs correctly using reinforcement-learning AI algorithms. The lead researcher acknowledged the record's importance: "[Now we] can make robots move robustly around the world on two legs." While impressive, the human body uses a remarkably complex sensory system to maintain balance and navigate the environment.
According to Nancy J. Cooke, an Arizona State University professor, the hardest part is building a machine that interacts with people naturally. Re-creating that in a robot is still in its infancy. That is currently one of the most challenging problems facing Our robot, GARY, and other humanoid robotic projects, like tesla’s Optimus.
However, Artificial intelligence (AI) is advancing quickly thanks to the triple exponential growth of computing power, software development, and data, which makes humanoid robots possible.
Natural language processing (NLP), particularly concerning text and text-to-image generation, is the best example of this rapid AI advancement. OpenAI's first text-generation tool, GPT-2, was released in February 2019, followed by GPT-3 in June 2020, text-to-image DALL-E in January 2021, and DALL-E 2 in April 2022. Each iteration was vastly superior to previous versions.
MidJourney and Stable Diffusion are two other companies pushing these technologies forward. The same thing is happening with text-to-video, with several new apps from Meta, Google, Synthesia, GliaCloud, and others appearing recently.
An AI-generated video called The Crow recently won the Jury Award at the Cannes Short Film Festival. To create the video, computer artist Glenn Marshall used CLIP (Contrastive Language-Image Pre-training), a text-to-image neural network also developed by OpenAI, and fed it the video frames of an existing video as an image reference. Marshall then instructed CLIP to create a video of "a painting of a crow in a desolate landscape."
Here is the video:
Of course, developing an NLP application is different from creating a robotics application. Although there are similarities between computing power, software, and data, the physical challenge of building robots that must interact with the outside world adds difficulties beyond developing software automation.
Robots require a brain.
According to AI expert Filip Piekniewski, "robots don't have anything even remotely close to a brain." That is still the case today, but NLP gives robots the first part of the brain they need to communicate with people. After all, one of the critical functions of the humanoid brain is the capacity to perceive and interpret language and translate that into appropriate responses and actions for the given context.
NLP might just be that “Secret Sauce” needed to evolve robots to their next level. From simple machines programmed to repeat the same task ad nauseam to a walking, talking, thinking machine that is not only capable of understanding the world around it but is even capable of participating in conversations and understanding human emotions.