AI That Sees What You See: The Future is Here!

Craig Leppan · 7 January 2025 · 3 min read

Imagine AI that’s not just smart, but can actually see what you’re doing on your computer or even in real life! That’s the magic of “multimodal AI” – it’s like giving your computer superpowers.

What’s So Special About Multimodal AI?

Think of it like this: regular AI is a bit like a person who can only read. Multimodal AI is like a person who can read, see pictures, hear sounds, and even understand your gestures. This means it can help you in ways we never thought possible before.

Here’s the Wow Factor:

AI as Your Personal Helper: Imagine you’re cooking a new recipe. This AI can see what ingredients you have, what you’re doing, and guide you through each step. It’s like having a master chef right there with you!
Learning Made Easy: Struggling with a math problem? This AI can see where you’re getting stuck and offer helpful hints. It’s like having a personal tutor who knows exactly what you need.
Creativity Unleashed: Want to design a logo but can’t draw? Just describe your idea, and this AI can create it for you! It’s like having a magic wand for your imagination.

Google Gemini and ChatGPT: The Superstars of Multimodal AI

Google Gemini and ChatGPT are leading the way in this exciting new world. They can understand what you’re doing on your screen, offer suggestions, and even help you learn new things.

But Remember…

Even the smartest AI can make mistakes. Think of it like a student who’s still learning. It’s important to double-check the information and use your own judgment.

The Future is Multimodal

This technology is still new, but it’s changing the way we interact with computers. Get ready for a future where AI is not just a tool, but a partner in everything we do!

Want to Learn More?

Google AI Studio is a great place to start exploring the possibilities of multimodal AI. You can try out different tools and see what this amazing technology can do.

As always I won’t ask you to test or try something I haven’t done myself! So here is todays short 5 min loom to show a view of using Google Gemini in Sheets and with Screen share options where you can hear and see how the AI assists me via voice and can track everything I’m working on from the screen share.

Have fun and try it out: https://aistudio.google.com/u/2/live

ChatGPT Plus subscribers have access to Advanced Voice Mode with vision capabilities, allowing real-time video input and screen sharing through the ChatGPT mobile app. This feature enables users to interact with ChatGPT using live video feeds or by sharing their device screens for tasks like troubleshooting or receiving real-time assistance. (For $20 per month, I do recommend giving Plus a try..)