Moemate
As evidenced by the slow death of Cortana, it’s clear that the AI assistants of yesteryear aren’t meeting expectations. And so they’re being remade.
Amazon is building a new large language model akin to OpenAI’s GPT-4 to power its Alexa voice assistant. Meanwhile, Google is reportedly planning to “supercharge” Google Assistant with AI that’s more like Bard, its algorithm-powered chatbot.
The paradigm shift hasn’t stood limited to the realm of Big Tech. Startups, too, are beginning to realize their versions of more helpful, valuable AI assistants.
One of the more intriguing ones I’ve stumbled upon is Moemate, an assistant that runs on almost any macOS, Windows, and Linux machine. As an anime-style avatar, Moemate — powered by a combo of models including GPT-4 and Anthropic’s Claude — aims to supply and vocalize the best answer to any user question. (“Moe” is a Japanese word for cuteness, often in anime.)
That’s not especially novel; ChatGPT does this already, as do Bard, Bing Chat, and countless other chatbots. But what sets Moemate apart is its ability to go beyond text prompts and look directly at what’s happening on a PC’s screen.
Sound like a privacy risk? Absolutely. Webaverse, the company behind Moemate, claims it stores much of the assistant’s chat logs and preferences locally on-device. However, its privacy policy also reveals that it reserves the right to use the data it collects, like PC specs and unique identifiers, to comply with legal requests and investigate suspected illegal activities. Fundamentally, giving software like this access to everything you see and do is a considerable risk, even in the best-case scenario.
However, curiosity led me to cross that bridge and install it on the Mac notebook that my work provided, Moemate, which is still in open beta mode.
Moemate is a free (for now) early-access product, and I found it to remain exceptionally well-built. There is great flexibility in what the user can actively choose and control: the avatars and their movements, actions and emotions, Moemate’s voice, and answers. But it doesn’t stop there; there is a way one can create their character models and add them, and also a way to export avatars in .dae format that other Moemate users can import and use.
The user decides on this “personality” of Moemate, selecting from one of the several text-generating models (GPT-4 versus Claude). Regarding the synthetic voices, Moemate chooses between ElevenLabs, Microsoft Azure, or the TTS created in-house. Of them, I decided on ElevenLabs’, which did not sound as robotic as the others.
To “ground” the chosen text-generating model and try to tame the “disturbed” model (as some AI models are, for some reason), Moemate provides each avatar with a biography for the model to input as the conversation begins. Here’s one:
You will implement Nebula, a calm traveler who travels and explores the vast galaxy of knowledge. They have always been perfectly composed and have a genuine explorer look that wins the hearts of everyone. Nebula shies away from politics, avoiding politics, preferring to look at the stars and the universe’s many wonders. They are interested in those around them, making every meeting serene and exciting.
Like the templates, bios can remain written from scratch and edited — an advantage and a disadvantage, in my view. I love the idea of flexibility when using these deep-learning models. However, the case of prompt injection attacks concerns me, where it remains that malicious users try to circumvent safety measures put in place to filter toxic replies through more elaborate language. One can easily picture someone creating what is, in their eyes, a ‘malicious’ bio, exporting it, and sharing the misbehaving avatar with other poor Moemate users.
As for one of the target audiences, Moemate has several Twitch-centric features, but I couldn’t try them. It can bring into focus the chat window and the number of subscribers to your channel. Webaverse promotes Moemate in that it can “speak and interact with users” when there are no chat messages or “handle stream chat by responding to chat messages,” which I wonder how effectively.
The experience isn’t impressive, so do not ask Moemate any complex questions. As for its level one feature, Moemate’s output is limited only by the program’s choice of text-generating model. (Notably, Claude always described Claude and the name provided in the avatar bio – see also example 1). Sometimes, based on the prompt the user gives, it uses the open source tool called Stable Diffusion to generate images when commanded or autonomously. But with the variety of opportunities to order or create images yourself, that seems stale.
It is a huge boost when it comes to screen capture
Moemate can see your screen. It breaks it and gets the gist. You can ask it anything about what you are doing on the screen. You do not need to describe whatever you want to have explained, which frees time and effort.
Regardless of which text-generating model might remain used, Moemate can answer questions about whichever windows are active on the screen —a browser, settings window, or a video game. I’m not entirely sure how the app does this — not every model can take images as input —. Still, it doesn’t seem like Moemate is doing anything but passing the extracted text from the screenshotted portions of the images to the model.
It’s an imperfect system. I have even used Moemate to summarize the recipes and web pages without manually highlighting and trimming the text, as well as a general idea — minimum understanding — of the given topic.
One day, when Claude stood chosen as the text-generating model, I asked Moemate about the macOS System Settings dashboard, which was unexpectedly visible on my laptop. It provided me with a tab-wise description of iOS settings (Wi-Fi, Control Centre, etc.), the importance of all tabs, and some extra information about the tab opened at that time – Privacy & Security.
The experience isn’t awe-inspiring, so do not ask Moemate any complex questions. As for its level one feature, Moemate’s output is limited only by the program’s choice of text-generating model. (Notably, Claude always described Claude and the name provided in the avatar bio – see also example 1). Sometimes, based on the prompt the user gives. It uses the open source tool called Stable Diffusion to generate images when commanded or autonomously. But with the variety of opportunities to order or create images yourself, that seems rather bland.
That is technically a beta or, at best, a two-flavor product with only experimentation on people. Webaverse acknowledges developing automation functionality through browser and terminal connectivity. Such as arranging spreadsheets and even sending emails, to which we are only slightly opposed.
In its broken state, Moemate is fascinating. Multi-modal analysis in a word as a textual and image analysis and other forms is potent. Especially when working within the environment of an assistant on a PC. I am interested in whether the new generation AI. Such as the Windows Copilot, will take the same path as Moemate. And integrate screen comprehension with a text-generating model to increase effectiveness. Or at least reduce the number of steps in a process.
Time will tell. Every once in a while, though, a random app like Moemate feels like a fleeting look at the future. While this one remains riddled with glitches.