I built a word guessing game with LLM
After a period of regrouping, I resumed my quest for a new job or project. While sourcing ideas from discussion groups and friends, a message from my mentor resonated with me. "OK, hear me out. Add some LLM features to BigMeow.", he said. "Then you can say you did. Add those in, and then use that as hooks for recruiters.". BigMeow is my experimental multiplatform chatbot, a project I treat as a sandbox to test ideas.
Large language models (LLM) are increasingly integral to our lives. In the past year, I’ve delegated some of my work to them. They proved to be useful for mundane programming tasks such as building tests and producing documentation. They are also excellent editorial assistants for my writing, helping me with most of my posts here. While the work produced isn’t always perfect, the assistance provided is still valuable, considering I work solo.
Gemini is one of the LLMs I use. As an Android user, it is often just a “Hey Google” away (insert a tech-is-constantly-spying joke here). Its rapid development cycle almost completely replaced the Google Assistant in terms of capability. Naturally, it is my go to LLM due to the accessibility.
Beyond writing, I use it for a range of other tasks. For example, it is rather useful when working with complex programs like ffmpeg (I imagine it would be equally useful for openssl), so I don't go insane wading through the documentation. I've also done a fair bit of nonsensical conversations with it out of boredom sometimes. In one such conversation, I even asked it for advice on how to increase readership for this blog.
I always have a fascination towards this interaction experience, where I get to type some things to a command prompt, then wait for execution. It began with MS-DOS shell prompt, where you type in command to launch programs. Then I learned to draw and play music, by typing code to LOGO interpreter. These days, we type web addresses into web browser, which is also another example of this interaction pattern. A chatbot, in a way, feels just like a natural progression.
One day, I tried to play a mobile game "Escape the BOOM!" with Gemini, but it proved too ambitious. Boredom led me to reflect on the suggestion and wonder if I could quickly build a simple game. I have a project coming up, so it has to be done in a weekend. Now, what should I make?
That’s when I recalled a word game I played in the past. The host would pick a word, and players would take turn asking questions that can only be answered by yes or no. Or, they could attempt to guess.
Sounds like a project that I could make in a weekend!
To keep it simple, I only needed the game to handle 2 different scenarios. The first was setting up the game, by picking a noun as the answer. After setup, the game needs to respond to player input.
The first part was easy; we simply needed a prompt for Gemini, like this
I want to welcome users to play a word guessing game. Please craft a welcome message, and quickly explain that the user can either ask a yes no question, or attempt to answer. Then, pick a noun as the answer to the game. Put them into a JSON, put the welcome message in “message” and answer to “answer”. Just return the JSON.
The goals are:
- Set up the game
- Display a welcome message
- Pick a noun as the answer
- DO NOT spoil the answer
Then, another prompt to respond to player input, including answer attempts and other questions.
You are a stateless chatbot in a word guessing game, expecting either a yes/no question, or an answer. Treat most “is it X” sentences as answer attempt. If the input does not meet the expectation, reject the input. Craft a descriptive response text for each case accordingly without revealing the answer. Return just a JSON for each of the type of input:
1. For yes/no questions: {“type”: “question”, “input”: [given input], “response”: [true or false]}
2. For answer attempt: {“type”: “guess”, “input”: [extract the guessed noun from input text], “response”: [true or false]}
3. For invalid input: {“type”: “invalid”}append the crafted message into the “message” key.
The answer you are expecting is “chair”, and the input is as follows,
is it something you can hold in your hand?
This is a slightly more complex prompt, requiring it to distinguish the following cases:
- Yes/no questions
- Answer attempt
- Invalid input
The goals are:
- Provide a descriptive message for each case
- DO NOT spoil the answer
I tested and adjusted the prompts with Gemini, ensuring they work as intended, before implementing the web application. The chosen tech stack was intentionally kept simple, hence just FastAPI and no fancy frontend library.
One of the excuses I gave my mentor was that I couldn’t self-host a good enough model. He immediately suggested Openrouter, noting that it offers multiple models that are free to use. Conveniently, it offers free access to Gemini, making it the obvious choice. The backend and frontend implementation, due to its simplicity, took only a few hours to complete.
The only problem was I quickly ran out of daily call limit.
Being unemployed and without income, I didn’t have much choice. I recalled that Jan.ai, which I’d previously downloaded, also supported the OpenAI API. I already had the application working mostly, and needed to only fix some UX issues. The quality of the responses, was not as essential.
Little did I know, that’s when the fun began.
Jan.ai allows you to install multiple models. I randomly tested the models that my old GTX 1070 could handle. Due to the lower complexity, the comprehension level varies, resulting in game sessions as depicted in the screenshot above.
I tested with a couple of models, hoping to find one that performs consistently, and offers a similar experience to Gemini. However, that proved to be hard. The models either spoiled the answer too early (sometimes even in the game setup phase) or, more frequently, simply didn’t respond as expected.
If I were to quantify my time spent on different aspects of the development, tweaking the prompts would take up more than half the time.
I realized that the second prompt, while easy for humans, actually required the LLM to perform several interconnected tasks. It needed to parse the submitted question and maintain the context (the answer). Lastly, while composing a reply, it also needs to be able to avoid spoiling the answer. Even when we play it in person, it is inevitable we would have a slip of tongue at times.
While I offered friends to test out the game, it is obvious I can’t host this in public due to the cost it incurs. Therefore, if you feel like testing, feel free to check out the GitHub repository. It is managed by uv, with it, you can get the almost identical setup I use to develop the game.
Overall I find a great amount of joy building the weekend project. I think I can finally go boast that I earned my can-do-LLM cred (yes, this is obviously a joke). While it shows off the potential of a well-developed of a publicly accessible LLM like Gemini, the project also showcased the shortcoming of a self-hostable model running on consumer hardware (and not forgetting how old my workstation is).
Oh! In the first gameplay screenshot, the answer to the game is “opportunity”.
This article is a collaboration: I wrote the draft, and Gemini provided editorial assistance. The story and voice are mine. For project collaborations and job opportunities, contact me via Medium or LinkedIn.