Eveline, my wife, embarked on an exciting yet complex challenge: organizing a running dinner for our 27 neighbors . A running dinner involves coordinating multiple homes for various courses, with guests moving between locations throughout the evening. The complexity of defining a route plan that ensures everyone meets new people, respects constraints, and maintains an enjoyable flow was more daunting than we initially thought.
Given the challenge, Eveline decided to see if an AI tool could help. My expectation? That AI would, if properly prompted, swiftly analyze the data, manage the constraints, and deliver an optimal route plan. Reality, however, proved far more intricate.
The AI Experiment
I started with OpenAI, which is my daily go-to AI engine. But I realized after several attempts that OpenAI was not up to the tasks and could not provide me with accurate solutions for our complex requests.
I approached several AI engines with the same task. I created a single PDF document with all the relevant information for the prompt: a list of the 27 houses, a dish assignment for each house, the running dinner rules (like no overlap), and some constraints. Then, I asked the AI engines to help us and propose a valid route plan. We started first with just the list of houses with their assigned dishes (simple requests), and then we added some specific requirements, such as a particular house not being allowed to be paired with another given house, etc. (complex requests).
The goal was to find an AI tool capable of navigating the complexity of social dynamics, logistical constraints, and multiple variables. The results were enlightening—and a little surprising.
| AI Engine | Ranking | Flag | Description |
|---|---|---|---|
| Gemini Pro | ⭐⭐⭐⭐ | 🟨 | Understood the problem and described the work to be done. Able to provide a valid answer for simple requests but struggled with complex scenarios even after several corrections. |
| OpenAI | ⭐⭐⭐⭐ | 🟨 | Understood the problem, described the work to be done, and worked on solutions. Provided valid answers for basic routes but struggled with advanced requests despite multiple attempts. |
| xAI Grok | ⭐⭐⭐⭐⭐ | 🟩 | Understood the problem, described the work to be done, and iteratively improved solutions. Provided valid answers for both simple and complex requests, though it required several corrections and rework. |
| Mistral | ⭐⭐ | 🟥 | Understood the problem but consistently failed to provide a valid answer, even after several corrections. |
| Copilot (no pro) | ⭐ | 🟥 | Struggled to grasp the complexity, provided inadequate solutions, and could not deliver a valid result even after multiple corrections. |
| DeepSeek | ⭐ | 🟥 | Similar to Copilot, failed to provide valid answers even after several corrections. |
| Lama | ⚫ | ⚫ | Unable to test in Europe due to accessibility restrictions. |
Key Takeaways
- Iterative Process is Crucial: Even the best-performing AI, xAI Grok, required several corrections and reworks before delivering a valid solution. The process was far from "plug and play."
- Elon's AI Victory Lap: Surprisingly—or perhaps not—Elon Musk's xAI Grok came out on top, managing to crack the dinner code. Maybe it's the result of the vast NVIDIA-powered data centers, or maybe it's just luck. Either way, some might find it amusing (or slightly irritating) that Grok succeeded where others didn't. Looking at Grok working its way through this problem was mesmerizing.
- AI Thinking Power: It's no coincidence that OpenAI and xAI are leading in AI infrastructure. Both boast massive AI data centers with thousands of NVIDIA chips fueling their thinking power. While OpenAI's infrastructure is formidable, xAI seems to have found the right "recipe" for this challenge. In June 2024, xAI reported having 200,000 of Nvidia’s H100 graphic chipset.
- Complexity Challenges: Simple requests were manageable for only three AI engines, but complexity brought significant hurdles. Only one could adapt and refine its responses effectively.
- Understanding Constraints: While most AI tools could understand the problem, applying constraints correctly and flexibly proved challenging, moving from an automatic response generation, to a mix of automatic route generation, followed by automatic verification, followed by 'automatic correction' of the algorithm or the outcome seemed to be the secret weapon of xAI. Most other AI engines were unable to identify their own mistakes and take corrective actions.
- Regional Limitations: Accessibility can be a concern, as seen with Lama, which wasn’t available for testing in Europe.
Conclusion
Our AI Running Dinner Challenge revealed that while AI holds great promise in solving complex logistical tasks, it's not yet a magic bullet. The process requires patience, iterative testing, and an understanding of the AI’s limitations.
Nonetheless, the experience was valuable, offering insights into how AI can assist (and where it still falls short) in real-world problem-solving. It turned out that human intuition and adaptability remain essential, even in the era of advanced AI.
Would we use AI again for such tasks? Absolutely—but with the understanding that it's more of a collaborative partner than an all-knowing solution.
And if Elon Musk's AI manages to crack it again, well, let's just hope it invites us to the after-party!
Have you tried using AI for complex event planning? I’d love to hear your experiences!