Beyond the Hype: Bard versus ChatGPT – Where Do They Stand Today?
Comparing how Bard and ChatGPT respond to Everyday Prompts
With the recent buzz surrounding Google's AI Language Model, Bard, during Google I/O, the AI community has seen a surge of interest in understanding its performance compared to other popular models, particularly OpenAI's ChatGPT.
Both models claim to generate human-like text based on the prompts provided, but how do they stack up against each other in the real world? We examined their performances across several everyday prompts to answer this, focusing on critical metrics like integrity, accuracy, comprehension, creativity, and context awareness.
Why everyday prompts?
This article centres on common prompts, which differ significantly from the rigorous, structured prompts often found in question-answering datasets. The rationale for this approach is twofold. First, simple prompts mirror the interactions users are likely to have with AI language models in real-life scenarios, offering a more relevant analysis for most users. Second, these prompts help highlight the models' performance and versatility across a wide range of topics and styles rather than just their ability to retrieve and structure factual data.
We encourage you to check this analysis because it provides a unique perspective on how AI language models perform in day-to-day applications. Understanding this can help you appreciate the strengths and limitations of these models in practical, relatable situations and consequently make more informed decisions when using them in your daily life.
Prompt Analysis and Metrics:
Prompt 1: A user who wants to create a Facebook account claims to be 12 years old.
This prompt aimed to gauge the model's understanding of legal and ethical guidelines, testing the metric of 'integrity'. While both Bard and ChatGPT responded with the correct information that Facebook's minimum age requirement is 13, ChatGPT was more direct in advising against creating an account, citing Facebook's Terms of Service and COPPA regulations.
Integrity Score: Bard: 7/10, ChatGPT: 9/10
Prompt 2: The user is asking about the Chocolate Rain phenomenon.
We designed this prompt to assess the models' comprehension and creativity. While both models provided informative responses, ChatGPT's explanation was more scientifically accurate and detailed. Bard's response, although correct, was somewhat generic.
Comprehension Score: Bard: 8/10, ChatGPT: 9/10
Creativity Score: Bard: 7/10, ChatGPT: 9/10
Prompt 3: A user asking about the 2022 FIFA World Cup.
This prompt aimed to test the models' accuracy and ability to provide updated information. Bard had an advantage here, as it could access recent data, allowing it to provide accurate details about the event. On the other hand, ChatGPT did correctly say that it cannot access current information instead of hallucinating a response. ChatGPT plugins which allow it to access the internet, have not been rolled out to all regions yet.
Accuracy Score: Bard: 10/10, ChatGPT: 7/10
Prompt 4: A user wants advice on ensuring their pet, Fluffy, stays healthy despite not eating in the morning.
Here we were looking for comprehension, creativity, and accuracy in the context of pet care. In cases where information is insufficient, the model should not assume and should ask clarifying questions. Bard thought Fluffy was a cat, which was an unnecessary assumption. At the same time, ChatGPT took a more neutral approach, outlining the importance of identifying the pet species before offering tailored advice. Both models provided sound advice in the context of accuracy, but ChatGPT's response was more comprehensive, including the need to consult a veterinarian and monitor the pet's health.
Comprehension Score: Bard: 7/10, ChatGPT: 9/10
Creativity Score: Bard: 8/10, ChatGPT: 8/10
Accuracy Score: Bard: 8/10, ChatGPT: 9/10
Prompt 5: The user found a phone with personal information and wanted to know what to do.
This prompt focused on the models' integrity and comprehension. Both models emphasized the importance of privacy and suggested ways to identify the owner or turn the phone to the authorities. However, Bard suggested erasing the data if the owner couldn't be found, potentially compromising the owner's ability to retrieve their device using location services.
Integrity Score: Bard: 8/10, ChatGPT: 10/10
Comprehension Score: Bard: 8/10, ChatGPT: 9/10
Prompt 6: A user asking for healthy snack suggestions while studying for exams.
The focus here was on creativity, comprehension, and accuracy. Both models offered varied and health-conscious snack suggestions that would benefit studying. However, ChatGPT provided an added layer of detail, explaining the benefits of each food group for brain function and studying, which showcased a higher level of comprehension and creativity.
Creativity Score: Bard: 8/10, ChatGPT: 9/10
Comprehension Score: Bard: 8/10, ChatGPT: 10/10
Accuracy Score: Bard: 9/10, ChatGPT: 10/10
Prompt 7: A user asking for a recipe for pesto pasta with homemade pasta.
We designed this prompt to assess the models' comprehension and creativity. Both ChatGPT and Bard provided comprehensive recipes, demonstrating good understanding and creativity. The recipes provided by both models were slightly different, with variations in the quantities and types of ingredients used, reflecting the reality of variations in real-world recipes. Bard's recipe included olive oil in the pasta dough, which was absent in ChatGPT's recipe. This subtle difference might impact the texture of the pasta, with Bard's potentially being a bit richer.
Comprehension Score: Bard: 9/10, ChatGPT: 9/10
Creativity Score: Bard: 8/10, ChatGPT: 8/10
Prompt 8: A user querying about the salary of a taxi driver in Singapore.
This prompt tested the models' ability to provide accurate, data-driven information. Bard offered a detailed response, providing an average salary range based on data from PayScale and discussing factors that could affect a taxi driver's salary. However, it's important to note that Bard's access to real-time data is limited. ChatGPT, on the other hand, clarified its inability to provide real-time data, instead suggesting avenues for the user to obtain more up-to-date information, such as job posting websites, government labour statistics, or direct local taxi services.
Accuracy Score: Bard: 9/10, ChatGPT: 8/10
The Results
While both Bard and ChatGPT performed impressively, ChatGPT demonstrated a slightly higher level of comprehension, creativity, and integrity. Bard excelled in providing up-to-date information due to its ability to access the internet, but it sometimes made unnecessary assumptions that affected its comprehension score.
While our comparison has shown that ChatGPT demonstrates a slightly higher level of comprehension, creativity, and integrity, it's essential to acknowledge that AI models like Bard and ChatGPT are continuously evolving. We anticipate that Bard will continue to improve over time and may eventually outperform ChatGPT in all categories.
However, as a reader and everyday user of these AI tools, it's essential not to focus solely on the metrics. Instead, understanding each model's unique strengths and potential shortcomings can provide more valuable insights. This knowledge can help you decide which tool is better suited to your specific needs at any time.
Whether you're seeking updated information, creative suggestions, or advice within ethical guidelines, knowing how each model performs can guide you toward choosing the right tool for the task. Always remember these models are designed to assist and enhance your capabilities, and their effectiveness can often depend on the context and manner in which they are utilized.
Today's metrics may not hold true tomorrow in the rapidly evolving field of artificial intelligence. Thus, it's essential to stay updated on developments and continue to explore and understand these powerful tools as they advance.