Human or AI Test: How Easily Can We Spot the Difference Today?

AI technology has become a digital goldmine for industries. Whether it is the blog posts crafted by ChatGPT or the realistic images created by DALL-E, AI accuracy is improving rapidly while making it hard for humans to recognize the difference.

Recently, Tidio conducted a survey to test how well people differentiate between human or AI content. This experiment tested human instincts by showing AI-generated text, images, music, and artwork.

Surprisingly, the results were totally unexpected, with more than 87% of respondents from some groups guessing an AI-generated image to be a real human picture.

If we talk about the AI and machine learning freaks, around 62% of these respondents gave correct answers to more than half of the questions. On the other hand, over 64% of the remaining respondents guessed mostly wrong.

Does this mean AI is ready to replace human work in many fields? Let’s find out in this guide.

Some of the key points this guide covers:

The ratio of respondents who misinterpreted AI-generated content (pictures, music, text, and artwork) as human-created.
What metrics or standards did the respondents use to tell if the content was real or AI-made?
Serious threats of not learning about AI developments.
How to identify AI-generated content and avoid it.

Here are the topics we will discuss in this guide:

Quick Overview of AI and Basic Concepts
Key Findings of the Study
Human or AI: Detailed Review of the Test Results
Category I: AI-crafted Text
Category II: AI-made Pictures
Category III: AI-designed Artwork
Category IV: AI-composed Music
Lessons From the Study: Human or AI Game
Wrapping Up

Quick Overview of AI and Basic Concepts

Before we discuss the human or AI test in-depth, it is important to revise our concepts around AI. So, let’s quickly go through the overview of basic AI, machine learning, and deep learning concepts:

Artificial intelligence is the collection of abilities that enable a machine to simulate human behavior. As the name implies, it is the process of making a machine artificially intelligent.
Machine learning is an advanced AI solution that feeds data to machines to make them learn the features and make predictions.
Deep learning is the branch of machine learning that works with artificial neural networks (ANNs).
Generative Adversarial networks use a deep learning architecture where two neural networks (known as generators and discriminatory) are trained to outperform each other to generate new and genuine data from the training sets.

Most of the examples discussed in Tidio’s research/survey use GANs. The generator first analyzes the image dataset and then tries to create a random image. Later, the discriminator observes the features of the image to check how real it looks.

For example, to get AI-generated photos of cats, a massive database of cat images was given to these networks to analyze. In the next step, the generator created a random picture of a cat, which was then evaluated by the discriminator to see if it matched the original cat. After improving the quality, it gave images of this kind:

Almost all of these pictures look quite convincing. But if you look closely, you can see some issues with the backgrounds, tails, and collars.

Know why?

This happened because not all cats in the images were wearing collars, but the AI network created a fusion of all images where the collar was blended with furs visibly.

So, this tells us that you can spot the differences if only you look attentively.

Tidio’s survey showed that people were good at recognizing fake cat images rather than fake pictures of humans, which is surprising.

Let’s now discuss what criteria the respondents used to identify the fake or real content.

Key Findings of the Study

The typical survey response from respondents reflected that they were “confused”. People behaved as if they didn’t know anything, which was quite extreme.

Some of the exciting points that were highlighted by Tidio’s are as follows:

Around 78% of people who were confident about their answers did not even answer half of the questions correctly. Whereas the majority of respondents who were not sure scored more than average.
Most respondents believed they could easily tell if it was a chatbot or a human they were communicating with. However, they could only answer correctly 34% of the time, which indicates that it will soon be impossible to notice any difference.
Male respondents better guessed the images a little, whereas the females better recognized the human touch in music.
People familiar with AI technology and relevant concepts performed better than those with no idea of these disciplines.
Younger respondents (Gen Z) evaluated the images more accurately than the baby boomers.
AI-modified pictures (real images with filters) were harder to recognize than the pictures generated by AI.

Human or AI: Detailed Review of the Test Results

Knowledge of AI technology and familiarity with relevant concepts were the most significant factors in Tidio’s research score.

The image below represents the score distribution for respondents familiar with AI compared to others.

A large number of respondents felt the test was difficult, which is quite prominent in the results because nearly everybody made a mistake at some stage.

Let’s go through each survey category to check the results and performance.

Category I: AI-crafted Text

If you don’t believe that AI can write articles, compose essays, or translate passages, then you need to do a quick research before you continue with the survey questions and results.

When it comes to writing, both humans and AI build up on the same blocks, which makes it quite hard to identify the AI text.

The Tidio experimentation tested how well people can recognize a text as AI-written or by a human. The first example in the survey was John Ormby’s translation of “Don Quixote”, which was marked as AI-generated by more than half (66%) of the people because they found it to be awkward, difficult to follow, and illogical.

On the other hand, AI translation of the opening lines from “One Hundred Years of Solitude” was identified as human-translated by 65% of the respondents.

However, in the case of contemporary lifestyle articles, most respondents judged correctly. The survey included two test passages on weight loss written by a human journalist and a commercial AI each.

Almost 81.5% of the people identified the original article as written by a human, making it appear more like real human work than the AI-composed text (with only a 36.9% convincing score).

In short, many different elements of a text passage can trick people. For instance, something that is well-written and seems natural has a higher chance of being recognized as human.

Category II: AI-made Pictures

AI Images seem simple to identify, but the survey results proved otherwise. Pictures edited with FaceApp, photos taken from ThisCatDoesNotExist and ThisPersonDoesNotExist, and stock images were tested in the survey.

An image generated with StyleGAN’s old version was answered incorrectly by 35.7% of the respondents, while the picture generated with improved StyleGAN2 failed 68.3% of the respondents.

The people familiar with AI technology guessed that the image was fake and no such person exists in real life. However, the remaining mistook it as a real person.

Over 23% of people who guessed right and said it was not a human believed that “it looks unnatural”. Making plain guesses and going with the gut feeling doesn’t always work, as the looks can be deceptive.

Look at this opposite example, which is mislabeled as an AI-generated image.

Around 72% of people said it was AI-generated just because it looked unnatural when it was a real human.

Although some AI-generated features are quite clearly identifiable, a few things should be kept in mind.

How To Find the Difference between AI and Human Images

Here’s a list of points that can help differentiate AI or human images:

AI algorithms train unedited and raw images, whereas heavily photoshopped or transformed images are mostly captured by real humans.
AI can’t completely replicate uncommon and unique elements, including jewelry, clothing, makeup style, hair highlights, etc.
Original pictures taken by professional photographers are sharper and more detailed than non-professional shots (which are noisier), whereas AI-created images lack both.
AI images have symmetrical faces and leveled eyes.
A cropped image with no hands, shoulders, or hairdo visible can possibly be AI-generated.
Some AI images show messy hair, but that’s not always the case.
Any skin imperfections in the image don’t reflect originality.

If you pay attention to these details, you may gauge the images accurately. However, with the speed at which AI is improving, one may never be completely correct.

Category III: AI-designed Artwork

AI-generated artwork is another plausible area that can generate attractive images within a few seconds. Although these drawings aren’t displayed in famous art exhibitions, they do well in certain areas.

The artwork above has been created by ArtBreeder, which allows changing certain elements, such as the weather conditions or the number of trees.

In terms of response, both the general survey respondents and AI freaks gave wrong answers to this artwork. Also, many people couldn’t tell if it was real or AI-generated, as they found it difficult to decide.

Other than creating pictures from scratch, generating fake AI paintings is now possible with filters.

Have a look at the real picture that was used in the survey.

The resulting image, quite obviously, didn’t trick the majority of the respondents. However, some even stated that it was ugly.

How To Determine the Originality of an Artwork

Here are a few points that can help you distinguish if a drawing has been made by a human or AI.

AI-generated artwork has swirling patterns that are easy to identify.
AI processes shapes and designs randomly.
Realistic photos with human characters can’t be drawn by AI.
AI artwork has no details and lacks complexity.
AI artwork is mostly in low resolution.
Human artists express more details in their paintings.

Category IV: AI-composed Music

The majority of respondents believed that music was the hardest to identify from all four categories. In fact, the results were more linked with the standard expectations of the people. As a result, almost 71.4% of respondents guessed an EDM track prepared by a human to be AI-generated music.

Meanwhile, 61.6% of respondents mistook a song to be human-composed, which was created by AI after training on a couple of songs by the Beatles. The perception behind the incorrect guesses was that people linked the chaotic songs to humans, and complex or too good ones to the smart AI technology.

The two music tests were similar tracks with a classical touch. Plus, there were no extra elements that could distract the respondents.

Here are the results:

And here is the example of the real McCoy:

Another example tricked the respondents to the next level. It was AI-composed music performed by human musicians. Around 40% of people guessed it to have been composed by a human just because it sounds like it was performed by a human.

At the current stage, AI music is far from creating Tony-worthy musical compositions. However, it can become a soundtrack for movies or video games.

Lessons From the Study: AI or Human Game

People have always been on a restrictive fence ever since the emergence of AI. Even one of the survey respondents expressed his concerns after getting easily tricked by AI.

To some extent, the concerns are genuine. Anyone with access to easy-to-use AI tools can generate a fake profile picture for social media. Or apply smartphone filters to enhance the facial features or transform the looks to cheat others.

Particularly, using AI solutions with photoshop can become a deceptive tool.

Look at this picture below:

Approximately 30% of the respondents believed the third picture was real. Surprisingly, only 20% of people could give the right answer.

Here are some key lessons to remember:

AI can spread misinformation.
It is hard to identify features or even gender transformations by AI.
AI still has a long way to go, and it is harder to use than other traditional techniques.
AI can easily create specific elements like faces, but machines think differently, so it fails at generating complex figures.

Wrapping Up

The Tidio survey tests the originality of AI-generated content based on four categories. Almost 1800 people participated in the survey, and most of them were confused about distinguishing between human or AI content. However, familiarity with AI concepts helped people identify some differences. Therefore, we will wrap up by saying that it is now always easy to identify AI content, as AI is gradually becoming more enhanced and like humans.