"Are you doing anything fun this weekend?"
"I might be visiting my parents."
"I haven't seen my parents in a few weeks, myself."
"How have they been?"
"They're going on a cruise in July!"
"Wow, are they excited?"
"They won't stop talking about it."
"That's great."
To you and me, the above interaction is easy to track. In fact, I'm fairly certain that you've had conversations just like this in real life. But even in its simplicity, this conversation is laced with nuance, intuition, and assumptions—things that I'm not quite certain we would be able to replicate at a binary level. Let me show you:
Pictured here is a conversation I had with Cleverbot. For anyone not familiar with the ai, I was the first line (black), and the chatbot was the second line (blue). You can see that non-sequiturs abound in this "conversation." But that's because the nature of human language is contextually interpretive.
Let's deconstruct the above conversation in terms a computer would be able to understand.
"I'm hungry."
This is a simple statement. It's a pretty easy statement to follow up, and the chatbot does (albeit in a sort of creepy manner).
"What would you like to eat, Master?"
The next part of the conversation is something that you might have taken for granted, but it's where the conversation took an immediate turn into a ditch—and it was technically my fault.
"I like sushi."
This is what I meant earlier by language is contextually interpretive. I responded with a fact about myself, when the chatbot was asking me to name a noun. What I should have said is "I would like to eat sushi," or simply, "sushi." Since I responded with something that didn't apply to the original question, I introduced a non-sequitur to the conversation, something computers don't do very well with, as evidenced by the chatbot's response:
"Do you have a family?"
Another non-sequitur. It responded to my non-answer of the original question with a new, unrelated question, because it couldn't reconcile my non-sequitur within the context of "what would you like to eat?" I may as well have said "I have a collection of Taylor Swift posters." Like any human, I naturally tried to bring the conversation back around.
"What does that have to do with my hunger?"
"You asked the question."
Here, we see that the chatbot was looking for a binary response again: yes or no. Instead, I answered its question with a question that it perceived as unrelated. So it threw out the first thing that it thought made sense. But it didn't. To us.
Here's an example of a conversation where I played by all of the rules:
You'll have to excuse the terrible spelling—it learns from other users, and as we all know, humans can be incredibly stupid. But the conversation stands up well. I stated a fact. It responded with a command related to that fact. I responded with a related fact. It asked me a related question. I responded with an actual response to that question. It asked a new, sort-of-related question:
"Are you Japanese?”
The way AI works right now is typically self-contextual within a certain limit. That is to say, AI only talks and responds within the context of the previous thing said to it—it can not extrapolate answers the way humans can. For example, when you say to me, "what do you want to eat?" and I reply, "I like sushi," you contextually infer that I am saying "I would like to eat sushi," even though in reality, that's not what I said at all.
Try noticing this in your day-to-day conversations. Whenever you're talking with somebody, see if the things you say actually relate to the previous thing said. I'll bet that a lot of the times, they take some extra inference to truly have a natural conversation. It might seem like something very simple, but it's in fact a very complex process that happens in our heads. To hear a non-sequitur response and be able to intuitively understand it, you have to think about everything that has been previously said, think about what was just currently said, cross-reference that statement with every other statement about the topic at hand, make a reasonable deduction about which two statements relate the best, and then respond back based on all this information.
For example, if a chatbot asks:
"What do you want to eat?"
As long as you respond with a noun, or a transitive sentence using the verb eat, and involving a noun, the sentence makes sense to the chatbot, even if it's absurd to us.
"Rocks."
"Why do you want to eat rocks?"
"I like eating rocks."
"Eating rocks is bad for your teeth."
"My teeth are fine."
"How many teeth do you have?"
Conclusion:
I very much doubt that we'll ever be able to distill contextual conversation to a binary response. After all, everything that computers do comes back to a choice: 1 or 0. Without a direct path to that choice, the logic fails. And conversation is certainly not logical.