The Turing Test Philosophy
The Turing test is a test of a 's ability to demonstrate. It proceeds as follows: a human judge engages in a natural language with oneand one machine, each of which tries to appear human. All participantsare placed in isolated locations.
“The Turing Test” is named for the brilliant English logician, mathematician, cryptographer, computer programmer, and computer engineer, Alan. Turing test: n artificial intelligence ( AI ), a Turing Test is a method of inquiry for determining whether or not a computer is capable of thinking like a human being. The test is named after Alan Turing, an English mathematician who pioneered machine learning during the 1940s and 1950s.
If the judge cannot reliably tell themachine from the human, the machine is said to have passed the test. Inorder to test the machine's intelligence rather than its ability torender words into audio, the conversation is limited to a text-onlychannel such as a and.The test was introduced by in his 1950 paper,which opens with the words: 'I propose to consider the question, 'Canmachines think?' ' Since 'thinking' is difficult to define, Turingchooses to 'replace the question by another, which is closely related toit and is expressed in relatively unambiguous words.' Turing's new question is: 'Are there imaginable digital computers which would do well in the Turing test'?This question, Turing believed, is one that can actually be answered.In the remainder of the paper, he argued against all the majorobjections to the proposition that 'machines can think'.In the years since 1950, the test has proven to be both highlyinfluential and widely criticized, and it is an essential concept in the. Contents. History Philosophical backgroundThe question of whether it is possible for machines to think has along history, which is firmly entrenched in the distinction between and views of the mind.
From the perspective of dualism, the is (or, at the very least, has )and, therefore, cannot be explained in purely physical terms. Thematerialist perspective argues that the mind can be explainedphysically, and thus leaves open the possibility of minds that areartificially produced.In 1936, philosopher considered the standard philosophical question of: how do we know that other people have the same conscious experiences that we do?
In his bookAyer suggested a protocol to distinguish between a conscious man and anunconscious machine: 'The only ground I can have for asserting that anobject which appears to be conscious is not really a conscious being,but only a dummy or a machine, is that it fails to satisfy one of theempirical tests by which the presence or absence of consciousness isdetermined.' (This suggestion is very similar to the Turing test, but it is notcertain that Ayer's popular philosophical classic was familiar toTuring.) Alan TuringResearchers in the had been exploring 'machine intelligence' for up to ten years prior to the founding of the field of AI research in 1956. Main article:The Loebner Prize provides an annual platform for practical Turing Tests with the first competition held in November, 1991. It is underwritten by; the Cambridge Center for Behavioral Studies in,organised the Prizes up to and including the 2003 contest. As Loebnerdescribed it, one reason the competition was created is to advance thestate of AI research, at least in part, because no one had taken stepsto implement the Turing Test despite 40 years of discussing it.The first Loebner Prize competition in 1991 led to a reneweddiscussion of the viability of the Turing Test and the value of pursuingit, in both the popular press and in academia.The first contest was won by a mindless program with no identifiableintelligence that managed to fool naive interrogators into making thewrong identification. This highlighted several of the shortcomings ofTuring test (discussed ): The winner won, at least in part, because it was able to 'imitate human typing errors'; the unsophisticated interrogators were easily fooled; and some researchers in AI have been led to feel that the test is merely a distraction from more fruitful research.The silver (text only) and gold (audio and visual) prizes have neverbeen won. However, the competition has awarded the bronze medal everyyear for the computer system that, in the judges' opinions, demonstratesthe 'most human' conversational behavior among that year's entries.
(A.L.I.C.E.) has won the bronze award on three occasions in recent times (2000, 2001, 2004). Learning AIwon in 2005 and 2006. Its creators have proposed a personalizedvariation: the ability to pass the imitation test while attemptingspecifically to imitate the human player, with whom the machine willhave conversed at length before the test. Huma Shah and have discussed Jabberwacky's performance in Emotion in the Turing Test, chapter in Handbook on Synthetic Emotions and Sociable Robotics: New Applications in Affective Computing and Artificial Intelligence.The Loebner Prize tests conversational intelligence; winners are typically programs, or.Early Loebner Prize rules restricted conversations: Each entry andhidden-human conversed on a single topic, thus the interrogators wererestricted to one line of questioning per entity interaction. Therestricted conversation rule was lifted for the 1995 Loebner Prize.Interaction duration between judge and entity has varied in LoebnerPrizes. In Loebner 2003, at the University of Surrey, each interrogatorwas allowed five minutes to interact with an entity, machine orhidden-human. Between 2004 and 2007, the interaction time allowed inLoebner Prizes was more than twenty minutes.
In 2008, the interrogationduration allowed was five minutes per pair, because the organiser,and coordinator, Huma Shah, consider this to be the duration for anytest, as Turing stated in his 1950 paper: '.making the rightidentification after five minutes of questioning' (p. 442). They feltLoebner's longer test, implemented in Loebner Prizes 2006 and 2007, wasinappropriate for the state of artificial conversation technology. It is ironic that the 2008 winning entry,does not mimic a human; its personality is that of a robot, yet Elbotdeceived three human judges that it was the human during human-parallelcomparisons; see Shah & Warwick (2009a): Testing Turing’s five minutes, parallel-paired imitation game in (forthcoming) Kybernetes Turing Test Special Issue.During the 2009 competition, held in Brighton, UK, the communicationprogram restricted judges to 10 minutes for each round, 5 minutes toconverse with the human, 5 minutes to converse with the program. Thiswas to test the alternative reading of Turing's prediction that the5-minute interaction was to be with the computer.
For the 2010competition, the Sponsor has again increased the interaction time,between interrogator and system, to 25 minutes. 2005 Colloquium on Conversational SystemsIn November 2005, the University of Surrey hosted an inaugural one-day meeting of artificial conversational entity developers,attended by winners of practical Turing Tests in the Loebner Prize:Robby Garner, Richard Wallace and Rollo Carpenter. Invited speakersincluded, Hugh Loebner (sponsor of the ) and Huma Shah. AISB 2008 Symposium on the Turing TestIn parallel to the 2008 held at the, the (AISB), hosted a one-day symposium to discuss the Turing Test, organised by, Huma Shah. The Speakers included Royal Institution's Director, Turing's biographer, and consciousness scientist.No agreement emerged for a canonical Turing Test, though Bringsjordexpressed that a sizeable prize would result in the Turing Test beingpassed sooner.
The Alan Turing Year, and Turing100 in 20122012 Will see a celebration of Turing’s life and scientific impact,with a number of major events taking place throughout the year. Most ofthese will be linked to places with special significance in Turing’slife, such as Cambridge, Manchester, and Bletchley Park. Theis coordinated by the Turing Centenary Advisory Committee (TCAC),representing a range of expertise and organisational involvement in the2012 celebrations. Supporting organisations for the Alan Turing Yearinclude the, the, the, the, the, the, the, the, the, the, the, the, the, and.Supporting TCAC is.With the aim of taking Turing's idea for a thinking machine, picturisedin Hollywood movies such as Blade Runner, to a wider audience includingchildren, Turing100 is set up to organise a special Turing test event,celebrating the 100th anniversary of Turing's birth in June 2012, at theplace where the mathematician broke codes during the Second World War:Bletchley Park.
The Turing100 team comprises (Chair), Huma Shah (coordinator), Ian Bland, Chris Chapman, Marc Allen; supporters include Rory Dunlop, Loebner winners, and Fred Roberts. Versions of the Turing test.
The Imitation Game, as described by Alan Turing in 'Computing Machineryand Intelligence.' Player C, through a series of written questions,attempts to determine which of the other two players is a man, and whichof the two is the woman. Player A, the man, tries to trick player Cinto making the wrong decision, while player B tries to help player C.Figure adapted from Saygin, 2000.There are at least three primary versions of the Turing test, two ofwhich are offered in 'Computing Machinery and Intelligence' and one thatSaul Traiger describes as the 'Standard Interpretation.' While there is some debate regarding whether the 'StandardInterpretation' is that described by Turing or, instead, based on amisreading of his paper, these three versions are not regarded asequivalent, and their strengths and weaknesses are distinct. The Imitation GameTuring's original game, as we have seen, described a simple partygame involving three players. Player A is a man, player B is a woman andplayer C (who plays the role of the interrogator) is of either sex.
Inthe Imitation Game, player C is unable to see either player A or playerB, and can only communicate with them through written notes. By askingquestions of player A and player B, player C tries to determine which ofthe two is the man and which is the woman. Player A's role is to trickthe interrogator into making the wrong decision, while player B attemptsto assist the interrogator in making the right one.Sterret refers to this as the 'Original Imitation Game Test,'Turing proposes that the role of player A be filled by a computer.Thus, the computer's task is to pretend to be a woman and attempt totrick the interrogator into making an incorrect evaluation. The successof the computer is determined by comparing the outcome of the game whenplayer A is a computer against the outcome when player A is a man. If,as Turing puts it, 'the interrogator decides wrongly as often when thegame is played with the computer as he does when the game is playedbetween a man and a woman', it may be argued that the computer is intelligent.
Shah & Warwick, in Testing Turing's Five Minutes Parallel-paired Imitation Game(Kybernetes, April 2010,) and in contrast to Sterrett's opinion, positthat Turing did not expect the design of the machine to imitate a woman,when compared against a human. The Turing test does not directly test whether the computer behavesintelligently - it tests only whether the computer behaves like a humanbeing. Since human behavior and intelligent behavior are not exactly thesame thing, the test can fail to accurately measure intelligence in twoways: Some human behavior is unintelligent The Turing test requires that the machine be able to execute allhuman behaviors, regardless of whether they are intelligent.
It eventests for behaviors that we may not consider intelligent at all, such asthe susceptibility to insults, the temptation to or, simply, a high frequency of. If a machine cannot imitate human behavior in detail, it fails the test. This objection was raised by, in an article entitled 'published shortly after the first Loebner prize competition in 1992.The article noted that the first Loebner winner's victory was due, atleast in part, to its ability to 'imitate human typing errors.' Turing himself had suggested that programs add errors into their output, so as to be better 'players' of the game. Some intelligent behavior is inhuman The Turing test does not test for highly intelligent behaviors, suchas the ability to solve difficult problems or come up with originalinsights. In fact, it specifically requires deception on the part of themachine: if the machine is more intelligent than a human beingit must deliberately avoid appearing too intelligent.
If it were tosolve a computational problem that is impossible for any human to solve,then the interrogator would know the program is not human, and themachine would fail the test. Because it can't measure intelligence that is beyond the ability ofhumans, the test can't be used in order to build or evaluate systemsthat are more intelligent than humans. Because of this, several testalternatives that would be able to evaluate superintelligent systemshave been proposed. Real intelligence vs simulated intelligence. See also:It tests only how the subject acts — the external behaviour of the machine. In this regard, it assumes a or definition of intelligence. The example ofsuggested that a machine passing the test may be able to simulate humanconversational behavior by following a simple (but large) list ofmechanical rules, without thinking or having a mind at all.argued that external behavior cannot be used to determine if a machine is 'actually' thinking or merely 'simulating thinking.'
Hisargument is intended to show that, even if the Turing test is a goodoperational definition of intelligence, it may not indicate that themachine has a,. (Intentionality is a philosophical term for the power of thoughts to be 'about' something.)Turing anticipated to this line of criticism in his original paper, writing that. — Alan TuringNaivete of interrogators and the anthropomorphic fallacyThe Turing test assumes that the interrogator is sophisticated enoughto determine the difference between the behaviour of a machine and thebehaviour of a human being, though critics argue that this is not askill most people have. The precise skills and knowledge required by theinterrogator are not specified by Turing in his description of thetest, but he did use the term 'average interrogator': '. Averageinterrogator would not have more than 70 per cent chance of making theright identification after five minutes of questioning' (Turing, 1950:p. 442).
Shah & Warwick (2009c) show that experts are fooled, andthat interrogator strategy, 'power' vs 'solidarity' affects correctidentification, the latter being more successful (in Hidden Interlocutor Misidentification in Practical Turing Tests submitted to journal, November 2009).Chatterbot programs such as ELIZA have repeatedly fooled unsuspectingpeople into believing that they are communicating with human beings. Inthese cases, the 'interrogator' is not even aware of the possibilitythat they are interacting with a computer. To successfully appear human,there is no need for the machine to have any intelligence whatsoeverand only a superficial resemblance to human behaviour is required. Mostwould agree that a 'true' Turing test has not been passed in'uninformed' situations like these. Early Loebner prize competitions used 'unsophisticated' interrogators who were easily fooled by the machines.Since 2004, the Loebner Prize organizers have deployed philosophers,computer scientists, and journalists among the interrogators. However,even some of these experts have been deceived by the machines.points out that human beings consistently choose to consider non-humanobjects as human whenever they are allowed the chance, a mistake calledthe:They talk to their cars, ascribe desire and intentions to naturalforces (e.g., 'nature abhors a vacuum'), and worship the sun as ahuman-like being with intelligence. If the Turing test is applied toreligious objects, Shermer argues, then, that inanimate statues, rocks,and places have consistently passed the test throughout history.This human tendency towards anthropomorphism effectively lowers the barfor the Turing test, unless interrogators are specifically trained toavoid it.
Impracticality and irrelevance: the Turing test and AI researchMainstream AI researchers argue that trying to pass the Turing Test is merely a distraction from more fruitful research. Indeed, the Turing test is not an active focus of much academic or commercial effort—as and write: 'AI researchers have devoted little attention to passing the Turing test.' There are several reasons.First, there are easier ways to test their programs. Most currentresearch in AI-related fields is aimed at modest and specific goals,such as, or.In order to test the intelligence of the programs that solve theseproblems, AI researchers simply give them the task directly, rather thangoing through the roundabout method of posing the question in a populated with computers and people.Second, creating life-like simulations of human beings is a difficultproblem on its own that does not need to be solved to achieve the basicgoals of AI research. Believable human characters may be interesting ina work of art, a, or a sophisticated,but they are not part of the science of creating intelligent machines,that is, machines that solve problems using intelligence. Russell andNorvig suggest an analogy with the: Planes are tested by how well they fly, not by comparing them to birds. ' texts,' they write, 'do not define the goal of their field as 'making machines that fly so exactly like that they can fool other pigeons.'
'Turing, for his part, never intended his test to be used as apractical, day-to-day measure of the intelligence of AI programs; hewanted to provide a clear and understandable example to aid in thediscussion of the. As such, it is not surprising that the Turing test has had so little influence on AI research — the philosophy of AI, writes,'is unlikely to have any more effect on the practice of AI researchthan philosophy of science generally has on the practice of science.' PredictionsTuring predicted that machines would eventually be able to pass thetest; in fact, he estimated that by the year 2000, machines with 10 9 (about 119.2 or approximately 120 )of memory would be able to fool thirty percent of human judges in afive-minute test.
He also predicted that people would then no longerconsider the phrase 'thinking machine' contradictory. He furtherpredicted thatwould be an important part of building powerful machines, a claimconsidered plausible by contemporary researchers in artificialintelligence.In a paper submitted to 19th Midwest Artificial Intelligence andCognitive Science Conference, Dr.Shane T. Mueller predicted a modifiedTuring Test called a 'Cognitive Decathlon' could be accomplished within 5years.By extrapolating an of technology over several decades, predicted that Turing test-capable computers would be manufactured in the near future. In 1990, he set the year around 2020. By 2005, he had revised his estimate to 2029.The is a wager of 20,000 between(pessimist) and Kurzweil (optimist) about whether a computer will pass aTuring Test by the year 2029.
The bet specifies the conditions in somedetail. Variations of the Turing testNumerous other versions of the Turing test, including those expounded above, have been mooted through the years.
Reverse Turing test and CAPTCHA. Main articles: andA modification of the Turing test wherein the objective of one ormore of the roles have been reversed between machines and humans istermed a reverse Turing test. An example is implied in the work ofpsychoanalyst,who was particularly fascinated by the 'storm' that resulted from theencounter of one mind by another. Carrying this idea forward,described the mind as a 'mind recognizing apparatus,' noting that thismight be some sort of 'supplement' to the Turing test.
The challengewould be for the computer to be able to determine if it were interactingwith a human or another computer. This is an extension of the originalquestion that Turing attempted answer but would, perhaps, offer a highenough standard to define a machine that could 'think' in a way that wetypically define as characteristically human.is a form of reverse Turing test. Before being allowed to perform some action on a,the user is presented with alphanumerical characters in a distortedgraphic image and asked to type them out. This is intended to preventautomated systems from being used to abuse the site.
The rationale isthat software sufficiently sophisticated to read and reproduce thedistorted image accurately does not exist (or is not available to theaverage user), so any system able to do so is likely to be a human.Software that can reverse CAPTCHA with some accuracy by analyzing patterns in the generating engine is being actively developed. 'Fly on the wall' Turing test. This section does not any.Please help by adding citations to. Unsourced material may be. (May 2009)The 'fly on the wall' variation of the Turing test changes theoriginal Turing-test parameters in three ways.
First, parties A and Bcommunicate with each other rather than with party C, who plays the roleof a detached observer (')rather than of an interrogator or other participant in theconversation. Second, party A and party B may each be either a human or acomputer of the type being tested. Third, it is specified that party Cmust not be informed as to the identity (human versus computer) ofeither participant in the conversation. Party C's task is to determinewhich of four possible participant combinations (human A/human B, humanA/computer B, computer A/human B, computer A/computer B) generated theconversation. At its most rigorous, the test is conducted in numerousiterations, in each of which the identity of each participant isdetermined at random (e.g., using a fair-coin toss) and independently ofthe determination of the other participant's identity, and in each ofwhich a new human observer is used (to prevent the discernment abilitiesof party C from improving through conscious or unconscious over time).
Artificial Intelligence The Turing Test TheTuring TestAlan Turing and the Imitation GameAlan Turing, in a 1951 paper, proposed a testcalled 'The Imitation Game' that might finally settle the issue of machine intelligence.The first version of the game he explained involved no computer intelligencewhatsoever. Imagine three rooms, each connected via computer screen and keyboardto the others. In one room sits a man, in the second a woman, and in the thirdsits a person - call him or her the 'judge'.
The judge's job is to decide whichof the two people talking to him through the computer is the man. The man willattempt to help the judge, offering whatever evidence he can (the computer terminalsare used so that physical clues cannot be used) to prove his man-hood. The woman'sjob is to trick the judge, so she will attempt to deceive him, and counteracther opponent's claims, in hopes that the judge will erroneously identify heras the male.What does any of this have to do with machineintelligence? Turing then proposed a modification of the game, in which insteadof a man and a woman as contestants, there was a human, of either gender, anda computer at the other terminal. Now the judge's job is to decide which ofthe contestants is human, and which the machine. Turing proposed that if, underthese conditions, a judge were less than 50% accurate, that is, if a judge isas likely to pick either human or computer, then the computer must be a passablesimulation of a human being and hence, intelligent. The game has been recentlymodified so that there is only one contestant, and the judge's job is not tochoose between two contestants, but simply to decide whether the single contestantis human or machine.The dictionary.com entry on the Turing Test is short, but very clearly stated.
A longer, but point-form reviewof the imitation game and its modifications written by Larry Hauser, (if link fails,for a local copy) is also available. Hauser's page may not contain enough detailto explain the test, but it is an excellent reference or study guide and containssome helpful diagrams for understanding the interplay of contestant and judge.The page also makes reference to John Searle's Chinese Room, a thought experimentdeveloped as an attack on the Turing test and similar 'behavioural' intelligencetests. We will discuss the Chinese Room in the next section.Natural Language Processing (NLP)Partly out of an attempt to pass Turing's test,and partly just for the fun of it, there arose, largely in the 1970s, a groupof programs that tried to cross the first human-computer barrier: language.These programs, often fairly simple in design, employed small databases of (usuallyEnglish) language combined with a series of rules for forming intelligent sentences.While most were woefully inadequate, some grew to tremendous popularity.
Perhapsthe most famous such program was Joseph Weizenbaum's ELIZA. Written in 1966it was one of the first and remained for quite a while one of the most convincing.ELIZA simulates a Rogerian psychotherapist (the Rogerian therapist is empathic,but passive, asking leading questions, but doing very little talking. 'Tellme more about that,' or 'How does that make you feel?' ) and does so quite convincingly,for a while. There is no hint of intelligence in ELIZA's code, it simply scansfor keywords like 'Mother' or 'Depressed' and then asks suitable questions froma large database. Failing that, it generates something generic in an attemptto elicit further conversation. Most programs since have relied on similar principlesof keyword matching, paired with basic knowledge of sentence structure.
Thereis however, no better way to see what they are capable of than to try them yourself.We have compiled a set of links to some of the more famous attempts at NLP.Students are encouraged to interact with these programs in order to get a feelingfor their strengths and weaknesses, but many of the pages provided here linkto dozens of such programs, don't get lost among the artificial people.Online Examples of NLPA series of online demos (many areJava applets, so be sure you are using a Java-capable browser) of some of themore famous NLP programs.- A good, Java applet version. What's more, source code is provided, for anystudents who may also have an interest in computer programming.- Jason Hutchens, a Ph. Student who wrote several criticisms of the Turingtest (some of which are mentioned in a later section) has created a few NLPprojects of his own. The most popular of these is MegaHAL, based on a previousprogram that was entered into the Loebner competition (see below) unsuccessfully.- Another Loebner contest entrant, and winner, for that matter. Three conversationscan be chosen from: Sex, Interview Skills, or a 'Mystery Conversation'. InCGI format, so even non-Java browsers should have no trouble.- This page is a large collection of NLP-style programs, affectionately calledChatterbots. Available are classic bots like ELIZA, as well as more modernattempts.- information about the creation of, and opportunities to converse with,the latest Loebner Prize winner. Mini golf mundo free.
This page contains several more transcripts,most notably, some featuring RACTER, a somewhat tongue-in-cheek attempt atNLP. This page contains some more humourous transcripts, like Parry talkingto ELIZA, or RACTER and ELIZA, and even features some of RACTER's poetry. is the world's leading AI research project, focusing on creatinggenuine Artificial Intelligence - the technology that enables machines toconverse with humans in natural language.
They believe that their groundbreakingapproach will help them fully pass the Turing Test by 2011.The Loebner PrizeAlthough Turing proposed his test in 1951, itwas not until 40 years later, in 1991, that the test was first really implemented.Dr. Hugh Loebner, a professor very much interested in seeing AI succeed, pledged$100,000 to the first entrant that could pass the test. The 1991 contest had someserious problems though, (perhaps most notable was that the judges were all computerscience specialists, and knew exactly what kind of questions might trip up a computer)and it was not until 1995 that the contest was re-opened. Since then, there hasbeen an annual competition, which has yet to find a winner.
While small prizesare given out to the most 'human-like' computer, no program has had the 50% successTuring aimed for.Validity of the Turing TestAlan Turing's imitation game has fueled 40 yearsof controversy, with little sign of slowing. On one side of the argument, human-likeinteraction is seen as absolutely essential to human-like intelligence. A successfulAI is worthless if its intelligence lies trapped in an unresponsive program. Somehave even extended the Turing Test.
Steven Harnad (see below) has proposed the'Total Turing Test', where instead of language, the machine must interact in allareas of human endeavor, and instead of a five minute conversation, the durationof the test is a lifetime. James Sennett has proposed (if link fails, for a local copy) to the Turing Test that challenges AI to mimic notonly human thought but also personhood as a whole.
To illustrate his points, theauthor uses Star Trek: The Next Generation's character 'Data'.Opponents of Turing's behavioural criterionof intelligence argue that it is either not sufficient, or perhaps not evenrelevant at all. What is important, they argue, is that the computer demonstratescognitive ability, regardless of behaviour. It is not necessary that a programspeak in order for it to be intelligent.
There are humans that would fail theTuring test, and unintelligent computers that might pass. The test is neithernecessary nor sufficient for intelligence, they argue. In hopes of illuminatingthe debate, we have assigned two papers that deal with the Turing Test fromvery different points of view.
The first is a criticism of the test, the secondcomes to its defense. Jason Hutchens, a Ph.D. Student who has twice entered theLoebner contest, has written on what's wrong with it, and with the Turing test ingeneral.