ACM Logo  An ACM Publication  |  CONTRIBUTE  |  FOLLOW    

Conversation-Based Assessments: Real-Time Assessment and Feedback

By Seyma N. Yildirim-Erbasli, Okan Bulut / December 2021

TYPE: HIGHER EDUCATION
Print Email
Comments Instapaper

Over the last two decades, digital assessments have gained widespread popularity in K-12 and higher education. They have been increasingly used for both summative and formative assessment purposes in e-learning and face-to-face learning environments. However, existing digital assessments follow a non-interactive approach that can demotivate students due to the lack of interaction between students and the assessment. To improve students’ motivation, it is essential to enhance the interaction between the instructor and the students [1]. Previous research reports one-on-one tutoring as an extremely effective interactive approach to assess student learning [24] because tutors ask questions to students and then review their answers and ask more questions to gain a better understanding of what they know. This type of interaction reveals what the students know and can do and what they need to study more. Previous studies found that interactive conversations create an ideal environment for information exchange for the students and thereby improve their engagement with the learning environment [5]. While these human-to-human interactions can provide significant insights and evidence for assessment purposes, such interactive tasks require more time and effort and thus they are neither convenient nor financially feasible to use with large student populations. In response to this problem, the existing digital assessments can be modified to support both student motivation and learning. One strategy to build interactive opportunities for the students is to use conversation-based assessments. A conversation-based assessment is an attempt for the betterment of digital assessment by simulating human teachers to increase student motivation and learning. Conversation-based systems have been studied as part of intelligent tutoring systems and proved to be an effective process in learning environments [6]. Conversation-based assessments can improve student learning and motivation by providing an opportunity to discuss assessment results and receive personalized feedback efficiently. This article will provide a summary of conversation-based assessments, their types, examples, and advantages, and discuss the differences between conversation-based and conventional digital assessments.

What is Conversation-Based Assessment?

Recent breakthroughs and advances in the fields of computational linguistics, information retrieval, cognitive science, artificial intelligence, and discourse processes offer affordances to researchers and practitioners to build successful conversation (or dialogue) systems [7]. Conversation-based systems open new possibilities for increasing levels of engagement, feedback, and personalization in the learning process, and hence learning outcomes [8]. Early research reveals conversation-based systems can play a role similar to a human tutor in helping learners improve their task performance and skill levels over time, especially when it comes to abilities like problem-solving [9]. Until recently, interactive conversations between students and computers in the educational context have been primarily used for tutoring and delivering instructional materials to students. Now there are new efforts underway to investigate and harness advanced methods for modeling conversations for measurement and assessment purposes [10]. Recently, researchers have studied the potential to expand conversation-based systems for assessment purposes (i.e., conversation-based assessment; hereafter referred to as CBA). The findings supported the use of CBAs to measure student knowledge and skills in a conversational environment (e.g., English language skills of second language learners) [11].

The idea behind CBAs is to use automated or adaptive conversations to measure and support student learning of complex concepts by taking an assessment with a computer agent through natural language conversations. A CBA combines assessment and feedback, which is the ultimate formula to improve student learning while assessing their knowledge and providing timely feedback. CBA starts conversations with a question and examines the student responses and categorizes them into one of several subdivisions–correct, incorrect, or insufficient response–to guide students by feedback, hint, or follow-up question. Thus, CBAs rely on past studies that revealed benefits in allowing second efforts at open-ended responses [12]. Figure 1 represents a conversation diagram between the student and the chatbot.

Figure 1. Conversation diagram between students and chatbot in conversation-based assessments.


[click to enlarge]

Types and Examples of Conversation-Based Systems

Most conversation-based systems use a conversational user interface, such as speech-based or text-based conversational agents (or chatbots), to receive input from users and deliver output to them through natural language processing [8]. Conversation-based systems vary in terms of how accurately they can simulate human dialogue mechanisms, but all aim to comprehend natural language, formulate adaptive responses, and implement pedagogical strategies to help student learning (see Table 1). Text-based forms are generally used to facilitate conversations where students type their questions (or answers) via a keyboard (e.g., QuizBot; see Figures 2 and 3) [10]. Speech-based systems employ embodied conversational agents that can convey emotion and gestures, as well as a text-to-speech synthesis that can enable voice input and output (e.g., ARIES; see Figure 4) [13].

Table 1. Conversation-based systems by subject, grade, and purpose.

System

Subject focus

Grade

Purpose

AutoTutor [14]

Computer literacy

College

Tutoring

Ms. Lindquist [15]

Algebra

College

Tutoring

Geometry Explanation [16]

Problem-solving in geometry

High school

Tutoring

iSTART [17]

Reading comprehension of science texts

College

Training

CALMsystem [18]

Science

Primary school

Tutoring

MetaTutor [19]

Biology

High school

Tutoring

ARIES [13]

Scientific inquiry

College

Tutoring

EER-Tutor [20]

Introductory database course

Undergraduate

Tutoring

The Request Game [21]

English

College

Tutoring

Affective AutoTutor [22]

Computer literacy

College

Tutoring

Beetle-2 [23]

Basic electricity and electronics

College

Tutoring

DeepTutor [24]

Science

College

Tutoring

KSC-PaL [25]

Computer Science

Undergraduate

Tutoring

Rimac [26]

Physics

High school

Tutoring

QuizBot [10]

Science, safety, and English vocabulary

College

Assessment

ELLA-Math [27]

English and math

Middle school

Assessment

Figure 2. A conversation between the student and chatbot when the student types correct answer (left) and incorrect answer (right) in QuizBot [10]. Reprinted with permission.


[click to enlarge]

Figure 3. The conversation architecture of QuizBot with sample responses [10]. Reprinted with permission.


[click to enlarge]

Figure 4. Screenshot of speech-based conversational agents in an ARIES trialogue. (Source: http://ace.autotutor.org/IISAutotutor/index.html)


[click to enlarge]

 

Advantages of Conversation-Based Systems

Student learning gain and performance. Previous studies reported that conversation-based systems produced significant learning gains depending on the comparison condition (e.g., reading nothing or textbook, interacting conversation-based systems) and subject [28]. Using conservation-based systems improved students’ learning gains by nearly one letter grade compared to reading the textbook for an equivalent amount of time [5, 28, 29]. They were also effective on students’ learning gains for deep levels of comprehension in comparison with reading nothing, starting at pretest, or reading the textbook for an amount of time equivalent to that involved in interacting conversation-based systems [7]. Heffernan found a strong positive impact on learning and reported that students who used a conversational agent solved fewer problems but learned as well as or better than students who were simply given the solution [15]. This finding has been characterized as “less is more.”

Researchers compared a conversation-based system (i.e., an interactive chatbot) with a flashcard app, both of which used the same algorithm, same question selection model, question pool, hints, and explanations, across science, safety, and English vocabulary [10]. They discovered that when students used the chatbot, they gave more correct responses to factual knowledge questions for all three subjects. The agent was substantially more effective in assisting students in the recall and recognition of factual knowledge. The researchers suggested that conversation-based systems could be used in other domains to measure factual knowledge such as biology and history. Furthermore, in comparison to constructed-response items that require students to type their responses, Jackson and his colleagues found CBA items allowed 41% of students to submit a more thorough response and thereby enhanced their scores [30]. Researchers also compared CBA items with multiple-choice items by examining how clearly students can explain their answers [31]. The results showed students who participated in CBAs provided better explanations for their problem-solving steps than those who answered a multiple-choice item and explained their choice selection.

Student motivation. An interesting study by Ruan et al. showed despite being more time-consuming to practice with compared to a flashcard app, the students found the CBA more beneficial for learning and chose to spend more time with the system [10]. Ruan and her colleagues determined conversation-based systems are more engaging to use but less efficient in terms of time spent; however, students can still prefer these systems over conventional digital assessments as they enhance motivation and learning. Studies found most students appreciated how well the agents asked follow-up questions and provided guidance and feedback to help them comprehend the questions [27]. Ruan et al. suggested feedback helps enhance the testing effect in CBAs regardless of whether the attempted answers are correct or not. Another study revealed that conversation was effective in keeping students motivated and had a strong positive impact on student motivation [15]. In another study, students found conversation-based systems as an engaging and easy way to practice and learn English as a second language [21, 32]. Students who interacted with the conversational agent were shown to be more actively engaged in learning activities and outperformed those who did not [32]. Moreover, students expressed an interest in using conversational agents in their other subjects. Researchers also investigated the emotional states of students when interacting with conversation-based systems [22, 33]. Among different emotion states, they found engagement the most frequent state and a significant relationship between learning and the affective state of engagement.

When is a Conversation-Based Assessment More Appropriate?

Four elements must be considered to design CBA systems that function better than conventional digital assessments: the type of the system, the subject, the knowledge level of the learner, and the sophistication of the dialogue strategies. In terms of the type of the system, researchers investigated and compared the improvements in learning with speech-based and text-based systems [5, 6]. Most of the improvement was due to the dialogue content of what the agent said, not the speech or animated facial display. As a result, research revealed that what is said in a conversation has an impact on student learning. This emphasizes the fact that the medium does not communicate the message; however, the message is the message. Regarding the subject, research shows mixed results. Both qualitative (or verbal) and quantitative (or numerical) content have been reported to operate successfully in conversation-based systems. Graesser and his colleagues [28] recommended designing qualitative content rather than quantitative content because their system functioned better when the content was on qualitative domains. However, other researchers reported that their system was better at reading responses to questions that needed students to write a number as an answer compared with questions that required students to write words [27]. In terms of learner knowledge, CBAs may work better when they are intended for students with low to medium levels of knowledge rather than students with a high degree of knowledge. When students with a high level of expertise interact with a CBA, both dialogue participants (the chatbot and the student) demand a higher level of precision, which can lead to a larger probability of failing to match both participants’ expectations [28]. AutoTutor, for example, performed best when the agent and learner shared little or no expertise [28]. In terms of the sophistication of dialogue strategies, Graesser and his colleagues proposed expectation and misconception tailored dialogue (EMT) based on their observations that human tutors rarely use sophisticated tutoring strategies instead they guide students based on EMT, but their strategies are still effective [28]. Following the EMT dialogue, they found the conversation-based systems operate successfully.

What Makes Conversation-Based Assessments Better?

Similar to conventional digital assessments, CBA starts conversations with a question. Conventional digital assessments would come to an end after students respond to each question. Within a CBA, however, to guide students by feedback, hint, or follow-up question, the student responses are reviewed and classified into one of the subdivisions (e.g., correct, incorrect, or insufficient response). Thus, they help students learn by holding a conversation in natural language by adapting the conversation to student responses [34].

CBAs necessitate not only the construction of responses to questions but also the use of natural-language processing to deliver adaptive follow-up prompts that target specific information [30]. Students can convey their understanding in their own words in adaptable and suitable ways since conversations are iterative. Unlike conventional digital assessments, CBA aims to maximize student learning, performance, and motivation. Thus, tasks in CBA systems are designed not only to allow students to demonstrate their knowledge, skills, and abilities but also to scaffold their learning and provide relevant feedback to students. CBA systems outperform conventional digital assessment methods because they combine the conversation and measurement in a single standardized setting [35]. CBAs can be very powerful when used in parallel with existing teaching methods as these systems can inform teachers to offer further support where weaknesses are identified. Thus, the evidence acquired by CBA is fundamentally different from the evidence gathered by conventional digital format items. Teachers could potentially benefit from the additional evidence provided by CBAs as they can provide further insight into student knowledge compared to conventional digital assessments such as student progress and which students require additional help. CBAs can also allow test takers to see what they know and where they need to study more. Thus, CBA has the potential to expand what conventional digital assessments can provide.

Are Conversation-Based Assessments Perfect?

We must be aware of the current limitations of designing and using a CBA system, which is still a work in progress and more difficult than anticipated [35]. One of the main issues is that conversational mechanisms in automated environments are unable to handle most student inputs and provide relevant and correct responses [34], and students may become frustrated because of such failures (e.g., unresponsiveness to what the student says) [22]. Second, CBAs are typically designed to provide positive feedback in exchange for a more complete answer. That is, even if a student’s answer is partially correct, the agent will still give them negative or neutral feedback, which can confuse, demotivate, and frustrate students [34]. Third, conversational agents can deliver negative feedback if a student’s correct answer is mismatched (or the opposite scenario). For example, even if a student’s initial response is correct, the system may occasionally convey responses that contain misspelled words to other discussion pathways. If this happens, students may receive inappropriate or irrelevant follow-up questions, hints, or feedback [27]. Lopez et al. reported nearly half of the students said the system did not always understand their responses and that they were frequently frustrated [27]. The ideal amount of interaction can be another limitation on CBA systems. Student interviews and surveys showed that many students regarded dialogues in conversational agents to be too protracted [26]. Researchers found that students rated conversational agents as unsatisfactory in providing feedback since they may spend more time on concepts that students already know and less time on concepts that they struggle with [26]. These findings imply students may become frustrated if they believe the agent is forcing them to engage in lengthy discussions about a concept they already understand rather than tackling a concept for which they need help.

Practical Implications

It is difficult to create engaging and motivating assessments due to the lack of interaction and thus modification of the ways to deliver conventional digital assessments is necessary and inevitable. CBAs have an important role in the future of assessments, and they can become a solution to the engagement and motivation problems that instructors often face. CBAs efficiently leverage content through conversations with students and subsequent interactions to target specific information that may be absent from their initial responses [35]. They can guide learners on what to do next, ask questions, provide hints to elicit extra or missing information, repeat or rephrase questions, hold social interactions, and provide feedback on the quality of responses through the natural flow of conversation. Educators can utilize CBA systems to motivate students to take assessments by holding conversations as well as assess and scaffold their learning through CBA systems. To integrate these systems into their instruction and assessment practices, educators need to decide the appropriateness of the use of CBA considering the aforementioned four factors (i.e., the type of the system, the subject, the knowledge level of the learner, and the sophistication of the dialogue strategies). For example, if there is a student cohort with low to medium ability levels, CBAs can provide tailored support to each student, as well as build on each student’s strengths, interests, and abilities to improve engaged and independent learning while assessing their learning. However, a cohort consisting of students with a high ability level can be unsatisfied with the CBA as they may have to spend time on the concepts that they already know due to the CBA structure.

Overall, CBAs can provide both interactivity and assistance that are missing in conventional digital assessments and thus improve learning and motivation. They help students to behave like how they would experience a typical conversation on a particular topic by allowing them to convey their knowledge and ideas using their own words [35]. Richer interaction opportunities with CBAs can advance personalization and improvement of learning and motivation. Such advanced assessment systems also provide a cost-effective way in courses with many students. There is no doubt that CBA systems have the potential to become increasingly popular in educational assessment.

References

[1] Goel, A. K., and Polepeddi, L. Jill Watson: A virtual teaching assistant for online education. Georgia Institute of Technology. School of Interactive Computing Technical Reports. 2016. 

[2] Chi, M. T., Roy, M., and Hausmann, R. G. M. Observing tutorial dialogues collaboratively: Insights about human tutoring effectiveness from vicarious learningCognitive Science 32, 2 (2008), 301–341.

[3] Corbett, A. Cognitive computer tutors: Solving the two-sigma problem. In M. Bauer, P. Gmytrasiewicz and J. Vassileva (Eds.), In Proceedings of the 8th International Conference on User Modeling. Springer, Berlin, Heidelberg, 2001, 137–147.

[4] Fletcher, J. D. (2003). Evidence for learning from technology-assisted instruction. In J. H. F. O’Neil and R. Perez (Eds.), In Technology Applications in Education: A Learning View. Erlbaum, Hillsdale, NJ, 79–99.

[5] Graesser, A. C., Jeon, M., and Dufty, D. Agent technologies designed to facilitate interactive knowledge construction. Discourse Processes 45, 4–5 (2008), 298–322>

[6] Graesser, A. C., Lu, S., Jackson, G. T., Mitchell, H. H., Ventura, M., Olney, A., and  Louwerse, M. M. AutoTutor: A tutor with dialogue in natural language. Behavior Research Methods, Instruments, & Computers 36, 2 (2004), 180–192.

[7] Graesser, A. C., Li, H., and Forsyth, C. Learning by communicating in natural language with conversational agents. Current Directions in Psychological Science 23, 5 (2014), 374–380.

[8] Maedche, A., Legner, C., Benlian, A., Berger, B., Gimpel, H., Hess, T., Hinz, O., Morana, S., and  Söllner, M. AI-based digital assistants: opportunities, threats, and research perspectives. Business & Information Systems Engineering 61, 4 (2019), 1–29.

[9] Winkler, R., Söllner, M., Neuweiler, M. L., Conti Rossini, F., and Leimeister, J. M. Alexa, can you help us solve this problem? How conversations with smart personal assistant tutors increase task group outcomes. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, 2019, 1–6.

[10] Ruan, S., Jiang, L., Xu, J., Tham, B. J. K., Qiu, Z., Zhu, Y., Murnane, E. L., Brunskill, E., and  Landay, J. A. Quizbot: A dialogue-based adaptive learning system for factual knowledge. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, 2019, 1–13.

[11] So, Y., Zapata-Rivera, D., Cho, Y., Luce, C., and Battistini, L. Using trialogues to measure English language skills. Educational Technology & Society 18, 2 (2015), 21–32.

[12] Attali, Y., and Powers, D. Effect of immediate feedback and revision on psychometric properties of open-ended GRE subject test items.  ETS Research Report Series. 2008.

[13] Cai, Z., Graesser, A. C., Millis, K. K., Halpern, D. F., Wallace, P. S., Moldovan, C., and  Forsyth, C. ARIES: An intelligent tutoring system assisted by conversational agents. In Artificial Intelligence in Education IOS Press, 2009, 796–796.

[14] Graesser, A. C., Wiemer-Hastings, K., Wiemer-Hastings, P., Kreuz, R., and Tutoring Research Group. AutoTutor: A simulation of a human tutorJournal of Cognitive Systems Research 1, 1 (1999), 35–51.

[15] Heffernan, N. T. Web-based evaluations showing both cognitive and motivational benefits of the Ms. Lindquist tutor. In Artificial intelligence in Education. IOS Press, 2003, 115–122.

[16] Aleven, V., Popescu, O., and  Koedinger, K. R. Towards tutorial dialog to support self-explanation: Adding natural language understanding to a cognitive tutor. In J. D. Moore, C. L. Redfield and W. L. Johnson (Eds.), Proceedings of the 10th International Conference on Artificial Intelligence in Education. IOS Press, Amsterdam, 2001.

[17] McNamara, D. S., Levinstein, I. B., and Boonthum, C. iSTART: Interactive strategy training for active reading and thinking. Behavior Research Methods, Instruments, & Computers 36, 2 (2004), 222–233.

[18] Kerly A., Ellis R., and Bull S. CALMsystem: A conversational agent for learner modelling. In Ellis R., Allen T., Petridis M. (Eds.), Applications and Innovations in Intelligent Systems. SGAI 2007. Springer, London, 2008.

[19] Azevedo, R., Witherspoon, A., Graesser, A., McNamara, D., Chauncey, A., Siler, E., Cai, Z., Rus, V., and  Lintean, M. MetaTutor: Analyzing self-regulated learning in a tutoring system for biology. In Artificial Intelligence in Education. IOS Press, 2009.

[20] Weerasinghe, A., Mitrovic, A., and Martin, B. Towards individualized dialogue support for ill-defined domains. International Journal of Artificial Intelligence in Education 19, 4 (2009), 357–379.

[21] Yang, H. C., and Zapata-Rivera, D. Interlanguage pragmatics with a pedagogical agent: The request game. Computer Assisted Language Learning 23, 5 (2010), 395–412.

[22] D’Mello, S., and Graesser, A. AutoTutor and affective AutoTutor: Learning by talking with cognitively and emotionally intelligent computers that talk backACM Transactions on Interactive Intelligent Systems 2, 4 (2013), 1–39.

[23] Dzikovska, M., Steinhauser, N., Farrow, E., Moore, J., and Campbell, G. BEETLE II: Deep natural language understanding and automatic feedback generation for intelligent tutoring in basic electricity and electronics. International Journal of Artificial Intelligence in Education 24, 3(2014), 284–332.

[24] Rus, V., D’Mello, S., Hu, X., and Graesser, A. C. Recent advances in conversational intelligent tutoring systems. AI Magazine 34, 3 (2013), 42–54.

[25] Howard, C., Jordan, P., Di Eugenio, B., and  Katz, S. Shifting the load: A peer dialogue agent that encourages its human collaborator to contribute more to problem solvingInternational Journal of Artificial Intelligence in Education 27, 1 (2017), 101–129.

[26] Katz, S., Albacete, P., Chounta, I. A., Jordan, P., McLaren, B. M., and Zapata-Rivera, D. Linking dialogue with student modelling to create an adaptive tutoring system for conceptual physics. International Journal of Artificial Intelligence in Education 31, 3 (2021), 1–49.

[27] Lopez, A. A., Guzman-Orth, D., Zapata-Rivera, D., Forsyth, C. M., and  Luce, C. Examining the accuracy of a conversation-based assessment in interpreting English learners’ written responsesETS Research Report Series. 2021.

[28] Graesser, A. C., Chipman, P., Haynes, B. C., and Olney, A. AutoTutor: An intelligent tutoring system with mixed-initiative dialogueIEEE Transactions on Education 48, 4 (2005), 612–618.

[29] Nye, B. D., Graesser, A. C., and Hu, X. AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education 24 (2014). 427–469.

[30] Jackson, G. T., Castellano, K. E., Brockway, D., and Lehman, B. Improving the measurement of cognitive skills through automated conversationsJournal of Research on Technology in Education 50, 3 (2018), 226–240.

[31] Aleven V., Ogan A., Popescu O., Torrey C., and Koedinger K. Evaluating the effectiveness of a tutorial dialogue system for self-explanation. In J. C. Lester, R. M. Vicari and F. Paraguaçu (Eds.), Intelligent Tutoring Systems: Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, 2004.

[32] Hong, Z. W., Chen, Y. L., and Lan, C. H. A courseware to script animated pedagogical agents in instructional material for elementary students in English education. Computer Assisted Language Learning 27, 5 (2014), 379–394.

[33] Craig, S., Graesser, A., Sullins, J., and Gholson, B. Affect and learning: An exploratory look into the role of affect in learning with AutoTutorJournal of Educational Media 29, 3 (2004), 241–250.

[34] Graesser, A. C. Conversations with AutoTutor help students learnInternational Journal of Artificial Intelligence in Education 26, 1(2016), 124–132.

[35] Jackson, G. T. and Zapata-Rivera, D. Conversation-based assessmentR&D Connections 25 (2015), 1–8. 

About the Authors

Seyma N. Yildirim-Erbasli, M.A., is a doctoral candidate in the Measurement, Evaluation, and Data Science program at the University of Alberta in Edmonton, Alberta. Her research interests include psychometrics, digital assessment, educational data mining, and interdisciplinary research in digital assessment and natural language processing.

Okan Bulut is an associate professor in the Measurement, Evaluation, and Data Science program and a researcher at the Centre for Research in Applied Measurement and Evaluation (CRAME) at the University of Alberta. Dr. Bulut teaches courses on psychometrics, educational measurement, and statistical modeling. Also, he gives workshops and seminars on advanced topics, such as data mining, big data modeling, data visualization, and statistical data analysis using software programs like R and SAS. His current research interests include educational data mining, big data modeling, digital assessments, and natural language processing applications in education.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Copyright © ACM 2021 1535-394X/2021/12-3495533 $15.00



Comments

  • There are no comments at this time.