AWS Lex Chatbot in the Classroom

Published on: March 6, 2019
Michael Gillian, Senior Software Engineer

Background and Overview

Many current web applications provide some form of automated help and support for their customers. The value of this is two-fold. By providing an intuitive and high quality automated system, customers can self-serve many common questions without needing to wait for a person to provide the answers. In addition, the people who would otherwise be fulfilling this service now become available to provide more complex support or to perform other useful tasks. It is not necessary to handle all requests to provide value; even a small reduction through automated support can add value and increase customer satisfaction.

The software landscape continues its rapid evolution, with simple, nearly out of the box solutions available to support this level of automated support. Amazon Web Services (AWS) now supports the Lex Chatbot service amongst its offerings. Using the same technology that drives Amazon’s Alexa home automation tool, Lex is a simple, yet extensible tool for providing an automated online chatbot experience for companies who want to leverage the AWS ecosystem. Deploying an AWS Lex Chatbot in your web or mobile application is easy, and extending it to support your specific needs is much easier than in the recent past.

Having a digital assistant in the classroom has the potential to provide significant value for both the teacher and the student. For example, rather than the teacher having to repeat answers continually, or identify resources for students individually, a digital assistant can fill this role quite effectively. The potential to provide significant value assumes that the system's developer understands the types of problems that automation can solve, and that the school has the resources to invest in its implementation. If some small timesavers can be identified, schools with technical resources can leverage a chatbot in their classrooms.

The Initial Chatbot

We implemented a chatbot for the classroom using AWS tools to provide some usable functionality and to demonstrate how AWS simplifies the development process.  As this was a proof of concept project, we defined fairly simple yet meaningful goals. We expect that with this foundation, we will be able to extend the chatbot with new features and capabilities over time. The initial version addresses this problem statement:

“I would like an automated assistant that can perform two tasks in response to student questions: (1) Define a term using the glossary definition from approved course material, and (2) provide a list of related terms, based on the same source.”

You can see from this that as developers, we’ll have several tasks to complete:

  • Parse a textbook so that we can access glossary definitions.
  • Parse a textbook so that we can identify related terms.
  • Provide a front-end for the user to ask questions.
  • Provide a back-end that can identify which questions were asked.
  • Capture the details of the question, including the term that the student asked about.
  • Provide a service that will return the glossary definition.
  • Provide a service that will return related terms.

For this exercise, another developer and I worked for about two weeks to create a functional proof of concept (POC). This POC is intended to show where deep technical skills are required, and what capabilities are readily supported by Amazon. The expectation is that this POC will help to identify the types of AWS capabilities that would truly benefit a teacher and their students.

Terminology

Lex is a Natural Language Parser (NLP). When people talk to one another, we expect that they understand not only the words used, but the meaning behind them as well. We understand the reasons that the other person says what they do, and we respond accordingly. Intuitively, we are able to break down the sounds we hear and the text we read into words and meaning. Computers, while rapidly evolving, are not quite there yet, but we can simplify things a bit by limiting the types of questions that can be asked and answered.

Understanding the language of Lex greatly simplifies the implementation. The four key concepts are ‘Intent’, ‘Utterance’, ‘Slot’, and ‘Fulfillment.’

  • Intent – think of intent as the goal of the question being asked. We want to know something, and we ask the chatbot to give us some information. We “Intend” for the chatbot to answer the question. It is very reasonable for a person to ask the same type of question in many different ways. For example, I could say “Hello,” “Hi,” “How are you doing?,” “What’s up?,” etc., and the person listening would understand that I am simply greeting that person. The Intent here is a “Greeting.” We may or may not expect any useful information back from the person (or chatbot), but we would very likely expect some sort of response back, such as “Hello” or “Nice to meet you.” We might ask “What is this student’s grade?” We intend for Lex to respond with the grade.
  • Utterance – think of an utterance as a string of syllables we say or a string of words we write. Lex uses AI/Machine Learning algorithms to try to match your utterances to your intent. Within the AWS Lex console, you can enter literally hundreds of utterances that map to each intent, and the Lex application will use the provided utterances to try to hone in on your specific request.
  • Slot – Think of a slot as a specific topic, object, or name that you are asking about. You might say to Lex, “Tell me about the band ‘Fleetwood Mac.’” It would be unreasonable (or even impossible) for you to list every possible utterance for every possible band. Instead, you code your utterance as something like “Tell me about the band {bandName}.” In this case, you would expect Lex to identify the Intent and store “Fleetwood Mac” in a variable. You could then do something interesting with the variable, such as query a database. The variable in this case is called a “Slot.”
  • Fulfillment – Now that the system has all of the needed information (i.e., the chatbot knows the question that is being asked by matching the intent and the slots), the custom program can do whatever processing is necessary to answer the question. Maybe the chatbot has a standard answer. Maybe the chatbot needs to look something up in a database. Maybe the chatbot needs to call an external system. Fulfilling the request is the final step.

Intents can have many slots, depending upon how much information you, as a user, will need to provide for the chatbot to do its work. For example, if you want to order a pizza, you need to specify the Size, the Sauce, the Toppings, etc. Intents do not require slots, but slots allow for simple intents to handle many different questions.

Information Flow

When people talk, they are sharing information that they think is useful. Sometimes, they ask questions of each other; sometimes, they make statements about what they know or believe. Fundamentally, Lex is trying to answer a question that you ask, and the process starts when you enter your first text into the chatbot input field.  Based on what you enter, Lex tries to determine your intent. If Lex fails, it will tell you and ask for more information. Then, once Lex identifies an intent, Lex looks for slots that have not been filled. If Lex has unfilled slots, Lex prompts you for the missing values. Once Lex has the intent and the slots, it fulfills the request by doing some work and returning a response. Once complete, Lex is ready to do it again for your next question.

The AWS Ecosystem

Writing software has always been tedious, and fraught with errors and missed requirements. Over the years, companies and developers have written standards, frameworks, and libraries that help developers deliver new functionality faster. AWS has evolved from an Infrastructure-as-a-Service to a computing platform that provides many capabilities that can be easily linked to simplify the development process. The developer can focus on the parts that are new and unique, while letting Amazon handle the parts that many of its customers have in common.

In this example, we provide a brief description of the various AWS components that are part of this chatbot solution.  

  • Lex – a configurable and extensible chatbot application. From the AWS Console, you can configure your intents, utterances, slots, and fulfillment actions.
  • Lambda – a serverless computing platform provided by AWS that allows Lex to call lambda functions that execute custom code, beyond what Lex does by default.
  • S3 – an online virtual drive, used for storing data. S3 can be used to hold files before, during, and after processing by other AWS tools.
  • Glue – an Extract, Transform, and Load (ETL) tool that, in this circumstance, presents data stored in files on S3 as a virtual database.
  • Athena – a query tool that can read/write data from a Glue data source.
  • IAM – Identity and Access Management tool for securing access to resources within the AWS ecosystem. Access to files and processes can be restricted to Roles that have the correct permissions, thus ensuring that only the right people and processes can access your mission critical information.
  • Cognito – a simple and secure tool for controlling access to your web and mobile applications. Works with IAM roles to prevent unauthorized access.
  • Kinesis – a high performance and highly available pipeline for sending data between systems. Kinesis prevents locks and bottlenecks from slowing the user’s experience by handling data with guarantees of delivery. It supports additional routing and translating of inbound data through lambda functions and can publish data to multiple sources, including S3.
  • Sagemaker – an AI/Machine Learning environment for running Python scripts. This environment is very popular in the AI/ML community as it allows you to create Jupyter notebooks that run Python scripts interactively. Much AI/ML work is currently done in Python, and many AI/ML libraries exist that work in this environment.
  • AWS SDK – A suite of programming language developer toolkits for writing custom code for your environment and skillset. In this application, we used the Java, JavaScript, and Python toolkits, but many other SDKs are available.

Parsing Textbooks

For our example, we chose OpenStax textbooks as our data source for a couple of different reasons. Because they are open source, we weren’t worried about licensing requirements. They also come in multiple formats, so that we could explore different ways of parsing the data. We needed to parse these textbooks so that we could transform the data into a format that was easy to query.

For glossary definitions, my partner, Andrew Lasak, parsed an OpenStax coursebook in PDF format to extract the terms and definitions.  He used the Apache PDFBox java library to parse the file. PDF documents are not well organized for extracting information; PDFBox turns a PDF document into a stream of words and formatting.  Andrew searched the stream for a key word to identify where the glossary terms were located, and then he searched within that section for specific formatting to identify the terms and definitions.  In an effort to address common spelling variations, we used the Apache OpenNLP java package to perform word stemming. Stemming is the process of reducing a word to a common foundation. For example, the words "Look’" "Looks," and "Looking" all have the common stem of "Look." When the user provides a term, the chatbot can then stem the term and look for a term with a matching stem in the glossary. This increases the chances of finding a match, since more than one term can have the same stem. Once Andrew had the terms, stemmed terms, and definitions, he used the Apache Commons CSV java library to write the data to a CSV file.  Andrew stored the resulting CSV files on S3, so that we could treat them as a database using AWS Glue.

Note: Although the OpenStax PDFs worked for us, I would recommend that you avoid using PDF as your base format. Andrew was able to parse multiple courses using his code, because all of the OpenStax course material had a consistent layout and semantics. If your coursework is created using different styles and semantics, extracting the data will require customization for each source file.

For the related terms, Andrew used a slightly different approach. OpenStax provides coursebook content in multiple formats. Because of the difficulties of parsing PDF, Andrew chose to parse coursebooks in an ePub format. He used a combination of Apache Tika and epublib to turn the coursebook into a stream of words. He identified where a section in the coursebook began and then captured each word in that section. He then created a JSON document that contained a list of the words in each section using the FasterXML/Jackson library. This JSON document is an intermediate step; it contains normalized data that is easier to process for generating a list of related terms.

Note: Like the OpenStax PDF documents, the ePub documents need to be in a consistent format; otherwise, each file may require some customization to parse successfully.

My task was to generate lists of related terms from the JSON document created by Andrew.  As the purpose of this project was a proof of concept and technology demonstration, I was looking for readily available Machine Learning tools to include. At Unicon, we worked on another project that demonstrated relationships between objects, and that led me to the Latent Dirichlet Allocation algorithm (LDA). LDA generates a "Topic Model," which is a popular way of saying that terms are related to each other based on context.  SciKit-Learn, a Python Machine Learning library, has an implementation of LDA that I could use out of the box. The LDA algorithm takes in a list of words that are part of a group and for each word in the group, outputs a list of words in the group that are related to it. Since this work would be in Python, I took this opportunity to use a Jupyter notebook and AWS Sagemaker. A Jupyter notebook is effectively a Python script that can be run interactively and in steps. The user can include comments in the notebook alongside code. The notebook can be easily saved and distributed to other computers that know how to process Jupyter notebooks, making collaboration and distribution very simple.  AWS Sagemaker is an application that provides an environment for running Jupyter notebooks with the AWS environment. Setting up a Jupyter notebook server can be complicated; AWS Sagemaker handles all of that complexity automatically, making it very easy to share and run notebooks. The Jupyter notebook I created used the SciKit-Learn library and other standard Python libraries to load the JSON document created by Andrew, run the LDA algorithm, and publish the results as a CSV file. This CSV file was stored on S3 and could be read using AWS Glue, like the glossary file created above.

Creating a Lex Chatbot

I used the AWS Lex Console to create the chatbot. The two obvious intents are for querying the glossary definition and querying the related terms

The Glossary Intent is named "define" in this example.  Here is a list of sample utterances that the AWS Lex algorithm uses to determine the intent.

utterances image
 
The wide variety of utterances trains AWS Lex to identify utterances that have not been explicitly listed. You can also see how the term is embedded in the utterance, so that any term can be matched.

Here we display how to configure which slots are defined for a given utterance.

slots image

We have two slots identified: term, and subject, but the Spring Boot application, when sending information to AWS Lex, provides the subject automatically after it has been initially selected by the student.

lex-dialog image

In this example, the Lex Console chat history shows that we chose the subject “biology” explicitly and then asked the chatbot to provide the glossary definition for the term "cell."  We can see that the response is a JSON object, so that we can perform additional processing and render it in the Spring Boot application.  If you review the list of predefined utterances above, you can see that the utterance I used "tell me about a cell" is not actually in the list. AWS Lex is smart enough to successfully determine the intent without a perfect match.

I created other intents to provide some static help text and provide a static greeting. I did not spend much energy creating many utterances for these intents; this can be done easily from within the console and does not require coding. Within the chatbot, I rely on two slots, a “term” slot and a “subject” slot. With an intent, a subject, and a term, the chatbot has enough information to query external systems to obtain a response for the intent. 

Lex Lambdas

Each intent can support two lambda functions that perform useful tasks during processing. Neither are required; Lex provides default behaviors. The first lambda, the “DialogHook,” is called once the intent is identified. If a lambda function is identified here, you can provide custom validation logic and return a response to Lex that will ask the user for information if needed. I used this hook to preserve important information from the dialog. Once the user fills slots, I wanted to store that information in the conversation’s “session” and eliminate the need for the user to ever ask again. I also used this hook to retrieve information from the session when the intent changes.

The user may want to ask for information about a term that is multiple words. Assume for a moment that the subject is Astronomy, and we’d like to get some definitions. A single word term, such as “perihelion,” can be handled easily when querying with the “term” slot. Multiple word terms, such as “asteroid belt,” require special processing. The DialogHook lambda provides a place for that additional processing. I’ll describe that additional processing later.

The second lambda, the “FulfillmentHook,” is called once the intent is determined and the slots are all filled. If this lambda is not defined, you can use the AWS Lex Console to provide some generic responses. With this lambda, you can perform any task that you can code within an AWS Lambda function. I use this lambda to query the AWS Glue data sources that I created after the course material was parsed and stored on S3.

Here you can see how lambdas are associated with the intent for both the DialogHook and the FulfillmentHook.

lambda-and-slots image

As part of the incoming parameters for an AWS Lambda function, Lex passes a Lex Request JSON object. When completing the request, the lambda function returns a Lex Response JSON object. As long as your lambda function can read and write JSON, everything in between is possible. Refer to the AWS Lex Developer’s Guide for information on the structure.

The lambda’s responsibility is to:

  1. Verify that all of the slots have been filled. If not, build and return a LexResponse with the slot that needs to be filled.
  2. Call external systems needed to fulfill the intent.
  3. Send data to the configured Kinesis Firehose regarding the fulfilled intent. We can write additional code that will process this data, allowing us to analyze how the chatbot is being used, and to ensure that the chatbot is returning reasonable responses.
  4. Send a LexResponse back to Lex with the result.

Lessons Learned

Most of the development described above was very straightforward. The AWS Athena Java SDK made querying the Glue tables very simple. However, there were a few interesting things I picked up along the way.

Each conversation with the Lex chatbot can maintain some session information. I used these “sessionAttributes” to store information across intents. For example, if I ask for a glossary definition, I will be prompted for a subject and a term, and then I’ll get the fulfillment response with the definition. However, if I then ask for related terms, I don’t want to have to enter the subject and term again. Every time I send a LexResponse, I make sure that I capture the subject and term in sessionAttributes. When I ask a new question without choosing a subject or entering a term, the DialogHook looks for these values in the Lex Request’s sessionAttributes and uses them if they are found, instead of asking for them again.

It gets interesting when the term is multiple words. When I ask for a glossary definition, for example, I will be prompted to enter a term. If I enter a term that is multiple words, the slot will not be filled in the LexRequest object sent to the lambda function. When the DialogHook lambda searches the slot, it’s empty. When it looks in the sessionAttribute, it’s empty (since this is our first time asking for it). How do we get that information? Fortunately for us, an answer is in the LexRequest field named “inputTranscript.” This field has all of the text the user submitted. Lex doesn’t know how to handle slots with multiple words. Fortunately, we can handle this ourselves with the DialogHook lambda function.

When we build a LexResponse, we can use sessionAttributes to pass information to Lex, knowing that Lex will return it with the next LexRequest. When we build a LexResponse with a DialogAction of “ElicitSlot,” we encode the LexResponse as a string and pass it along with the LexResponse in its sessionAttributes. We need to be careful not to recurse endlessly here. The user will be prompted for the term, and the user will respond with a term to search. This will generate a new LexRequest, including the sessionAttributes. When the DialogHook lambda processes, it will look in the slot, which is empty. It will then look for the term in the sessionAttribute which is also empty. It can then look for the encoded LexResponse in the sessionAttribute, decode it, verify that Lex was trying to Elicit a Slot and which slot it was. The DialogHook lambda then uses the inputTranscript field as the value of the slot being elicited. This is very convoluted, but this is a valid workaround for the current limitations of Lex. I would expect that this limitation will be overcome in future releases.

On a completely different note, I made an architectural decision to have all intents use the same lambda, and one lambda handled both DialogHook and FulfillmentHook integration points. I did so in order to consolidate code, but that may have not been the best solution. I haven’t seen any obvious limitations, but I haven’t seen any best practices yet regarding coding lambdas.

In addition, I was concerned that each time the lambda was called, it might be slow instantiating services required by the intent. However, through testing, I discovered that the lambda uses an executionContext, which keeps objects in memory across calls. Needed services don’t need to be recreated each time. The first time the lambda is called may be slow, but subsequent calls perform very well.

The Front End

For testing purposes, you can do just about everything from the AWS Lex Console. In production, you will want some other platform. Out of the box, you can integrate your published Lex chatbot with Facebook, Kik, Twilio, and Slack. For this project, however, I chose to stand up my own web application. My assumption was that it should be fairly easy to integrate, and I was not disappointed.

I chose Spring Boot as my platform, as I can spin up a Spring Boot starter application fairly quickly. Within an hour, I had a working controller and web page. I did not intend for this POC to become a Spring Boot tutorial, so I didn’t worry about configuring Spring Security. I also didn’t intend to demonstrate any JavaScript frameworks, like React or Angular. I was able to find boilerplate code and within a couple of hours had my chatbot running in my webpage. There were three parts:

  1. HTML – I used some sample HTML to create a form and a div to hold the dialog and responses. I also included links to a CSS file and a JavaScript file.
  2. CSS – I used some sample CSS styling
  3. JavaScript – I used some sample JavaScript that submitted the form field to the Lex chatbot, running on AWS. The sample also handled the callback script. I only needed to update the AWS Cognito information in the script, based on the configuration I created.

I suspect that you can easily configure Spring Security to secure access to the chatbot by user role. I also suspect that you can easily use React or Angular to interact with Lex. The existing JavaScript is fairly simple.

Security

I did need to create an AWS Cognito access code to include in my website. This code allows your website to connect to AWS anonymously, using AWS Roles that you configure, and you can do this from the AWS Console. I did not make any effort to hide the code; for production systems, you will not want the code embedded as plaintext. In addition, you will likely want to restrict the Cognito IAM roles as much as possible. The webpage is generated by Thymeleaf, and Spring controllers have access to environment information. You will need to determine what is best for your environment when passing secured information around your application.

In addition, the lambda function needs to have an IAM role created that has sufficient privileges to access the various resources. In my case, I needed access to S3, Kinesis, Athena, and Glue. Your needs will depend upon what services you choose to use. For this application, I only needed write permissions for Kinesis to S3. Depending upon your policies and needs, you can create very complex roles to manage your application within AWS IAM.

Results

The resulting Spring Boot application integrates with the AWS Lex Chatbot and provides initial functionality that demonstrates the potential for Machine Learning in the classroom.

spring-boot image

The mocked-up LMS above shows how an automated “Teacher’s Assistant” may be integrated into a course. We can see information about the current course and course material. We can also see a sample dialog, where the student asks the Teacher’s Assistant for their grade, the topic for the day, and information about an interesting term in the course. The Teacher's Assistant may eventually become the driver for the course content, assuming additional Machine Learning techniques are applied. For example, the Teacher's Assistant may be able to recommend relevant content to the student, based on any identified gaps in their knowledge, identify peers who are having similar issues, or notify the instructor based on the student's interactions that a student needs assistance.

Conclusions

The purpose of this exercise was to identify how difficult it would be to deploy some Machine Learning capabilities on the Internet using AWS technology. I was tremendously surprised by how easy it was to get something up and running as quickly as I did, and I identified fairly quickly which areas would require deep technical expertise and which areas were simpler. It would be naive to say that good developers are not required for this type of challenge. Developers are still required to understand the technology and write code. However, developers won’t have to write everything, just the interesting parts.

These are the areas in this project that required experienced developers:

  • Converting course content from PDF or ePub into other formats.
  • Using Machine Learning capabilities within Sagemaker.
  • Creating lambda functions that can call external services.
  • Integrating a chatbot into an existing website seamlessly.

Andrew and I spent about two weeks on this and were very pleased with the results. Although the result is not quite polished enough to deploy to production, it would not be very difficult to make it so. The next step would be to identify more use cases for an automated assistant and evolve it over time. With the infrastructure created and already in place, that becomes a very manageable task.

References

Amazon Web Services (AWS) Lex – a chatbot toolkit for your web and mobile applications - https://aws.amazon.com/lex/

OpenStax – Open Source Textbooks used as references for parsing and querying - https://openstax.org/

Latent Dirichlet Allocation (LDA)– Machine Learning algorithm for creating a topic model - https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

Spring Boot – open source framework for easily creating stand-alone web applications - http://spring.io/projects/spring-boot

AWS Lex Integration Example – gives code sample demonstrations of integrating a Lex Chatbot into your web application - https://aws.amazon.com/blogs/machine-learning/greetings-visitor-engage-your-web-users-with-amazon-lex/

Apache PDFBox - open source PDF manipulation library in Java - https://pdfbox.apache.org/

Apache OpenNLP - open source Natural Language Processing library in Java - https://opennlp.apache.org/

Apache Commons CSV - open source CSV processing library in Java - https://commons.apache.org/proper/commons-csv/

Apache Tika - a content analysis library in Java - https://tika.apache.org/

epublib - a Github library for reading and writing epub file formats in Java - https://github.com/psiegman/epublib

FasterXML/Jackson - a library for manipulating JSON structures in Java - https://github.com/FasterXML/jackson

SciKit-Learn - Python library implementing Machine Learning algorithms - https://scikit-learn.org/stable/index.html

Jupyter Notebooks - framework for easily sharing and running Python scripts and documentation - https://jupyter.org/

 

 

Michael Gillian photo

Michael Gillian

Senior Software Engineer

Michael Gillian has been a Senior Software Engineer with Unicon since 2011.  He has been in the IT Industry since 1991, designing, developing, deploying, sustaining, and managing technical systems and teams.  He has worked across multiple industries, including Education, Wholesale, Healthcare, Manufacturing, and Hospitality.

While at Unicon, Michael has participated in many projects.  Most recently, he has supported California Community College and their efforts to streamline the engagement process for potential students (MyPath).  He has implemented a Drools Rules Engine in support of creating customized engagement paths.  He has implemented a Recommendations Engine using Apache Spark.  He has implemented an Analytics component to capture information as the student navigates the engagement process.  All of these technologies have been built using AWS components, including S3, EC2, ECS, EMR, Kinesis Firehose, RDS, Lambda, et al.  In addition, Michael has supported both uPortal and Sakai for various customers, and has done work on Kaltura for image and video embedding.  Michael has considerable experience developing Spring applications with focus on back-end processes.

Michael has an MBA from the University of Phoenix and a BS in Information and Computer Science from University of California, Irvine.