Safe and responsible AI

Last updated: October 27, 2023

Purpose-driven innovation

At Making Waves Education Foundation, we envision a future where ethically designed and unbiased artificial intelligence (AI) can revolutionize education by providing tailored support to underrepresented students, fostering an inclusive learning environment, and empowering them to achieve their full potential in college and beyond.

Our principles

The approach we use to develop and deploy safe and responsible AI resources is grounded in principles that align with our commitment to educational equity and the well-being of our diverse students.

Purpose and Values Alignment

We ensure that our AI resources align with our mission and values, focusing on educational equity, accessibility, and support for historically underrepresented and underserved students.

Inclusivity and Fairness

We design and develop AI resources that promote inclusivity and fairness, avoiding biases that may discriminate against certain groups of students based on race, gender, socioeconomic background, or any other protected characteristic.

Privacy and Data Protection

We protect the privacy and personal information of students, families, staff, and community members. We implement strong data governance policies and practices to ensure the responsible collection, storage, and use of data.

Transparency and Explainability

We ensure that our AI resources and algorithms are transparent so that community members can understand how the technology impacts students and the education process. We provide clear explanations for AI-driven decisions and recommendations.

Accountability and Responsibility

We establish clear lines of accountability and responsibility for the development, deployment, and oversight of AI resources. This includes assigning roles and responsibilities to specific individuals or teams within our organization and providing appropriate training and support.

Collaboration and Partnership

We collaborate with other educational institutions, organizations, and experts in the AI field to share knowledge, best practices, and resources. We foster partnerships to continuously improve AI resources and promote ethical AI use throughout the education sector.

Continuous Improvement and Monitoring

We regularly review and update our AI resources, policies, and practices to ensure they remain effective, ethical, and relevant. We monitor the impact of AI on students and the education process and make necessary adjustments to address any unintended consequences or emerging ethical concerns.

Empowerment and Agency

We empower students, their families, staff, and community members by providing them with the necessary information, tools, and resources to understand and actively engage with AI resources. We respect the agency of individuals to make informed decisions about their educational journey.

Accessibility and Universal Design

We design AI resources that are accessible to all users, including those with disabilities or special needs, in line with the principles of universal design. We ensure that AI resources do not create or exacerbate existing barriers to education for any student.

Long-term Impact and Sustainability

We consider the long-term impact and sustainability of AI resources on the education sector, the environment, and society at large. We strive to create AI resources that contribute to a more equitable, inclusive, and sustainable future for all students.

AI resource transparency

24/7 chatbot for college and career exploration

Our 24/7 chatbot for college and career exploration works by using an advanced language model from OpenAI called GPT-3.5-Turbo to answer any questions about college and career. Large language models like GPT-3.5-Turbo are developed by training them on massive amounts of text from the internet, helping them learn grammar, facts, and reasoning abilities.

When a person sends a question to our chatbot, the language model processes it and generates a relevant response based on its training. The more information a person provides, the more accurate and helpful the answer will be.

In addition to answering questions, our chatbot also sends “nudges,” or check-in texts, to its users. These messages are tailored to a user’s goals and are written by human experts. They can include reminders, tips, and other useful information related to college and career exploration.

Wave-Maker Success Framework articles

We developed articles based on our Wave-Maker Success Framework by utilizing the knowledge, insights, and experiences of college coaches, financial services coordinators, and Wave-Makers. We also incorporated key references from research, higher education standards, and career readiness frameworks. 

With this information, we partnered with Project Evident to refine the framework and align it with our program priorities. Finally, we used artificial intelligence to generate articles which were then reviewed, edited, and revised by our organization to ensure accuracy and relevance.

Safety standards

We implement stringent safety standards, including employing mitigation tools and best practices for responsible use, while vigilantly monitoring AI resources to prevent misuse. 

Our safety standards align with trust and safety guidelines from OpenAI.

OpenAI Moderation API

Making Waves Education Foundation employs a Moderation API from OpenAI to minimize the occurrence of unsafe content in AI-generated completions through our chatbot. We are in the early stages of developing a custom content filtration system to complement our current Moderation API. 

Adversarial testing

We conduct “red-teaming” on our chatbot to ensure its resilience against adversarial input. We test our product with a broad spectrum of inputs and user behaviors, including both representative sets and those that may attempt to ‘break’ the application. We assess if it strays off-topic or if it can be easily redirected through prompt injections.

Human in the Loop (HITL) approach

We have human reviewers examine AI-generated outputs, including regular examinations of outputs through our chatbot. Our human reviewers are informed about the limitations of the AI models used and have access to all necessary information to verify outputs, including relying on their professional expertise.

Prompt engineering

We use “prompt engineering” on our chatbot to constrain the topic and tone of the AI-generated outputs, reducing the likelihood of producing undesired content. By providing additional context to the mode, we can better steer the AI-generated outputs in the desired direction. 

“Know your customer” (KYC) measures

We require users to register to access our chatbot to reduce the likelihood of misuse. 

Constraints on the amount of text

We limit the amount of text users can send and receive to prevent malicious prompt injection and to reduce the likelihood of misuse. 

Validated materials for outputs

Currently, the outputs from our AI model are generated using novel content. We are in the early stages of “fine-tuning” the model so that it returns outputs from a validated set of materials on the backend, where possible.

Reporting mechanism

We enable users to report improper functionality or concerns about application behavior easily through email. The inbox is monitored by a human who can respond appropriately.

Understanding and communicating limitations

We are aware of the limitations of language models, such as inaccurate information, offensive outputs, bias, and more. We communicate these limitations to our users through a disclosure at sign-up, as well as a micro-course we developed to promote safe and responsible use of AI. We carefully evaluate if the We are aware of the limitations of language models, such as inaccurate information, offensive outputs, bias, and more. We communicate these limitations to our users through a disclosure at sign-up, as well as a micro-course we developed to promote safe and responsible use of AI. We carefully evaluate if the AI models we use are appropriate for our use case and assess its performance across various inputs to identify potential performance drops. 

Content moderation

Making Waves Education Foundation uses the Moderation API from OpenAI to identify content that violates our usage policy and take action, for instance by filtering it.

The OpenAI Moderation API classifies and acts on the following categories

CATEGORYDESCRIPTION
hateContent that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
hate/threateningHateful content that also includes violence or serious harm towards the targeted group.
self-harmContent that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
sexualContent meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minorsSexual content that includes an individual who is under 18 years old.
violenceContent that promotes or glorifies violence or celebrates the suffering or humiliation of others.
violence/graphicViolent content that depicts death, violence, or serious physical injury in extreme graphic detail.

Disallowed usage policy

Making Waves Education Foundation has a policy for disallowed usage of its AI resource to ensure ethical, safe, and responsible use of the technology while preventing potential harm or exploitation of individuals and communities. 

Our disallowed usage policy aligns with trust and safety guidelines from OpenAI.

We prohibit the use of our AI model for the following:

Illegal activity

Child Sexual Abuse Material or any content that exploits or harms children

Generation of hateful, harassing, or violent content

Generation of malware

Activity that has high risk of physical harm, including:

Activity that has high risk of economic harm, including:

Fraudulent or deceptive activity, including:

Adult content, adult industries, and dating apps, including:

Political campaigning or lobbying, by:

Activity that violates people’s privacy, including:

Engaging in the unauthorized practice of law, or offering tailored legal advice without a qualified person reviewing the information

Offering tailored financial advice without a qualified person reviewing the information

Telling someone that they have or do not have a certain health condition, or providing instructions on how to cure or treat a health condition

High risk government decision-making, including:

Performance evaluation

97.3% AI Accuracy – Study Conducted April 2023

Our team conducted a study to evaluate the accuracy of the large language model that we used in production from January 1 to April 11, 2023: the “text-davinci-003” variation of GPT-3 from OpenAI. We analyzed de-identified text message logs from January 1 to April 11, 2023, and found that our AI model produced 854 out of the total 4879 messages. A human reviewer checked these AI-generated responses and found that 831 of them were correct answers in response to users’ requests.

The AI model made a few mistakes, including providing incorrect information (12 instances) having hallucinations where it thought it was a real person (9 instances), and generating factual responses to inappropriate requests prompted by users (2 instances).

However, we have upgraded our AI model and have added improved safety features to address these issues and improve its accuracy. Specifically, the current AI model in production, “GPT-3.5-Turbo,” can admit its mistakes, challenge incorrect premises, reject inappropriate requests, and refer to itself as an AI language model. Additionally, we have developed a micro-course for users to teach safe and responsible use of AI as part of sign-up, which highlights when to use AI, when to consult a trusted person, and when to verify information like deadlines and requirements with a primary source.

There were also 39 cases where the AI model gave irrelevant or incoherent answers, but we didn’t count these against its accuracy because they were in response to incomplete or unclear user prompts.

Overall, after evaluating these results, we determined that our AI model had an accuracy rate of 97.3%. We will continue monitoring the performance of our newly upgraded model and make further improvements as necessary.