Safe and responsible AI
Last updated: October 27, 2023
At Making Waves Education Foundation, we envision a future where ethically designed and unbiased artificial intelligence (AI) can revolutionize education by providing tailored support to underrepresented students, fostering an inclusive learning environment, and empowering them to achieve their full potential in college and beyond.
The approach we use to develop and deploy safe and responsible AI resources is grounded in principles that align with our commitment to educational equity and the well-being of our diverse students.
Purpose and Values Alignment
We ensure that our AI resources align with our mission and values, focusing on educational equity, accessibility, and support for historically underrepresented and underserved students.
Inclusivity and Fairness
We design and develop AI resources that promote inclusivity and fairness, avoiding biases that may discriminate against certain groups of students based on race, gender, socioeconomic background, or any other protected characteristic.
Privacy and Data Protection
We protect the privacy and personal information of students, families, staff, and community members. We implement strong data governance policies and practices to ensure the responsible collection, storage, and use of data.
Transparency and Explainability
We ensure that our AI resources and algorithms are transparent so that community members can understand how the technology impacts students and the education process. We provide clear explanations for AI-driven decisions and recommendations.
Accountability and Responsibility
We establish clear lines of accountability and responsibility for the development, deployment, and oversight of AI resources. This includes assigning roles and responsibilities to specific individuals or teams within our organization and providing appropriate training and support.
Collaboration and Partnership
We collaborate with other educational institutions, organizations, and experts in the AI field to share knowledge, best practices, and resources. We foster partnerships to continuously improve AI resources and promote ethical AI use throughout the education sector.
Continuous Improvement and Monitoring
We regularly review and update our AI resources, policies, and practices to ensure they remain effective, ethical, and relevant. We monitor the impact of AI on students and the education process and make necessary adjustments to address any unintended consequences or emerging ethical concerns.
Empowerment and Agency
We empower students, their families, staff, and community members by providing them with the necessary information, tools, and resources to understand and actively engage with AI resources. We respect the agency of individuals to make informed decisions about their educational journey.
Accessibility and Universal Design
We design AI resources that are accessible to all users, including those with disabilities or special needs, in line with the principles of universal design. We ensure that AI resources do not create or exacerbate existing barriers to education for any student.
Long-term Impact and Sustainability
We consider the long-term impact and sustainability of AI resources on the education sector, the environment, and society at large. We strive to create AI resources that contribute to a more equitable, inclusive, and sustainable future for all students.
AI resource transparency
24/7 chatbot for college and career exploration
Our 24/7 chatbot for college and career exploration works by using an advanced language model from OpenAI called GPT-3.5-Turbo to answer any questions about college and career. Large language models like GPT-3.5-Turbo are developed by training them on massive amounts of text from the internet, helping them learn grammar, facts, and reasoning abilities.
When a person sends a question to our chatbot, the language model processes it and generates a relevant response based on its training. The more information a person provides, the more accurate and helpful the answer will be.
In addition to answering questions, our chatbot also sends “nudges,” or check-in texts, to its users. These messages are tailored to a user’s goals and are written by human experts. They can include reminders, tips, and other useful information related to college and career exploration.
Wave-Maker Success Framework articles
We developed articles based on our Wave-Maker Success Framework by utilizing the knowledge, insights, and experiences of college coaches, financial services coordinators, and Wave-Makers. We also incorporated key references from research, higher education standards, and career readiness frameworks.
With this information, we partnered with Project Evident to refine the framework and align it with our program priorities. Finally, we used artificial intelligence to generate articles which were then reviewed, edited, and revised by our organization to ensure accuracy and relevance.
We implement stringent safety standards, including employing mitigation tools and best practices for responsible use, while vigilantly monitoring AI resources to prevent misuse.
Our safety standards align with trust and safety guidelines from OpenAI.
OpenAI Moderation API
Making Waves Education Foundation employs a Moderation API from OpenAI to minimize the occurrence of unsafe content in AI-generated completions through our chatbot. We are in the early stages of developing a custom content filtration system to complement our current Moderation API.
We conduct “red-teaming” on our chatbot to ensure its resilience against adversarial input. We test our product with a broad spectrum of inputs and user behaviors, including both representative sets and those that may attempt to ‘break’ the application. We assess if it strays off-topic or if it can be easily redirected through prompt injections.
Human in the Loop (HITL) approach
We have human reviewers examine AI-generated outputs, including regular examinations of outputs through our chatbot. Our human reviewers are informed about the limitations of the AI models used and have access to all necessary information to verify outputs, including relying on their professional expertise.
We use “prompt engineering” on our chatbot to constrain the topic and tone of the AI-generated outputs, reducing the likelihood of producing undesired content. By providing additional context to the mode, we can better steer the AI-generated outputs in the desired direction.
“Know your customer” (KYC) measures
We require users to register to access our chatbot to reduce the likelihood of misuse.
Constraints on the amount of text
We limit the amount of text users can send and receive to prevent malicious prompt injection and to reduce the likelihood of misuse.
Validated materials for outputs
Currently, the outputs from our AI model are generated using novel content. We are in the early stages of “fine-tuning” the model so that it returns outputs from a validated set of materials on the backend, where possible.
We enable users to report improper functionality or concerns about application behavior easily through email. The inbox is monitored by a human who can respond appropriately.
Understanding and communicating limitations
We are aware of the limitations of language models, such as inaccurate information, offensive outputs, bias, and more. We communicate these limitations to our users through a disclosure at sign-up, as well as a micro-course we developed to promote safe and responsible use of AI. We carefully evaluate if the We are aware of the limitations of language models, such as inaccurate information, offensive outputs, bias, and more. We communicate these limitations to our users through a disclosure at sign-up, as well as a micro-course we developed to promote safe and responsible use of AI. We carefully evaluate if the AI models we use are appropriate for our use case and assess its performance across various inputs to identify potential performance drops.
Making Waves Education Foundation uses the Moderation API from OpenAI to identify content that violates our usage policy and take action, for instance by filtering it.
The OpenAI Moderation API classifies and acts on the following categories
|hate||Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.|
|hate/threatening||Hateful content that also includes violence or serious harm towards the targeted group.|
|self-harm||Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.|
|sexual||Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).|
|sexual/minors||Sexual content that includes an individual who is under 18 years old.|
|violence||Content that promotes or glorifies violence or celebrates the suffering or humiliation of others.|
|violence/graphic||Violent content that depicts death, violence, or serious physical injury in extreme graphic detail.|
Disallowed usage policy
Making Waves Education Foundation has a policy for disallowed usage of its AI resource to ensure ethical, safe, and responsible use of the technology while preventing potential harm or exploitation of individuals and communities.
Our disallowed usage policy aligns with trust and safety guidelines from OpenAI.
We prohibit the use of our AI model for the following:
- We prohibit the use of our large language model for illegal activity.
Child Sexual Abuse Material or any content that exploits or harms children
- OpenAI, the maker of our large language model, reports CSAM to the National Center for Missing and Exploited Children.
Generation of hateful, harassing, or violent content
- Content that expresses, incites, or promotes hate based on identity
- Content that intends to harass, threaten, or bully an individual
- Content that promotes or glorifies violence or celebrates the suffering or humiliation of others
Generation of malware
- Content that attempts to generate code that is designed to disrupt, damage, or gain unauthorized access to a computer system.
Activity that has high risk of physical harm, including:
- Weapons development
- Military and warfare
- Management or operation of critical infrastructure in energy, transportation, and water
- Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders
Activity that has high risk of economic harm, including:
- Multi-level marketing
- Payday lending
- Automated determinations of eligibility for credit, employment, educational institutions, or public assistance services
Fraudulent or deceptive activity, including:
- Coordinated inauthentic behavior
- Academic dishonesty
- Astroturfing, such as fake grassroots support or fake review generation
Adult content, adult industries, and dating apps, including:
- Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness)
- Erotic chat
Political campaigning or lobbying, by:
- Generating high volumes of campaign materials
- Generating campaign materials personalized to or targeted at specific demographics
- Building conversational or interactive systems such as chatbots that provide information about campaigns or engage in political advocacy or lobbying
- Building products for political campaigning or lobbying purposes
Activity that violates people’s privacy, including:
- Tracking or monitoring an individual without their consent
- Facial recognition of private individuals
- Classifying individuals based on protected characteristics
- Using biometrics for identification or assessment
- Unlawful collection or disclosure of personal identifiable information or educational, financial, or other protected records
Engaging in the unauthorized practice of law, or offering tailored legal advice without a qualified person reviewing the information
- Our model is not fine-tuned to provide legal advice. You should not rely on our model as a sole source of legal advice.
Offering tailored financial advice without a qualified person reviewing the information
- Our model is not fine-tuned to provide financial advice. You should not rely on our model as a sole source of financial advice.
Telling someone that they have or do not have a certain health condition, or providing instructions on how to cure or treat a health condition
- Our model is not fine-tuned to provide medical information. You should never use our models to provide diagnostic or treatment services for serious medical conditions.
- Our model should not be used to triage or manage life-threatening issues that need immediate attention.
High risk government decision-making, including:
- Law enforcement and criminal justice
- Migration and asylum
97.3% AI Accuracy – Study Conducted April 2023
Our team conducted a study to evaluate the accuracy of the large language model that we used in production from January 1 to April 11, 2023: the “text-davinci-003” variation of GPT-3 from OpenAI. We analyzed de-identified text message logs from January 1 to April 11, 2023, and found that our AI model produced 854 out of the total 4879 messages. A human reviewer checked these AI-generated responses and found that 831 of them were correct answers in response to users’ requests.
The AI model made a few mistakes, including providing incorrect information (12 instances) having hallucinations where it thought it was a real person (9 instances), and generating factual responses to inappropriate requests prompted by users (2 instances).
However, we have upgraded our AI model and have added improved safety features to address these issues and improve its accuracy. Specifically, the current AI model in production, “GPT-3.5-Turbo,” can admit its mistakes, challenge incorrect premises, reject inappropriate requests, and refer to itself as an AI language model. Additionally, we have developed a micro-course for users to teach safe and responsible use of AI as part of sign-up, which highlights when to use AI, when to consult a trusted person, and when to verify information like deadlines and requirements with a primary source.
There were also 39 cases where the AI model gave irrelevant or incoherent answers, but we didn’t count these against its accuracy because they were in response to incomplete or unclear user prompts.
Overall, after evaluating these results, we determined that our AI model had an accuracy rate of 97.3%. We will continue monitoring the performance of our newly upgraded model and make further improvements as necessary.