ethics in data science case study

  • Crisis Data: An Ethics Case Study
  • Markkula Center for Applied Ethics
  • Focus Areas
  • Internet Ethics
  • Internet Ethics Resources

ethics in data science case study

Crisis Data

An ai ethics case study.

Ethical questions about data collection, data-sharing, access, use, and privacy.

ethics in data science case study

"Depression please cut to the chase." by darcyadelaide is marked with CC BY 2.0.

" Depression please cut to the chase. " by darcyadelaide is marked with CC BY 2.0 .

In January 2022, Politico published an article about a nonprofit called Crisis Text Line , which offers support via text messages for people who are going through mental health crises. For years, the nonprofit had been collecting a database of messages exchanged, and used the data to triage the incoming calls for help and to train its volunteers to better manage their difficult conversations with people in great distress. In a 2020 report , the nonprofit (which first launched in 2013) stated that “[b]y implementing data science tools and machine learning from day one, [it had] created the largest mental health dataset in the world.” A report section titled “Data Philosophy” added, “we share data to support smarter research, policy, and community organizing. Unlike other large-scale data sets on mental health and crisis, our data has incredible volume, velocity, and variety.”

As Politico reported, in 2018 the nonprofit also launched a for-profit spinoff called Loris.ai, which planned to use Crisis Text Line data (which it said was anonymized) to gain insights that would then be incorporated into customer-support software products. The plan was for a portion of the profits from that software to then be shared with the Crisis Text Line.

The Politico article sparked a great deal of criticism of that data-sharing agreement. Some critics were concerned that the data might still be traceable back to individuals who could then be stigmatized or otherwise harmed by being “outed” as dealing with severe mental health issues. Others argued that even anonymized data should not be used in ways that the people who texted in would not have anticipated—in other words, for purposes distinct from helping them directly. When the organization responded that its data-sharing agreement was disclosed to users (whose first text is answered by an automated reply that reads “By texting further with us, you agree to our Terms” and links to a 50-page agreement), critics questioned whether the mere act of users following through, under such circumstances, could be deemed to be “actual meaningful, emotional, fully understood consent.”

Some of the Crisis Text Line volunteers were greatly concerned by the secondary use of the data collected by the nonprofit, and raised those concerns both internally and externally . Once a petition was organized, demanding an end to the data sharing agreement, other volunteers expressed shock that they had not even been aware of the for-profit effort.

A few days after the Politico article was published, Crisis Text Line announced that it was ending the data-sharing agreement with Loris.ai. In a subsequent personal blog post responding to the controversy, researcher danah boyd , who had been a founding board member of CTL and had served as its board chair for some time, explained her thinking and her actions regarding the controversial arrangement. “Since my peers are asking for this to be a case study in tech ethics, I am going into significant detail,” she wrote. 

Part of it highlights one of the questions that arose early on in the development of the organization: “could we construct our training so that all counselors got to learn from the knowledge developed by those who came before them? This would mean using texter data for a purpose that went beyond the care and support of that individual.” boyd writes,

Yes, the Terms of Service allowed this, but this is not just a legal question; it’s an ethical question. Given the trade-offs, I made a judgment call early on that not only was using texter data to strengthen training of counselors without their explicit consent ethical, but that to not do this would be unethical. Our mission is clear: help people in crisis. To do this, we need to help our counselors better serve texters. We needed to help counselors learn and grow and develop skills with which they can help others.

The post continues, discussing additional challenges related to scaling access to the service, triage of incoming texts, the need for funding, and the desire to facilitate important research. After noting that she struggled with the question of sharing data with the for-profit entity, boyd states that she ultimately did vote in favor of it. She adds, “Knowing what I know now, I would not have.”

The blog post ends with a call for input: “I also have some honest questions,” boyd writes, “for all of you who are frustrated, angry, disappointed, or simply unsure about us.” Among those questions: “What is the best way to balance the implicit consent of users in crisis with other potentially beneficial uses of data which they likely will not have intentionally consented to but which can help them or others?” She also asks, “Is there any structure in which lessons learned from a non-profit service provider can be transferred to a for-profit entity? Also, how might this work with partner organizations, foundations, government agencies, sponsors, or subsidiaries, and are the answers different?”

Discussion questions

Before answering these questions, please review the Markkula Center for Applied Ethics’ Framework for Ethical Decision-Making , which details the ethical lenses discussed below.

  • Who are the stakeholders involved in this case?
  • Consider the case through the lenses of rights, justice, utilitarianism, the common good, virtue, and care ethics; what aspects of the ethical landscape do they highlight?
  • What would you say in answer to the questions posed by danah boyd, quoted above?

ethics in data science case study

Irina Raicu, director, Internet Ethics Program, quoted by NBC Bay Area.

Logo for conference

An upcoming conference offers insights and challenges related to AI's environmental impact.

KTVU Fox 2

Ann Skeet, senior director, leadership ethics, interviewed by KTVU Fox 2.

New Atlan Named a Leader in The Forrester Wave™: Enterprise Data Catalogs, Q3 2024. Read Full Report Learn More

The Forrester Wave™: Enterprise Data Catalogs for DataOps, Q2 2022

altText

  • Document hundreds of tables on autopilot
  • Explore data with natural language
  • Ask any question about your data stack to your personal AI copilot.

Start integrating with Atlan on the go

altText

The role of active metadata in the modern data stack

altText

A deep dive into the 10 data trends you should know

altText

May 24, 2022

altText

May 10, 2023

altText

Feb 02, 2022

altText

Join over 5k data leaders from companies like Amazon, Apple, and Spotify who subscribe to our weekly newsletter.

altText

Best practices for building a collaborative data culture

7 Real-World Data Ethics Examples You Need to Know in 2024

Share this article

Data ethics is the branch of ethics that addresses the generation, collection, sharing, and use of data. It considers how data practices respect values like privacy, fairness, and transparency, as well as the balance between individual rights and societal benefits.

See How Atlan Simplifies Data Governance – Start Product Tour

Data ethics is concerned with moral obligations and issues related to personally identifiable information (PII) and its potential impacts on individuals and society at large.

Data ethics challenges us to ask questions like, “ Is this the right thing to do? ” and “ Can we do better? ”

In this article, we will learn what is data ethics and what are the best examples of data ethics you need to be aware of. Let’s dive in!

Table of contents #

  • Real-world examples of data ethics
  • Reasons why data ethics matter
  • Principles of data ethics
  • 3 Noteworthy examples of unethical data practices
  • Data ethics: Related reads

7 Real-world examples of data ethics in practice #

Data ethics can be challenging to grasp in theory, but real-world examples can offer valuable insights. So, here are seven significant instances where data ethics played a pivotal role:

  • Apple’s commitment to privacy
  • IBM’s AI ethics
  • Microsoft’s data governance
  • GDPR and data protection
  • Facebook and Cambridge Analytica scandal
  • Project Nightingale and Google
  • Toronto’s Sidewalk Labs

Let’s understand each one of them in detail.

1. Apple’s Commitment to Privacy #

Apple has long positioned itself as a privacy-focused company. Its privacy policy underlines the principles of data minimization, on-device processing, user transparency and control. These principles demonstrate how a global tech giant implements data ethics.

Besides, it consistently emphasizes its dedication to user privacy. The company minimizes personal data collection, processes much of the data on the user’s device instead of in the cloud, provides transparency reports, and gives the user significant control over their data. These practices have led to Apple being lauded as a model for privacy, a cornerstone of data ethics.

2. IBM’s AI Ethics #

IBM’s commitment to AI ethics is visible in their principles of transparency and explainability in AI, which states that AI systems should be transparent, and the decision-making process of AI should be explainable.

They also stress the importance of removing bias from AI systems to ensure fairness and impartiality. The company’s AI Ethics Policy reflects its commitment to data ethics.

3. Microsoft’s Data Governance #

Microsoft demonstrates its ethical approach to data management through rigorous data governance. Microsoft’s privacy policy includes principles like accountability, transparency, and user control, showcasing an ethical approach to data management.

The company takes responsibility for personal data protection, provides transparent privacy policies, and gives users control over their data, allowing them to view, edit, download, and delete their information.

4. GDPR and Data Protection #

The European Union’s General Data Protection Regulation ( GDPR ) by the EU ensures strong data protection rights for individuals. This law itself is an example of data ethics in practice.

GDPR embodies data ethics by granting individuals the right to know what data is collected about them, why it is collected, and where it is stored. It requires organizations to protect personal data and to notify authorities of data breaches within 72 hours.

Caveat: Now, let’s dive into the after effects of what happens when data ethics aren’t followed properly.

5. Facebook and Cambridge Analytica Scandal #

On the flip side, the Facebook and Cambridge Analytica scandal showed what happens when data ethics are not followed. The misuse of data impacted millions and led to widespread backlash and regulatory scrutiny. Wait, when did it happen?

In 2018, the revelation that Cambridge Analytica had harvested the personal data of millions of Facebook users without their consent resulted in significant fallout. The scandal highlighted the devastating consequences of unethical data practices and led to calls for stricter data regulations worldwide.

6. Project Nightingale and Google #

Criticism surrounded Google’s Project Nightingale due to its acquisition of healthcare data from millions of Americans without obtaining their consent, drawing attention to the critical aspect of informed consent in data ethics.

The project faced significant backlash as it became public knowledge that Google was gathering healthcare data from a vast number of individuals without their knowledge or explicit approval. This incident prompted widespread discussions on the ethical considerations of data collection and processing within the healthcare industry, underscoring the essential requirement for transparent and unequivocal consent in all data practices.

7. Toronto’s Sidewalk Labs #

Alphabet’s Sidewalk Labs had ambitious plans to create a data-driven “smart city” in Toronto. However, the project faced significant opposition due to worries about data privacy and the possibility of surveillance .

As a result, the project was eventually scrapped, serving as a poignant reminder of the necessity to strike a delicate balance between technological advancement and ethical handling of data. This incident highlighted the growing importance of considering ethical considerations alongside innovation in such ventures.

Nugget: These examples underline the importance of data ethics in diverse scenarios and the impact they can have on individuals and society. Each case teaches us about ethics—what to do or avoid doing. It shows how important data ethics are in our data-focused world.

The successful implementation of data ethics principles can build trust and promote fairness, while negligence can lead to significant consequences.

5 Reasons why data ethics matter in 2024 #

To understand why data ethics is important, we need to reflect on the critical role data plays in today’s society. Data is not just information; it is power. It can shape behaviors, influence decisions, and even define the course of our lives. Now, let’s dive into the five reasons why data ethics is so significant.

  • Protection of personal privacy
  • Transparency and trust
  • Equitable decision-making
  • Regulatory compliance
  • Social responsibility

Let’s look at each one of them in detail.

1. Protection of personal privacy #

With the vast amounts of data being collected every second, individuals’ privacy is at risk. Data ethics provides a guideline for maintaining the integrity and confidentiality of this data, ensuring that personal information is not misused.

2. Transparency and trust #

Companies and institutions that adhere to good data ethics are more transparent about their data practices. This openness builds trust with customers, employees, and stakeholders, which is essential for long-term success.

3. Equitable decision making #

When data is used to drive decisions - be it in healthcare, finance, or law enforcement - ethical considerations ensure that the algorithms and models don’t perpetuate discrimination or bias.

4. Regulatory compliance #

Understanding and adhering to data ethics can help organizations navigate the complex landscape of data regulations. Non-compliance can lead to severe legal and financial repercussions.

5. Social responsibility #

In the era of big data, organizations have a social responsibility to handle data ethically. They must balance their pursuit of innovation and profit with respect for individuals’ rights and societal well-being.

In short, data ethics is the moral compass guiding us through the digital age. By following its principles, we can ensure that the vast amounts of data generated every day are used responsibly, equitably, and for the greater good.

What are the principles of data ethics? #

As businesses become increasingly data-driven, they must be mindful of the ethical implications of their data practices. Here are six fundamental principles of data ethics that are crucial for modern businesses:

  • Transparency
  • Consent and control
  • Privacy and security
  • Fairness and non-discrimination
  • Data minimization
  • Accountability

Let us look into each of the above principles of data ethics in brief:

1. Transparency #

Transparency is about being open and clear about the data collection and processing practices. Businesses should inform customers about what data they are collecting, how they are using it, and with whom it might be shared.

2. Consent and control #

Businesses must ensure they obtain explicit consent from individuals before collecting or processing their data. Moreover, individuals should have control over their own data, including the ability to access, modify, or delete it.

3. Privacy and security #

Safeguarding personal data is a critical ethical responsibility. Businesses must use secure methods to protect data from unauthorized access, breaches, or leaks, and respect the privacy of the individuals whose data they handle.

4. Fairness and non-discrimination #

The use of data and AI should not perpetuate discrimination or bias. Businesses should work to ensure that their algorithms and data practices are fair and equitable to all, and don’t inadvertently reinforce existing societal inequalities.

5. Data minimization #

Businesses should follow the principle of data minimization, which means gathering only the essential information required for a particular purpose. Excessive data collection can lead to increased risks of data breaches and privacy violations.

6. Accountability #

Businesses must be accountable for their data practices. If something goes wrong, such as a data breach or an AI system making a discriminatory decision, businesses must take responsibility and rectify the issue.

Incorporating these principles into their data practices can help businesses navigate the complex landscape of data ethics. By doing so, they can not only ensure regulatory compliance but also build trust with customers and stakeholders, creating a sustainable and ethical data culture.

Cautionary tales: 3 Noteworthy examples of unethical data practices #

The understanding of what constitutes data ethics can often be best illustrated by looking at the instances where these principles were breached. So, let’s examine three major examples of unethical data practices.

  • Equifax data breach
  • Yahoo data breach
  • Uber data breach cover-up

Let’s dive deeper into each one of them.

1. Equifax data breach #

In 2017, Equifax, one of the largest credit bureaus in the U.S., suffered a massive data breach that compromised the personal information (including Social Security numbers and driver’s license numbers) of approximately 147 million people.

Equifax was heavily criticized for its inadequate security measures and slow response to the breach. The incident raised serious questions about the ethics of data security and the responsibilities of companies to protect user data.

2. Yahoo data breach #

Yahoo suffered multiple data breaches between 2013 and 2014, affecting all of its 3 billion user accounts. The stolen user information included names, email addresses, dates of birth, security questions and answers, and hashed passwords.

The incident was made worse by Yahoo’s delayed disclosure, as the company did not publicly acknowledge the breach until 2016.

3. Uber data breach cover-up #

In 2016, Uber suffered a data breach that exposed the personal data of 57 million users and drivers. Rather than reporting the breach, Uber paid the hackers $100,000 to delete the data and keep the breach quiet.

This cover-up was a clear violation of data ethics, particularly transparency and accountability, and resulted in several lawsuits and damage to Uber’s reputation.

These examples highlight the disastrous consequences of neglecting data ethics. Not only do such breaches violate personal privacy and trust, but they can also lead to significant financial and reputational damage for the companies involved. These cases underline the critical importance of adhering to ethical principles in data handling and usage.

Recap: What have we learnt? #

Data ethics is critical in the digital era, encompassing the moral aspects of data generation, collection, and utilization. High-profile instances such as Apple’s privacy commitment and IBM’s AI ethics policy exemplify positive data ethics practices, ensuring privacy, transparency, and equitable decision-making.

Conversely, the Facebook-Cambridge Analytica scandal, Equifax’s data breach, and Google’s Project Nightingale underscore the dire consequences of neglecting ethical considerations, resulting in privacy violations, legal issues, and erosion of trust.

These examples collectively underline the pivotal role of data ethics in maintaining individual privacy, ensuring regulatory compliance, and upholding social responsibility in our increasingly data-centric world.

Data ethics: Related reads #

  • What is Data Literacy and Why is It Important?
  • Data Ethics Unveiled: Principles & Frameworks Explored
  • What Is a Data Catalog? & Why Do You Need One in 2024?
  • Data Catalog 101 Guide
  • What is Data Governance? Its Importance & Principles
  • What is Metadata? - Examples, Benefits, and Use Cases
  • Metadata Management: Benefits, Automation & Use Cases

Support our work

Data Ethics Case Studies

The Council for Big Data, Ethics, and Society has released three case studies (with more on the way) and has set a deadline of June 1, 2016 , for any new submissions to its call for cases .

1) The Ethics of Using Hacked Data: Patreon’s Data Hack and Academic Data Standards by Nathaniel Poor and Roei Davidson: Should researchers utilize hacked datasets that have been released in public forums? This case study discusses the ethical arguments for and against utilizing hacked crowdfunding data for academic research.

2) “It Was A Matter of Life and Death”: A YouTube Engineer’s Decision to Alter Data in the ‘It Gets Better Project’ by Laurie Honda: In this case study, a YouTube engineer contemplates whether to subvert engineering best practices to bypass storage capacity limits on videos created for the It Gets Better Project, which aims to prevent self-harm by LGBTQ youth.

3) No Encore for Encore? Ethical questions for web-based censorship measurement by Arvind Narayanan and Bendert Zevenbergen: This case study examines tricky ethical questions that arise when researchers co-opt Internet-connected devices as vantage points for data collection, without the knowledge or consent of the users of those devices.

Stay tuned for more case studies from the Council, and consider proposing one of your own. For the Council’s purposes, “A robust case study consists of a roughly 1,000-word description of narrative and background describing an actual situation faced by big data scientists or practitioners, along with collateral materials. It should be rich with context and be usable in a variety of instructional situations.” More information, including submission instructions, can be found in the full call for case studies in data ethics .

IMAGES

  1. 2.3 Using ethical concepts to analyze case studies

    ethics in data science case study

  2. Ethics in Data Science: Challenges and Best Practices

    ethics in data science case study

  3. Ethics in Data Science Projects: current Practices and Perceptions

    ethics in data science case study

  4. [Infographic] Big Data Ethics

    ethics in data science case study

  5. How to Promote Ethics in Data Science and Analytics

    ethics in data science case study

  6. Ethics in Data Science

    ethics in data science case study