gender bias experiment procedure

Using the Implicit Relational Assessment Procedure (IRAP) to Examine Implicit Gender Stereotypes in Science, Technology, Engineering and Maths (STEM)

ORIGINAL ARTICLE
Published: 28 May 2020
Volume 70 , pages 459–469, ( 2020 )

Cite this article

Katie Fleming 1 ,
Mairead Foody 2 &
Carol Murphy ORCID: orcid.org/0000-0001-6313-0409 1

1349 Accesses

9 Citations

10 Altmetric

Explore all metrics

Women are often subject to gender stereotyping in the fields of science, technology, engineering, and mathematics (STEM). The Implicit Relational Assessment Procedure (IRAP) was used to determine directionality of any implicit gender-STEM bias detected. In addition, the IRAP was used to explore the possibility of implicit ageism bias, because there is anecdotal evidence of high levels of ageism in the STEM areas. Thus two IRAPs (one with adult pictorial stimuli and one with child pictorial stimuli) were employed to assess implicit gender bias toward STEM with a sample of undergraduates ( N = 33). Results indicated a gender STEM bias in both IRAPs and the directionality in both IRAPs was pro-male and not anti-female. Participant gender was not shown to impact results in either IRAP. Gender bias effects were more pronounced in the Adult-IRAP results. Comparison of bias toward older versus young pictorial stimuli was exploratory thus findings are preliminary but may suggest ageism and potential negative interaction effects between age and gender warrant further research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

Examining the effectiveness of brief interventions to strengthen a positive implicit relation between women and STEM across two timepoints

Comparing implicit gender stereotypes between women and men with the implicit relational assessment procedure.

Social Context in a Collective IRAP Application about Gender Stereotypes: Mixed Versus Single Gender Groups

Barnes-Holmes, D., Barnes-Holmes, Y., Power, P., Hayden, E., Milne, R., & Stewart, I. (2006). Do you really know what you believe? Developing the Implicit Relational Assessment Procedure (IRAP) as a direct measure of implicit beliefs. The Irish Psychologist, 32 (7), 169–177.

Google Scholar

Barnes-Holmes, D., Barnes-Holmes, Y., Stewart, I., & Boles, S. (2010a). A sketch of the Implicit Relational Assessment Procedure (IRAP) and the Relational Elaboration and Coherence (REC) model. The Psychological Record, 60 (3), 527–542.

Article Google Scholar

Barnes-Holmes, D., Murtagh, L., Barnes-Holmes, Y., & Stewart, I. (2010b). Using the Implicit Association Test and the Implicit Relational Assessment Procedure to measure attitudes toward meat and vegetables in vegetarians and meat-eaters. The Psychological Record, 60 (2), 287–305.

Bench, S. W., Lench, H. C., Liew, J., Miner, K., & Flores, S. A. (2015). Gender gaps in overestimation of Math performance. Sex Roles, 72 , 536–546. https://doi.org/10.1007/s11199-015-0486-9 .

Bierbaum, E. G. (1988). Museum, arts, and humanities librarians: careers, professional development, and continuing education. Journal of Education for Library and Information Science, 29 (2), 127–134. https://doi.org/10.2307/40323567

Blažev, M., Karabegović, M., Burušić, J., & Selimbegović, L. (2017). Predicting gender-STEM stereotyped beliefs among boys and girls from prior school achievement and interest in STEM school subjects. Social Psychology of Education, 20 , 831–847.

Breiner, J. M., Harkness, S. S., Johnson, C. C., & Koehler, C. M. (2012). What is STEM? A discussion about conceptions of STEM in education and partnerships. School Science and Mathematics, 112 (1), 3–11. https://doi.org/10.1111/j.1949-8594.2011.00109.x .

Ceci, S. J., & Williams, W. M. (2011). Understanding current causes of women’s underrepresentation in science. Proceedings of the National Academy of Sciences of the United States of America, 108 , 3157–3162. https://doi.org/10.1073/pnas.1014871108 .

Article PubMed PubMed Central Google Scholar

Cullen, C., Barnes-Holmes, D., Barnes-Holmes, Y., & Stewart, I. (2009). The Implicit Relational Assessment Procedure (IRAP) and the malleability of ageist attitudes: Evidence for a goal congruity perspective. Journal of Personality & Social Psychology, 101 (5), 902–918. https://doi.org/10.1007/BF03395683 .

Dawson, D. L., Barnes-Holmes, D., Gresswell, D. M., Hart, A. J. P., & Gore, N. J. (2009). Assessing the implicit beliefs of sexual offenders using the Implicit Relational Assessment Procedure: A First Study. Sexual Abuse: A Journal of Research & Treatment, 21 (1), 57–75. https://doi.org/10.1177/2F1079063208326928 .

Farrell, L., & McHugh, L. (2017). Examining gender-STEM bias among STEM and non-STEM students using the implicit relational assessment procedure (IRAP). Journal of Contextual Behavioural Science, 6 (1), 80–90. https://doi.org/10.1016/j.jcbs.2017.02.001 .

Farrell, L., Cochrane, A., & McHugh, L. (2015). Exploring attitudes towards gender and science: The advantages of an IRAP approach versus the IAT. Journal of Contextual Behavioural Science, 4 (2), 121–128. https://doi.org/10.1016/j.jcbs.2015.04.002 .

Finn, M., Barnes-Holmes, D., Hussey, I., & Graddy, J. (2016). Exploring the behavioural dynamics of the implicit relational assessment procedure: The impact of three types of introductory rules. The Psychological Record, 66 , 309–321. https://doi.org/10.1007/s40732-016-0173-4 .

Gibb, S. J., Fergusson, D. M., & Horwood, L. J. (2008). Gender differences in educational achievement to age 25. Australian Journal of Education, 52 (1), 63–80. https://doi.org/10.1177/2F000494410805200105 .

Greenwald, A. G., McGhee, D. E., & Schwarz, J. L. K. (1998). Measuring individual differences in implicit cognition: The Implicit Association Test. Journal of Personality & Social Psychology, 74 , 1464–1480.

Grunspan, D. Z., Eddy, S. L., Brownell, S. E., Wiggins, B. L., Crowe, A. J., & Goodreau, S. M. (2016). Males under-estimate academic performance of their female peers in undergraduate biology classrooms. PloS One, 11 (2), 1–16. https://doi.org/10.1371/journal.pone.0148405

Gunderson, E. A., Ramirez, G., Levine, S., & Beilock, S. L. (2012). The role of parents and teachers in the development of gender-related math attitudes. Sex Roles, 66 , 153–166. https://doi.org/10.1007/s11199-011-9996-2 .

Handelsman, J., Cantor, N., Carnes, M., Denton, D., Fine, E., Grosz, B., et al. (2005). More women in Science. Science, 309 (5738), 1190–1191. https://doi.org/10.1126/science.1113252 .

Helwig, R., Anderson, L., & Tindal, G. (2001). Influence of elementary student gender on teachers' perceptions of mathematics achievement. Journal of Educational Research, 95 (2), 93–102. https://doi.org/10.1080/00220670109596577 .

Kogan, N., & Mills, M. (1992). Gender influences on age cognitions and preferences: sociocultural or sociobiological? Psychology & Aging, 7 , 98–106. https://doi.org/10.1037/0882-7974.7.1.98 . Accessed 16 Nov 2019.

Leslie, S.-J., Cimpian, A., Meyer, M., & Freeland, E. (2015). Expectations of brilliance underlie gender distributions across academic disciplines. Science, 347 (6219), 262–265. https://doi.org/10.1126/science.1261375 .

Article PubMed Google Scholar

Maloney, E., & Barnes-Holmes, D. (2016). Exploring the behavioural dynamics of the implicit relational assessment procedure: The role of relational contextual cues versus relational coherence indicators as response options. The Psychological Record, 66 (3), 395–403. https://doi.org/10.1007/s40732-016-0173-4 .

Minear, M., & Park, D. C. (2004). A lifespan database of adult facial stimuli. Behaviour Research Methods, Instruments, & Computers, 36 , 630–633. https://doi.org/10.3758/BF03206543 http://utdallas.box.com/v/facedatabase .

Moakler, M. W., & Kin, M. M. (2014). College major choice in STEM: Revisiting confidence and demographic factors. Career Development Quarterly, 62 , 128–142. https://doi.org/10.1002/j.2161-0045.2014.00075.x .

Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J., & Handelsman, J. (2012). Science faculty’s subtle gender biases favour male students. Proceedings of the National Academy of Sciences of the United States of America, 109 (41), 16474–16479. https://doi.org/10.1073/pnas.1211286109 .

Moss-Racusin, C. A., Sanzari, C., Caluori, N., & Rabasco, H. (2018). Gender bias produces gender gaps in STEM engagement. Sex Roles: A Journal of Research, 79 (11–12), 651–670. https://doi.org/10.1007/s11199-018-0902-z .

Newall, C., Gonsalkorale, K., Walker, E., Forbes, G. A., Highfield, K., & Sweller, N. (2018). Science education: Adult biases because of the child’s gender and gender stereotypicality. Contemporary Educational Psychology, 55 , 30–41. https://doi.org/10.1016/j.cedpsych.2018.08.003 .

Nolan, J., Murphy, C., & Barnes-Holmes, D. (2013). Implicit relational assessment procedure and body-weight bias: Influence of gender of participants and targets. The Psychological Record, 6 (3), 467–488. https://doi.org/10.11133/j.tpr.2013.63.3.005 .

Nosek, B. A., Smyth, F. L., Hansen, J. J., Devos, T., Lindner, N. M., Ranganath, K. A., et al. (2007). Pervasiveness and correlates of implicit attitudes and stereotypes. European Review of Social Psychology, 18 , 36–88. https://doi.org/10.1080/10463280701489053 http://www.projectimplicit.net/nosek/stimuli .

Nosek, B. A., Smyth, F. L., Sriram, N., Lindner, N. M., Devos, T., Ayala, A., et al. (2009). National differences in gender-science stereotypes predict national sex differences in science and math achievement. Proceedings of the National Academy of Sciences, 106 (26), 10593–10597. https://doi.org/10.1073/pnas.0809921106 .

O’Brien, L. T., Garcia, D. M., Adams, G., Villalobos, J. G., Hammer, E., & Gilbert, P. (2015). The threat of sexism in a STEM educational setting: The moderating impacts of ethnicity and legitimacy beliefs on test performance. Social Psychology of Education, 18 (4), 667–684. https://doi.org/10.1007/s11218-015-9310-1 .

Reilly, E. D., Rackley, K. R., & Awad, G. H. (2017). Perceptions of male and female STEM aptitude: The moderating effect of benevolent and hostile sexism. Journal of Career Development, 44 (2), 159–173. https://doi.org/10.1177/2F0894845316641514 .

Reuben, E., Sapienza, P., & Zingales, L. (2014). How stereotypes impair women’s careers in science. Proceedings of the National Academy of Sciences, 111 , 4403–4408. https://doi.org/10.1073/pnas.1314788111 .

Ritzert, T. R., Anderson, L. M., Reilly, E. E., Gorrell, S., Forsyth, J. P., & Anderson, D. A. (2016). Assessment of weight/shape implicit bias related to attractiveness, fear, and disgust. The Psychological Record, 66 (3), 405–417. https://doi.org/10.1007/s40732-016-0181-4 .

Robnett, R. D., & Leaper, C. (2012). Friendship groups, personal motivation, and gender in relation to high school students’ STEM career interest. Journal of Research on Adolescence, 23 (4), 652–664.

Sudman, S., & Bradburn, N. M. (1982). Asking questions . San Francisco, CA: Jossey-Bass.

World Economic Forum (2018). Global gender gap report 2018 . [online] Geneva: World Economic Forum. Retrieved from http://www3.weforum.org/docs/WEF_GGGR_2018.pdf . Accessed 20 Dec 2019.

Download references

Availability of Data and Materials

All data and materials used in this study are available on request from the corresponding author.

The second author is funded by the Marie Skłodowska-Curie Actions COFUND Collaborative Research Fellowships for a Responsive and Innovative Europe (CAROLINE).

Author information

Authors and affiliations.

National University of Ireland Maynooth University, Maynooth, Co Kildare, Ireland

Katie Fleming & Carol Murphy

Dublin City University, Dublin 9, Ireland

Mairead Foody

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carol Murphy .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Fleming, K., Foody, M. & Murphy, C. Using the Implicit Relational Assessment Procedure (IRAP) to Examine Implicit Gender Stereotypes in Science, Technology, Engineering and Maths (STEM). Psychol Rec 70 , 459–469 (2020). https://doi.org/10.1007/s40732-020-00401-6

Download citation

Published : 28 May 2020

Issue Date : September 2020

DOI : https://doi.org/10.1007/s40732-020-00401-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Gender-bias
Find a journal
Publish with us
Track your research

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Publications
Account settings
Advanced Search
Journal List

Gender bias in academia: a lifetime problem that needs solutions

Anaïs llorens, athina tzovara, ludovic bellier, ilina bhaya-grossman, aurélie bidet-caulet, william k chang, zachariah r cross, rosa dominguez-faus, adeen flinker, yvonne fonken, mark a gorenstein, chris holdgraf, colin w hoy, maria v ivanova, richard t jimenez, julia wy kam, celeste kidd, enitan marcelle, deborah marciano, stephanie martin, nicholas e myers, karita ojala, pedro pinheiro-chagas, stephanie k riès, ignacio saez, ivan skelin, katarina slama, brooke staveland, danielle s bassett, elizabeth a buffalo, adrienne l fairhall, nancy j kopell, anna c nobre, dylan riley, anne-kristin solbakk, joni d wallis, xiao-jing wang, shlomit yuval-greenberg, sabine kastner, robert t knight, nina f dronkers.

Author information
Copyright and License information

Corresponding authors Athina Tzovara, University of Bern, Institute for Computer Science, Neubrückstrasse 10, CH-3012 Bern, [email protected] ; Anaïs Llorens , University of California, Berkeley, Knight Lab, Helen Wills Neuroscience Institute, 210 Barker Hall, Berkeley, CA 94720, [email protected]

Equal contribution

Co-senior authors of this article

These authors contributed equally and are listed alphabetically

originating project laboratories

Despite increased awareness of the lack of gender equity in academia and a growing number of initiatives to address issues of diversity, change is slow and inequalities remain. A major source of inequity is gender bias, which has a substantial negative impact on the careers, work-life balance, and mental health of underrepresented groups in science. Here, we argue that gender bias is not a single problem but manifests as a collection of distinct issues that impact researchers' lives. We disentangle these facets and propose concrete solutions that can be adopted by individuals, academic institutions, and society.

Despite increased awareness of the lack of gender equity in academia, change is slow and inequalities remain. We disentangle the different aspects of gender bias impacting woman researchers throughout their lives. We expose the different issues and discuss potential solutions that can be adopted by individuals, academic institutions, and society.

2. Introduction

The past decades have seen tremendous scientific progress and astonishing technological advances that not long ago seemed like science fiction. Yet, such scientific progress stands in stark contrast to progress in improving the participation of underrepresented groups in academia, particularly in the fields of science, technology, engineering and mathematics, known as STEM. A report from the National Institutes of Health (NIH) published in 2017 highlights the gender disparities encountered in science: Out of 16 NIH directors, only 1 was a woman; in the top 10 research institutes in the USA, the percentage of women with tenure among all professors was at most 26%, and in some cases even below 20%. Women occupied 37% of the NIH intramural research program tenure-track body, but only 21% attained tenured status, with women of color occupying only 5% of tenured positions (addressing gender inequality in the NIH intramural research program). The numbers show similar trends for PhD programs in the US. According to the Society for Neuroscience, the percentage of women applicants in PhD programs has increased in the recent years, from 38 % in 2000-2001 to 57 % in 2016-2017, with a matriculation rate of 48% for women in 2016-2017. By contrast, women represented only 30% of all faculty for PhD programs.

The statistics are similar in Europe. The European Research Council (ERC-Equality of opportunity in ERC Competitions) reported that only 32% of its panel members and 27% of its grantees in the Horizon 2020 program were women. In the Netherlands, 44% of PhDs were awarded to women in 2018, yet only 22% of the tenured faculty were women. A similar trend is reported in Switzerland, where close to 40% of fixed term professorships in 2017 were held by women, but for tenured positions the fraction of women dropped to 25%.

These statistics confirm the gender disparity that exists in higher academic positions, despite an almost equal representation across disciplines at earlier career stages (see Gruber et al., 2020 for a thorough investigation of gender disparities in psychological science). A putative cause of this phenomenon is gender bias, i.e., prejudice based on gender (encompassing the identity and the expression of that gender). Gender bias can be explicit or implicit. Explicit bias is a conscious and intentional evaluation of a particular entity with some degree of favor or disfavor ( Eagly and Chaiken, 1998 ). Implicit bias reflects the automatic judgment of the entity without the awareness of the individual ( Greenwald and Banaji, 1995 ). These types of bias emerge from different sources such as stereotypes, prejudice, and discrimination ( Fiske, 1998 ), which reflect general expectations about members of a given social group. Gender stereotypes are broadly shared and reflect differences between women and men in their perspective and manner of behavior. Importantly, gender stereotypes also impact the way men and women define themselves and are treated by others, which in turn contributes and perpetuates such stereotypes (see Ellemers, 2018 for review). Gender bias impacts all women, with even more impact on women whose gender intersects with other identities that are often discriminated against, including but not limited to race and ethnicity (see Quick Take: Women in Academia), socio-economic status, religion, gender expression, gender identity, sexual orientation, or disabilities ( Armstrong and Jovanovic, 2015 ). Moreover, it has been shown that gender stereotypes influence the enrollment of women in STEM in many countries ( Miller et al., 2015 ; Hanson et al. 2017 ). As such, properly tackling this issue requires both structural and cultural change. Many of the biases and solutions presented in this article can apply to and be amplified in other minority groups (see our discussion of intersectionality), but a comprehensive assessment of those issues is beyond the scope of this paper. Indeed, pervasive gender biases do not start at the academic level, but they are deeply rooted in many societies and even appear early in life, impacting young girls’ career aspirations and lifetime educational achievements ( Makarova et al., 2019 ). For instance, in many cultures, it is a long-standing stereotype that boys are better at math than girls ( Else-Quest et al., 2010 ), which, in turn, impacts young girls’ performance on math tests ( Spencer et al., 1999 ) despite no intrinsic or biological difference ( Kersey et al., 2019 ; Shapiro and Williams, 2012 ). Parents’ and teachers’ expectations can also show biases that influence children’s attitudes and performance in math ( Gunderson et al., 2012 ). This gender stereotyping through interactions with parents, educators, peers, and the media has a negative effect on girls’ interest and confidence in their performance in STEM subjects, potentially reducing interest in research careers in STEM later in life ( Cheryan et al., 2015 , 2017 ).

Here, we will focus on gender bias at the university level, which forms a further bottleneck for gender equity in STEM. The women-to-men ratio progressively decreases with advancing degrees and career stages. Despite remarkable progress made over the last three decades to mitigate gender bias ( Eagly, 2018 ), equity is still far from being reached in academia. Multiple studies have systematically documented bias in every aspect of academia ( Fernandes et al., 2020 ), including journal article and innovation citations ( Dworkin et al., 2020b ; Hofstra et al., 2020 ), publication rates ( West et al., 2013 ), patent applications ( Jensen et al., 2018 ), hiring decisions ( Nielsen, 2016 ), research grant applications ( Burns et al., 2019 ), evaluations of conference abstracts ( Knobloch-Westerwick et al., 2013 ), symposia speaker invitations ( Schroeder et al., 2013 ), postdoctoral employment ( Sheltzer and Smith, 2014 ), prestigious science awards ( Lunnemann et al., 2019 ), and tenure decisions ( Weisshaar, 2017 ). These forms of bias are intertwined, and evolve and accumulate along the career path (see Figure 1 ). Their combination can lead to a gradual abandonment of scientific careers by many women, the numbers of which decrease as career stages progress.

Expression of the accumulation of the different facets of gender bias throughout a woman researcher’s career organized according to when they begin to have an impact. Each line represents one aspect of the gender bias and covers the career stages it is prevalent in. The dot represents the peak in time of a given aspect.

Given the prevalent and deep-rooted nature of gender bias in academia, we aim to unravel different forms of bias, evaluate their manifestation over the career-span, and provide suggestions towards resolving gender disparity. We explain how pervasive gender bias affects different components, dimensions and roles of academics, and how these barriers to women’s advancement differ across each stage of career development. Our goal is to assemble information regarding the different facets of gender bias in a digestible format for the neuroscientific community. We aim to launch a discussion around the multifaceted and deeply rooted issues surrounding gender bias in academia and, in particular, in the field of neuroscience. We discuss problems faced by women in science, which are often taking place behind closed doors, providing information and increased awareness of central issues to academics and institutions seeking a balanced and fair environment. We also recommend both tested and untested concrete solutions to help mitigate the negative consequences of bias along three axes: at the individual (i.e., actions we can take as colleagues, friends, or mentors), institutional (i.e., policies and regulations), and societal levels (i.e., legislative action concerning society at large).

Changes in society and culture are often slow and difficult to implement, but without ongoing awareness, gender equality cannot be achieved. Solutions to the problem of gender bias have been difficult to achieve for many reasons, and some may be more tenable in certain circumstances than others. Here, we present exemplary policies from progressive institutions that have been effective in alleviating gender bias mostly in STEM, and specifically in neuroscience. We also describe quantitative tracking tools ( Table 1 ) that contribute to identifying and mitigating bias. As several manifestations of bias do not yet have concrete solutions with demonstrated results, we also propose some untested suggestions that may prove useful, and which future research could address ( Table 2 ). It is our hope that this article will continue the conversation toward resolving gender bias and bring us closer to tangible results.

Tools and Resources for Addressing Gender Bias in Academia

These tools are based on probabilistic algorithms that may not always provide accurate estimates, especially in cases with missing or uninformative data (e.g., initials instead of full surnames, rare surnames, etc.). See original publications for limitations.

Summary of the different actions suggested throughout the manuscript to mitigate gender bias by section and level of responsibility. Each action is classified by its current status (tested/recommended, tested/debated or implemented) and supported by some examples of highlighted advocates. Note that many solutions for the individual are difficult to quantify, and so are left blank.

3. Gender biases are amplified through career stages

Though gender stereotypes are already strongly shaped in childhood ( Makarova et al., 2019 ), college or university study is a further bottleneck to gender equity. Even in their first year beyond high-school, women are 1.5 times more likely than men to leave the STEM higher education pipeline ( Ellis et al., 2016 ). In more advanced university degrees and career stages, the women-to-men ratio progressively decreases, referred to as the “scissors effect.” In most countries, the point where the effect begins is at the start of the university years with equal numbers of women and men enrolled. The gap widens (like an open pair of scissors) by the end of the postdoctoral career stage (European commission report 2015, GARCIA Project). In the United States, the gender gap continues to grow between the postdoctoral and associate professor years with women transitioning to principal investigator positions at about a 20% lower rate than men ( Lerchenmueller and Sorenson, 2018 ). Similar data have been reported for other agencies and countries, highlighting the widening gender gap across the career stages ( Burns et al., 2019 ; McAllister et al., 2016 ; Pohlhaus et al., 2011 ). Although the percentage of women among undergraduates, graduate students, and postdoctoral researchers has increased in the past few decades, women remain largely underrepresented in STEM faculty positions ( Beede et al., 2011 ; Field of degree: Women-NSF). Possible factors contributing to the increasing gender gap as careers progress will be reviewed in the following sections, where we will disentangle the various aspects contributing to each factor and propose concrete solutions to close the gender gap.

4. Gender bias hinders scientific productivity, authorship and peer-review

Women are systematically underrepresented as first and last authors in peer-reviewed publications relative to the proportion of women scientists in the field ( Dworkin et al., 2020b ; West et al., 2013 ). The discrepancy is particularly evident for senior author positions, as well as single-authored papers and commissioned editorials, i.e., positions typically reflective of senior roles ( Holman et al., 2018 ; Schrouff et al., 2019 ; West et al., 2013 ). Moreover, an overall increase in gender differences in productivity has accompanied the steady increase of women in STEM over the past decades. This difference in productivity between men and women is mostly explained by a higher female than male dropout rate while the yearly difference in productivity between genders is relatively small ( Huang et al., 2020 ). Furthermore, a study of peer review based on 145 journals in various fields reported that women submit fewer papers than men ( Squazzoni et al., 2021 ). The underrepresentation of women increases with the impact factor of the journal ( Bendels et al., 2018 ). Neuroscience is no exception, as women authors are less likely to submit to high-profile journals, including senior women. In 2016, only around 20% of neuroscience papers sent to Nature had a woman as corresponding author (Promoting diversity in neuroscience, 2018). But even when women do submit to such journals they face gender bias. Indeed, several studies where the identity of the authors was experimentally manipulated demonstrated that conference abstracts, papers, and fellowship applications were rated as having higher merit when they were supposedly written by men. These effects were even stronger in scientific fields viewed as more “masculine” ( Knobloch-Westerwick et al., 2013 ; Krawczyk and Smyk, 2016 ). Furthermore, a recent study of 9,000 editors and 43,000 reviewers from Frontiers journals demonstrated that women are underrepresented as editors and peer reviewers ( Helmer et al., 2017 ). Additionally, all editors, regardless of whether they are men or women, display a same-gender preference (homophily), which at the moment favors men in part because there are more men in the field ( Murray et al., 2019 ).

In addition to publications, a screening of approximately 2.7 million US patent applications indicated that there was also discrimination in the patent review process, leading to relatively few approved patent applications registered by women inventors ( Jensen et al., 2018 ). Many of these effects were larger in fields with a generally higher representation of women, such as life sciences, than in technology areas ( Hunt et al., 2013 ; Sugimoto et al., 2015 ; Whittington and Smith-Doerr, 2008 ). Though gender bias in authorship has been explicitly acknowledged for years (Women in neuroscience: a numbers game, 2006), it has changed minimally over the last decade ( Bendels et al., 2018 ; Holman et al., 2018 ; 2018 ). Although the publication gap is decreasing, it is wrong to assume that there will be a proportional representation anytime soon without further active interventions ( Bendels et al., 2018 ). In some disciplines, such as math, computer science, and surgery, gender parity in publications is unlikely to be reached in this century due to the current slow rates of increased representation of women ( Holman et al., 2018 ). Other fields, such as psychology, have seen relatively greater increases in publications by men authors over time, further widening the gender gap ( Ceci et al., 2014 ). Given that publishing, particularly in high-profile journals, is critical for hiring decisions and career advancement, this inequality in authorship will continue to contribute to the increasing gender disparity across academic ranks ( Fairhall and Marder, 2020 ).

Suggestions for decreasing gender bias at an individual level:

Increasing awareness for all scientists, editors and reviewers regarding gender bias in authorship could help mitigate this issue. All scientists could seek out education in gender bias, and proactively consider how to adjust their own behavior to ensure equity in their reviews.

Suggestions for decreasing gender bias at the institutional level:

Finding alternatives to single-blind review is needed to increase the transparency of the peer review process ( Barroga, 2020 ; Lee et al., 2013 ). One proposed solution to mitigate gender bias in the review process is adoption of double-blind review, hiding the authors’ name ( Rodgers, 2017 ). Double-blind review has been introduced in several fields, such as ecology and computational sciences, and has been successful in reducing biases due to geographic location or university reputation ( Bernard, 2018 ; Budden et al., 2008 ; Mulligan et al., 2013 ; Snodgrass, 2006 ; Tomkins et al., 2017 ). It is also standard usage in the top journals in sociology, political science, and history and was introduced in some neuroscientific journals such as eNeuro. However, the efficacy of double-blind review in reducing gender bias is still unclear. An early study found that introducing double-blind peer review significantly increased the number of first-authored papers by women ( Budden et al., 2008 ), whereas later studies found no effect on review gender bias ( Cox and Montgomerie, 2019 ; Tomkins et al., 2017 ). It is possible that more recent blind reviews were compromised by the use of preprint servers that list authors’ full names. Another proposed solution is an open peer review as currently implemented in Frontiers journals where the names of the authors and the editor and reviewers are made public upon publications. One last alternative would be a hybrid peer review system combining open discussion between scientists and peers while preserving the anonymity of the latter ( Bravo et al., 2019 ; Lee et al., 2013 ). Such a system could consist of a pre- or post- publication discussion platform that allows referees, editors, and authors to interact providing feedback on a paper.

Importantly, academic journals need to pay attention to potential sources of gender bias in order to be able to identify ways to mitigate them. One way to encourage review and editorial panels to improve accountability and transparency is to make demographic information regarding authors and reviewers publicly accessible ( Murray et al., 2019 ). This is already implemented by PEERE, an European protocol designed to be an equitable way to get more data on the peer review process ( Table 1 ; Squazzoni et al., 2017 ). Moreover, an increasing number of publishing groups are publicly releasing statements in support of diversity in authors, citations, and/or referees ( Sweet, 2021 ; 2018). As a recent example, Cell Press is encouraging authors to evaluate their citation lists for biases, as well as to ensure diversity in their research participants, authors and collaborators ( Sweet, 2021 ). It is also the case of eLife which sets a twice-yearly report about actions taken to improve transparency, promote equity, diversity and inclusion in the publishing process as well as in their editorial board. Such initiatives are setting a positive example that could be followed by more publishers across all academic fields.

5. Gender differences in the number of citations

Citation metrics have emerged as a critical index of productivity in the biological and cognitive sciences. Citation counts influence hiring and tenure decisions, grant awards, speaker invitations, and career recognition. As an example, a study in the field of astronomy showed that in 149,000 publications, a paper whose lead-author was a woman received 10% fewer citations on average than similar papers with a man as leading author ( Caplar et al., 2017 ). In top neuroscience journals, that number is even greater; papers with women as first and last author receive 30% fewer citations than expected given the number of such papers in the field ( Dworkin et al., 2020b ).

Furthermore, recent research reveals that contemporary citation practices skew these metrics in favor of men, undervaluing woman-led research of equivalent quality and potential impact. In particular, men undercite women scientists relative to men scientists, and their rates of self-citation are higher than those of women ( Dworkin et al., 2020b ; King et al., 2017 ). Additionally, men are more likely to use promotional language, such as positive words (e.g. “unprecedented” or “excellent”) in the title or abstract, which in turn leads to more citations and an inflation of the h-index ( Cameron et al., 2016 ; Kelly and Jennions, 2006 ; Lerchenmueller et al., 2019 ; Woolston, 2020 ). It is also possible that citation bias is exacerbated by the use of social media platforms such as Twitter. A recent randomized controlled trial demonstrates that papers that were tweeted received more citations at the end of one year than papers that were not tweeted ( Luc et al., 2021 ). Women academics have disproportionately fewer Twitter followers, “likes”, and re-tweets than men academics, controlling for their social media activity levels and professional rank ( Zhu et al., 2019 ).

Suggestions at the individual level:

At the individual level, all authors should be more aware of which articles they cite in their work. In particular, articles that already have a high number of citations are seen as “seminal” thus exacerbating biases that may not reflect quality. In the case of multiple possible citations, they should seek to balance the number of citations between genders according to a chosen model of research ethics. In the distributional model, citations would be distributed in a manner that is proportional to the percentages in their field, while in a diversity model, citations would be distributed in a manner that seeks to proactively counteract a history of inequality ( Dworkin et al., 2020a , 2020b ). Practically, efforts to diversify one’s reference list can be supported by algorithmic tools that now exist to predict the gender of the first and last author of each reference by using databases that store the probability of a name being carried by a woman ( Zhou et al., 2020 ). This tool already exists in neuroscience ( Table 1 ) and we recommend wide implementation across academic fields.

Suggestions at the institutional level:

One proposed solution is to increase diversity in review and editorial panels ( Murray et al., 2019 ) as implemented by Progress in Neurobiology and Elife among other journals. As a notable example, Progress in Neurobiology, has an editorial board with 80% women associate editors. This can help mitigate bias, but may not be sufficient, as even women might be biased against other women. One option is to develop alternative citation metrics that account for the influence of self-citation and gender bias. One example of these metrics are the m-index, which is the h-index adjusted for career age, or the m(Q)-index, which adjusts for career age and excludes self-citations ( Cameron et al., 2016 ).

We also suggest that journal editors incorporate existing quantitative tools that analyze the gender ratio of a reference list by probabilistically inferring the gender of authors in a list of citations (see Table 1 ). Journals could then require authors to either eliminate any possible bias or provide a detailed justification for their deviation from the expected distribution. We also recommend the implementation of additional algorithmic tools in scientific journal submission websites to identify under-cited articles by women authors in a subfield, or to notify authors of citation biases in their submissions. Lastly, journal editors could consider increasing limits on the number of citations to accelerate the diversification of reference lists. As an example, Neuron modified their guidelines to exclude reference sections from the maximum character limit in research article submissions.

6. Scientific funding and awards are heavily biased

Funding is crucial to a researcher’s scientific progression and career advancement, including gaining tenure and broad professional recognition ( Charlesworth and Banaji, 2019 ; Duch et al., 2012 ). While the funding landscape is slowly evolving towards gender parity, women still face substantial challenges as they compete for limited resources. Some funding agencies collect data on the distribution of funding across genders. For instance, the percentage of NIH research grants awarded to women has been steadily growing over the past two decades: increasing from 23% in 1998 to 34% in 2019 (NIH Data Book—Data by Gender, 2020), with similar patterns observed for the National Science Foundation (NSF), the United States Department of Agriculture, and the European Research Council (ERC) ( Charlesworth and Banaji, 2019 ; ERC consolidator grants 2019 - statistics, 2019). However, despite this positive trend, progress still needs to be made as women scientists typically hold fewer grants and receive smaller awards compared to men scientists (National Institutes of Health, 2020; 2019).

Interestingly, while women receive more NIH research career grants at an early career stage than men (54%), the percentage of grants awarded to women progressively drops for grants associated with later career stages (research project grants: 34%; research center grants: 26%; NIH, 2020). Similar data have been reported for other agencies and countries, highlighting the widening gender gap across career stages: women are awarded fewer larger grants and are less likely to have them renewed than men ( Burns et al., 2019 ; McAllister et al., 2016 ). Possible factors contributing to this increasing gender gap might be publication and citation practices, family circumstances, and other barriers resulting from implicit and explicit gender stereotypes ( Pohlhaus et al., 2011 ). Moreover, the percentage of women submitting research grant proposals as a PI is less than expected relative to their representation in all fields but engineering ( Rissler et al., 2020 ).

The funding gap is also apparent in the amount awarded, with men typically asking for more funds ( Waisbren et al., 2008 ) and obtaining larger grants than women (National Institutes of Health, 2020). A recent study found a median gender disparity in NIH funding of $39K per year awarded to first-time principal investigators, while no significant differences by gender were found in the performance measures (i.e., median number of articles published per year, median number of citations per article, and the number of areas of research expertise in published articles prior to their first NIH grant; ( Oliveira et al., 2019 ). The differences were even more pronounced for funding acquired by investigators at prominent U.S. universities (median gender difference of $82k). Although the gender gap is smaller regarding R01 awards (median difference $16k), men receive more of them (after controlling for other performance measures; (National Institutes of Health, 2020; Pohlhaus et al., 2011 ). Furthermore, data from the NIH also show that the most dramatic differences in funding amounts were observed for research center grants (average difference of $476k), again highlighting increasing disparity at later career stages.

Although the proportion of women who receive career awards for their scientific contributions has steadily increased over the past decades, women still receive substantially fewer prizes than men, and less money ( Ma et al., 2019 ). Across 13 major STEM disciplines, only 17% of professional award winners were women ( Lincoln et al., 2012 ). This number is lower than expected based on overall representation by women in the STEM fields (38% for junior faculty and 27% for senior), likely indicating review bias with professional efforts and accomplishments of women not receiving the same recognition. Gender disparity is even more dramatic for more prestigious awards. For instance, women represent only 21% of Kavli Prize winners, 14% of recipients for the National Medal of Science, 3% for the Nobel Prize in Chemistry, 3% for the Fields Medal in Mathematics, and 1% for the Nobel Prize in Physics ( Charlesworth and Banaji, 2019 ; RAISE Project 2018).The year 2020 was a unique year in Nobel Prizes, with two women winning the prize for Chemistry and one woman the prize for Physics. Despite this positive step, gender equity is still lacking, and active efforts need to be continued to ensure that women will keep being represented in prestigious awards in the years to come. Gender bias in distinguished recognition perpetuates the falsehood that only men can aspire to the highest levels of academic achievement, thus sending a harmful message to younger generations of aspiring scientists. Furthermore, disparities in funding and recognition tend to have a subsequent snowball effect. Indeed, grant funding drives scientific productivity, which in turn drives promotions; promotions drive increases in salaries and stature; stature drives recognition. Gender bias at each of these collective steps serves to further hamper the advancement of women in their academic careers.

Suggestions at the Individual level:

The process of applying for certain career transition awards across scientific disciplines, such as NIH K awards or the Burroughs-Wellcome career award, forces both the applicant and the mentor to envision the candidate in the role of a faculty member, something that can have a profound effect on the candidate’s internal model of self and the attitude of the mentor.

Suggestions at the Institutional level:

Solutions could emerge directly from funding agencies in all scientific disciplines if they commit to actively monitoring for gender differences and ensuring gender equity in grant application rates, success rates and amounts awarded. To ensure fairer funding, we suggest that agencies introduce a gender target for grant applicants, success rates and amounts awarded. This could consist of a defined percentage of women researchers or amount of funding allocated to them at different career stages. Crucially, funding agencies should hold themselves accountable for attracting more female applicants, by changing the procedures used in their competitions to create more equitable outcomes ( Niederle, 2017 ; Niederle and Vesterlund, 2011 ). Further, it has been shown that having a target representation among women leads to increased numbers of applications by women; this brings stronger candidates to the competition, with little reverse discrimination -i.e. discrimination in favour of women- ( Niederle et al., 2013 ). Importantly, in contrast to some affirmative action approaches, this approach preserves the performance and the quality of the competition ( Balafoutas and Sutter, 2012 ).

This step could be enhanced by alerting the committee to potential gender bias (that both male and female reviewers are susceptible to) and even prefacing grant reviews with bias training. In addition, women are particularly underrepresented as leaders on large projects and/or international collaborations, and adjusting this imbalance could help establish overall gender equity in research funding. Finally, the Canadian Institutes of Health Research have successfully increased the number of female grant recipients by creating funding mechanisms that dispense awards focusing on the merit of the scientific proposal instead of the merit of the principal investigator ( Witteman et al., 2019 ).

Moreover, monitoring implicit bias by making the demographics information of former grantees accessible to funding committees could help pinpoint the disparities and distribute the resources more equitably ( Choudhury and Aggarwal, 2020 ). To reduce bias in the amount of requested funding, we suggest that submission portals implement artificial intelligence tools to provide researchers with recommendations on amounts of funding given their career stage and type of research. This suggestion follows the findings of Bowles and colleagues, who have shown that women ask for as much as men when ambiguity about bargaining range is reduced ( Bowles et al., 2005 ).

Importantly, department chairs and deans must commit to an equitable distribution of institutional resources across genders. Additionally (but not as an alternative), department chairs could actively encourage, support and provide the means (for example through release time, workshops, etc.) to all faculty members to pursue applications for career awards and large grants such as program projects and center grant funding (see Gender Equity Guidelines for Department Chairs).

7. Teaching evaluations reflect biases and gender-role expectations

Gender biases are ubiquitous in the classroom, affecting both the students and their professors ( Fan et al., 2019 ). At the student level, what professors integrate in their course syllabi shapes students’ knowledge and perception of academia. Women are under-cited as well as under-assigned in syllabi: 82% of assigned readings in graduate training in international relations across 42 U.S. universities are written by all-men authors ( Colgan, 2017 ), and only 15 of the 200 most frequently assigned works in the section “politics” of the Open Syllabus Project are authored by at least one woman ( Sumner, 2018 ).

At the professor level, large-scale studies have found that women instructors receive lower than average scores on their student evaluations in comparison to men and that gender bias can be so substantial that more effective instructors are rated lower than less effective ones ( Mengel et al., 2018 ). These findings have been substantiated in experimental studies, where the gender identity of the instructor in online courses was manipulated, with the instructors receiving lower ratings from both male and female students when they were believed to be women ( Khazan et al., 2019 ; MacNell et al., 2015 ). Men are perceived by all genders to be more knowledgeable and to have stronger leadership skills than their women counterparts ( Boring, 2017 ), even when there are no actual differences in what students have learned. This bias towards masculine traits during student evaluations of teaching (SETs) can have an important impact on the career of women scientists, as it is commonly used as a measurement of teaching effectiveness for promotion and tenure decisions. Apart from bias in the perception of women as teachers, women also tend to have higher teaching loads compared to men, and less time for research ( Misra et al., 2011 ), which can negatively impact their research productivity.

We propose the use of existing tools ( Reinholz and Shah, 2018 , see Table 1 ) that can help faculty to build their syllabi and bibliographies in a more gender-balanced way ( Sumner, 2018 ). In particular, faculty could provide historical examples of successful women scientists to reinforce female role models, ensure that the resources they give to their students are gender balanced ( Table 1 ), and use more inclusive language (i.e. ‘folks’ instead of ‘guys’, ( Bigler and Leaper, 2015 ).

The necessity to improve fairness and objectivity in teaching evaluations is critical to balance the odds for promotion across genders. A study conducted at the University of California, Berkeley, suggested abandoning the SETs as the principal measure of teaching effectiveness, and implementing instead other types of assessment, such as observing the teaching and examining teaching materials and portfolios ( Stark and Freishtat, 2014 ). Moreover, improvement in the phrasing of the SETs is also required. Simple changes to the language used (e.g., explicitly asking students to be aware of their biases) had a positive impact on the assessment of women professors ( Peterson et al., 2019 ). Prefacing SETs with counter-stereotype content could further decrease bias that is evident during the evaluation itself ( Blair et al., 2001 ).

8. Academic hiring, tenure decisions and promotions favor men

Evaluation criteria for hiring and promotion commonly used in academia are also susceptible to gender inequality. These biases are common across all hiring stages, encompassing lab manager positions ( Moss-Racusin et al., 2012 ), postdoctoral fellowships ( Sheltzer and Smith, 2014 ), as well as tenure track positions ( Steinpreis et al., 1999 ).

Strikingly, despite experimental and observable data in STEM fields reporting favorability toward women in hiring decisions compared to equally qualified men, women remain heavily underrepresented in tenure track positions (National Research Council et al., 2010; Williams and Ceci, 2015 ). This discrepancy has multiple potential sources related to different dimensions of gender bias. Gender biases in recruitment can occur even before applicants are evaluated ( Nielsen, 2016 ). In neuroscience and STEM in general, most departmental or unit leaders are men ( Gupta et al., 2005 ; McCullough, 2019 ). Consequently, men are more likely than women to define the unit’s strategic research foci and/or teaching needs, draft the job profile, and outline the announcement, thereby determining the focus of the search. Defining a profile in a broad or narrow manner directly impacts the number and quality of eligible candidates. Narrow profiles can be used to legitimize the selection of a specific candidate ( van den Brink, 2010 ) and often penalize women, as men’s social networks benefit from a higher proportion of scientific leaders ( Greguletz et al., 2019 ; James et al., 2019 ). The practice of some academic institutions limiting open recruitments presents an added barrier for women. A study in Denmark showed that, at the University of Aarhus, about 20% of associate and full professor positions were filled via a closed recruitment procedure ( Nielsen, 2015 ); such procedures are likely to propagate bias, as closed recruitment frequently results in a single applicant ( Nielsen, 2015 ).

The evaluation and selection phase of the hiring process contributes to the persistence of gender imbalance ( Rivera, 2017 ). Since men continue to be overrepresented among tenured/tenure-track faculty, evaluation committees and interview panels tend to have skewed gender composition ( Sheltzer and Smith, 2014 ). Gender bias during hiring is amplified by the role of “elite” male faculty, who employ fewer women in their labs and have a disproportionate effect on training the next generation of faculty; these processes in turn, affect hiring at high-ranking research universities ( Sheltzer and Smith, 2014 ). Moreover, studies performed in Italian and Spanish academic institutions across several scientific fields show that when promotion committees are composed exclusively of men, women are less likely to get promoted ( De Paola and Scoppa, 2015 ). Each additional woman on a 7-member promotion committee increased the number of women promoted to full professor by 14% ( Zinovyeva and Bagues, 2011 ). Another important factor in reducing gender bias in committee decisions is committee member awareness of implicit bias. Indeed, as shown in a recent study in France conducted across scientific disciplines, committee members who believe that women face external barriers in their performance and evaluation are less biased towards selecting men ( Régner et al., 2019 ).

The biases that affect search criteria also influence the evaluation of the applicant’s curriculum vitae. When faculty believe the applicant to be a man, they tend to evaluate the CV more favorably and are more likely to hire the applicant ( Moss-Racusin et al., 2012 ; Steinpreis et al., 1999 ) than when faculty believe the applicant to be a woman. Consequently, only women with extraordinary applications tend to be considered, narrowing the pool of potential women candidates to be interviewed.

Another source of bias during hiring comes from recommendation letters. Their content and quality significantly differ based on the gender of the applicant ( Dutt et al., 2016 ; Madera et al., 2009 ; Schmader et al., 2007 ). For example, letters in support of women are typically shorter, raise more doubts, include fewer ‘standout’ adjectives (e.g., superb, brilliant) and more ‘endeavor’ adjectives (e.g., hardworking and diligent), regardless of the gender of the recommender. Altogether, subtle gender biases throughout the academic hiring process, from job posting to evaluation, increase the risk of creating self-reinforcing cycles of gender inequality ( van den Brink et al., 2010 ; Nielsen, 2015 ).

We recommend that individuals writing job announcements be made aware concerning gender bias issues both explicit and implicit. Individuals evaluating applications should also be trained on topics relevant to gender equity, gender bias, and bias mitigation ( Bergman et al., 2013 ).

Bias awareness workshops could help scientists to improve job advertisements, and assess applications more objectively ( Carnes et al., 2015 ; Schrouff et al., 2019 ). This approach is already in place in some academic institutions (e.g., in the University of California system) and could be more widely adopted and made mandatory for all academic members. The University of Wisconsin-Madison has successfully increased diversity by implementing workshops for faculty search committees that raise awareness about unconscious bias and provide evidence-based solutions to counter the problem ( Fine et al., 2014 ). These types of workshops can be broadly implemented across institutions and fields. Finally, numerous studies show that reminding evaluators of their internal biases at the evaluation stage of the hiring process reduces the impact of bias ( Carnes et al., 2015 ; Devine et al., 2017 ; Smith et al., 2015 ; Valantine et al., 2014 ).

Efforts should also be made to increase diversity in search committees. Increasing representation of women is necessary for reducing bias ( Schrouff et al., 2019 ; Smith et al., 2004 ), despite not being sufficient on its own (see Discussion ). At the same time, institutions should ensure that women in underrepresented departments are not overloaded with administrative obligations, time-consuming committees, or any other assignment tasks that do not enhance promotion prospects ( Babcock et al., 2017 ). To increase diversity in search committees while not overworking women, we propose that members of search committees be compensated by reducing their teaching or other administrative duties. Importantly, we highlight the strong need for male allies as part of search committees (see Discussion ).

Some academic institutions have already introduced mediators from equity committees in the hiring/promotion procedure. For example in Switzerland such mediators are required to actively provide input in faculty hiring and monitor gender balance (Gender Monitoring_Egalité_EPFL). Although non-academic advisors cannot judge the quality of scientific work, their input on the fairness of the hiring process can be valuable.

Each institution must commit to policies and action plans that set quantifiable goals for women in different position categories. Ideally, the number of women reaching the interview stages should match the gender ratio of a given academic field. Concrete recruitment strategies to achieve these goals could be developed, for example, by adopting mandatory submission of regular reports on gender ratio with quantifiable measures ( Bergman et al., 2013 ). As an example, if no women candidates apply, the University of California at Berkeley requires the position to be re-announced more broadly. Institutions can be required to be more explicit and transparent about how merit is evaluated. All of the above measures can be enforced with central incentives, such as funding allocations, to motivate departments to implement the necessary steps and hire more women ( Bergman et al., 2013 ). Another solution to help reach a larger and more diverse pool of potential candidates would be the development of a curated and regularly updated list of underrepresented minority mentees that could become targets for job searches and awards (as it is already the case for conference speakers, Table 1 ).

Importantly, we believe that hiring committees need to recognize forms of scientific contribution to the STEM community not directly tied to scientific productivity. Such contributions include outreach, knowledge dissemination, and faculty service; these are contributions which women make on average significantly more than men, taking time from more traditional forms of research ( Guarino and Borden, 2017 ). The practice of science is evolving, and additional qualification criteria for hiring decisions should be adopted to acknowledge the broader range of roles and responsibilities of contemporary scientists ( Moher et al., 2018 ). In addition to building towards gender equity, recognizing and incentivizing these contributions to our academic communities will benefit all scientists regardless of gender.

Suggestions at the societal level:

When legally possible (as in Sweden, Germany, and Switzerland), any organization, including academic institutions can set policies on gender equity, set goals for gender ratios in different position categories, and develop recruitment strategies to achieve these goals ( Nielsen et al., 2017 ; Schrouff et al., 2019 ; Exploring quotas in academia; Des quotas pour promouvoir l’égalité des chances dans la recherche).

9. Gender bias in negotiation outcomes

Negotiations are important for building a successful career, as they can lead to better starting salaries and start-up packages, salary increases, better work conditions, and increased allocation of personnel, lab space, and other resources. On average, men tend to initiate negotiations more often than women ( Babcock et al., 2006 ; Small et al., 2007 ). Additionally, when they do, women still get less out of negotiations; are less likely than men to be successful in receiving the raise they asked for, and may incur a social cost for standing up for themselves ( Bowles et al., 2007 ; Mazei et al., 2015 ).

Importantly, negotiations might be affected by perceived gender stereotypes as gender roles influence both parties of the negotiations regardless of their gender ( Kray et al., 2001 , 2014 ). In accordance with Role Congruity Theory ( Eagly and Karau, 2002 ), women are often reluctant to negotiate because initiating negotiations is perceived as stereotypically male behavior. Moreover, expressions of emotions commonly associated with leadership characteristics, such as anger and pride ( Brescoll, 2016 ), are more widely tolerated and even appreciated when they emanate from men compared to women ( Brescoll and Uhlmann, 2008 ). The expression of gender roles is a complex phenomenon though. On the one hand, women may lose social capital (i.e the work connections that have productive benefits) when voicing their opinions, especially when they go against the group’s opinion. On the other hand, it has been reported that women who described themselves as displaying so-called "masculine" personality traits (i.e., a competitive mindset and willingness to take risks) had a 4.3% greater chance of getting positions and were more likely to take up positions that offered 10% higher wages than those displaying so-called "feminine" personality traits (i.e., gentle, friendly, and affectionate)( Drydakis et al., 2018 ). This deep-seated implicit bias, held by all genders, has non-trivial consequences over women’s career in academia.

Transparency is a key element for equity during negotiations. We propose that institutions provide access to everyone's salary and also to a range of possible salaries per academic level. Gender differences in economic outcomes tend to be smaller when negotiators first receive information about the bargaining range in a negotiation ( Mazei et al., 2015 ). Such an approach could be complemented by providing information to faculty about ranges of research budgets, or salaries and construct a rational -rather than ad-hoc- process for determining how resources are allocated.

Removing stereotypes in both parties of the negotiations can improve women’s performance ( Kray and Kennedy, 2017 ). It has been shown that having supportive academic supervisors plays an important role in improving negotiational effectiveness for women ( Fiset and Saffie-Robertson, 2020 ). Also, for mentees eager to develop their negotiation skills, institutions could offer courses on this topic. For instance, several online services, highlighted on Table 1 , offer training materials on negotiation strategies, as well as materials targeted for companies wanting to improve their gender representation. These workshops provide techniques for negotiation and conflict resolution.

10. Gender inequalities are present in conferences

Conferences and meetings are crucial avenues for scientists to communicate new discoveries, form research collaborations, communicate with funding agencies, and attract new members to their labs and programs ( Calisi and a Working Group of Mothers in Science, 2018 ). For instance, invitations to seminars at different institutions increase scientists’ visibility and expand their academic networks. However, equally qualified women scientists are often given fewer opportunities to speak at conferences and seminars than men. For instance, nearly half of the conferences in neuroscience have fewer women speakers than the base rate of women working in the field of the conference (Conference Watch at a glance ∣ biaswatchneuro, How scientists are fighting against gender bias in conference speaker lineups). Given that conference presentations are an important indicator of the impact and significance of one’s research, this form of gender bias has negative implications for women during hiring and promotion. Inviting women speakers and providing them with resources that allow them to attend the conference contributes to their professional development and increases their visibility. This action also contributes to the perception of women researchers as leaders for young scientists in the audience. This visibility is especially important for boosting the confidence of young women researchers. Moreover, women in the conference audience generally remain less visible, as they ask fewer questions than men. This is due to both internal (e.g., being unsure whether their question is appropriate) and structural factors (e.g., when the first question is asked by a man, women are less likely to follow up) ( Carter et al., 2018 ).

Another important point that undermines the experience of women at conferences is unprofessional and inappropriate behavior ( Parsons, 2015 ) (see the below section 11 on sexual harassment). This may cause some scientists to avoid conferences due to feeling unsafe ( Richey et al., 2015 ). Specifically, sexual and gender harassment and micro-aggressions target primarily women, and are a common form of reported harassment at conferences ( Marts, 2017 ). Finally, disrespectful and unprofessional questions and feedback during poster sessions and talks may discourage women from presenting their work ( Biggs et al., 2018 ).

We recommend that invited participants take proactive actions to promote gender equity. They could ask the organizers what measures are taken to ensure that the symposium and/or conference will not be a man-dominated event, and could also decline to speak at conferences with an imbalanced speaker lineup. For instance, attendees can monitor progress in a conference’s history of gender balance in speaker selection and see the base rates of women in relevant subfields, as is already possible in neuroscience ( Table 1 ). We believe that scientists of all genders and levels of seniority should take personal responsibility to ensure professional conduct by speaking out against harassment and other biased behaviors.

Conferences can strive to ensure that symposia include gender-balanced speakers and chairs, at least in a ratio that matches the demographics of the field. Conference, seminar, and symposium organizers should have a list of women speakers that they can invite. They can search outside their personal and professional networks by consulting resources such as the directory compiled by Jennifer Glass and Minda Monteagudo which lists searchable databases of highly qualified women by subfield ( Table 1 ). As a notable example, proposals for symposia at the Federation of European Neuroscience Societies (FENS) Forum are required to include men and women speakers or provide a justification for single-gender symposia.

We also propose that organizers consider existing tools to mitigate their own bias. Gender balance at neuroscience conferences has been publicly monitored through the website BiasWatchNeuro ( Table 1 ). Such measures could be implemented in many academic fields. In the context of conferences, unlike that for citations, diversity must come from the top: the organizations hosting a conference should strive for a committee that is well trained regarding bias. The Organization for Human Brain Mapping (OHBM) has introduced an ‘Affirmative Attention’ approach, by which new Council members are elected through a ballot, so that the candidates for at least some open positions may only include women, to ensure that the gender distribution in the council remains equitable, no matter which candidates get elected ( Tzovara et al., 2021 ). Conference organizers can also offer programs that raise awareness of the issue of gender bias. For example, the annual meetings of several major conferences, such as the Society for Neuroscience, OHBM, or FENS, include educational courses, workshops and informational sessions on gender bias (Seeds of Change within OHBM: Three Years of Work Addressing Inclusivity and Diversity). Another example is the ‘power hour’ institutionalized by The Gordon Research Conferences which consists of a forum for conversations about diversity, inclusivity and related topics (The GRC Power Hour™).

However, in workshops about gender bias, often only highly successful women are represented on panels discussing bias and women’s careers in academia. In these instances, we believe that it is important to avoid promoting survivorship bias, which emphasizes positive outcomes without addressing the barriers and challenges that must be overcome to achieve that success more broadly among women scientists. Moreover, men are not usually invited as speakers in these events and are also usually absent from the audience, which renders them less aware of the issues around gender bias, and therefore less effective allies. We suggest that the way that the speakers and topics of panels are chosen must be improved to be more inclusive and represent the full spectrum of diversity in the community.

An inclusive code of conduct has been proposed as mandatory for each conference, stating what is and what is not appropriate behavior for conference attendees ( Favaro et al., 2016 ). Conference organizers should have clear plans of action in place in case harassment occurs, including anonymous reporting and removing confirmed harassers from the conference ( Marts, 2017 ; Parsons, 2015 ). The suggested code of conduct should also include respectful ways to provide constructive scientific feedback ( Favaro et al., 2016 ), a practice that should be implemented across all contexts within academia. Lastly, all attendees should feel concerned about and responsible for maintaining a respectable environment during conferences. Since it can sometimes be hard to intervene as things unfold in real-time, we suggest that conference organizers provide a specific contact where members can report unethical or inappropriate incidents.

11. Sexual harassment is a major obstacle encompassing all career stages

A recent exhaustive report on sexual assault led by the National Academies of Science, Engineering, and Medicine, and funded by the NIH, reported that rates of sexual harassment are as high as 58% for academic faculty and staff and between 20 to 50% for students. The majority of the sexual harassment experienced by women in academia consists of sexist hostility. These unacceptable rates are higher than any other work environment except for the military ( Johnson and Smith, 2018 ). The consequences of harassment are far-reaching and require widespread efforts to reduce these high rates if we are to see gender parity in a scientific workplace.

Sexual harassment falls into four main categories: micro-aggression (i.e., comments or actions that express prejudiced attitudes), sexual coercion, unwanted sexual attention, and gender harassment (see National Academies of Sciences, Engineering, and Medicine, 2018 for detailed review). Harassment consists of actions that create a hostile and inequitable environment for members of a specific group. Harassment is not limited to the extreme form of physical assault; it also includes endorsing beliefs that someone’s intelligence is inferior to another’s, or making demeaning jokes that target one gender group.

Unfortunately, all types of sexual harassment are common and lead to negative outcomes for the people who experience them. In addition to the 58% of academic faculty or staff who experienced sexual harassment, 38% of women trainees and 23% of men trainees experienced sexual harassment from faculty ( Johnson and Smith, 2018 ). More egregious numbers are found in specific fields; a recent study reports that 75% of undergraduate women majoring in physics experienced sexual harassment ( Aycock et al., 2019 ). While peer-to-peer harassment is also prevalent, trainees experience worse professional outcomes when faculty at their university conducted the harassment. These numbers may underestimate the problem, as trainees might not feel comfortable speaking up when their career development, and sometimes even legal status in a country, depends on the person harassing them. In another study of 474 scientists, 30% of women reported feeling unsafe at work, compared to 2% of men ( Clancy et al., 2017 ). The rates were even higher for women of color, where almost 50% of women scientists of color reported feeling unsafe at work ( Clancy et al., 2017 ). These experiences are chronically stressful and have been linked to higher levels of depression, anxiety, and generally impaired psychological well-being ( Lim and Cortina, 2005 ; Parker and Griffin, 2002 ). People who have experienced sexual harassment report higher rates of absenteeism, tardiness, and use of sick leave (measured on scales where respondents indicated desirability, frequency, likelihood, and ease of engaging in these behaviors) and unfavorable job behaviors (e.g., making excuses to get out of work, neglecting tasks not evaluated on performance appraisal) ( Schneider et al., 1997 ). Finally, and not surprisingly, individuals who experience sexual harassment are more likely to leave their jobs. All of these statistics demonstrate that sexual harassment is both alarmingly common and reduces the scientific productivity and well-being of the people who have been harmed. Yet, when this behavior is reported, the whistle-blowers may be either retaliated against or there may be no repercussions for the perpetrators. Moreover, even the policies that aim to ‘protect’ victims of harassment have substantial negative consequences, which are more likely to occur to women than men. These include reluctance to have one-to-one meetings with women or to include them in social events, or reluctance to hire women for positions that require close contact with them ( Atwater et al., 2019 ).

Collegial behavior, that does not propagate harassment and micro-aggressions should be the bare minimum expectation in any lab or academic institution. Individuals of all levels should consider their personal responsibility to promote a respectful and professional environment, avoid and denounce unwelcome behavior when witnessed. Besides everyone’s own responsibility, it is essential that organizational leaders display an unequivocal anti-harassment message ( Buchanan et al., 2014 ).

Sexual harassment cannot be tolerated and must be severely reprehended by institutions. Although some initiatives for combating harassment exist, there is to date no evidence that current policies have succeeded in reducing harassment (ACD Working Group on Changing the Culture to End Sexual Harassment). To counter this ineffectiveness, the NIH has recently recommended that sexual harassment needs to be equated to scientific misconduct, including similar mechanisms for reporting, investigation, and adjudication.

Researchers found guilty of sexual harassment could be barred from applying for new grants over a period of years deemed appropriate by the various regulatory entities similar to the penalty for scientific misconduct. Examples of such entities in the USA would be the Department of Health and Human Services (HHS), their Office of Research Integrity (ORI), and the NIH. Importantly, the committees involved in investigating and adjudicating harassment should be independent from the institution leaders ( Greider et al. 2019 ).

One solution often proposed to combat sexual harassment is anti-harassment training. This consists of requiring students and staff to participate in workshops detailing sexual harassment policies and what constitutes unwelcome behavior. This approach has been widely suggested, and is currently implemented in several institutions despite its debatable effectiveness in reducing harassment. Indeed, it has been shown that some approaches could have the opposite effect, with men being less likely to judge a situation as harassment after receiving training, and leading to gender stereotype reinforcement ( Roehling and Huang, 2018 ). Moreover, empirical studies have shown that training employees to recognize what constitutes harassment can be followed by decreases in women managers ( Dobbin and Kalev, 2019 ). By contrast, training managers to recognize signs of harassment and intervene, results in increases in women managers ( Dobbin and Kalev, 2019 ). This seeming discrepancy may be due to gender differences in perception of harassment, so that women are more likely to believe victims of harassment. Departments need to carefully design their sexual harassment training as studies have reported that the designs of such training are essential and need to be adapted to the targeted populations ( Dobbin and Kalev, 2019 ). Interventions that place trainees as allies, such as bystander intervention training (Bringing in the Bystander®), showed positive effects on sexual harassment prevention in academia and military sectors ( Buchanan et al., 2014 ; Cares et al., 2015 ; Katz and Moore, 2013 ; Potter and Moynihan, 2011 ). For instance, Potter et al. (2019) are developing videogames to educate college students bystander intervention skills in situations of sexual harassment and stalking.

One example of a novel, yet untested approach is the ‘Respect is Part of Research’ initiative by graduate students in the University of California Berkeley Physics Department. During these trainings, participants discuss case studies in small groups together with a facilitator, addressing what is wrong about the behavior of the actors in the example, separating intent from impact, and methods to resolve the situation. Providing trainees with the tools to handle difficult situations and creating a supportive community has the potential to significantly shift the culture towards more respectful behavior in academia. However, its effectiveness for combating harassment in the long-term still remains to be tested.

Another factor that can assist in reducing harassment is adopting clear anti-harassment policies in codes of conduct (Why and How to Develop an Event Code of Conduct), both at conferences, and in individual labs. Enforcing a code of conduct is a challenging task, and future efforts should focus on drafting clear policies for different scenarios.

To lower the rates of sexual harassment, all members of the scientific community, and the community at large, need to make widespread changes. Learning to recognize sexual harassment should be an ongoing goal for any nation, starting with education in schools. We recommend that all organizations develop programs charged with reducing the prevalence of sexual violence, sexual harassment, and stalking through prevention, advocacy, training, and healing (for example see the Path to Care center from University of California Berkeley). This approach is distinct from and complementary to the purpose of official university legal procedures (e.g., Title IX in the USA): while such officers legally arbitrate gender discrimination disputes, the University Program we envision would be dedicated to serving the survivors of sexual harassment, preventing new cases, and training the university-wide community.

12. Encompassing all sectors: family planning in academia

Gender inequity exists in the division of household labor. Women typically shoulder most of the burden in childcare and in maintenance of the household, even among dual career partners ( Chopra and Zambelli, 2017 ). Women have increasingly joined the paid labor force, increasing their total work time, but men have not increased the amount of time they spend in unpaid household work. The COVID-19 pandemic is the most recent evidence of the impact of gender inequality in the labor market ( Alon et al., 2020 ). During the lockdown, women scientists submitted fewer manuscripts and started fewer research projects than men ( Viglione, 2020 ), consistent with an additional and disproportionate burden of childcare. While the majority of studies consider households composed of one man and one woman, further work is needed to evaluate the relations between gender and labor in single-parent homes or same-gender parent homes.

Although academia has its perks for the single parent, same-gender parent, and different-gender parent families, such as flexible hours and additional time to tenure, other working conditions can become barriers for family planning. Career stages where funding and mobility are critical, such as transitions between graduate school, postgraduate training, and tenure positions, often correspond to a time when researchers may wish to start a family (see Figure 1 ). However, pregnancy, childbirth, nursing, parental leave, and early childcare take a considerable amount of time, physical and mental resources, and money that constitute a competitive disadvantage in a scientific career. Indeed, parental leave negatively impacts metrics of productivity of early career scientists who are parents ( Chapman et al., 2019 ), yet with a stronger effect for women ( Morgan et al., 2021 ); which in turn impacts the possibility to obtain grant funding (i.e., several calls are limited to a certain amount of years post-degree according to funding agency policies).

Women with children are reluctant to attend conferences due to the lack of childcare support ( Calisi and a Working Group of Mothers in Science, 2018 ). Conferences in distant locations add another layer of complexity, as transoceanic flights often mean a longer stay away from home. Adequate facilities such as lactation rooms are rarely provided, nor are support for a traveling caretaker to assist in the care of their infant as the scientist attends the meeting. This limited mobility reduces parents’ opportunities for international collaborations and funding, which are common criteria used for promotion and evaluations.

Importantly, women face even stronger discrimination when they are part of non-traditional family formations: single mothers experience a stronger work-family strain than partnered ones ( Baxter and Alexander, 2008 ). Studies of single mother doctoral students have shown that they fear being judged in their departments, and that they often feel excluded by university life and academic schedules ( AmiriRad, 2016 ). Although LGBTQ+ parents face similar challenges as cisgender and heterosexual parents ( King et al., 2013 ), LGBTQ+ individuals might have fewer health or retirement benefits, and face unequal treatment in academia ( Cech and Waidzunas, 2021 ; Thompson and Parry, 2017 ). Future studies should address the particular challenges and biases faced by single parent and LGBTQ+ families and their potential impact on academic achievements.

Apart from the academic aspect, most societies are not built to assist families where both parents pursue a demanding career path. For instance, public schools in some countries like Germany often stop in the early afternoon, and it can be hard to find public preschool or after school childcare. Moreover, working mothers often feel stigmatized as they risk being looked down upon by citizens of more “traditional” societies for their choices to work instead of staying at home with their children.

Parents should not have to choose between having a family and an academic career. Evaluation of academic progress should take into consideration delays caused by parenthood and childcare responsibilities. Individuals should also assess their own possible tendencies to judge or exclude academics with young children, and become prepared to support initiatives that would encourage their participation in gatherings, conferences, and other professional activities.

Institutions need to adopt official extensions of graduate, postdoctoral, and tenure timelines due to childbirth and parenthood. To address the financial difficulties for academic families, we suggest a number of measures. First, job security can be improved by creating longer-term contracts where possible, and by providing bridge funds at the department or university level to support trainees during gaps in funding ( Stewart and Valian, 2018 ). Both universities and funding institutions should put measures in place to prevent a gap in funding during parental leave ( Powell 2019 ). Special provisions for parenthood can be made in calls for proposals and funding mechanisms. A few funding organizations include childbirth in their policies as a valid reason to extend the eligibility window (from one year for NIH K awards to 18 months for ERC grants, or a 2 year extension to post-PhD limits per child for the Emmy Noether Program of the German Research Foundation), or subtract time for parental leave (“Research Project for Young Talent” proposed by the Research Council of Norway, 2–7 years post-PhD). Finally, efforts should be made to reduce the difficulty in returning to work after maternity leave, such as providing lactation rooms.

Solutions can be found to support couples in which both partners are in academia ( Schiebinger et al., 2008 ). By enabling couple hiring for tenure track positions, institutions can help women pursue their academic career. Critically, universities should ensure access to affordable, on-site childcare, as this both improves outcomes for children enrolled in such programs and increases women’s participation in the workforce ( Morrissey, 2017 ; Gault 2016).

Specific funding should be allocated for parents to travel for conferences and sabbaticals. Conferences, universities, and funding agencies can reserve a part of their budget to create travel funds for parents. Compared to a decade ago, more conferences are offering nursing rooms ( Cardel et al., 2020 ; Hope et al., 2019 ; Langin, 2018 ) and other types of on-site childcare, which should be accessible to all parents ( Cardel et al., 2020 ; Langin, 2018 ). However, unfamiliar caregivers are not always a viable option, and parents will likely feel most comfortable knowing their child is cared for by a primary caregiver. To address these issues, some conferences, such as FENS or OHBM, are offering childcare grants, which can either cover travel expense for a trusted caregiver (spouse, partner, or nanny) to accompany the parent and child, or pay for expenses involved in leaving the child at home ( Calisi and a Working Group of Mothers in Science, 2018 ; Langin, 2018 ; Tzovara et al., 2021 ).

These issues require a broad reshaping of society, which still relies on parental roles or family patterns that are increasingly obsolete. Law in all countries needs to enforce official extension of timelines to accommodate pregnancy, childbirth, and parenthood, as increases in parental leave result in fewer women leaving the workforce ( Jones and Wilcher, 2019 ). For instance, the total paid period of parental leave in Norway is between 46 and 59 weeks, with maternal and paternal quotas of 15 weeks each and a joint period of 16 weeks. The downside of providing parental leave to both parents is that previous research has shown that giving the same extensions to both parents puts mothers at a disadvantage as fathers are more likely to increase their productivity during this period ( Antecol et al., 2016 ). It is therefore important for parents to have an equal split of child caring duties, and profit from allocated time to bond with their child.

13. Not all gender biases are the same: Intersectionality

Discussions surrounding plans to combat gender bias in academia are incomplete without attention to the unique struggles of women who hold additional identities are subject to discrimination. Barriers faced by all women in academia are compounded for those who are members of additional underrepresented groups (e.g., based on, but not limited to race, ethnicity, first-generation status, religion, socioeconomic status, gender expression, gender identity, sexual orientation, and disability) that interact with and increase gender bias ( Armstrong and Jovanovic, 2015 ). For instance, the gender wage gap has been shown to be wider for transgender women ( Schilt and Wiswall, 2008 ) and also for black women (Guillory, 2001). Women of color faculty are the least likely to receive tenure of all demographic groups despite comparable productivity ( Armstrong and Jovanovic, 2017 ). As such, successful interventions must consider these supra-additive effects, and take an intersectional approach.

Across all career stages and aspects of academia, institutions could develop interventions and programs that take into account the specific needs of overlapping identities. For instance, Flores ( Flores, 2011 ) proposed that financial awards, or targeted mentoring programs could help underrepresented women to overcome practical and psychological burdens associated with intersecting identities. Policies to increase the Latino community in STEM propose mentoring and educational programs in different languages, for women whose native language is not English ( Flores, 2011 ). A first step in developing such programming can be interviews and focus groups with underrepresented minority women in order to receive feedback on structural inequalities that can be addressed at the institutional level. Intersectional approaches can include targeted networking events, mentorship pipelines, and funding initiatives, as well as rigorous data collection to assess efficacy of these approaches. As with any intervention, special care needs to be taken to not overburden the individuals experiencing discrimination with additional tasks and administrative overload.

14. Discussion

In this article, we review empirical evidence demonstrating pervasive gender bias throughout all stages and venues of academic life. Studies have shown that women are less likely to be hired or to receive tenure than men, despite equal performance. They receive less grant funding and fewer prestigious awards. The rates of accepted publications, presentations, and patents are lower for women, and women are less likely to be first or last author on publications or to submit to high impact journals. Studies are documenting a prevailing notion that work by men has higher merit than that of women, a perception that is reflected in the discrepant number of citations of men versus women authors in research papers or in assigned classroom readings. Positions on review panels with the power to hire, promote, approve funding, or decide policy are still largely offered to men, whose own biases (unconscious or otherwise) may impede the advancement of women academics. Women’s salaries are lower than men’s, and women take on the greater burden of childcare, restricting their opportunities to conduct research or attend conferences. Finally, women continue to experience sexual harassment and hostility at an alarming rate, not only in their work environment, but also at conferences and other academic venues.

Apart from the ethical issues this evidence raises, a large proportion of the highly trained and talented individuals who are essential for advancing research and educational practice are not progressing in their academic careers, largely due to the rectifiable issue of gender bias. Here, we gather, explore, and suggest actions at the individual, institutional and societal levels, aimed to mitigate the effects of gender bias. Implementing some of the proposed recommendations will not be trivial as new regulations and controls might themselves require monitoring for bias. We cannot predict the outcomes of the proposed suggestions. However, openly and explicitly acknowledging gender bias (that all genders are susceptible to) is an essential starting point to restore the unbalanced academic environment. In considering such complexities, institutions should engage the advice and guidance of social science experts and the affected groups to ensure optimal solutions.

Diversity is essential to delivering excellence in science as it increases cognitive diversity, which in turn leads to novel solutions ( Page, 2008 )and innovations ( Hofstra et al., 2020 ), as well as increased problem-solving ( Hong and Page, 2004 ) and scientific discovery ( Nielsen et al., 2018 ). Besides the invaluable contribution to science, it will also help reduce stereotypes ( Miller et al., 2015 ). To ensure successful changes, mindsets must change, and our proposed solutions provide a step in that direction. However, many challenges first need to be understood and overcome. Thus, a few important aspects of gender bias must be addressed.

The fight for gender equity needs diverse role models and strong allies

First, we need to amplify the voices of under-represented scientists and mentors as role models in order to encourage diversity. One of the main reasons for leaving science is a lack of mentoring, which affects more women than men trainees, as women are less likely to be mentored ( Preston, 2004 ). In line with this reported gender bias in mentoring in academia, experimental evidence showed that women and men science faculty were less likely to offer mentoring to a trainee when their application materials were assigned a female rather than a male name ( Moss-Racusin et al., 2012 ). In order to overcome this bias against women trainees, mentors have to make an intentional effort to offer mentoring to women trainees to ensure that mentoring is provided equally to women and men trainees. This study also found that female applicants were rated as less competent than the male applicants with the identical application. Awareness of implicit bias is an important first step to overcome these barriers and enable mentors to improve equal support of women mentees. For instance, they could actively encourage them to submit to higher impact factor journals, apply for funding opportunities and large grants, nominate them to awards, invite them to speak in conferences and seminars, and meet potential collaborators. All scientists should consider gender equity when building a team of principal investigators for collaborative work, particularly on larger or more prestigious projects. Having encouraging mentors and role models with whom students and scientists can identify will positively shift their perception of themselves ( Morgenroth et al., 2015 ) and mitigate imposter syndrome ( Abdelaal, 2020 ). This type of support can make the academic career path more inclusive and accessible, irrespective of race, ethnicity, sex, sexual orientation, or gender.

Second, everyone needs to be on board, irrespective of gender or career stage. This is particularly critical as men still hold most positions of power in STEM, and can use their positions to change the system from within. This can be challenging as there are several persistent misbeliefs about preventing progress ( Johnson and Smith, 2018 ). One might argue that giving more opportunities to women necessarily comes with a loss of privileges for men. However, the situation in some STEM domains is not a zero-sum game. Many countries suffer from an overall STEM worker shortage; thus adding women to the workforce would improve overall industry performance. In addition, gender equity comes with many benefits: organizations with more female leaders offer employees more generous policies ( Ingram and Simons, 1995 ) producing better business results ( Berdahl, 2007 ). Some men might feel that gender equity “is not their fight”. The answer to this concern is two-fold. First, gender equity is a moral imperative, and the voices and actions of all are needed. Second, gender equity is a man’s fight. Gendered roles impact not only women, but men as well: many still believe that “child caregiving/domestic work is not a male job”, and that “a man needs to be the family breadwinner”, a belief that can have a strong impact on mental health ( King et al., 2020 ). This position reflects a “fixed mindset” about gender roles, which leads men to rationalize the status quo, i.e. engage in system justification about gender inequality ( Kray et al., 2017 ). A more fundamental antidote in combating gender bias is to promote growth mindsets (e.g. “things can change, there is no reason why men and women can’t occupy the same social roles”; Dweck 2016 ).

Notably, the concern and interest around the topic of women's underrepresentation in STEM has not been matched by a similar concern about men's underrepresentation in healthcare, early education and domestic roles ( Block et al., 2019 ; Croft et al., 2015 ; Meeussen et al., 2020 ). However, gender experts are now pointing at men and men’s representation as a key component to advance women’s place in society ( Block et al., 2019 ; Croft et al., 2021 ). Gender equity will benefit men by freeing them from societal biases. In turn, a change in the aspirations and careers of men will likely benefit overall gender equality: men who take on non-traditional roles can enable women and girls to envision themselves in less traditional, complementary roles ( Block et al., 2018 ; Croft et al., 2014 ). When more men turn to roles in health care, education, and domestic work, there will be more STEM roles that can be occupied by women. To quote one of our reviewers: “As long as there is stagnation in men's roles, there will be an upper limit on the amount of change that can be achieved for women's roles as well”.

Importantly, as soon as the fight for gender equity becomes a universal cause, the overload of academic work weighing on women should be alleviated. The approach of several institutes or funding agencies for improving equity is to task women with taking part in administrative obligations during hiring processes, panels in conferences etc. However, being fewer in number, the same women find themselves having to manage substantial extra work. Besides these administrative burdens, they are also often asked to participate in initiatives aimed for promoting diversity. This work additionally affects women disproportionately, and even more so women of color ( Nair, 2014 ). It may seem natural that individuals facing discrimination would have the strongest interest and possibly knowledge on how to resolve it. However, leaving the work that promotes diversity to those directly affected by the lack of diversity/inclusivity can contribute to further injustices. This work thus needs to be shared with advocates from the non-minority category.

When implementing some of the proposed solutions, it is important to consider complexities that might emerge from “positive discrimination”, where the “best” candidate might be overlooked in favor of a candidate who meets another requirement (e.g., ethnicity, first-generation status, religion, socioeconomic status, gender expression, gender identity, sexual orientation, and disability; STEM Women, 2019). Not dismantling structural conditions of inequality, means that existing disadvantages triumph. Institutions should carefully consider these complexities and include affected minorities in policy development to ensure optimal solutions.

Challenges and major open questions in addressing gender bias

Improving gender equity in science represents challenges at several levels. First of all, despite an abundance of research, there is a lack of systematic and validated metrics to assess gender bias and evaluate the efficacy of various initiatives in improving gender equity. Without standardized data collection and metrics to objectively measure gender bias, it is often impossible to draw solid conclusions on the degree of its presence and/or origin. However, for appropriate measures to be deployed the source of bias needs to be properly established so that proposed actions can differentially target the real cause. One reason for this is that despite its far-fetched consequences, evaluating the existence of bias can be very subtle and challenging. Measuring presence and then reductions in implicit bias in a controlled setting does not necessarily translate to changes in real life situations (Forscher et al., 2019). It is crucial that advocacy goals do not bias the presentation of scientific evidence for and against different interventions and policy changes ( Eagly, 2016 , 2018 ). Moreover, there is a lack of systematic gathering and reporting of gender data from various organizations such as universities, conferences, funding agencies, or award and hiring committees. Moving forward, we encourage institutions to gather and report data about gender representation in their membership and to collaborate with social scientists who can provide valuable expertise. Importantly, we encourage all scientific bodies to increase transparency about the successes and failures of interventions that they have used in the past to address bias.

Notably, for many of the issues raised in this article, no straightforward solutions exist. Despite an increasing number of actions taken to mitigate gender bias in the workplace over the past decades, a thorough assessment and evaluation of their impact on diversity are often lacking as their short- and long-term impacts are hard to quantify in the real world ( Paluck and Green, 2009 ; Paluck et al., 2021 ). Long term impacts are vital to quantify especially as some evidence suggests that gender bias persists even after gender representation becomes balanced, paradoxically perpetuated by members who believe that gender bias has been overcome ( Begeny et al., 2020 ). Not all of the potential solutions presented here are destined to work, but several of them are certainly worth consideration (see Table 2 for an overview of tested vs. proposed actions).

For instance, diversity training is oftentimes recommended as one potential tool to mitigate gender bias. The admirable goal is to raise awareness on implicit and explicit biases that every human being carries. Although it is an intuitive way to tackle bias, the efficiency of diversity training is currently debated. Some studies, especially in the corporate sector, have reported modest to no effect of trainings with potential adverse effects for certain minority groups ( Dobbin and Kalev, 2013 ; Dobbin et al., 2011 ; Kalev et al., 2006 ), while other studies have shown encouraging results (in corporate sectors: Anand and Winters, 2008 and in academia: Carnes et al., 2015 ). Multiple factors influence the effectiveness of diversity training ( Roberson et al., 2013 ). Among them, the design of the training itself; such as the format, the length, and most importantly the way men are depicted (as allies and not oppressors); and the way to assign training (i.e. voluntarily, in person) may positively influence the outcome of these initiatives ( Bezrukova et al., 2016 ; Kalev and Dobbin, 2020 ). Genuine motivation, support and commitment from superiors, social accountability, and transparency play important roles ( Chang et al., 2019 ; Dobbin and Kalev, 2020 ). Lastly, as diversity training is not effective to change behavior in isolation ( Kalev et al., 2006 ), other actions and concrete changes at the institutional and societal levels are needed ( Dobbin and Kalev, 2020 ; Paluck et al., 2021 ).

Combining several actions is required for successful outcomes. For instance, increasing the representation of women across scientific bodies (i.e. hiring committees, review panels, in mentorship) and career stages can be helpful in reducing bias, but on its own it is not enough. Extensive research in hundreds of thousands of participants and across multiple countries has shown that increasing the enrollment of women students in higher education can reduce gender stereotypes. However, increasing the employment of women as researchers reduces only explicit, but not implicit stereotypes ( Miller et al., 2015 ). The perseverance of explicit gender stereotypes is stronger in disciplines that are male-dominated, but implicit stereotypes remain even in disciplines where women are well represented ( Smyth and Nosek, 2015 ). Gender stereotypes are also prevalent in women, who can be biased against women. It is important to highlight that increasing the representation of women is a necessary but not sufficient condition for addressing gender bias.

A second major challenge in improving gender equity is that not all scientific fields have the same gender imbalances across career stages. Several fields like psychology typically achieve a more balanced gender ratio than other men-dominated fields such as engineering. Future attempts should implement initiatives that cater to the needs of each sub-field and should also test the generalizability of initiatives across fields.

Last, one major open question is that of governance. To date there is a lack of governance models for monitoring gender bias, and for deciding whether a given solution is sufficient, or well implemented. Importantly, the decision about whether a solution is successful often relies on arbitrary metrics and does not take into account the experiences of women who are targets of bias. We invite scholars to develop better governance models and oversight committees for monitoring gender bias in an inclusive and objective way.

Conclusions

Gender bias is a complex assortment of problems, encompassing all career stages. Concrete actions are required to address each of the facets of gender bias, and need to be initiated by every academic entity, from individuals to departments to conferences and professional organizations. These actions, in combination with strong role models and a diverse pool of allies, will make it possible to shift the culture and bring positive change. The time for action is now.

Acknowledgements

We thank Susan Fiske, Vinitha Rangarajan, Kristina R. Olson, and Sapna Cheryan for their help in editing and improving the manuscript and Luisa Reis Castro for valuable discussions. A.T. is supported by the Interfaculty Research Cooperation “Decoding Sleep: From Neurons to Health and Mind” of the University of Bern, and the Swiss National Science Foundation (#320030_188737 and P300PA_174451). E.A.B is supported by NIH grant OD-010425, A.L.F. by the Simons Collaboration for the Global Brain, S.K. by NIH (RO1MH64043, RO1EY017699, 21560-685 Silvio O. Conte Center), the James S. McDonnell Foundation and the Overdeck Family Foundation. C.K. is supported by the Jacobs foundation; N.J.K. by NIH/NIGMS P01-GM118629 and NIH/NIMH P50-MH109429; J.J.L. by U19NS107609-01. A.C.N is funded by a Wellcome Trust Senior Investigator Award (ACN) 104571/Z/14/Z; James S. McDonnell Foundation Understanding Human Cognition Collaborative Award 220020448; and by the NIHR Oxford Health Biomedical Research Centre. The Wellcome Centre for Integrative Neuroimaging is supported by core funding from the Wellcome Trust (203139/Z/16/Z). S.K.R. is supported by NIH/NIDCD 1R21DC016985. A.K.S. is supported by a research grant from the Research Council of Norway (RCN; project number 240389) and through the RCNs Centres of Excellence scheme (project number 262762 RITMO). J.D.W. is supported by NIMH R01-MH121448 and NIMH R01-MH117763, W.-J.W. by the US Office of Naval Research (ONR) grant N00014-17-1-2041, US National Institutes of Health (NIH) grant 062349, and the Simons Collaboration on the Global Brain program grant 543057SPI. M.V.I. and N.F.D. contributions are supported by NIH/NIDCD R01-DC016345; DSB is supported by NSF CAREER PHY-1554488. R.T.K. is supported by NINDS NS21135, NIMH CONTE Center PO-MH109429, Brain Initiative U19 NS1076, and Brain Initiative U01 NS108916

References * :

* The gender proportions in the references list of this manuscript have been checked with cleanBib ( https://github.com/dalejn/cleanBib ) to evaluate gender ratio. Our reference list contains 42.4% woman(first)/woman(last), 12.2% man/woman, 22% woman/man, 18.5% man/man, and 4.88% unknown categorization. The remaining percentage is unknown. It is important to note the limits of classifying gender identity using names, pronouns, other signifiers scraped from online databases, and that this methodology cannot account for intersex, non-binary, or transgender people.

Abdelaal G (2020). Coping with imposter syndrome in academia and research. Biochem. 42, 62–64. [ Google Scholar ]
Alon TM, Doepke M, Olmstead-Rumsey J, and Tertilt M (2020). The Impact of COVID-19 on Gender Equality. Working Paper. [ Google Scholar ]
AmiriRad MB (2016). Experiences of Single - Mother Doctoral Students as They Navigate Between the Educational System, Societal Expectations, and Parenting Their Children: A Phenomenological Approach (Lulu Press, Inc; ). [ Google Scholar ]
Anand R, and Winters M-F (2008). A Retrospective View of Corporate Diversity Training From 1964 to the Present. AMLE 7, 356–372. [ Google Scholar ]
Antecol H, Bedard K, and Stearns J (2016). Equal but Inequitable: Who Benefits from Gender-Neutral Tenure Clock Stopping Policies? In IZA Discussion Papers (No. 9904; IZA Discussion Papers). Institute of Labor Economics (IZA,. [ Google Scholar ]
Armstrong MA, and Jovanovic J (2015). Starting at The Crossroads: Intersectional approaches to institutionally supporting underrepresented minority women stem faculty. J. Women Minor. Sci. Eng 21, 141–157. [ Google Scholar ]
Armstrong MA, and Jovanovic J (2017). The intersectional matrix: Rethinking institutional change for URM women in STEM. J. Divers. High. Educ 10, 216–231. [ Google Scholar ]
Atwater LE, Tringale AM, Sturm RE, Taylor SN, and Braddy PW (2019). Looking Ahead: How What We Know About Sexual Harassment Now Informs Us of the Future. Organ. Dyn 48, 100677. [ Google Scholar ]
Aycock LM, Hazari Z, Brewe E, Clancy KBH, Hodapp T, and Goertzen RM (2019). Sexual harassment reported by undergraduate female physicists. Physical Review Physics Education Research 15, 010121. [ Google Scholar ]
Babcock L, Gelfand M, Small D, and Stayn H (2006). Gender Differences in the Propensity to Initiate Negotiations. In Social Psychology and Economics , (pp, Murnighan J, ed. (Lawrence Erlbaum Associates Publishers, xii: Mahwah, NJ, US: ), pp. 239–259. [ Google Scholar ]
Babcock L, Recalde MP, Vesterlund L, and Weingart L (2017). Gender Differences in Accepting and Receiving Requests for Tasks with Low Promotability. Am. Econ. Rev 107, 714–747. [ Google Scholar ]
Balafoutas L, and Sutter M (2012). Affirmative action policies promote women and do not harm efficiency in the laboratory. Science 335, 579–582. [ DOI ] [ PubMed ] [ Google Scholar ]
Barroga E (2020). Innovative Strategies for Peer Review. J. Korean Med. Sci 35, e138. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Baxter J, and Alexander M (2008). Mothers’ work–to–family strain in single and couple parent families: The role of job characteristics and supports. Aust. J. Soc. Issues 43, 195–214. [ Google Scholar ]
Beede DN, Julian TA, Langdon D, McKittrick G, Khan B, and Doms ME (2011). Women in STEM: A gender gap to innovation (Economics and Statistics Administration Issue Brief; ). [ Google Scholar ]
Begeny CT, Ryan MK, Moss-Racusin CA, and Ravetz G (2020). In some professions, women have become well represented, yet gender bias persists-Perpetuated by those who think it is not happening. Sci Adv 6, eaba7814. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Bendels MHK, Müller R, Brueggmann D, and Groneberg DA (2018). Gender disparities in high-quality research revealed by Nature Index journals. PLoS One 13, e0189136. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Berdahl JL (2007). The sexual harassment of uppity women. J. Appl. Psychol 92, 425–437. [ DOI ] [ PubMed ] [ Google Scholar ]
Bergman S, Rustad ML, and a Nordic reference group . (2013). The Nordic region - a step closer to gender balance in research? : Joint Nordic strategies and measures to promote gender balance among researchers in academia. In Nordic Council of Ministers, TemaNord 2013:544. [ Google Scholar ]
Bernard C (2018). Editorial: Gender Bias in Publishing: Double-Blind Reviewing as a Solution? eNeuro 5. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Bezrukova K, Spell CS, Perry JL, and Jehn KA (2016). A meta-analytical integration of over 40 years of research on diversity training evaluation. Psychol. Bull 142, 1227–1274. [ DOI ] [ PubMed ] [ Google Scholar ]
Biggs J, Hawley PH, and Biernat M (2018). The Academic Conference as a Chilly Climate for Women: Effects of Gender Representation on Experiences of Sexism, Coping Responses, and Career Intentions. Sex Roles 78, 394–408. [ Google Scholar ]
Bigler RS, and Leaper C (2015). Gendered language: psychological principles, evolving practices, and inclusive policies. Policy Insights from the Behavioral and Brain Sciences 2, 187–194. [ Google Scholar ]
Blair IV, Ma JE, and Lenton AP (2001). Imagining stereotypes away: the moderation of implicit stereotypes through mental imagery. J. Pers. Soc. Psychol 81, 828–841. [ DOI ] [ PubMed ] [ Google Scholar ]
Block K, Croft A, and Schmader T (2018). Worth Less?: Why Men (and Women) Devalue Care-Oriented Careers. Front. Psychol 9, 1353. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Block K, Croft A, De Souza L, and Schmader T (2019). Do people care if men don’t care about caring? The asymmetry in support for changing gender roles. J. Exp. Soc. Psychol 83, 112–131. [ Google Scholar ]
Boring A (2017). Gender biases in student evaluations of teaching. J. Public Econ 145, 27–41. [ Google Scholar ]
Bowles H, Babcock L, and Lai L (2007). Social Incentives for Gender Differences in the Propensity to Initiate Negotiations: Sometimes It Does Hurt to Ask. Organ. Behav. Hum. Decis. Process 103, 84–103. [ Google Scholar ]
Bowles HR, Babcock L, and McGinn KL (2005). Constraints and triggers: situational mechanics of gender in negotiation. J. Pers. Soc. Psychol 89, 951–965. [ DOI ] [ PubMed ] [ Google Scholar ]
Bravo G, Grimaldo F, López-Iñesta E, Mehmani B, and Squazzoni F (2019). The effect of publishing peer review reports on referee behavior in five scholarly journals. Nat. Commun 10, 322. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Brescoll VL (2016). Leading with their hearts? How gender stereotypes of emotion lead to biased evaluations of female leaders. Leadersh. Q 27, 415–428. [ Google Scholar ]
Brescoll VL, and Uhlmann EL (2008). Can an Angry Woman Get Ahead?: Status Conferral, Gender, and Expression of Emotion in the Workplace. Psychol. Sci [ DOI ] [ PubMed ] [ Google Scholar ]
van den Brink M (2010). Behind the Scenes of Science: Gender Practices in the Recruitment and Selection of Professors in the Netherlands (Amsterdam University Press; ). [ Google Scholar ]
van den Brink M, Benschop Y, and Jansen W (2010). Transparency in academic recruitment: A problematic tool for gender equality? Organ. Stud 31, 1459–1483. [ Google Scholar ]
Buchanan NT, Settles IH, Hall AT, and O’Connor RC (2014). A review of organizational strategies for reducing sexual harassment: Insights from the U. s. military. J. Soc. Issues 70, 687–702. [ Google Scholar ]
Budden AE, Tregenza T, Aarssen LW, Koricheva J, Leimu R, and Lortie CJ (2008). Double-blind review favours increased representation of female authors. Trends Ecol. Evol 23, 4–6. [ DOI ] [ PubMed ] [ Google Scholar ]
Burns KEA, Straus SE, Liu K, Rizvi L, and Guyatt G (2019). Gender differences in grant and personnel award funding rates at the Canadian Institutes of Health Research based on research content area: A retrospective analysis. PLoS Med. 16, e1002935. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Calisi RM, and a Working Group of Mothers in Science (2018). Opinion: How to tackle the childcare-conference conundrum. Proc. Natl. Acad. Sci. U. S. A 115, 2845–2849. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Cameron EZ, White AM, and Gray ME (2016). Solving the Productivity and Impact Puzzle: Do Men Outperform Women, or are Metrics Biased? Bioscience 66, 245–252. [ Google Scholar ]
Caplar N, Tacchella S, and Birrer S (2017). Quantitative evaluation of gender bias in astronomical publications from citation counts. Nature Astronomy 1, 1–5. [ Google Scholar ]
Cardel MI, Dhurandhar E, Yarar-Fisher C, Foster M, Hidalgo B, McClure LA, Pagoto S, Brown N, Pekmezi D, Sharafeldin N, et al. (2020). Turning Chutes into Ladders for Women Faculty: A Review and Roadmap for Equity in Academia. Journal of Women’s Health. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Cares AC, Banyard VL, Moynihan MM, Williams LM, Potter SJ, and Stapleton JG (2015). Changing attitudes about being a bystander to violence: translating an in-person sexual violence prevention program to a new campus. Violence Against Women 21, 165–187. [ DOI ] [ PubMed ] [ Google Scholar ]
Carnes M, Devine PG, Baier Manwell L, Byars-Winston A, Fine E, Ford CE, Forscher P, Isaac C, Kaatz A, Magua W, et al. (2015). The effect of an intervention to break the gender bias habit for faculty at one institution: a cluster randomized, controlled trial. Acad. Med 90, 221–230. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Carter AJ, Croft A, Lukas D, and Sandstrom GM (2018). Women’s visibility in academic seminars: Women ask fewer questions than men. PLoS One 13, 0202743. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Cech EA, and Waidzunas TJ (2021). Systemic inequalities for LGBTQ professionals in STEM. Sci Adv 7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Ceci SJ, Ginther DK, Kahn S, and Williams WM (2014). Women in Academic Science: A Changing Landscape. Psychol. Sci. Public Interest 15, 75–141. [ DOI ] [ PubMed ] [ Google Scholar ]
Chang EH, Milkman KL, Gromet DM, Rebele RW, Massey C, Duckworth AL, and Grant AM (2019). The mixed effects of online diversity training. Proc. Natl. Acad. Sci. U. S. A 116, 7778–7783. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Chapman C, Bicca-Marques JC, Calvignac-Spencer S, Fan P-F, Fashing P, Gogarten J, Guo S, Hemingway C, Leendertz F, Li B, et al. (2019). Games academics play and their consequences: How authorship, h -index and journal impact factors are shaping the future of academia. In Proceedings of the Royal Society B: Biological Sciences,. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Charlesworth TES, and Banaji MR (2019). Gender in Science, Technology, Engineering, and Mathematics: Issues, Causes, Solutions. J. Neurosci 39, 7228–7243. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Cheryan S, Master A, and Meltzoff AN (2015). Cultural stereotypes as gatekeepers: Increasing girls’ interest in computer science and engineering by diversifying stereotypes. Front. Psychol 6. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Cheryan S, Ziegler SA, Montoya AK, and Jiang L (2017). Why are some STEM fields more gender balanced than others? Psychol. Bull 143, 1–35. [ DOI ] [ PubMed ] [ Google Scholar ]
Chopra D, and Zambelli E (2017). No Time to Rest: Women’s Lived Experiences of Balancing Paid Work and Unpaid Care Work. Institute of Development Studies. [ Google Scholar ]
Choudhury S, and Aggarwal NK (2020). Reporting Grantee Demographics for Diversity, Equity, and Inclusion in Neuroscience. J. Neurosci 40, 7780–7781. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Clancy KBH, Lee KMN, Rodgers EM, and Richey C (2017). Double jeopardy in astronomy and planetary science: Women of color face greater risks of gendered and racial harassment. Journal of Geophysical Research: Planets 122, 1610–1623. [ Google Scholar ]
Colgan J (2017). Gender Bias in International Relations Graduate Education? New Evidence from Syllabi. PS Polit. Sci. Polit 50, 456–460. [ Google Scholar ]
Cox AR, and Montgomerie R (2019). The cases for and against double-blind reviews. PeerJ 7, e6702. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Croft A, Schmader T, Block K, and Baron AS (2014). The Second Shift Reflected in the Second Generation: Do Parents’ Gender Roles at Home Predict Children’s Aspirations? Psychol. Sci 25, 1418–1428. [ DOI ] [ PubMed ] [ Google Scholar ]
Croft A, Schmader T, and Block K (2015). An underexamined inequality: cultural and psychological barriers to men’s engagement with communal roles. Pers. Soc. Psychol. Rev 19, 343–370. [ DOI ] [ PubMed ] [ Google Scholar ]
Croft A, Atkinson C, and May AM (2021). Promoting Gender Equality by Supporting Men’s Emotional Flexibility. Policy Insights from the Behavioral and Brain Sciences 8, 42–49. [ Google Scholar ]
De Paola M, and Scoppa V (2015). Gender discrimination and evaluators’ gender: Evidence from Italian academia. Economica 82, 162–188. [ Google Scholar ]
Devine PG, Forscher PS, Cox WTL, Kaatz A, Sheridan J, and Carnes M (2017). A Gender Bias Habit-Breaking Intervention Led to Increased Hiring of Female Faculty in STEMM Departments. J. Exp. Soc. Psychol 73, 211–215. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Dobbin F, and Kalev A (2013). The Origins and Effects of Corporate Diversity Programs. Oxford Handbooks Online. [ Google Scholar ]
Dobbin F, and Kalev A (2019). The promise and peril of sexual harassment programs. Proc. Natl. Acad. Sci. U. S. A 116, 12255–12260. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Dobbin F, and Kalev A (2020). Why Sexual Harassment Programs Backfire And what to do about it. Harv. Bus. Rev 98, 45–52. [ Google Scholar ]
Dobbin F, Kim S, and Kalev A (2011). You Can’t Always Get What You Need: Organizational Determinants of Diversity Programs. Am. Sociol. Rev 76, 386–411. [ Google Scholar ]
Drydakis N, Sidiropoulou K, Bozani V, Selmanovic S, and Patnaik S (2018). Masculine vs feminine personality traits and women’s employment outcomes in Britain: A field experiment. Int. J. Manpow 39, 621–630. [ Google Scholar ]
Duch J, Zeng XHT, Sales-Pardo M, Radicchi F, Otis S, Woodruff TK, and Nunes Amaral LA (2012). The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact. PLoS One 7, e51332. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Dutt K, Pfaff DL, Bernstein AF, Dillard JS, and Block CJ (2016). Gender differences in recommendation letters for postdoctoral fellowships in geoscience. Nat. Geosci 9, 805–808. [ Google Scholar ]
Dweck C (2016). What Having a “Growth Mindset” Actually Means. Harvard Business Review, 13, 213–226 [ Google Scholar ]
Dworkin J, Zurn P, and Bassett DS (2020a). (In)citing Action to Realize an Equitable Future. Neuron 106, 890–894. [ DOI ] [ PubMed ] [ Google Scholar ]
Dworkin JD, Linn KA, Teich EG, Zurn P, Shinohara RT, and Bassett DS (2020b). The extent and drivers of gender imbalance in neuroscience reference lists. Nat. Neurosci 23, 918–926. [ DOI ] [ PubMed ] [ Google Scholar ]
Eagly AH (2016). When passionate advocates meet research on diversity, does the honest broker stand a chance? J. Soc. Issues 72, 199–222. [ Google Scholar ]
Eagly AH (2018). The shaping of science by ideology: How feminism inspired, led, and constrained scientific understanding of sex and gender. J. Soc. Issues 74, 871–888. [ Google Scholar ]
Eagly AH, and Chaiken S (1998). Attitude structure and function. In The Handbook of Social Psychology, Vols, Gilbert DT, ed. (New York, NY, US: McGraw-Hill, x), pp. 1–2. [ Google Scholar ]
Eagly AH, and Karau SJ (2002). Role congruity theory of prejudice toward female leaders. Psychol. Rev 109, 573–598. [ DOI ] [ PubMed ] [ Google Scholar ]
Eckerson E, Talbourdet L, Reichlin L, Sykes M, Noll E, and Gault B (2016). Child Care for Parents in College: A State-by-State Assessment. Institute for Women's Policy Research. [ Google Scholar ]
Ellemers N (2018). Gender Stereotypes. Annu. Rev. Psychol 69, 275–298. [ DOI ] [ PubMed ] [ Google Scholar ]
Ellis J, Fosdick BK, and Rasmussen C (2016). Women 1.5 Times More Likely to Leave STEM Pipeline after Calculus Compared to Men: Lack of Mathematical Confidence a Potential Culprit. PLoS One 11, e0157447. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Else-Quest NM, Hyde JS, and Linn MC (2010). Cross-national patterns of gender differences in mathematics: a meta-analysis. Psychol. Bull 136, 103–127. [ DOI ] [ PubMed ] [ Google Scholar ]
Fairhall AL, and Marder E (2020). Acknowledging female voices. Nat. Neurosci 23, 904–905. [ DOI ] [ PubMed ] [ Google Scholar ]
Fan Y, Shepherd LJ, Slavich E, Waters D, Stone M, Abel R, and Johnston EL (2019). Gender and cultural bias in student evaluations: Why representation matters. PLoS One 14, e0209749. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Favaro B, Oester S, Cigliano JA, Cornick LA, Hind EJ, Parsons ECM, and Woodbury TJ (2016). Your Science Conference Should Have a Code of Conduct. Frontiers in Marine Science 3. [ Google Scholar ]
Fernandes JD, Sarabipour S, Smith CT, Niemi NM, Jadavji NM, Kozik AJ, Holehouse AS, Pejaver V, Symmons O, Bisson Filho AW, et al. (2020). A survey-based analysis of the academic job market. Elife 9, 54097. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Fine E, Sheridan J, Carnes M, Handelsman J, Pribbenow C, Savoy J, and Wendt A (2014). Minimizing the influence of gender bias on the faculty search process. In Gender Transformation in the Academy, (Emerald Group Publishing Limited; ), pp. 267–289. [ Google Scholar ]
Fiset J, and Saffie-Robertson MC (2020). The impact of gender and perceived academic supervisory support on new faculty negotiation success. High. Educ. Q 74, 240–256. [ Google Scholar ]
Fiske ST (1998). Stereotyping, prejudice, and discrimination. In The Handbook of Social Psychology, Vols, Gilbert DT, ed. (New York, NY, US: McGraw-Hill, x), pp. 1–2. [ Google Scholar ]
Flores GM (2011). Latino/as in the hard sciences: Increasing Latina/o participation in science, technology, engineering and math (STEM) related fields. Latino Studies 9, 327–335. [ Google Scholar ]
Greenwald AG, and Banaji MR (1995). Implicit social cognition: attitudes, self-esteem, and stereotypes. Psychol. Rev 102, 4–27. [ DOI ] [ PubMed ] [ Google Scholar ]
Greguletz E, Diehl M-R, and Kreutzer K (2019). Why women build less effective networks than men: The role of structural exclusion and personal hesitation. Hum. Relat 72, 1234–1261. [ Google Scholar ]
Greider CW, Sheltzer JM, Cantalupo NC, Copeland WB, Dasgupta N, Hopkins N, Jansen JM, Joshua-Tor L, McDowell GS, Metcalf JL, et al. (2019). Increasing gender diversity in the STEM research workforce. Science 366, 692–695. [ DOI ] [ PubMed ] [ Google Scholar ]
Gruber J, Mendle J, Lindquist KA, Schmader T, Clark LA, Bliss-Moreau E, Akinola M, Atlas L, Barch DM, Barrett LF, et al. (2020). The Future of Women in Psychological Science. Perspect. Psychol. Sci 1745691620952789, 1745691620952789. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Guarino CM, and Borden VMH (2017). Faculty Service Loads and Gender: Are Women Taking Care of the Academic Family? Res. High. Educ 58, 672–694. [ Google Scholar ]
Gunderson EA, Ramirez G, Levine SC, and Beilock SL (2012). The role of parents and teachers in the development of gender-related math attitudes. Sex Roles 66, 153–166. [ Google Scholar ]
Gupta N, Kemelgor C, Fuchs S, and Etzkowitz H (2005). Triple burden on women in science: A cross-cultural analysis. Curr. Sci 89, 1382–1386. [ Google Scholar ]
Hanson SL, Sykes M, and Pena LB (2017). Gender Equity in Science: The Global Context. International Journal of Social Science Studies 6, 33–47. [ Google Scholar ]
Helmer M, Schottdorf M, Neef A, and Battaglia D (2017). Gender bias in scholarly peer review. Elife 6, 21718. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Hofstra B, Kulkarni VV, Munoz-Najar Galvez S, He B, Jurafsky D, and McFarland DA (2020). The Diversity-Innovation Paradox in Science. Proc. Natl. Acad. Sci. U. S. A 117, 9284–9291. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Holman L, Stuart-Fox D, and Hauser CE (2018). The gender gap in science: How long until women are equally represented? PLoS Biol. 16, e2004956. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Hong L, and Page SE (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences 101, 16385–16389. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Hope J, Lemanski C, Bastia T, Moeller NI, and Williams G (2019). Childcare and Academia - an intervention. International Development Planning Review. [ Google Scholar ]
Huang J, Gates AJ, Sinatra R, and Barabási A-L (2020). Historical comparison of gender inequality in scientific careers across countries and disciplines. Proc. Natl. Acad. Sci. U. S. A 117, 4609–4616. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Hunt J, Garant J-P, Herman H, and Munroe DJ (2013). Why are women underrepresented amongst patentees? Res. Policy 42, 831–843. [ Google Scholar ]
Ingram P, and Simons T (1995). Institutional and Resource Dependence Determinants of Responsiveness to Work-Family Issues. Acad. Manage. J 38, 1466–1482. [ Google Scholar ]
James A, Chisnall R, and Plank MJ (2019). Gender and societies: a grassroots approach to women in science. R Soc Open Sci 6, 190633. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Jensen K, Kovács B, and Sorenson O (2018). Gender differences in obtaining and maintaining patent rights. Nat. Biotechnol 36, 307–309. [ DOI ] [ PubMed ] [ Google Scholar ]
Johnson BW, and Smith DG (2018). How Men Can Become Better Allies to Women. Harv. Bus. Rev [ Google Scholar ]
Jones K, and Wilcher B (2019). Reducing Maternal Labor Market Detachment: A Role for Paid Family Leave (American University, Department of Economics; ). [ Google Scholar ]
Kalev A, and Dobbin F (2020). Does Diversity Training Increase Corporate Diversity? Regulation Backlash and Regulatory Accountability. [ Google Scholar ]
Kalev A, Dobbin F, and Kelly E (2006). Best Practices or Best Guesses? Assessing the Efficacy of Corporate Affirmative Action and Diversity Policies. Am. Sociol. Rev 71, 589–617. [ Google Scholar ]
Katz J, and Moore J (2013). Bystander education training for campus sexual assault prevention: an initial meta-analysis. Violence Vict. 28, 1054–1067. [ DOI ] [ PubMed ] [ Google Scholar ]
Kelly CD, and Jennions MD (2006). The h index and career assessment by numbers. Trends Ecol. Evol 21, 167–170. [ DOI ] [ PubMed ] [ Google Scholar ]
Kersey AJ, Csumitta KD, and Cantlon JF (2019). Gender similarities in the brain during mathematics development. NPJ Sci Learn 4, 19. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Khazan E, Borden J, Johnson S, and Greenhaw L (2019). Examining gender bias in student evaluation of teaching for graduate teaching assistants. NACTA Journal 2020. [ Google Scholar ]
King EB, Huffman AH, and Peddie CI (2013). LGBT Parents and the Workplace. In LGBT-Parent Families: Innovations in Research and Implications for Practice, Goldberg AE, and Allen KR, eds. (Springer; ), pp. 225–237. [ Google Scholar ]
King MM, Bergstrom CT, Correll SJ, Jacquet J, and West JD (2017). Men set their own cites high: Gender and self-citation across fields and over time. Socius 3, 237802311773890. [ Google Scholar ]
King TL, Shields M, Byars S, Kavanagh AM, Craig L, and Milner A (2020). Breadwinners and Losers: Does the Mental Health of Mothers, Fathers, and Children Vary by Household Employment Arrangements? Evidence From 7 Waves of Data From the Longitudinal Study of Australian Children. Am. J. Epidemiol 189, 1512–1520. [ DOI ] [ PubMed ] [ Google Scholar ]
Knobloch-Westerwick S, Glynn CJ, and Huge M (2013). The Matilda Effect in Science Communication: An Experiment on Gender Bias in Publication Quality Perceptions and Collaboration Interest. Sci. Commun 35, 603–625. [ Google Scholar ]
Krawczyk M, and Smyk M (2016). Author’s gender affects rating of academic articles: Evidence from an incentivized, deception-free laboratory experiment. Eur. Econ. Rev 90, 326–335. [ Google Scholar ]
Kray LJ, and Kennedy JA (2017). Changing the Narrative: Women as Negotiators—and Leaders. Calif. Manage. Rev 60, 70–87. [ Google Scholar ]
Kray LJ, Thompson L, and Galinsky A (2001). Battle of the sexes: gender stereotype confirmation and reactance in negotiations. J. Pers. Soc. Psychol 80, 942–958. [ PubMed ] [ Google Scholar ]
Kray LJ, Kennedy JA, and Van Zant AB (2014). Not competent enough to know the difference? Gender stereotypes about women’s ease of being misled predict negotiator deception. Organ. Behav. Hum. Decis. Process 125, 61–72. [ Google Scholar ]
Kray LJ, Howland L, Russell AG, and Jackman LM (2017). The effects of implicit gender role theories on gender system justification: Fixed beliefs strengthen masculinity to preserve the status quo. J. Pers. Soc. Psychol 112, 98–115. [ DOI ] [ PubMed ] [ Google Scholar ]
Langin K (2018). Are conferences providing enough child care support? We decided to find out. Science ∣ AAAS. [ Google Scholar ]
Lee CJ, Sugimoto CR, Zhang G, and Cronin B (2013). Bias in peer review. J. Am. Soc. Inf. Sci. Technol 64, 2–17. [ Google Scholar ]
Lerchenmueller MJ, and Sorenson O (2018). The gender gap in early career transitions in the life sciences. Res. Policy 47, 1007–1017. [ Google Scholar ]
Lerchenmueller MJ, Sorenson O, and Jena AB (2019). Gender differences in how scientists present the importance of their research: observational study. BMJ 367, l6573. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Lim S, and Cortina L (2005). Interpersonal Mistreatment in the Workplace: The Interface and Impact of General Incivility and Sexual Harassment. J. Appl. Psychol 90, 483–496. [ DOI ] [ PubMed ] [ Google Scholar ]
Lincoln AE, Pincus S, Koster JB, and Leboy PS (2012). The matilda effect in science: awards and prizes in the US, 1990s and 2000s. Soc. Stud. Sci 42, 307–320. [ DOI ] [ PubMed ] [ Google Scholar ]
Luc JGY, Archer MA, Arora RC, Bender EM, Blitz A, Cooke DT, Hlci TN, Kidane B, Ouzounian M, Varghese TK, Jr, et al. (2021). Does Tweeting Improve Citations? One-Year Results From the TSSMN Prospective Randomized Trial. Ann. Thorac. Surg 111, 296–300. [ DOI ] [ PubMed ] [ Google Scholar ]
Lunnemann P, Jensen MH, and Jauffred L (2019). Gender bias in Nobel prizes. Palgrave Commun. 5, 1–4. [ Google Scholar ]
Ma Y, Oliveira DFM, Woodruff TK, and Uzzi B (2019). Women who win prizes get less money and prestige. Nature 565, 287–288. [ DOI ] [ PubMed ] [ Google Scholar ]
MacNell L, Driscoll A, and Hunt AN (2015). What’s in a Name: Exposing Gender Bias in Student Ratings of Teaching. Innovative Higher Education 40, 291–303. [ Google Scholar ]
Madera JM, Hebl MR, and Martin RC (2009). Gender and letters of recommendation for academia: agentic and communal differences. J. Appl. Psychol 94, 1591–1599. [ DOI ] [ PubMed ] [ Google Scholar ]
Makarova E, Aeschlimann B, and Herzog W (2019). The gender gap in STEM Fields: The impact of the gender stereotype of math and science on secondary students’ career aspirations. Frontiers in Education 4. [ Google Scholar ]
Marts S (2017). Open Secrets and Missing Stairs: Sexual and Gender-Based Harassment at Scientific Meetings. [ Google Scholar ]
Mazei J, Hüffmeier J, Freund PA, Stuhlmacher AF, Bilke L, and Hertel G (2015). A meta-analysis on gender differences in negotiation outcomes and their moderators. Psychol. Bull 141, 85–104. [ DOI ] [ PubMed ] [ Google Scholar ]
McAllister D, Juillerat J, and Hunter J (2016). Funding: What stops women getting more grants? Nature 529, 466. [ DOI ] [ PubMed ] [ Google Scholar ]
McCullough L (2019). Proportions of women in STEM leadership in the academy in the USA. Educ. Sci 10, 1. [ Google Scholar ]
Meeussen L, Van Laar C, and Van Grootel S (2020). How to foster male engagement in traditionally female communal roles and occupations: Insights from research on gender norms and precarious manhood. Soc. Issues Policy Rev 14, 297–328. [ Google Scholar ]
Mengel F, Sauermann J, and Zölitz U (2018). Gender Bias in Teaching Evaluations. J. Eur. Econ. Assoc 17, 535–566. [ Google Scholar ]
Miller DI, Eagly AH, and Linn MC (2015). Women’s representation in science predicts national gender-science stereotypes: Evidence from 66 nations. J. Educ. Psychol 107, 631–644. [ Google Scholar ]
Misra J, Lundquist JH, Holmes E, and Agiomavritis S (2011). The ivory ceiling of service work. Academe 97, 22–26. [ Google Scholar ]
Moher D, Naudet F, Cristea IA, Miedema F, Ioannidis JPA, and Goodman SN (2018). Assessing scientists for hiring, promotion, and tenure. PLoS Biol. 16, e2004089. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Morgan AC, Way SF, Hoefer MJD, Larremore DB, Galesic M, and Clauset A (2021). The unequal impact of parenthood in academia. Sci Adv 7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Morgenroth T, Ryan MK, and Peters K (2015). The Motivational Theory of Role Modeling: How Role Models Influence Role Aspirants’ Goals. Rev. Gen. Psychol 19, 465–483. [ Google Scholar ]
Morrissey T (2017). Child care and parent labor force participation: a review of the research literature. Rev Econ Household. 15, 1–24. [ Google Scholar ]
Moss-Racusin CA, Dovidio JF, Brescoll VL, Graham MJ, and Handelsman J (2012). Science faculty’s subtle gender biases favor male students. Proc. Natl. Acad. Sci. U. S. A 109, 16474–16479. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Mulligan A, Hall L, and Raphael E (2013). Peer review in a changing world: An international study measuring the attitudes of researchers. J. Am. Soc. Inf. Sci. Technol 64, 132–161. [ Google Scholar ]
Murray D, Siler K, Larivière V, Chan WM, Collings AM, Raymond J, and Sugimoto CR (2019). Author-Reviewer Homophily in Peer Review. bioRxiv. 2019;400515 [ Google Scholar ]
Nair S (2014). Women of Color Faculty and the “Burden” of Diversity. International Feminist Journal of Politics 16, 497–500. [ Google Scholar ]
Niederle M (2017). A Gender Agenda: A Progress Report on Competitiveness. Am. Econ. Rev 107, 115–119. [ Google Scholar ]
Niederle M, and Vesterlund L (2011). Gender and Competition. Annu. Rev. Econom 3, 601–630. [ Google Scholar ]
Niederle M, Segal C, and Vesterlund L (2013). How Costly Is Diversity? Affirmative Action in Light of Gender Differences in Competitiveness. Manage. Sci 59, 1–16. [ Google Scholar ]
Nielsen MW (2015). Make academic job advertisements fair to all. Nature 525, 427. [ DOI ] [ PubMed ] [ Google Scholar ]
Nielsen MW (2016). Limits to meritocracy? Gender in academic recruitment and promotion processes. Sci. Public Policy 43, 386–399. [ Google Scholar ]
Nielsen MW, Alegria S, Börjeson L, Etzkowitz H, Falk-Krzesinski HJ, Joshi A, Leahey E, Smith-Doerr L, Woolley AW, and Schiebinger L (2017). Opinion: Gender diversity leads to better science. Proceedings of the National Academy of Sciences 114, 1740–1742. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Nielsen MW, Bloch CW, and Schiebinger L (2018). Making gender diversity work for scientific discovery and innovation. Nature Human Behaviour 2, 726–734. [ DOI ] [ PubMed ] [ Google Scholar ]
Oliveira DFM, Ma Y, Woodruff TK, and Uzzi B (2019). Comparison of National Institutes of Health Grant Amounts to First-Time Male and Female Principal Investigators. JAMA 321, 898–900. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Page S (2008). The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies (Princeton University Press; ). [ Google Scholar ]
Paluck EL, and Green DP (2009). Prejudice Reduction: What Works? A Review and Assessment of Research and Practice. Annual Review of Psychology 60, 339–367. [ DOI ] [ PubMed ] [ Google Scholar ]
Paluck EL, Porat R, Clark CS, and Green DP (2021). Prejudice Reduction: Progress and Challenges. Annu. Rev. Psychol 72, 533–560. [ DOI ] [ PubMed ] [ Google Scholar ]
Parker SK, and Griffin MA (2002). What is so bad about a little name-calling? Negative consequences of gender harassment for overperformance demands and distress. J. Occup. Health Psychol 7, 195–210. [ PubMed ] [ Google Scholar ]
Parsons ECM (2015). So you think you want to run an environmental conservation meeting? Advice on the slings and arrows of outrageous fortune that accompany academic conference planning. Journal of Environmental Studies and Sciences 5, 735–744. [ Google Scholar ]
Peterson DAM, Biederman LA, Andersen D, Ditonto TM, and Roe K (2019). Mitigating gender bias in student evaluations of teaching. PLoS One 14, e0216241. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Pohlhaus JR, Jiang H, Wagner RM, Schaffer WT, and Pinn VW (2011). Sex differences in application, success, and funding rates for NIH extramural programs. Acad. Med 86, 759–767. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Potter SJ, and Moynihan MM (2011). Bringing in the Bystander in-person prevention program to a U.S. military installation: results from a pilot study. Mil. Med 176, 870–875. [ DOI ] [ PubMed ] [ Google Scholar ]
Potter SJ, Flanagan M, Seidman M, Hodges H, and Stapleton JG (2019). Developing and Piloting Videogames to Increase College and University Students’ Awareness and Efficacy of the Bystander Role in Incidents of Sexual Violence. Games Health J 8, 24–34. [ DOI ] [ PubMed ] [ Google Scholar ]
Powell K (2019). Why scientist-mums in the United States need better parental-support policies. Nature 569, 149–151. [ DOI ] [ PubMed ] [ Google Scholar ]
Preston AE (2004). Plugging the Leaks in the Scientific Workforce. Issues Sci. Technol 20, 69–74. [ Google Scholar ]
Régner I, Thinus-Blanc C, Netter A, Schmader T, and Huguet P (2019). Committees with implicit biases promote fewer women when they do not believe gender bias exists. Nat Hum Behav 3, 1171–1179. [ DOI ] [ PubMed ] [ Google Scholar ]
Reinholz DL, and Shah N (2018). Equity analytics: A methodological approach for quantifying participation patterns in mathematics classroom discourse. J. Res. Math. Educ 49, 140–177. [ Google Scholar ]
Richey CR, Clancy KBH, Lee KM, and Rodgers E (2015). The CSWA Survey on Workplace Climate and an Uncomfortable Conversation about Harassment. I. [ Google Scholar ]
Rissler LJ, Hale KL, Joffe NR, and Caruso NM (2020). Gender Differences in Grant Submissions across Science and Engineering Fields at the NSF. Bioscience 70, 814–820. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Rivera LA (2017). When two bodies are (not) a problem: Gender and relationship status discrimination in academic hiring. Am. Sociol. Rev 82, 1111–1138. [ Google Scholar ]
Roberson L, Kulik CT, and Tan RY (2013). Effective Diversity Training. In The Oxford Handbook of Diversity and Work, Roberson QM, ed. (Oxford University Press; ). [ Google Scholar ]
Rodgers P (2017). Decisions, decisions. Elife 6, 32011. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Roehling MV, and Huang J (2018). Sexual harassment training effectiveness: An interdisciplinary review and call for research. J. Organ. Behav 39, 134–150. [ Google Scholar ]
Schiebinger LL, Gilmartin SK, and Henderson AD (2008). Dual-career academic couples: What universities need to know (Michelle R. Clayman Institute for Gender Research, Stanford University; ). [ Google Scholar ]
Schilt K, and Wiswall M (2008). Before and After: Gender Transitions, Human Capital, and Workplace Experiences. B. E. J. Econom. Anal. Policy 8. [ Google Scholar ]
Schmader T, Whitehead J, and Wysocki VH (2007). A Linguistic Comparison of Letters of Recommendation for Male and Female Chemistry and Biochemistry Job Applicants. Sex Roles 57, 509–514. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Schneider KT, Swan S, and Fitzgerald LF (1997). Job-related and psychological effects of sexual harassment in the workplace: Empirical evidence from two organizations. J. Appl. Psychol 82, 401–415. [ DOI ] [ PubMed ] [ Google Scholar ]
Schroeder J, Dugdale HL, Radersma R, Hinsch M, Buehler DM, Saul J, Porter L, Liker A, De Cauwer I, Johnson PJ, et al. (2013). Fewer invited talks by women in evolutionary biology symposia. J. Evol. Biol 26, 2063–2069. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Schrouff J, Pischedda D, Genon S, Fryns G, Pinho AL, Vassena E, Liuzzi AG, and Ferreira FS (2019). Gender bias in (neuro)science: Facts, consequences, and solutions. Eur. J. Neurosci 50, 3094–3100. [ DOI ] [ PubMed ] [ Google Scholar ]
Shapiro JR, and Williams AM (2012). The role of stereotype threats in undermining girls’ and women’s performance and interest in STEM fields. Sex Roles: A Journal of Research 66, 175–183. [ Google Scholar ]
Sheltzer JM, and Smith JC (2014). Elite male faculty in the life sciences employ fewer women. Proc. Natl. Acad. Sci. U. S. A 111, 10107–10112. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Small D, Gelfand M, Babcock L, and Gettman H (2007). Who Goes to the Bargaining Table? The Influence of Gender and Framing on the Initiation of Negotiation. J. Pers. Soc. Psychol 93, 600–613. [ DOI ] [ PubMed ] [ Google Scholar ]
Smith DG, Turner CSV, Osei-Kofi N, and Richards S (2004). Interrupting the Usual: Successful Strategies for Hiring Diverse Faculty. J. Higher Educ 75, 133–160. [ Google Scholar ]
Smith JL, Handley IM, Zale AV, Rushing S, and Potvin MA (2015). Now Hiring! Empirically Testing a Three-Step Intervention to Increase Faculty Gender Diversity in STEM. Bioscience 65, 1084–1087. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Smyth FL, and Nosek BA (2015). On the gender-science stereotypes held by scientists: explicit accord with gender-ratios, implicit accord with scientific identity. Front. Psychol 6, 415. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Snodgrass R (2006). Single- versus double-blind reviewing: an analysis of the literature. SIGMOD Rec. 35, 8–21. [ Google Scholar ]
Spencer SJ, Steele CM, and Quinn DM (1999). Stereotype Threat and Women’s Math Performance. J. Exp. Soc. Psychol 35, 4–28. [ Google Scholar ]
Squazzoni F, Grimaldo F, and Marušić A (2017). Publishing: Journals could share peer-review data. Nature 546, 352. [ DOI ] [ PubMed ] [ Google Scholar ]
Squazzoni F, Bravo G, Farjam M, Marusic A, Mehmani B, Willis M, Birukou A, Dondio P, and Grimaldo F (2021). Peer review and gender bias: A study on 145 scholarly journals. Sci Adv 7. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Stark P, and Freishtat R (2014). An evaluation of course evaluations. ScienceOpen Res. [ Google Scholar ]
Steinpreis RE, Anders KA, and Ritzke D (1999). The Impact of Gender on the Review of the Curricula Vitae of Job Applicants and Tenure Candidates: A National Empirical Study. Sex Roles 41, 509–528. [ Google Scholar ]
Stewart AJ, and Valian V (2018). An Inclusive Academy: Achieving Diversity and Excellence. In MIT Press, (MIT Press. 55 Hayward Street, Cambridge, MA 02142. Tel: 800–405-1619; Tel: 617–253-5646; Fax: 617–253-1709; [email protected]; Web site: http://mitpress.mit.edu ),. [ Google Scholar ]
Sugimoto CR, Ni C, West JD, and Larivière V (2015). The academic advantage: gender disparities in patenting. PLoS One 10, e0128000. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Sumner JL (2018). The Gender Balance Assessment Tool (GBAT): A Web-Based Tool for Estimating Gender Balance in Syllabi and Bibliographies. PS Polit. Sci. Polit 51, 396–400. [ Google Scholar ]
Sweet DJ (2021). New at Cell Press: The Inclusion and Diversity Statement. Cell 184, 1–2. [ DOI ] [ PubMed ] [ Google Scholar ]
Thompson S, and Parry P (2017). Coping with Gender Inequities: Critical Conversations of Women Faculty (Rowman and Littlefield; ). [ Google Scholar ]
Tomkins A, Zhang M, and Heavlin WD (2017). Reviewer bias in single- versus double-blind peer review. Proc. Natl. Acad. Sci. U. S. A 114, 12708–12713. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Tzovara A, Amarreh I, Borghesani V, Chakravarty MM, DuPre E, Grefkes C, Haugg A, Jollans L, Lee HW, Newman SD, et al. (2021). Embracing diversity and inclusivity in an academic setting: Insights from the Organization for Human Brain Mapping. Neuroimage 229, 117742. [ DOI ] [ PubMed ] [ Google Scholar ]
Valantine HA, Grewal D, Ku MC, Moseley J, Shih M-C, Stevenson D, and Pizzo PA (2014). The gender gap in academic medicine: comparing results from a multifaceted intervention for stanford faculty to peer and national cohorts. Acad. Med 89, 904–911. [ DOI ] [ PubMed ] [ Google Scholar ]
Viglione G (2020). Are women publishing less during the pandemic? Here’s what the data say. Nature 581, 365–366. [ DOI ] [ PubMed ] [ Google Scholar ]
Waisbren SE, Bowles H, Hasan T, Zou KH, Emans SJ, Goldberg C, Gould S, Levine D, Lieberman E, Loeken M, et al. (2008). Gender differences in research grant applications and funding outcomes for medical school faculty. J. Womens. Health 17, 207–214. [ DOI ] [ PubMed ] [ Google Scholar ]
Weisshaar K (2017). Publish and perish? An assessment of gender gaps in promotion to tenure in academia. Soc. Forces 96, 529–560. [ Google Scholar ]
West JD, Jacquet J, King MM, Correll SJ, and Bergstrom CT (2013). The role of gender in scholarly authorship. PLoS One 8, e66212. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Whittington KB, and Smith-Doerr L (2008). Women Inventors in Context: Disparities in Patenting across Academia and Industry. Gend. Soc 22, 194–218. [ Google Scholar ]
Williams WM, and Ceci SJ (2015). National hiring experiments reveal 2:1 faculty preference for women on STEM tenure track. Proc. Natl. Acad. Sci. U. S. A 112, 5360–5365. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Witteman HO, Hendricks M, Straus S, and Tannenbaum C (2019). Are gender gaps due to evaluations of the applicant or the science? A natural experiment at a national funding agency. Lancet 393, 531–540. [ DOI ] [ PubMed ] [ Google Scholar ]
Woolston C (2020). Male authors boost research impact through self-hyping studies. Nature 578, 328. [ DOI ] [ PubMed ] [ Google Scholar ]
Zhou D, Cornblath EJ, Stiso J, Teich EG, Dworkin JD, Blevins AS, and Bassett DS (2020). Gender Diversity Statement and Code Notebook v1.0 (Zenodo; ). [ Google Scholar ]
Zhu JM, Pelullo AP, Hassan S, Siderowf L, Merchant RM, and Werner RM (2019). Gender Differences in Twitter Use and Influence Among Health Policy and Health Services Researchers. JAMA Intern. Med 179, 1726–1729. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
Zinovyeva N, and Bagues MF (2011). Does gender matter for academic promotion? Evidence from a randomized natural experiment (IZA Discussion Papers. [ Google Scholar ]
View on publisher site
PDF (641.0 KB)
Collections

Add to Collections

An official website of the United States government

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Publications
Account settings
Advanced Search
Journal List

Quality of evidence revealing subtle gender biases in science is in the eye of the beholder

Ian m handley, elizabeth r brown, corinne a moss-racusin, jessi l smith.

Author information
Article notes
Copyright and License information

To whom correspondence should be addressed. Email: [email protected] .

Edited by Susan T. Fiske, Princeton University, Princeton, NJ, and approved September 16, 2015 (received for review May 31, 2015)

Author contributions: I.M.H., E.R.B., C.A.M.-R., and J.L.S. designed research; E.R.B. and J.L.S. performed research; I.M.H., E.R.B., and J.L.S. analyzed data; and I.M.H., E.R.B., C.A.M.-R., and J.L.S. wrote the paper.

Issue date 2015 Oct 27.

Freely available online through the PNAS open access option.

Significance

Ever-growing empirical evidence documents a gender bias against women and their research—and favoring men—in science, technology, engineering, and mathematics (STEM) fields. Our research examined how receptive the scientific and public communities are to experimental evidence demonstrating this gender bias, which may contribute to women’s underrepresentation within STEM. Results from our three experiments, using general-public and university faculty samples, demonstrated that men evaluate the quality of research unveiling this bias as less meritorious than do women. These findings may inform and fuel self-correction efforts within STEM to reduce gender bias, bolster objectivity and diversity in STEM workforces, and enhance discovery, education, and achievement.

Keywords: gender bias, science workforce, diversity, science education, sexism

Scientists are trained to evaluate and interpret evidence without bias or subjectivity. Thus, growing evidence revealing a gender bias against women—or favoring men—within science, technology, engineering, and mathematics (STEM) settings is provocative and raises questions about the extent to which gender bias may contribute to women’s underrepresentation within STEM fields. To the extent that research illustrating gender bias in STEM is viewed as convincing, the culture of science can begin to address the bias. However, are men and women equally receptive to this type of experimental evidence? This question was tested with three randomized, double-blind experiments—two involving samples from the general public ( n = 205 and 303, respectively) and one involving a sample of university STEM and non-STEM faculty ( n = 205). In all experiments, participants read an actual journal abstract reporting gender bias in a STEM context (or an altered abstract reporting no gender bias in experiment 3) and evaluated the overall quality of the research. Results across experiments showed that men evaluate the gender-bias research less favorably than women, and, of concern, this gender difference was especially prominent among STEM faculty (experiment 2). These results suggest a relative reluctance among men, especially faculty men within STEM, to accept evidence of gender biases in STEM. This finding is problematic because broadening the participation of underrepresented people in STEM, including women, necessarily requires a widespread willingness (particularly by those in the majority) to acknowledge that bias exists before transformation is possible.

Objectivity is a fundamental value in the practice of science and is required to optimally assess one’s own research findings, others’ findings, and the merits of others’ abilities and ideas ( 1 ). For example, when scientists evaluate data collected on a potentially controversial topic (such as climate change), they strive to set aside their own belief systems and instead focus solely on the strength of the data and conclusions warranted. Similarly, when scientists evaluate a resume for a laboratory-manager position or assess the importance of a conference submission, the gender of the applicant or author should be immaterial. If they are truly objective, scientists should focus only on the relevant criteria of applicant qualifications or research merit.

However, despite rigorous training in the objective evaluation of information and resultant values ( 2 ), people working and learning within the science, technology, engineering, and mathematics (STEM) community are still prone to the same subtle biases that subvert objectivity and distort accurate perceptions of scientific evidence by the general public ( 3 , 4 ). We focus here on the robust gender biases documented repeatedly within the psychological literature ( 5 – 7 ). Some within the STEM community have turned to these methods and ideas as an explanation for the consistent underrepresentation of women in STEM fields ( 8 , 9 ) and the undervaluation of these women and their work. Specifically, many scientists have systemically documented and reported (including in PNAS ) a gender bias against women—or favoring men—in STEM contexts ( 10 – 17 ), including hiring decisions for a laboratory-manager position ( 10 ) and selection for a mathematical task ( 11 ), evaluations of conference abstracts ( 12 ), research citations ( 13 ), symposia-speaker invitations ( 14 ), postdoctoral employment ( 15 ), and tenure decisions ( 16 ). For example, Moss-Racusin et al. ( 10 ) conducted an experiment in which university science professors received the same application for a laboratory-manager position, either associated with a male or female name through random assignment. The results demonstrated that the science professors—regardless of their gender—evaluated the applicant more favorably if the applicant had a man's name compared to a woman's name. These findings mirror past results in which men and women psychology faculty participants evaluated an application from a faculty candidate with a woman’s name less favorably than the identical application with a man’s name ( 17 ). As another example, Knobloch-Westerwick et al. ( 12 ) found that graduate students evaluate science-related conference abstracts more positively when attributed to a male relative to a female author, particularly in male-gender-typed science fields. These biases are frequently unintentional ( 18 – 20 ), exhibited even by individuals who greatly value fairness and view themselves as objective ( 21 ). Indeed, gender biases often result from unconscious processes ( 22 , 23 ) or manifest so subtly that they escape notice ( 24 ).

However unintentional or subtle, systematic gender bias favoring male scientists and their work could significantly hinder scientific progress and communication ( 12 ). In fact, the evidence for a gender bias in STEM suggests that our scientific community is not living up to its potential, because homogenous workforces (including the academic workplace) can deplete the creativity, discovery, and satisfaction of workers, faculty, and students ( 25 – 27 ). STEM fields are fairly homogeneously male; at 4-y US colleges, for example, an average of 71% of STEM faculty are men ( 28 ). For these reasons, there is a growing call for broadening the participation of women (and other underrepresented groups) in STEM fields. For instance, the National Science Foundation (NSF) promoted inclusiveness as a core value in its 2014–18 strategic plan, continues to fund ADVANCE Institutional Transformation grants to broaden the participation of women faculty in STEM, and has created a directorate charged with broadening the participation of all underrepresented people within STEM. Similarly, the National Institutes of Health called for reducing subtle biases and broadening participation in STEM fields ( 29 ) and issued at least three large new requests for proposals to help accomplish this goal ( 30 ). Indeed, there are growing numbers of research studies, calls to action, strategic plans, and even resources to systematically document, understand, and hopefully ameliorate gender biases within STEM to create a thriving, diverse, and equitable scientific community ( 31 – 34 ). However, are people generally (e.g., taxpayers, voters, government officials, etc.) and STEM practitioners in particular “buying” the mounting evidence of these gender biases within the STEM community? Currently, to our knowledge, there is no experimental research examining how receptive or biased various individuals within the STEM and public communities are to research demonstrating gender bias that undermines women’s participation within STEM. Thus, to address this question, our experimental research investigates potentially biased evaluations among the general public and STEM practitioners of evidence demonstrating gender biases against women/favoring men within STEM fields.

Of course, to ameliorate gender bias within STEM fields, it is not sufficient to simply herald findings demonstrating that STEM practitioners exhibit these biases. Indeed, there may well be another layer of bias such that men evaluate findings such as those reported by Moss-Racusin et al. ( 10 ) and Knobloch-Westerwick et al. ( 12 ) less favorably than women. In fact, a recent (nonexperimental) analysis of naturally occurring online comments written by readers of popular press articles covering the research of Moss-Racusin et al. ( 10 ) suggests that men were more likely than women to demonstrate negative reactions to experimental evidence of gender bias ( 35 ). Further, several lines of theorizing suggest that men may evaluate such research as less meritorious than would women ( 24 , 36 – 42 ). Among these theories, Social Identity Theory ( 36 – 38 ) and related perspectives ( 39 ) posit that people are motivated to perceive their group favorably and defend that perception against threat, and that people within privileged groups often seek to retain and justify their privileged status ( 39 ). Men clearly hold an advantageous position within the sciences, because they represent the vast majority of STEM university faculty at all ranks, earn higher salaries controlling for rank and related factors ( 43 ), and on average receive more federal grant funding to support their research than their comparable women colleagues ( 44 , 45 ). Indeed, growing evidence reveals an often invisible advantage for men, stemming in part from inequities against women in STEM, which can threaten that advantage ( 10 , 12 , 46 , 47 ). That is, men might find the results reported by Moss-Racusin et al. ( 10 ) threatening, because remedying the gender bias in STEM fields could translate into favoring women over men, especially if one takes a zero-sum-gain perspective ( 47 ). Therefore, relative to women, men may devalue such evidence in an unintentional implicit effort ( 18 – 20 ) to retain their status as the majority group in STEM fields. However, some men might perceive research that exposes gender bias in STEM as more threatening than other men. According to Social identity Theory, individuals perceive greater threat toward their group (and defend against it) when they are highly committed to that group ( 37 , 38 ). Thus, men within STEM fields (e.g., physics professors) may feel more threated by the research of Moss-Racusin et al. ( 10 ) than men within non-STEM fields (e.g., English professors), assuming they are more committed to STEM fields and men’s status therein. Thus, men overall relative to women are likely to devalue research demonstrating bias against women in STEM, but this difference may be prominent among individuals within (and committed to) STEM fields, and weaker to nonexistent among individuals within non-STEM fields.

Beyond Social Identity Theory, other frameworks could predict a difference between men’s and women’s evaluation of research demonstrating bias against women in STEM, and, in fact, this difference might result from multiple factors. For instance, the predicted gender difference may also result from a confirmation bias such that people favorably evaluate information that is consistent with their beliefs, but unfavorably evaluate information that is inconsistent with their beliefs ( 48 ). A classic empirical example of confirmation bias showed that peer-reviewers were less favorable toward an essentially identical research manuscript when it was doctored to report results inconsistent with the reviewers’ preferred theoretical viewpoint, but more favorable when it was doctored to report results consistent with the reviewers’ preferred theoretical viewpoint ( 49 ). Add to this finding that there is compelling evidence that women faculty are more likely to view gender bias as a problem within their current working academic context ( 40 ), and it is possible that women may evaluate research demonstrating a gender bias (belief consistent) more favorably than men, but evaluate research demonstrating no gender bias (belief inconsistent) less favorably than men.

Current Research

We report three experiments designed to provide, to our knowledge, the first test for gender differences in the evaluation of scientific evidence demonstrating that individuals are biased against women within STEM contexts. In each experiment, men and women participants read via an online survey instrument an actual article abstract from a peer-reviewed scientific journal, accompanied by the date and title of the publication (see Materials and Methods for more details). Participants then evaluated their agreement with the authors’ interpretation of the results, the importance of the research, and how well-written and favorable they found the quality of the abstract. These ratings were highly associated with one another and were averaged to create a measure of participants’ overall evaluation of the abstract (for further details, see SI Materials and Methods , Dependent Variables ). Globally, we predicted that male relative to female participants would evaluate the abstract less favorably when the abstract reported a gender bias against women in STEM (hypothesis A; experiments 1–3), and that this difference would be more prominent among participants in STEM (vs. non-STEM) fields, to whom a gender bias in STEM is most germane (hypothesis B; experiment 2). Further, we predicted that this gender difference would manifest for abstracts that reported a gender bias in STEM, but would reverse for abstracts that reported no gender bias in STEM (hypothesis C; experiment 3).

All experiments included 2 or more factors (some for exploratory purposes in Experiments 1 and 2; see SI Materials and Methods for more details), and thus we tested all hypotheses using between-groups factorial analyses of variance. Further, we calculated Cohen’s d for each experiment to provide an index of strength for the predicted difference between men and women participants and to account for the unequal sample sizes between the genders. As per convention ( 50 ), effect sizes can range from small ( d = 0.20), to medium ( d = 0.50), to large ( d = 0.80).

The first two experiments tested for participant-gender differences in the evaluation of the actual abstract written by Moss-Racusin et al. ( 10 ). As discussed above, Moss-Racusin et al. ( 10 ) produced experimental evidence that STEM faculty of both genders demonstrate a significant bias against an identical applicant with a female vs. male name. Although this gender bias was empirically demonstrated with a national sample, we predicted that men would be less receptive to these (and related) findings, and women more receptive. Our first experiment involved a general sample of US adults ( n = 205) recruited online through Amazon’s Mechanical Turk. Our second experiment involved a sample of professors ( n = 205) from all STEM and non-STEM departments at a research-intensive university, allowing us to test whether the predicted gender difference in abstract evaluations is larger among individuals within STEM fields of study. A third experiment replicated the first two with a different abstract and is discussed in more detail below.

SI Materials and Methods

Participants and recruitment for experiments 1 and 3..

Participation was elicited from workers on Amazon’s Mechanical Turk online job site, who could view our employment opportunity (titled “What do REAL people think about science research results?”) listed alongside other opportunities.

A total of 205 individuals opted to participate in experiment 1 and provided usable data, which was active in March 2014. Originally, 218 individuals participated in the experiment, but 9 were excluded from data analysis because they failed one or more attention-check items (e.g., “If you are reading, respond ‘very much’ to this question;” “If you are reading, respond ‘not at all’ to this question”), 2 because they reported being under 18 y of age, and 2 because they did not specify their gender. Ultimately, 146 men and 59 women from the United States who were 18 y of age or older ( M = 30.13; range = 18–66) were retained for analysis. Of this general sample, 68.12% reported their race as “white,” and 51 individuals reported they were currently college students.

A total of 303 individuals opted to participate in experiment 3 and provided usable data, which was active in November 2014. Originally, 321 individuals participated in the experiment, but 12 were excluded from data analysis because they failed one or more attention-check items, 2 because they reported being under 18 y of age or did not specify an age, 1 because they did not specify their gender, and 7 because they reported they had read the abstract before (some participants met multiple exclusion criteria). Ultimately, 162 men and 141 women from the United States who were 18 y of age or older ( M = 34.22; range = 18–79),were retained for analysis. Of this general sample, 73.93% reported their race as “white,” and 55 individuals reported they were currently college students.

Participants and Recruitment for Experiment 2.

Participation was initially elicited on November 4, 2013, from all 506 tenure-track faculty at a research-intensive university via an email from their university provost encouraging participation in a larger faculty climate survey. That same day, our research team emailed all tenure-track faculty a message that explained the nature and importance of the survey, contained an informed consent form for faculty to read, explained the compensation faculty would receive for their participation, and contained a link to the survey and experiment, which was hosted on surveymonkey.com . This email included a unique identification code for each person, which preserved respondents’ anonymity and confidentiality, but allowed us to trace the faculty’s home department. In this way, we could determine whether faculty resided in STEM, including Social and Behavioral Sciences, departments (i.e., Agricultural Economics and Economics, Animal and Range Sciences, Cell Biology and Neuroscience, Center for Biofilm Engineering, Chemical and Biological Engineering, Chemistry, Civil Engineering, Computer Science, Earth Science, Ecology, Electrical Engineering, Industrial and Management Engineering, Institute on Ecosystems, Immunology and Infectious Diseases, Land Resources and Environmental Science, Mathematical Sciences, Mechanical and Industrial Engineering, Microbiology, Native American Studies, Physics, Plant Sciences, Political Science, Psychology, and Sociology and Anthropology) or non-STEM departments (i.e., Agricultural Education, Art, Nursing, Education, English, Film and Photography, Health and Human Development, History and Philosophy, Honor's Program, Liberal Studies, Modern Languages and Literature, Science Education, Music, and University Studies). All faculty who did not participate as of November 18 received a reminder email, which also contained a link to the survey and experiment and their unique identification code. The survey was closed on the evening of November 22.

Ultimately, 286 faculty participated in the unrelated survey, and 205 (40.5% of faculty) further elected to participate in our experiment at the end of the survey. Of these, 111 (54%) were men and 94 were women. Further, as specified above, 116 faculty were categorized as residing in STEM departments and 89 as residing in non-STEM departments. A comparable ratio of faculty from STEM (116/289 or 40.1%) and non-STEM (89/217 or 41.0%) departments completed the experiment. Participants indicated their race as white/Caucasian (86.3%), Asian (2%), Hispanic/Latino (1%), Native American (0.5%), or mixed (0.5%), or they opted not to report these data (9.8%). Further, participants reported their faculty rank as assistant (43.9%), associate (27.8%), or full (26.3%), or they did not specify (2%). Participants’ ages ranged from 27 to 73 y ( M = 47.35), and they had worked in their current position between 0 and 35 y ( M = 10.51). The demographics of our sample closely match the population of professors from this university (which is 64% male and 90.9% white/Caucasian), although assistant professors were somewhat overrepresented in our sample relative to the university population (assistant, associate, and full ranks comprise 29.3%, 32.1%, and 38.6% of professors, respectively). Aside from rank, perhaps, we can reasonably infer that there were no systematic biases influencing individuals’ decisions to participate in the experiment. That is, the results from this sample likely generalize to the population of faculty under investigation.

Procedure for Experiment 1.

For experiment 1, once participants clicked on the title for our experiment on Amazon’s Mechanical Turk, they encountered the following short paragraph: “In the scientific world, peer experts judge the quality of research and decide whether or not to publish it, fund it, or discard it. But what do everyday people think about these articles that get published? We are conducting an academic survey about people's opinions about different types of research that was published back in the last few years. You will be asked to read a very brief research summary and then answer a few questions about your judgments as non-experts about this research. There is no right or wrong answer and we realize you don’t have all the information or background. But just like in the scientific world, many judgments are made on whether something is quality science or not after just reading a short abstract summary. So to create that experience for you, we ask that you just provide your overall reaction as best you can even with the limited information. You will also be asked to provide demographic information about yourself. Select the link below to complete the survey.” Participants were also reminded that they would receive $0.25 in exchange for submitting the job “hit.” Participants then accepted the hit and opened up the survey in a separate tab or window. After consenting to participate, participants were given a summary of the experiment that they read before accepting the hit and then were asked, “Please read the following abstract from a 2012 published research study then provide your opinion with the items below.” Next, participants viewed the abstract written by Moss-Racusin et al. ( 10 ), the first author’s name and affiliation, and keywords, as described in the main text, and participants then provided their opinions about the abstract using scale ratings ( SI Materials and Methods , Dependent Variables ). Once they began the survey, participants learned that they could skip over any questions or task that they wished, ensuring that our procedures were not coercive. Participants then completed demographic information, were debriefed regarding the purpose of the experiment, and were compensated $0.25 for their time.

Procedure for Experiment 2.

For experiment 2, once participants followed the link to the survey website, they first read information about the faculty climate survey and the types of tasks and questions they would encounter. Participants were also reminded that they would receive a $5 coupon from a local coffee shop for completing the survey and would be entered into a raffle to win 1 of 50 gift certificates form the campus bookstore (worth $50). Once they advanced to the survey, participants further learned that they could skip over any question or task they wished. This option resulted in several participants providing only partial data for the experiment (addressed in SI Additional Analyses , Experiment 2 ). The faculty climate survey took ∼15 min to complete and primarily contained questions about the university work environment, which were independent from the reported experiment.

Just after the survey, participants were asked to “Please read the following abstract from a 2012 published research study then provide your opinion with the items below.” They then viewed the same abstract and associated information as in experiment 1 and evaluated that abstract using the same scale ratings. Finally, participants entered their unique code and could print off a coupon in compensation for their participation.

Procedure for Experiment 3.

The procedures for experiment 3 were identical to experiment 1, with a few minor, but important, differences. First, participants were randomly assigned to read either the original version of the Knobloch-Westerwick et al. ( 12 ) abstract, which reported a gender bias (e.g., “Publications from male authors were associated with greater scientific quality, in particular if the topic was male-typed”) or a version slightly altered to report no gender differences (e.g., “Publications from male and female authors were associated with comparable scientific quality, even if the topic was male-typed”). Second, unlike in experiments 1 and 2, the abstract was not accompanied by the author’s name or affiliation. Otherwise, the procedures and dependent measures for this experiment were identical to those used in the previous experiments. At the end of the experiment, participants completed demographic information, were debriefed regarding the purpose of the experiment, and were compensated $0.25 for their time.

Dependent Variables.

After reading the abstract, participants in all experiments reported their evaluation of the abstract and research using measures adapted from those commonly used to gauge attitude change and evaluations of persuasive materials ( 59 , 60 ). Specifically, on scales from 1 ( not at all ) to 6 ( very much ), participants responded to the following four questions or statements: “To what extent do you agree with the interpretation of the research results?” “To what extent are the findings of this research important?” “To what extent was the abstract well written?” and “Overall, my evaluation of this abstract is favorable.” These four responses demonstrated high internal consistency in all experiments (Cronbach’s α = 0.84, 0.89, and 0.78 in experiments 1, 2, and 3, respectively) and were therefore averaged to measure participants’ perceived quality of the research.

For experiment 2 only, participants completed a faculty climate survey before the experiment, which included items assessing the extent to which faculty felt that they had been personally discriminated against due to their gender. Specifically, on scales from 1 ( strongly disagree ) to 7 ( strongly agree ), participants responded to the following three statements: “I have personally been a victim of gender discrimination,” “I consider myself a person who has been deprived of opportunities because of my gender,” and “Prejudice against my gender group has not affected me personally” (the latter of which was reverse-scored). These three responses demonstrated high internal consistency (Cronbach’s α = 0.87) and were therefore averaged to measure participants’ personal experience of gender discrimination.

Experiments 1 and 2.

Results from our experiment 1 supported hypothesis A, revealing a main effect of participant gender [ F (1, 197) = 9.85, P = 0.002, η 2 partial = 0.048], such that men ( M = 4.25, SD = 0.91, n = 146) evaluated the research less favorably than women ( M = 4.66, SD = 0.93, n = 59) in a general sample. Further, this effect was of moderate size ( d = 0.45).

Results from our experiment 2 also supported hypothesis A, revealing a main effect of participant gender [ F (1, 174) = 6.08, P = 0.015, η 2 partial = 0.034], such that male faculty evaluated the research less favorably ( M = 4.21, SD = 1.05) than female faculty ( M = 4.65, SD = 1.19; d = 0.397 [similar to experiment 1]). Thus, overall, experiments 1 and 2 provide converging evidence from multiple participant populations that men are less receptive than women—and by the same token, that women are more receptive than men—to experimental evidence of gender bias in STEM. Importantly, results from experiment 2 further reveal that this effect was qualified by a significant interaction between participant gender and field of study [ F (1, 174) = 5.19, P = 0.024, η 2 partial = 0.03]. This interaction supported hypothesis B, because simple-effect tests confirmed that male faculty evaluated the research less favorably ( M = 4.02, SD = 0.988, n = 66) than female faculty ( M = 4.80, SD = 1.14, n = 38) in STEM fields [ F (1, 174) = 11.94, P < 0.001], whereas male ( M = 4.55, SD = 1.09, n = 37) and female ( M = 4.54, SD = 1.23, n = 49) faculty reported comparable evaluations in non-STEM fields ( F < 1). Further, the effect size for the observed gender difference was large within STEM departments ( d = 0.74). Looking at this interaction another way, simple-effect tests demonstrated that men evaluated the research more negatively if they were in STEM than non-STEM departments [ F (1, 174) = 4.19, P = 0.042], whereas the opposite trend was not statistically significant among female faculty [ F (1, 174) = 1.45, P = 0.23]. Thus, it seems that men in STEM displayed harsher judgments of Moss-Racusin et al.’s ( 10 ) research, not that women in STEM exhibited more positive evaluations of it. The analysis revealed one other significant interaction that did not involve faculty gender (for further details, see SI Additional Analyses , Experiment 2 ). No other main effects or interactions reached significance (all other F < 2.07; P > 0.15). Finally, additional measures collected within a faculty survey ( SI Materials and Methods , Dependent Variables ) and analyses thereof provide suggestive evidence for a threat mechanism behind the effects (for the analyses and discussion, see SI Additional Analyses , Experiment 2 ).

Experiment 3.

We predicted that, compared with women, men would be prone to more negative evaluations of research that demonstrates a gender bias against women (and favors men) in STEM, not just the specific research reported by Moss-Racusin et al. ( 10 ). Further, we predicted that, compared with men, women would be prone to more negative evaluations of research that demonstrates no gender bias against women in STEM. Thus, the gender effect seen in experiments 1 and 2 should replicate for a different abstract that also reports a gender bias, but reverse for an abstract that demonstrates no gender bias. Testing these predictions, we randomly assigned new participants to read either the original abstract published by Knobloch-Westerwick et al. ( 12 ) which reported a gender bias against women’s (relative to men’s) scientific conference submissions, or a version slightly altered to report no gender bias. These participants were recruited online through Amazon’s Mechanical Turk ( n = 303). Results indicated only a significant interaction between participant gender and abstract version [ F (1, 299) = 4.00, P = 0.046, η 2 partial = 0.013] (all other F < 1). Although no simple-effect tests were significant (all F < 2.69 , P > 0.10), together, these results support the overall pattern predicted by hypothesis C, such that that men evaluated the original (gender-bias exists) abstract less favorably ( M = 3.65, SD = 1.03, n = 78) than did women ( M = 3.86, SD = 1.05, n = 74; d = 0.20), whereas men evaluated the modified (no gender-bias exists) abstract more favorably ( M = 3.83, SD = 0.92, n = 84) than did women ( M = 3.59, SD = 0.86, n = 67; d = 0.27).

SI Additional Analyses

Experiment 1..

For the primary measure, author gender and affiliation alone did not influence evaluations, and neither did any two-way interactions among factors (all P > 0.3). However, the analysis revealed a nonpredicted and significant interaction among participant gender, author gender, and author affiliation [ F (1, 197) = 18.13; P < 0.001]. Consistent with the theme of this work, we describe this interaction in terms of gender differences at each combination of author gender and affiliation. When the abstract author was supposedly a man from Iowa State University, male participants rated the abstract as being of higher quality ( M = 4.57, SD = 0.787) than did women ( M = 4.26, SD = 0.893), whereas when the abstract author was supposedly a woman from Iowa State University, female participants rated the abstract as being of higher quality ( M = 5.03, SD = 0.713) than did men ( M = 3.89, SD = 1.13). Thus, when the author was supposedly affiliated with Iowa State University, all participants seemed to demonstrate a gender bias in favor of their own gender; women had higher ratings for a female author, and men gave higher ratings for a male author. However, when the abstract author was supposedly a man from Yale University, female participants instead rated the abstract as being of higher quality ( M = 5.02, SD = 0.874) than did men ( M = 4.13, SD = 0.897), whereas when the abstract author was supposedly a woman from Yale University, female participants reported ratings of the abstract ( M = 4.38, SD = 1.031) that were equivalent to those of men ( M = 4.38, SD = 0.697). Interestingly, when evaluating research from Yale that reveals gender bias, it seems that women demonstrated the greatest bias against women (or favoring men) authors.

There are at least two important notes regarding this interaction between participant gender, author gender, and author affiliation. First, this interaction was not observed in the second experiment among university faculty. Thus, although this interaction is certainly interesting, we withhold focusing too much on this result until it is replicated in future research. This result was not predicted or replicated and may be spurious. Second, if this interaction pattern does replicate in future research, this finding may indicate that the lay public and scientific community manifest bias toward research uncovering gender bias differently under different conditions. Within scientific communities, perhaps the gender bias against such research is unaffected by author gender or affiliation. However, in the lay public, the gender bias is more complex and context-dependent. Ultimately, it is important to understand failures in objectivity among the scientific community, as well as the public, regarding research demonstrating gender bias in STEM. After all, it is often the nonscientists (the public, government officials, bureaucrats, nonprofit organizations, special-interest groups, etc.) that drive the funding opportunities so critical to scientific progress and discovery.

Experiment 2.

In addition to the predicted effects reported in the paper, the primary analysis also revealed a significant interaction among field of study, author gender, and author affiliation [ F (1, 174) = 8.07; P < 0.01]. The interaction pattern indicated that faculty in STEM evaluated the abstract written by a man more favorably if the author was from Yale (vs. Iowa State), but the abstract written by a woman more favorably if the author was from Iowa State (vs. Yale), whereas the opposite pattern manifested among non-STEM faculty.

Additionally, we conducted the analysis again, removing fields of study associated with the social and behavioral sciences (i.e., Agricultural Economics and Economics, Native American Studies, Political Science, Psychology, and Sociology and Anthropology) from the analysis entirely. Given that the classification of some of these fields as STEM might vary depending on who one consults, we wanted to confirm that the key results held comparing STEM to non-STEM fields, even excluding the social and behavioral sciences. Indeed, this analysis, too, revealed the predicted significant main effect of gender [ F (1, 156) = 8.30, P = 0.005] and the predicted significant interaction between gender and field of study [ F (1, 156) = 7.31, P = 0.008].

Further, given that there was a somewhat disproportionate representation of assistant professors in our sample, we investigated whether our results held accounting for faculty rank. To do this analysis, we collapsed across the author’s gender and affiliation (including all factors created several conditions with only one participant’s response) and conducted an analysis with faculty gender, field of study, and faculty rank as factors (four participants did not report their rank and were therefore not included in this analysis). Like the primary analysis, this analysis revealed a significant main effect of gender [ F (1, 174) = 6.04; P = 0.015] and a significant interaction between gender and field of study [ F (1, 174) = 5.27; P = 0.023]. Therefore, the original results hold while controlling for faculty rank. No other main effects or interactions reached significance (all other F < 2.43; P > 0.09).

Of note, several participants in experiment 2 elected to skip some of our four measures. Of the full 205 participants, 190 completed all four measures—which were averaged for the primary analyses. Thus, we examined how well our predicted findings held examining each measure independently. Critically, there was a significant main effect of participant gender for three of the four measures. Relative to female faculty, male faculty agreed less with the interpretations of the research [ n = 199, F (1, 183) = 6.66, P = 0.011], evaluated the research findings as less important [ n = 202, F (1, 186) = 7.00, P = 0.009], evaluated the abstract as less well written [ n = 196, F (1, 181) = 4.67, P = 0.032], and overall evaluated the abstract less favorably [ n = 201, F (1, 185) = 3.45, P = 0.065)].

Additionally, the pattern of means for the interaction between participant gender and their STEM status for each of these measures was identical to that observed for the primary analysis. However, the omnibus test of this interaction was significant for participants’ ratings of how important they evaluated the research findings [ F (1, 186) = 8.31, P = 0.004], how well written they found the abstract [ F (1, 181) = 4.22, P = 0.041], and their overall favorability toward the abstract [ F (1, 185) = 9.80, P = 0.002], but not for their assessment of how much they agreed with the interpretations of the research [ F (1, 183) = 1.55, P = 0.21]. Nonetheless, as in the primary analysis, simple-effect tests for all measures revealed that male faculty reported less favorable evaluations than female faculty in STEM departments (all F > 7.91 and < 17.14; all P < 0.005), but comparable evaluations within non-STEM departments (all F < 1). Overall, then, the critical findings for the primary measure hold well when looking at each individual measure.

Finally, although we did not design experiment 2 to specifically investigate potential mechanisms behind these effects, especially regarding the interaction, some data within a faculty survey (completed just before our experiment) allowed us to explore the possibility that these effects were related to perceptions of threat. Specifically, faculty rated the extent to which they felt they had been personally discriminated against due to their gender ( SI Materials and Methods , Dependent Variables ). We reasoned that the greater men’s experience of gender discrimination (the more they feel women have had an unjust advantage at men’s expense), the more threatening they should find research demonstrating an actual bias against women in STEM. After all, men who have experienced gender discrimination may harbor concern that such research could promote future “reverse” discrimination against men in STEM. Further, assuming men in STEM are more committed to (or identify with) STEM than men in non-STEM fields, Social Identity Theory ( 36 , 37 ) predicts that the experience of threat should predominantly manifest among men in STEM. Indeed, there was a negative correlation between the personal experience of gender discrimination and evaluations of the abstract only among men in STEM. The more male faculty in STEM felt they experienced gender discrimination, the less favorably they evaluated the abstract [ r (63) = −0.404; P = 0.001]. This same correlation among non-STEM men was positive but nonsignificant, [ r (34) = 0.157; P = 0.367]. Among women, results yielded a significant correlation within non-STEM fields [ r (48) = 0.35; P = 0.014], but no correlation within STEM fields [ r (36) = 0.262; P = 0.118]. However, these correlations would not indicate anything about threat because the results of Moss-Racusin et al. ( 10 ) affirm women’s experience with gender discrimination.

Together, these two correlations among men in STEM and non-STEM are consistent with Social Identity Theory and our assumption that men in STEM identify more with STEM than do non-STEM men and likely perceived the abstract as more threatening. However, the gender-discrimination measure did not mediate the effects found for the abstract evaluation. To test for possible effects, we subjected the gender-discrimination measure to an analysis of variance with gender and field of study as factors (participants completed this measure before reading the abstract, making the factors associated with the abstract inconsequential). Importantly, this analysis revealed a significant main effect of gender such that women experienced greater gender discrimination than men [ F (1, 194) = 16.87; P < 0.001], indicating that the construct was valid. However, this analysis revealed no interaction [ F (1, 194) = 1.77; P > 0.18], meaning this construct did not mediate our primary results. This finding is not necessarily surprising, however, given that the gender-discrimination measure was not designed to directly measure the extent to which participants find the results of Moss-Racusin et al. ( 10 ) to be threatening. Overall, then, the correlation evidence is only suggestive, and we encourage future research to explore this and other possible mechanisms behind our effect.

There is now copious evidence that women are disadvantaged in STEM fields ( 51 – 53 ) and that this disadvantage may relate to gender stereotypes ( 11 ) and consequent biases against women (or favoring men) traversing the STEM pipeline ( 10 – 17 ). Of course, people should not passively accept such evidence, even if it appears in preeminent peer-reviewed journals (e.g., Science , PNAS , or Nature )—suggesting the quality of the research was sound. Ideally, especially within the STEM community, people should evaluate as objectively as possible the research producing such evidence, the resulting quality of the evidence, and the interpretation of that evidence.

However, the evidence from our three straightforward experiments indicates than men evaluate research that demonstrates bias against women in STEM less favorably than do women—or, that women evaluate it more favorably. Specifically, male relative to female participants (including university faculty) in experiments 1 and 2 assessed the quality of the research by Moss-Racusin et al. ( 10 )—as presented simply through their actual abstract—as being lower. In addition, perhaps of greatest concern, this gender difference and accompanying effect size was large among faculty working within STEM fields ( 50 ) and nonexistent among faculty from non-STEM fields (experiment 2). Further, the overall gender difference observed in the first two experiments was replicated among participants in experiment 3 who read the true abstract of Knobloch-Westerwick et al. ( 12 ), which also reported a gender bias in STEM. However, this gender difference was reversed among participants who read an altered version purporting no gender bias in STEM.

The results from this third experiment are important for at least three reasons. First, they indicate that men relative to women do not uniquely disfavor the research of Moss-Racusin et al. ( 10 ), but research that reports a gender bias hindering women in STEM. Second, these results suggest that men do not generally evaluate research more harshly than women, as it might seem from the first two experiments (but see the results from non-STEM faculty in experiment 2). Rather, relative to women, men actually favor research suggesting there is no gender bias in STEM. Finally, the results indicate that individuals are likely to demonstrate a gender bias toward research pertaining to the mere topic of gender bias in STEM; men seem to disfavor (and women favor) research demonstrating a gender bias, but women seem to disfavor (and men favor) research demonstrating no gender bias. Of course, given that we cannot have a gender-free control condition, it is important to note that these biases are relative to the other gender; we cannot conclude that one gender is more biased than the other, just that individuals’ judgments of research regarding gender bias in STEM is biased by their gender.

Critically, across three experiments, we uncovered a gender difference in the way people from the general public and STEM faculty evaluate the quality of research that demonstrates women’s documented disadvantage in STEM fields: Men think the research is of lower quality, whereas women think the research is of higher quality. Why does this gender difference matter? For one, there are significant implications for the dissemination and impact of meritorious previous, current, and future research on gender bias in STEM fields. Foremost, our research suggests that men will relatively disfavor—and women will relatively favor—research demonstrating this bias. Given that men dominate STEM fields throughout industry and academia, scholars whose program of research focuses on demonstrating gender bias in STEM settings might experience undue challenges for publication, have lower chances of publication in top-tier outlets, experience greater challenges in receiving tenure, and overall have lower-than-warranted impact on the thinking, research, and practice of those in STEM fields. Such possibilities are highly problematic and call for additional research evaluating biased reactions to scientific evidence demonstrating gender and/or racial biases within STEM.

Second, because men represent the majority of individuals in STEM fields and yet are less likely than women to acknowledge biases against women in STEM, it may be challenging to fully embrace the numerous calls to broaden the participation of women and minorities in STEM. How can we successfully broaden the participation of women in STEM when the very research underscoring the need for this initiative is less valued by the majority group who dominate and maintain the culture of STEM? Intensifying the challenge, men hold an advantageous position in STEM fields and may feel threatened by research and efforts to “level the playing field” for women. Similarly, people often unintentionally exhibit in-group favoritism ( 54 ), wherein individuals engage in behaviors and allocate resources in ways that benefit members of their group (e.g., men unintentionally conferring advantage to other men).

Fortunately, there are current efforts in place to meet these challenges. For example, “Project Implicit” ( https://implicit.harvard.edu/implicit/ ) provides workshops and talks to reveal the subtlety and implicitness of gender bias and considers how to foster a broader recognition of these biases and address them. Further, NSF funds ADVANCE-Institutional Transformation grants to specifically facilitate the increased participation of women in STEM and help transform academic cultures to foster equality and inclusivity. Shields et al. ( 55 ) created a “WAGES” game and accompanying discussion platform that effectively highlights male privilege and advantage among STEM faculty and helps reduce reactance to acknowledging this advantage ( 56 ). Finally, Moss-Racusin et al. have developed an evidence-based framework for creating, evaluating, and implementing diversity interventions designed to increase awareness of and reduce bias across STEM fields ( 31 ). Initial evidence reveals promising results for interventions adhering to these guidelines ( 31 ). These efforts, along with others that can help individuals actually acknowledge evidence demonstrating gender bias in STEM, are critical in bringing about change and increasing the participation of women in STEM.

Limitations and Future Directions

As with any research, ours is met with limitations. First, we did not directly test the potential mechanisms behind the reported gender effect. However, even before we understand exactly why men are less favorable than women toward research demonstrating a gender bias in STEM, we suggest that is important for the STEM community to know that this phenomenon exists. However, we uncovered evidence in experiment 2 suggesting that men in STEM found the abstract of Moss-Racusin et al. ( 10 ) threatening ( SI Additional Analyses , Experiment 2 ), which may be one possible explanation for the results ( 37 ). In the future, researchers could test this possibility by including a direct measure of how threatening people find the implications of various research results and multiple measures of social identity. It is also worth investigating in future research whether the confirmation bias ( 48 , 49 ) contributes to the reported gender effect by measuring people’s beliefs about gender bias in STEM before reading research demonstrating that bias. We hope our findings will spark future research thoroughly investigating the mechanisms underscoring this effect. Second, we investigated individuals’ evaluations of two abstracts reporting gender bias in STEM, specifically within the contexts of evaluating a laboratory-manager application and conference abstracts. It is worthwhile to investigate whether this bias furthermore generalizes to evaluations of research that demonstrates gender bias in other STEM contexts, such as disparities in funding, publication rates, faculty and postdoctoral applicants, talk invitations, tenure decisions, and so forth. Theoretically, however, there is reason to predict that gender biases toward such research would replicate our current findings. In fact, because these contexts suggest a bias against (or in favor of) one’s direct peers and colleagues, it seems likely that gender-biased evaluations of this research would be even more prominent. For instance, STEM faculty might find threatening the possibility that they are biased regarding the quality of research from their female colleagues and prefer (likely implicitly) to find fault with the research rather than face that possibility.

Third, we investigated individuals’ assessment of research quality after they read only an abstract. We chose an abstract as a reasonable basis for assessment because abstracts present key methods and findings, are indexed and available for free, and are often what people read to determine whether or not they will read the full article. Nonetheless, it is conceivable that the gender bias we uncovered is a short-lived reaction. Perhaps the bias would shrink or disappear after reading the full article or a longer synopsis of the research. However, there is ample reason to predict that the bias will actually strengthen as people receive greater amounts of information, because they will (unintentionally) process that information based on initial impressions and per their motivation to arrive at a particular conclusion ( 42 , 48 , 49 ). However, we encourage future research into this issue.

As a final point on limitations, our experiments took place on an Internet platform, either at the end of a faculty survey that offered US$5 or as a short 10-min experiment paying $0.25. Thus, it is possible that our participants were not highly motivated to think about the abstract and thus simply based their quality assessments on “gut reactions” resulting in part from unconscious biases. Perhaps our findings would not hold among highly motivated participants whose assessments might have actual bearing on the publication of the research described in the abstract (e.g., peer reviewers). This hypothesis is certainly a possibility that warrants future exploration. However, we note that greater motivation does not always result in greater objectivity. In fact, biases can influence people’s judgments even more so when they are motivated to be accurate, particularly if they do not notice that their thought process is biased ( 21 , 42 ).

Further research might also explore why our first two experiments did not replicate previous research demonstrating an overall bias favoring the research of men above women in STEM ( SI Additional Analyses ). In particular, Knobloch-Westerwick et al. ( 12 ) found that graduate students evaluate science-related conference abstracts more positively when attributed to a male (relative to female) author, particularly in male-gender-typed fields. However, we did not find that participants in experiment 1 and 2 favored the abstract written by Moss-Racusin et al. ( 10 ) more if they thought it was written by a man vs. a women. It is possible that participants in our first two experiments found the topic of gender bias within STEM “feminine,” or perhaps only somewhat “scientific,” thus decreasing the bias toward the author’s gender. Future research might reveal that participants’ perception of gender-bias research plays an important role in producing biases against women—and favoring men—who conduct such research.

Failures in objectivity are problematic to specific research projects, science generally, and receptivity to discovery. However, objectivity is threatened by a multitude of cognitive biases, including gender bias in STEM fields. Numerous experimental findings confirm the existence of this bias, and the research we present here peels back yet another level of bias: Men evaluate the research that confirms gender bias within STEM contexts as less meritorious than do women. We hope that our findings help inform and fuel self-correction efforts within STEM to reduce this bias, bolster objectivity, and diversify STEM workforces. After all, the success of these efforts can translate into greater STEM discovery, education, and achievement ( 57 ).

Materials and Methods

Participants..

In experiments 1 and 3, participation was solicited from workers on Amazon’s Mechanical Turk online job site, who could view our employment opportunity listed alongside other opportunities. In experiment 1, a total of 205 individuals (146 men and 59 women) from the United States who were 18 y of age or older ( M = 30.13; range = 18–66) opted to participate in the experiment and provided usable data (for more details, see SI Materials and Methods , Participants and Recruitment for Experiments 1 and 3 ). In experiment 3, a total of 303 individuals (162 men and 141 women) from the United States who were 18 y of age or older ( M = 34.22; range = 18–79) opted to participate in the experiment and provided usable data. All participants engaged in the ∼10-min experiment in exchange for $0.25.

In experiment 2, participation was first solicited from all tenure-track faculty at a research-intensive American university via an email from their university provost encouraging participation in a larger baseline faculty climate survey. The survey and experiment were conducted on an Internet platform, during which time 506 tenure-track faculty from this university received the email invitation to participate. A total of 268 of these faculty participated in the survey, and 205 of these faculty further elected to participate in our experiment at the end of the survey. The resulting sample included faculty from all departments at the university, from STEM departments ( n = 116) and non-STEM departments ( n = 89; for more details, see SI Materials and Methods , Participants and Recruitment for Experiment 2 ). All participants received a $5 coupon for a local coffee shop and, if they elected, were entered into a raffle for 1 of 50 possible $50US gift certificates for the campus bookstore.

All procedures were approved by the Montana State University institutional review board. The three experiments were approximately identical, although the experiment stood alone in experiments 1 and 3 and followed a faculty climate survey in experiment 2. All participants completed the experiment using a personal or work computer and received experiment materials, provided informed consent, and provided responses through surveymonkey.com .

Participants in experiments 1 and 2 were first instructed to read the actual abstract from the Moss-Racusin et al. ( 10 ) paper, which was provided in full on a single screen. The abstract was accompanied by that paper’s actual title, publication date, volume and issue number, first author’s full name, keywords, and a fictitious DOI. Further, participants were randomly assigned to receive a version of the abstract that either identified the first authors’ first name as “Karen” or “Brian,” which previous research indicates are equally likable and common names in the United States ( 58 ). Independent from this manipulation, participants received a version of the abstract that identified the author as affiliated with either Yale University (Moss-Racusin’s true affiliation at the time of the publication) or Iowa State University. After reading the abstract and affiliated information, participants were asked to provide ratings on several scales (adapted from scales commonly used to gauge attitude change and evaluations of persuasive materials) assessing the quality of the abstract and the research provided therein (for details, see SI Materials and Methods , Dependent Variables ). Participants also provided demographic information, including their gender. Participants’ responses were anonymous, but in experiment 2 their status as a STEM or non-STEM faculty member was identifiable using specialized codes. Overall, the research design allowed us to analyze participants’ quality assessments of the Moss-Racusin et al. ( 10 ) research as a function of participant gender, author gender, author affiliation, and participants’ STEM affiliation (experiment 2 only).

Participants in experiment 3 completed a similar procedure, with some key differences. First, participants were randomly assigned to read either the original version of the abstract by Knobloch-Westerwick et al. ( 12 ), which reported a gender bias, or a version slightly altered to report no gender differences. Second, the abstract was not accompanied by the author’s name or affiliation (as was done in experiments 1 and 2). Otherwise, the procedures and dependent measures for this experiment are identical to those used in the previous experiments. This research design allowed us to analyze participants’ quality assessments of the research by Knobloch-Westerwick et al. ( 10 ) as a function of participant gender and abstract version (reporting gender bias or no gender bias).

Acknowledgments

We thank the social science research team (especially Rebecca Belou) and project management staff of ADVANCE Project TRACS (Transformation through Relatedness, Autonomy, and Competence Support) for their efforts. This work was supported in part by National Science Foundation Grant 1208831 (to J.L.S.).

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1510649112/-/DCSupplemental .

1. Institute of Medicine, National Academy of Science, and National Academy of Engineering 1992 Responsible Science: Ensuring the Integrity of the Research Process (National Academy, Washington, DC), Vol 1. Available at www.nap.edu/openbook.php?isbn=0309047315 . Accessed September 11, 2014.
2. de Melo-Martín I, Intemann K. Interpreting evidence: why values can matter as much as science. Perspect Biol Med. 2012;55(1):59–70. doi: 10.1353/pbm.2012.0007. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
3. Carnes M, et al. Promoting institutional change through bias literacy. J Divers High Educ. 2012;5(2):63–77. doi: 10.1037/a0028128. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
4. Gilovich T, Griffin D, Kahneman D, editors. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge Univ Press; Cambridge, UK: 2002. [ Google Scholar ]
5. Al-Gazali L. Remove social barriers. Nature. 2013;495:35–36. [ Google Scholar ]
6. Ceci SJ, Williams WM. Understanding current causes of women’s underrepresentation in science. Proc Natl Acad Sci USA. 2011;108(8):3157–3162. doi: 10.1073/pnas.1014871108. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
7. Shen H. Inequality quantified: Mind the gender gap. Nature. 2013;495(7439):22–24. doi: 10.1038/495022a. [ DOI ] [ PubMed ] [ Google Scholar ]
8. National Science Board 2014 Science and Engineering Indicators 2014 (National Center for Science and Engineering Statistics, Arlington, VA). Available at www.nsf.gov/statistics/seind14/ . Accessed September 11, 2014.
9. Snyder TD, Dillow SA, Hoffman CM. 2009 Digest of Education Statistics 2008 (U.S. Department of Education, National Center for Education Statistics, Institute of Education Sciences, Washington, DC), NCES Publ No 2009-020. Available at nces.ed.gov/pubs2009/2009020.pdf . Accessed September 11, 2014.
10. Moss-Racusin CA, Dovidio JF, Brescoll VL, Graham MJ, Handelsman J. Science faculty’s subtle gender biases favor male students. Proc Natl Acad Sci USA. 2012;109(41):16474–16479. doi: 10.1073/pnas.1211286109. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
11. Reuben E, Sapienza P, Zingales L. How stereotypes impair women’s careers in science. Proc Natl Acad Sci USA. 2014;111(12):4403–4408. doi: 10.1073/pnas.1314788111. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
12. Knobloch-Westerwick S, Glynn CJ, Huge M. The Maltida effect in science communication: An experiment on gender bias in publication quality perceptions and collaboration interest. Sci Commun. 2013;35(5):603–625. [ Google Scholar ]
13. Larivière V, Ni C, Gingras Y, Cronin B, Sugimoto CR. Bibliometrics: Global gender disparities in science. Nature. 2013;504(7479):211–213. doi: 10.1038/504211a. [ DOI ] [ PubMed ] [ Google Scholar ]
14. Schroeder J, et al. Fewer invited talks by women in evolutionary biology symposia. J Evol Biol. 2013;26(9):2063–2069. doi: 10.1111/jeb.12198. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
15. Sheltzer JM, Smith JC. Elite male faculty in the life sciences employ fewer women. Proc Natl Acad Sci USA. 2014;111(28):10107–10112. doi: 10.1073/pnas.1403334111. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
16. Jaschik S. 2014 Productivity or Sexism? Inside Higher Education. Available at https://www.insidehighered.com/news/2014/08/18/study-raises-questions-about-why-women-are-less-likely-men-earn-tenure-research . Accessed September 11, 2014.
17. Steinpreis RE, Anders KA, Ritzke D. The impact of gender on the review of the curricula vitae of job applicants and tenure candidates: A national empirical study. Sex Roles. 1999;41(7/8):509–528. [ Google Scholar ]
18. Devine PG. Stereotypes and prejudice: Their automatic and controlled components. J Pers Soc Psychol. 1989;56(1):5–18. [ Google Scholar ]
19. Greenwald AG, Banaji MR. Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychol Rev. 1995;102(1):4–27. doi: 10.1037/0033-295x.102.1.4. [ DOI ] [ PubMed ] [ Google Scholar ]
20. Swim JK, Scott ED, Sechrist GB, Campbell B, Stangor C. The role of intent and harm in judgments of prejudice and discrimination. J Pers Soc Psychol. 2003;84(5):944–959. doi: 10.1037/0022-3514.84.5.944. [ DOI ] [ PubMed ] [ Google Scholar ]
21. Uhlmann EL, Cohen GL. “I think it, therefore it’s true”: Effects of self-perceived objectivity on hiring discrimination. Organ Behav Hum Decis Process. 2007;104(7):207–223. [ Google Scholar ]
22. Baron AS, Banaji MR. The development of implicit attitudes: Evidence of race evaluations from ages 6 and 10 and adulthood. Psychol Sci. 2006;17(1):53–58. doi: 10.1111/j.1467-9280.2005.01664.x. [ DOI ] [ PubMed ] [ Google Scholar ]
23. Knutson KM, Mah L, Manly CF, Grafman J. Neural correlates of automatic beliefs about gender and race. Hum Brain Mapp. 2007;28(10):915–930. doi: 10.1002/hbm.20320. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
24. Becker JC, Swim JK. Seeing the unseen: Attention to daily encounters with sexism as way to reduce sexist beliefs. Psychol Women Q. 2011;35(2):227–242. [ Google Scholar ]
25. Apfelbaum EP, Phillips KW, Richeson JA. Rethinking the baseline in diversity research: Should we be explaining the effects of homogeneity? Perspect Psychol Sci. 2014;9(3):235–244. doi: 10.1177/1745691614527466. [ DOI ] [ PubMed ] [ Google Scholar ]
26. Page SE. Making the difference: Applying the logic of diversity. Acad Manage Perspect. 2007;21(4):6–20. [ Google Scholar ]
27. Freeman RB, Huang W. 2014 Collaborating with People Like Me: Ethnic Co-Authorship Within the US. (National Bureau of Economic Research, Cambridge, MA), NBER Working Paper No 19905. Available at www.nber.org/papers/w19905 . Accessed September 11, 2014.
28. National Science Board 2012 Science and Engineering Indicators 2012 (National Center for Science and Engineering Statistics, Arlington, VA). Available at www.nsf.gov/statistics/seind12/ . Accessed September 11, 2014.
29. Tabak LA, Collins FS. Sociology. Weaving a richer tapestry in biomedical science. Science. 2011;333(6045):940–941. doi: 10.1126/science.1211704. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
30. National Institutes of Health 2014 Enhancing the Diversity of the NIH-Funded Workforce (National Institutes of Health, Bethesda). Available at commonfund.nih.gov/diversity/Initiatives . Accessed September 11, 2014.
31. Moss-Racusin CA, et al. Social science. Scientific diversity interventions. Science. 2014;343(6171):615–616. doi: 10.1126/science.1245936. [ DOI ] [ PubMed ] [ Google Scholar ]
32. US Department of Education 2014 Science, Technology, Engineering and Math: Education for Global Leadership. Available at www.ed.gov/stem . Accessed September 11, 2014.
33. Obama BH. 2013 State of the Union Address. Available at https://www.whitehouse.gov/the-press-office/2013/02/12/remarks-president-state-union-address . Accessed September 11, 2014.
34. President’s Council of Advisors on Science and Technology 2012 Engage to Excel: Producing One Million Additional College Graduates with degrees in Science, Technology, Engineering, and Mathematics (Executive Office of the President, Washington, DC). Available at https://www.whitehouse.gov/sites/default/files/microsites/ostp/pcast-engage-to-excel-final_feb.pdf . Accessed September 11, 2014.
35. Moss-Racusin CA, Molenda AK, Cramer CR. Can evidence impact attitudes? Public reactions to evidence of gender bias in STEM fields. Psychol Women Q. 2015;39(2):194–209. [ Google Scholar ]
36. Tajfel H, Turner JC. The social identity theory of intergroup behaviour. In: Worchel S, Austin WG, editors. Psychology of Intergroup Relations. Nelson-Hall; Chicago: 1986. pp. 7–24. [ Google Scholar ]
37. Ellemers N, Spears R, Doosje B. Self and social identity. Annu Rev Psychol. 2002;53(1):161–186. doi: 10.1146/annurev.psych.53.100901.135228. [ DOI ] [ PubMed ] [ Google Scholar ]
38. Schmitt MT, Branscombe NR. The good, the bad, and the manly: Threats to one’s prototypicality and evaluations of fellow in-group members. J Exp Soc Psychol. 2001;37(6):510–517. [ Google Scholar ]
39. Jost JT, Banaji MR, Nosek BA. A decade of system justification theory: Accumulated evidence of conscious and unconscious bolstering of the status quo. Polit Psychol. 2004;25(6):881–919. [ Google Scholar ]
40. Ecklund EH, Lincoln AE, Tansey C. Gender segregation in elite academic science. Gend Soc. 2012;26(5):693–717. [ Google Scholar ]
41. Festinger L. A Theory of Cognitive Dissonance. Stanford Univ Press; Stanford, CA: 1957. [ Google Scholar ]
42. Kunda Z. The case for motivated reasoning. Psychol Bull. 1990;108(3):480–498. doi: 10.1037/0033-2909.108.3.480. [ DOI ] [ PubMed ] [ Google Scholar ]
43. Curtis JW. 2010 Faculty Salary Equity: Still a Gender Gap? On Campus with Women 39(1). Available at archive.aacu.org/ocww/volume39_1/feature.cfm?section=2 . Accessed August 20, 2014.
44. Boyle PJ, Smith LK, Cooper NJ, Williams KS, O'Connor H. Gender balance: Women are funded more fairly in social science. Nature. 2015;525(7568):181–183. doi: 10.1038/525181a. [ DOI ] [ PubMed ] [ Google Scholar ]
45. Wenneras C, Wold A. Nepotism and sexism in peer-review. Nature. 1997;387(6631):341–343. doi: 10.1038/387341a0. [ DOI ] [ PubMed ] [ Google Scholar ]
46. McIntosh P. 1988 White Privilege: Unpacking the Invisible Knapsack. Peace and Freedom. Available at nationalseedproject.org/white-privilege-unpacking-the-invisible-knapsack . Accessed September 25, 2015.
47. Norton MI, Sommers SR. Whites see racism as a zero-sum game that they are now losing. Perspect Psychol Sci. 2011;6(3):215–218. doi: 10.1177/1745691611406922. [ DOI ] [ PubMed ] [ Google Scholar ]
48. Lord CG, Ross L, Lepper MR. Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. J Pers Soc Psychol. 1979;37(11):2098–2109. [ Google Scholar ]
49. Mahoney MJ. Publication prejudices: An experimental study of confirmatory bias in the peer review system. Cognit Ther Res. 1977;1(2):161–175. [ Google Scholar ]
50. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Erlbaum; Hillsdale, NJ: 1988. [ Google Scholar ]
51. American Association of University Women 2010 Why So Few? Women in Science, Technology, Engineering, and Mathematics (American Association of University Women, Washington, DC). Available at www.aauw.org/resource/why-so-few-women-in-science-technology-engineering-mathematics/ . Accessed September 11, 2014.
52. Ginther DK, Kahn S. 2006 Does Science Promote Women? Evidence from Academia 1973-2001 (National Bureau of Economic Research, Cambridge, MA), NBER Working Paper No 12691. Available at www.nber.org/papers/w12691 . Accessed September 11, 2014.
53. National Science Foundation 2010 Women, Minorities, and Persons with Disabilities in Science and Engineering (National Center for Science and Engineering Statistics, Arlington, VA). Available at www.nsf.gov/statistics/2015/nsf15311/ . Accessed September 11, 2014.
54. Brewer MB. The psychology of prejudice: Ingroup love and outgroup hate. J Soc Issues. 1999;55(3):429–444. [ Google Scholar ]
55. Shields SA, Zawadzki MJ, Johnson RN. The impact of a workshop activity for gender equity simulation in the academy (WAGES-Academic) in demonstrating cumulative effects of gender bias. J Divers High Educ. 2011;4(2):120–129. [ Google Scholar ]
56. Zawadzki MJ, Danube CL, Shields SA. How to talk about gender inequity in the workplace: Using WAGES as an experiential learning tool to reduce reactance and promote self-efficacy. Sex Roles. 2012;67(11-12):605–616. [ Google Scholar ]
57. Hong L, Page SE. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc Natl Acad Sci USA. 2004;101(46):16385–16389. doi: 10.1073/pnas.0403723101. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
58. Kasof J. Sex bias in the naming of stimulus persons. Psychol Bull. 1993;113(1):140–163. doi: 10.1037/0033-2909.113.1.140. [ DOI ] [ PubMed ] [ Google Scholar ]
59. Mackie DM, Worth LT, Asuncion AG. Processing of persuasive in-group messages. J Pers Soc Psychol. 1990;58(5):812–822. doi: 10.1037//0022-3514.58.5.812. [ DOI ] [ PubMed ] [ Google Scholar ]
60. Maitner AT, Mackie DM, Claypool HM, Crisp RJ. Identity salience moderates processing of group-relevant information. J Exp Soc Psychol. 2010;46(2):441–444. [ Google Scholar ]
View on publisher site
PDF (568.5 KB)
Collections

Add to Collections

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 15 November 2021

The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting

Damon Centola ORCID: orcid.org/0000-0002-8084-2333 1 , 2 , 3 , 4 ,
Douglas Guilbeault ORCID: orcid.org/0000-0002-0177-3027 4 , 5 ,
Urmimala Sarkar 4 , 6 ,
Elaine Khoong ORCID: orcid.org/0000-0002-2514-3572 4 , 6 &
Jingwen Zhang ORCID: orcid.org/0000-0003-1733-6857 4 , 7

Nature Communications volume 12 , Article number: 6585 ( 2021 ) Cite this article

13k Accesses

21 Citations

171 Altmetric

Metrics details

Bias in clinical practice, in particular in relation to race and gender, is a persistent cause of healthcare disparities. We investigated the potential of a peer-network approach to reduce bias in medical treatment decisions within an experimental setting. We created “egalitarian” information exchange networks among practicing clinicians who provided recommendations for the clinical management of patient scenarios, presented via standardized patient videos of actors portraying patients with cardiac chest pain. The videos, which were standardized for relevant clinical factors, presented either a white male actor or Black female actor of similar age, wearing the same attire and in the same clinical setting, portraying a patient with clinically significant chest pain symptoms. We found significant disparities in the treatment recommendations given to the white male patient-actor and Black female patient-actor, which when translated into real clinical scenarios would result in the Black female patient being significantly more likely to receive unsafe undertreatment, rather than the guideline-recommended treatment. In the experimental control group, clinicians who were asked to independently reflect on the standardized patient videos did not show any significant reduction in bias. However, clinicians who exchanged real-time information in structured peer networks significantly improved their clinical accuracy and showed no bias in their final recommendations. The findings indicate that clinician network interventions might be used in healthcare settings to reduce significant disparities in patient treatment.

Explicit discrimination and ingroup favoritism, but no implicit biases in hypothetical triage decisions during COVID-19

“Influencing the influencers:” a field experimental approach to promoting effective mental health communication on TikTok

Reducing bias, increasing transparency and calibrating confidence with preregistration

Introduction.

Bias is an enduring cause of healthcare disparities by race and gender 1 , 2 , 3 , 4 , 5 , 6 , 7 . Previous experimental work demonstrated that clinicians reviewing video-based vignettes of high-risk patients with chest pain disproportionately referred men compared to women, and white patients compared to Black patients for the guideline-recommended treatment, cardiac catheterization 1 , 7 . Proposed solutions for addressing bias have focused on cognitive strategies that increase clinicians’ awareness of their own biases 6 , 8 , 9 , 10 , 11 . However, no approaches have yet been found that successfully reduce race and gender bias in clinical treatment recommendations 6 , 10 .

Recent research in non-clinical settings has shown that information exchange in large social networks with uniform—i.e. egalitarian 12 —connectivity can be effective for improving collective intelligence in both health-related and non-health-related risk assessments 13 , 14 , 15 , 16 . Studies of bias reduction in partisan networks 15 , 16 have found that this process of collective learning in egalitarian networks can effectively reduce, and even eliminate longstanding political biases in the evaluation of novel information 16 . Here, we integrate this recent work on bias reduction in egalitarian information-exchange networks with theoretical research on medical reasoning 17 , 18 , 19 , which has argued that improving the accuracy of clinicians’ diagnostic assessments should improve the quality of their treatment recommendations 19 , 20 , 21 . We hypothesize that creating structured information-exchange networks among clinicians will lead to improved clinical assessments that may be effective for reducing observed patterns of race and gender bias in clinical treatment recommendations 13 , 14 . Despite the broad practical 2 , 3 , 4 and scientific 1 , 5 , 6 , 7 importance of understanding and addressing bias in medical settings, it has not been possible to evaluate this hypothesis because such a test requires the ability to experimentally isolate and measure the direct effects of clinical networks on reducing medical bias and treatment disparities.

We adopted an experimental approach to evaluate whether large, uniform information-exchange networks among clinicians 13 , 14 , 15 , 16 might significantly reduce observed race and gender bias in clinicians’ treatment recommendations, relative to a control group of independent clinicians who did not participate in information-exchange networks. We recruited 840 practicing clinicians (see Supplementary Information for details) to participate in the online video-based study, which was administered through a proprietary mobile app for clinicians. Each clinician viewed a standardized patient video of either a white male patient-actor or Black female patient-actor, and provided clinical assessments and treatment recommendations for the depicted clinical case (see SI, Supplementary Fig. 5 and Supplementary Fig. 6 ). Both the white male and Black female “patients” in the videos were portrayed by professional actors who appeared 65 years old, were dressed in identical attire, and depicted a patient with clinically significant chest pain symptoms. The actors portraying each patient followed a single script, in which they provided an identical clinical history that included several risk factors for coronary artery disease (age, hyperlipidemia, and discomfort with exertion). Both videos were accompanied by an identical electrocardiogram exhibit showing abnormalities. (Hereafter, we refer to the patient-actors in the standardized patient videos as “patients”.)

After viewing the patient video, clinicians were asked to provide their initial clinical assessments and treatment recommendations. Clinical assessments took the form of a probability estimate (from 0 to 100) of the patient’s chance of having a major adverse cardiac event within the next 30 days. The most accurate assessment based on the patient’s HEART score is 16% 22 , 23 . (Additional analyses show the robustness of our findings across a range of assessment values. See SI “Sensitivity Analyses”, Supplementary Fig. 9 and Supplementary Fig. 10 ). Clinicians then selected a single treatment recommendation from four multiple choice options: Option A. daily 81 mg aspirin and return to clinic in one week (i.e. unsafe undertreatment); Option B. daily 81 mg aspirin and stress test within two to three days (i.e. undertreatment); Option C. full-dose aspirin and referral to emergency department for evaluation and monitoring (i.e. highest quality, guideline-recommended treatment); or Option D. full-dose aspirin and referral to cardiology for urgent cardiac catheterization (i.e. overtreatment in the context of unconfirmed diagnosis).

Option C is the most appropriate treatment based on currently accepted guidelines from the American College of Cardiology 23 , 24 and represents the highest standard of care 22 , 23 . (In consideration of the fact that some clinicians may choose a less aggressive initial strategy in a patient with atypical symptoms, we conducted sensitivity analyses that accepted Option B and Option C as correct. As reported in Supplementary Fig. 11 and Supplementary Fig. 12 in SI “Sensitivity Analyses”, these analyses show the robustness of our findings for both Option B and Option C). Our primary measure of bias is the rate at which the white male patient versus the Black female patient was given the highest quality, guideline-recommended care (Option C). Our secondary measure of bias is inequity in the treatment of the Black female and white male patients, in terms of the relative rates at which the white male patient and the Black female patient were recommended for unsafe undertreatment (option A) rather than the guideline-recommended care (Option C) (see SI ). We focus on the relative rates of option A and C because this aligns with the observed racial disparities in workup and referral rates for chest pain in clinical care 25 . Option A is unsafe and inappropriately defers workup to one week later putting the patient at risk of significant adverse outcomes. Option B (undertreatment) is not as unsafe as Option A because it shortens the time period of further evaluation from one week to 3 days; however, it is not the guideline-recommended care, which advises immediate evaluation for cardiac tissue damage to appropriately triage the patient. Option D is incorrect because without a troponin measurement (which assesses for acute damage of cardiac tissue), the patient presentation does not warrant an immediate invasive procedure. Option D exposes the patient to the risk of a potentially unnecessary invasive procedure and wasteful healthcare spending.

In each trial, clinicians were randomized to one of four conditions: (i) network condition with the Black female patient; (ii) network condition with the white male patient; (iii) control (independent reflection) condition with the Black female patient or (iv) control condition with the white male patient. In the two network conditions, clinicians were randomly assigned to a single location in a large, anonymous uniform social network ( n = 40), in which every clinician had an equal number of connections ( z = 4), which ensured that no single clinician had greater power over the communication dynamics within the network 13 (see SI, Supplementary Fig. 4 for network details). Clinicians were anonymous and did not have any information about how many peers they were connected to. Clinicians’ contacts in the network remained the same throughout the experiment. In the two control conditions, clinicians provided their responses in isolation. Because clinicians in the control conditions were independent from one another, fewer overall clinicians were required for the control analyses ( n = 20 in each trial). For proper comparison with the experimental conditions, we randomly assigned clinicians in each control condition into bootstrapped “groups” of n = 40, and conducted our analyses at the group level (see SI , “Statistical Analyses”).

In all conditions, clinicians were given three rounds to provide their assessments and treatment recommendations for the presented patient. In the initial round, all clinicians independently viewed their respective videos, and were then given two minutes to provide their clinical assessments and treatment recommendations. In the control conditions, clinicians remained isolated for two additional rounds of evaluation. In round two, they viewed the patient video a second time, and were again given two minutes to respond. Clinicians could either provide the same responses or modify their responses. In the final round, clinicians repeated this procedure again, and provided their final responses. In the network conditions, in round two clinicians were again shown the patient video, as well as being shown the average assessment responses (i.e. diagnostic estimates) of their network contacts, and then asked to provide their assessments and recommendations (see SI, Supplementary Fig. 6 ). Clinicians could either provide the same responses they gave in the initial round or modify their responses. In the final round, this procedure was repeated again showing the average responses from round two, and clinicians were asked to provide their final responses.

Each trial lasted ~8 min. Participants’ compensation was based on their performance in the final round. Only clinicians who provided the guideline-recommended clinical recommendation in their final responses were given a payment of $30. Clinicians who provided other responses were not compensated for their participation. 86% of participating clinicians completed our study. (Analyses provided in the SI show that all of our results are robust to the inclusion or exclusion of attrited participants, see SI “Sensitivity Analyses”).

We conducted seven independent trials of this study from March 1, 2019 to November 29, 2019. Except where explicitly noted, all statistical analyses were conducted at the trial level ( n = 7 trials × 4 experimental conditions = 28 trial-level observations). The conservative statistical approach we adopt here (reporting trial-level observations) reduces our power to detect effects of the experimental intervention, but it controls for the nonindependence among clinicians in the network conditions, enabling the direct comparison of each trial-level observation across all four experimental conditions. When individual-level analyses are presented using regression techniques, all standard errors are clustered at the trial level to preserve trial-level comparisons. (Additional analyses in the SI show that our findings are confirmed, and significantly strengthened using individual-level regression analyses with clustered standard errors. See SI, Supplementary Tables 6 to 14 ). Except for the presence of peer information in the network conditions, participant experience was identical across all experimental conditions. Consequently, any significant differences across experimental conditions in the change in clinicians’ treatment recommendations (from initial to final response) can be attributed to the direct effects of peer interaction networks on clinicians’ decision-making.

We now present the results indicating the effects of social networks on clinicians’ revisions to their diagnostic assessments and their treatment recommendations. In the following analyses, diagnostic accuracy is defined as the absolute number of percentage points between a clinician’s diagnostic assessment and the most accurate diagnostic assessment. For clarity of presentation, we normalize diagnostic accuracy on a 0–1 scale by applying min-max normalization to the absolute error of clinicians’ diagnostic assessments. Under this procedure, the minimum possible accuracy (indicated by 0) corresponds to the diagnostic assessment with the greatest absolute error (i.e. an estimate that is as far as possible from the most accurate answer of 16%, which in this case is 84 percentage points), while the maximum possible accuracy (indicated by 1) corresponds to a diagnostic assessment that is 0 percentage points away from the most accurate answer, such that they are equivalent (SI, “Statistical Analyses”). As above, in the discussion of our results we refer to the patient-actors in the standardized patient videos as “patients”.

Initial race and gender bias

Clinicians’ initial assessments and treatment recommendations were made independently. Figure 1 shows that for the initial responses of all clinicians in the study, there were no significant differences in the accuracy of the diagnostic assessments (Fig. 1a, b ) given to the Black female patient and the white male patient ( p > 0.5, n = 28, Wilcoxon Rank Sum Test, Two-sided); nor were there any significant differences in the accuracy of initial diagnostic assessments when controlling for experimental condition using a regression approach (β = 1.06, CI = [−3.79 to 5.92], p = 0.67, Supplementary Table 6 ). However, consistent with previous studies of bias in medical care 2 , 3 , 4 , 5 , 6 , despite clinicians providing both patients with similar diagnostic assessments, clinicians’ treatment recommendations varied significantly between patients. Across all clinicians, their initial treatment recommendations (Fig. 1c, d ) show a significant disparity in the rate at which the guideline-recommended treatment was recommended for the white male patient versus the Black female patient. Overall, clinicians recommended Option C, referral to the emergency department for immediate evaluation, for the white male patient in 22% of responses, while only making this recommendation for the Black female patient in 14% of responses ( p = 0.02, n = 28 observations, Wilcoxon Rank Sum Test, Two-sided).

Panels a and b show the change (from the initial assessment to the final assessment) in the average diagnostic accuracy of clinicians. Panel a shows the control conditions. Panel b shows the network conditions. The insets in both panels show the total improvement (in percentage points) in the accuracy of clinicians’ diagnostic assessments. Error bars display 95% confidence intervals; data points display the mean change for each of the trials ( N = 7) in each condition. Panels c and d show the change (from the initial recommendation to the final recommendation) in the proportion of clinicians recommending the guideline-recommended treatment recommendation—referral to the emergency department for immediate cardiac evaluation (Option C)—for the white male patient-actor and Black female patient-actor. Panel c shows the control conditions. Panel d shows the network conditions. The insets in both panels show the total improvement (in percentage points) in the percent of clinicians recommending the guideline-recommended treatment. Error bars display 95% confidence intervals; data points display the mean change for each of the trials ( N = 7) in each condition. Panels e and f show the change (from the initial response to the final response) in the odds of clinicians recommending option A (unsafe undertreatment) rather than option C (highest quality, guideline-recommended treatment) for each patient-actor. Panel e shows the control conditions. Panel f shows the network conditions. The insets in both panels show the total reduction in the likelihood that clinicians would recommend unsafe undertreatment rather than the guideline-recommended treatment for each patient-actor. Error bars display 95% confidence intervals; data points display the mean change for each of the trials ( N = 7) in each condition.

In the control conditions (Fig. 1a ), after two rounds of revision there was no significant change in the accuracy of clinicians’ assessments (i.e. diagnostic estimates) for either the white male patient ( p > 0.9, n = 7, Fig. 1a inset, Wilcoxon Signed Rank Test, Two-sided) or the Black female patient ( p > 0.9, n = 7, Fig. 1a inset, Wilcoxon Signed Rank Test, Two-sided). Correspondingly, Fig. 1c shows that in the control conditions there was no significant change in the rate at which clinicians recommend the guideline-recommend treatment for either the Black female patient or the white male patient (Black female patient showed a 3 percentage point increase, p = 0.81, n = 7 observations, Wilcoxon Signed Rank Test, Two-sided; white male patient showed a 1 percentage point increase, p = 0.93, n = 7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig. 1c ). Clinicians’ final treatment recommendations in the control conditions still showed a significant disparity between the white male patient and the Black female patient in their rates of referral to the emergency department ( p = 0.04, n = 14 observations, Wilcoxon Signed Rank Test, Two-sided; Fig. 1c ).

Networks reduce race and gender bias

Figure 1b shows that in the network conditions there were significant improvements (from the initial response to the final response) in the accuracy of the assessments given to both the white male patient ( p = 0.04, n = 7, Wilcoxon Signed Rank Test, Two-sided; Fig. 1b inset) and the Black female patient ( p = 0.01, n = 7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig. 1b inset). Figure 1d shows that in the network conditions, after two rounds of revision there was no significant change in the rate at which clinicians recommended the guideline-recommended treatment for the white male patient ( p = 0.57, n = 7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig. 1d inset). This lack of change is due to the fact that, regardless of the accuracy of their initial assessments for the white male patient, clinicians were initially significantly more likely to recommend the guideline-recommended treatment for white male patient ( p < 0.01, OR = 1.78, CI = [1.2–2.6], Supplementary Table 7 ). Consequently, improvements in assessment accuracy for the white male patient had a smaller positive impact on increasing clinicians’ likelihood of recommending the guideline-recommended treatment. By contrast, clinicians initially were significantly less likely to recommend the guideline-recommended treatment for the Black female patient ( p < 0.01, OR = 0.56, CI = [0.38–0.83], Supplementary Table 7 ), while they were significantly more likely to recommend unsafe undertreatment for this patient ( p < 0.05, OR = 1.5, CI = [1.08–2.04], Supplementary Table 8 ). Consequently, improvements in assessment accuracy had a substantially greater effect on the final treatment recommendations for the Black female patient (Fig. 1d ). In the network condition, the rate at which clinicians recommended guideline-recommended treatment for the Black female patient increased significantly, from 14% in initial response to 27% in final response ( p < 0.01, n = 7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig. 1d ). As a result, clinicians’ final treatment recommendations in the network conditions exhibited no significant disparity between the Black female patient and the white male patient in terms of referral rates to the emergency department ( p = 0.22, n = 14 observations, Wilcoxon Rank Sum Test, Two-sided ; See Supplementary Table 11 ).

The primary pathway for bias reduction in the network condition was the effect of improvements in clinicians’ assessment accuracy on reducing the initially high rates at which unsafe undertreatment was recommended for the Black female patient. Figure 1e, f shows the odds of clinicians recommending unsafe undertreatment rather than the guideline-recommended treatment for both patients in both conditions. Consistent with the above discussion, treatment recommendations for the white male patient did not exhibit any bias toward unsafe undertreatment ( p = 0.19, n = 14, Wilcoxon Signed Rank Test, Two-sided). As expected, improvements in assessment accuracy in the network condition did not significantly impact clinicians’ odds of recommending the guideline-recommended treatment rather than unsafe undertreatment for the white male patient ( p = 0.21, n = 7, Wilcoxon Signed Rank Test, Two-sided). By contrast, clinicians initially had significantly greater odds of recommending unsafe undertreatment rather than the guideline-recommended treatment for the Black female patient (Fig. 1e, f ; p < 0.01, n = 28 observations, Wilcoxon Signed Rank Test, Two-sided). Independent revision in the control conditions did not have any impact on the treatment recommendations for either the white male ( p = 1.0, n = 7, Wilcoxon Signed Rank Test, Two-sided) or the Black female patient ( p = 0.81, n = 7, Wilcoxon Signed Rank Test, Two-sided). However, assessment revisions in the network condition led to a significant change in the odds of clinicians recommending the guideline-recommended treatment rather than unsafe undertreatment for the Black female patient (Fig. 1f p = 0.01, n = 7, Wilcoxon Signed Rank Test, Two-sided). By the final round in the network conditions, there was no significant difference between patients in their odds of having clinicians recommend the guideline-recommended treatment rather than unsafe undertreatment (Fig. 1f , p = 0.19, n = 14, Wilcoxon Rank Sum Test, Two-sided).

Network mechanism for bias reduction

The network mechanism responsible for improvements in the accuracy of clinicians’ assessments, and the corresponding reduction of race and gender disparity in their treatment recommendations, is the disproportionate impact of accurate individuals in the process of belief revision within egalitarian social networks 13 , 15 , 16 . As demonstrated in earlier studies of networked collective intelligence 13 , 15 , 16 , during the process of belief revision in peer networks there is an expected correlation between the accuracy of an individual’s beliefs and the magnitude of their belief revisions, such that accurate individuals revise their responses less; this correlation between accuracy and revision magnitude is referred to as the “revision coefficient” 13 . Within egalitarian social networks, a positive revision coefficient has been found to give greater de facto social influence to more accurate individuals, which is predicted to produce network-wide improvements in the accuracy of individual beliefs within the social network. These improvements in collective accuracy have been found to result in a corresponding reduction in biased responses among initially biased participants 12 , 13 , 15 , 16 . Figure 2a tests this prediction for clinicians in our study. The results show, as expected, that there is a significant positive revision coefficient among clinicians in the network conditions ( p < 0.001, r = 0.66, SE = 0.1, clustered by trial, Supplementary Table 14 ), indicating that less accurate clinicians made greater revisions to their responses while more accurate clinicians made smaller revisions, giving greater de facto influence in the social network to more accurate clinicians. This correlation holds equally for clinicians’ assessments for both the white male and Black female patients (Supplementary Table 14 ). Figure 2b shows that for both patients, improvements in assessment accuracy led to significant improvements in the quality of their treatment recommendations ( p < 0.05, OR = 1.04, CI = [1.00, 1.09], Supplementary Table 9 ). Importantly, for clinicians who initially recommended unsafe undertreatment (Option A), we find that improvements in assessment accuracy significantly predict an increased likelihood of recommending the guideline-recommended treatment (Option C) by the final round ( p < 0.01, OR = 1.17, CI = [1.03, 1.33], Supplementary Table 10 ). These improvements translated into a significant reduction in the inequity of recommended care for the Black female patient, for whom clinicians were initially significantly more likely to recommend unsafe undertreatment (see Fig. 3 , below).

Panel a shows clinicians’ propensity to revise their diagnostic assessments in the network conditions according to the initial error in their diagnostic assessments. Clinicians’ accuracy is represented as the absolute number of percentage points of a given assessment from the most accurate assessment of 16% (represented by 0 along the x -axis, indicating a distance of 0 percentage points from the most accurate response). Magnitude of revision is measured as the absolute difference (percentage points) between a clinician’s initial diagnostic assessment and their final diagnostic assessment. Clinicians’ accuracy in their initial assessment significantly predicts the magnitude of their revisions between the initial to final response. Grey error band displays 95% confidence intervals for the fit of an OLS model regressing initial error of diagnostic assessment on magnitude of revision. Panel b shows the significant positive relationship between the improvement in clinicians’ diagnostic accuracy (from the initial to final assessment), and their likelihood of improving in their treatment recommendation (i.e. the probability of switching from recommending Option A, B, or D to Option C) for clinicians in the network conditions. The trend line shows the estimated probability of clinicians improving their treatment recommendations according to a logistic regression, controlling for an interaction between experimental condition (control or network) and patient-actor demographic (Black female or white male) (Supplementary Table 9 ). Error bars show standard errors clustered at the trial level.

Each panel shows the fraction of clinicians providing each treatment recommendation at the initial and final response, averaged first within each of the trials in each condition ( N = 7), and then averaged across trials. Option A. 1 week follow-up (unsafe undertreatment). Option B. Stress test in 2–3 days (undertreatment). Option C. Immediate emergency department evaluation (guideline-recommended treatment). Option D. Immediate cardiac catheterization (overtreatment Panel a shows the change in control condition recommendations for the Black female patient-actor (initial recommendations light pink, final recommendations dark pink). Panel b shows the change in network condition recommendations for the Black female patient-actor (initial recommendations light pink, final recommendations dark pink). Panel c shows the change in control condition recommendations for the white male patient-actor (initial recommendations light blue, final recommendations dark blue). Panel d shows the change in network condition recommendations for the white male patient-actor (initial recommendations light blue, final recommendations dark blue).

Figure 3 shows the changing rates at which clinicians recommended each option (Option A. unsafe undertreatment, Option B. undertreatment, Option C. guideline-recommended treatment, and Option D. overtreatment) for each patient, from the initial response to the final response, for all conditions. As discussed above, we are particularly interested in the inequity of patient care, defined as the rate at which clinicians made a clearly unsafe recommendation (Option A) versus recommending the guideline-recommended treatment (Option C) 23 , 24 . Initial responses exhibited significant inequity between patients. Initially, across both conditions, 29.9% of clinicians recommended the unsafe undertreatment for the Black female patient, while only 14.1% recommended the guideline-recommended treatment, resulting in a 15.7 percentage point difference in the rate at which clinicians recommended unsafe undertreatment rather than the guideline-recommended treatment for the Black female patient. By contrast, for the white male patient, 23.4% of clinicians recommended the unsafe undertreatment, while 21.4% of clinicians recommended the guideline-recommended treatment, resulting in a 2 percentage point difference in the likelihood of clinicians recommending unsafe undertreatment rather than the guideline-recommended treatment for the white male patient. This resulted in a 13.7 percentage point difference between the Black female patient and the white male patient in their likelihood of having clinicians recommend unsafe undertreatment rather than the guideline-recommended treatment ( p = 0.02, n = 28 observations, Wilcoxon Rank Sum Test, Two-sided). Individual reflection did not reduce this inequity. The control conditions produced no significant change in the inequity between patients from the initial response to the final response ( p = 0.57, n = 14 observations, Wilcoxon Signed Rank Test, Two-sided). Accordingly, in the final response in the control conditions, there was a 15.3 percentage point difference between the Black female patient and the white male patient in their likelihood of having the clinician recommend unsafe undertreatment rather than the guideline-recommended treatment ( p = 0.04, n = 14 observations, Wilcoxon Rank Sum Test, Two-sided; see SI Eq. 2). Strikingly, however, improvements in diagnostic accuracy in the network condition produced a 20 percentage point reduction in the rate at which clinicians recommended unsafe undertreatment rather than the guideline-recommended treatment the Black female patient ( p = 0.04, n = 14 observations, Wilcoxon Rank Sum Test, Two-sided). By the final response in the network conditions, inequity was eliminated—the Black female patient was no longer more likely than the white male patient to have clinicians recommend unsafe undertreatment rather than the guideline-recommended treatment ( p = 0.16, n = 14 observations, Wilcoxon Rank Sum Test, Two-sided).

Networks īncrease quality of care for all

Figure 3 (panels a–d) also shows that the network conditions improved the quality of clinical care recommended for both patients (white male and Black female). In particular, for both the Black female and white male patient, the network conditions produced significantly greater reductions in the proportion of clinicians recommending unsafe undertreatment (Option A) than the control conditions (−1.6 percentage point reduction in the control conditions, −11.8 percentage point reduction in the network conditions; p < 0.01 , n = 28 observations, Wilcoxon Signed Rank Test, Two-sided). This reduction in the recommendation of unsafe undertreatment (Option A) was associated with significant increases in recommendations for safer care for both patients. While Option B was not the guideline-recommended treatment, it represents a safer treatment than Option A. Correspondingly, the network conditions significantly increased the proportion of clinicians recommending safer undertreatment (Option B) than the control conditions (−3.5 percentage point reduction in control conditions, +6.5 percentage point increase in the network conditions; p = 0.03, n = 28 observations, Wilcoxon Signed Rank Test, Two-sided). Strikingly, the rate of overtreatment (i.e. Option D, unnecessary invasive procedure) for both patients was significantly decreased in the network conditions, while it increased in the control conditions (−2.8 percentage point reduction in the network conditions, +3.1 percentage point increase in the control conditions; p < 0.01, n = 28 observations, Wilcoxon Signed Rank Test, Two-sided).

These results reveal a tendency for clinicians in the control conditions to increase the acuity (i.e. “urgency”) of care for all patients as a result of independent reflection, leading to an increase in overtreatment. By contrast, in the network conditions, clinicians adjusted their recommendations toward safer, more equitable care for both patients, significantly reducing both unsafe undertreatment (Option A) and overtreatment (Option D). Additional sensitivity analyses show these findings to be robust to variations in clinicians’ characteristics 26 (see SI, “Sensitivity Analyses”).

Past experimental and epidemiologic studies of bias have reported changes in biased attitudes as a result of cognitive interventions (such as cultural competency training) 27 , 28 . However, these studies have been unable to demonstrate any effect of these interventions on clinical recommendations, or on the reduction of population level disparities in clinical treatment by race and gender 6 , 29 .

We found that among a population of clinicians who initially exhibited significant bias in the provision of recommended treatment for a Black female versus a white male patient with chest pain, egalitarian communication networks significantly reduced disparities in treatment recommendations for the white male and Black female patient. In particular, as a result of information exchange in structured peer networks, significantly fewer clinicians recommended unsafe undertreatment for the Black female patient. Consistent with our predictions about the effects of peer-network communications in reducing biased perceptions 12 , 13 , 15 , 16 , these findings suggest that clinical decision-making can be viewed through a behavioral and social lens rather than as a purely individual, rational process 30 . New institutional opportunities may exist for digital technologies to connect clinicians in uniform information-sharing networks, particularly in the emerging fields of telemedicine and online clinical support networks 12 , 31 , 32 , 33 , 34 , 35 .

Our study design offered several advantages. First, by using identically clothed standardized patients, the same examination room backdrop, the same electrocardiogram exhibit, identical hand gestures and body language, and a single script, we minimized the effects of individual patient differences, for example in perceived socioeconomic status, as well as other incidental factors like patient affect, from our experiment (see SI, “Stimuli Design”). The only variation between patient conditions was the race and gender of the patient. These controls enabled the identification of bias in clinicians’ recommendations. Second, clinicians in our study were only compensated based on the quality of their final recommendations. This design created strong incentives for clinicians to provide the highest quality care. Finally, the use of several rounds of independent reflection in the control conditions ensured that any improvements in the quality of clinicians’ recommendations in the network conditions can be attributed directly to peer networks and not to the opportunity for reflection.

As with all experimental settings, the controlled design of our study necessarily comes with some limitations. First, rather than in-person clinical visits, we used video recordings of actors portraying patients and a computerized survey instrument to assess clinical treatment recommendations for the management of cardiac chest pain. This enabled us to better identify patient race and gender as the primary factors differing across patient conditions. Previous studies have demonstrated the external validity of case vignettes for assessing in-person clinical decision-making and treatment recommendations 33 , 35 . Further work has also indicated that using standardized patient videos, rather than written vignettes, substantially increases the likelihood that observed clinician decision-making in the study will match clinical decision-making in real medical settings 36 . Second, a practical limitation of our study, which arose due to the studio time required to hire actors and record very similar patient videos for both patients (in terms of clothing, posture, gesticulations, and pacing of speech), was that we were only able to have a single patient video for each condition. We note that a larger number of patient videos would be desirable for future studies, and we anticipate that future work will explore the extent to which additional factors may be relevant for understanding the impact of patients’ non-medical characteristics on the quality of their medical treatment. To provide support for future studies we have made the media resources constructed for this study publicly available for use by other scholars. A third limitation of our study is that we recruited participants through social media and through an academic medical center. Clinicians who responded to our invitation were likely to be younger, and more likely to be located in an academic practice, than the overall population of practicing clinicians in the US 37 , 38 . This suggests that the baseline bias detected in this study may be different in other populations. We also anticipate that the increasing familiarity with social technologies found among early career clinicians will be a positive factor for considering opportunities for the use of information-exchange networks to support bias reduction 38 . Finally, clinicians in our study were forced to select among four possible treatment options. These four options did not reflect the full breadth of potential clinical care options. However, the clinical options used in our study were sufficient to distinguish between recommendations for unsafe care and guideline-recommended care, revealing inequity in treatment recommendations according to the race and gender of the patient.

We found that independent reflection in the control conditions produced a consistent movement toward increased acuity treatments 39 , 40 . Strikingly, this movement did not have any significant effect on reducing inequity, but did significantly increase overtreatment. Peer networks may also be an effective approach to address overtreatment, an area of increasing concern in healthcare, and a well-known issue for cardiac catheterization 41 , 42 , 43 , 44 , 45 . Overtreatment may result in inappropriate care which not only has implications for patient outcomes but also for healthcare costs. By potentially reducing over-testing and enabling clinicians to reach a guideline-recommended treatment decision more quickly, peer networks may have the potential to reduce costs associated with diagnostic delays, inappropriate testing, and incorrect treatment 41 , 42 . While more work is needed to explore the economic implications of peer-network technologies for supporting clinical decisions, our findings suggest that there may be significant economic benefits of leveraging peer-network strategies to reduce both medical bias and patient mistreatment.

We anticipate that, beyond cardiovascular disease, structured peer communication networks may also be effective for reducing bias in other clinical settings known to suffer from race and gender disparities, such as the use of opioids in the management of acute pain 5 , 46 , imaging for back pain 47 and breast cancer 48 , and the management of depression 49 . Our findings suggest that bias in healthcare might be treated not only as a cognitive problem, but also as a problem of social norms, which may be addressed through peer networking strategies for bias reduction.

This research was approved by the Institutional Review Board at the University of Pennsylvania where this study was conducted, and it included informed consent by all participants in the study.

Debriefing materials

Immediately following completion of the study, all participants were provided debriefing materials that included the correct diagnostic estimate, the correct treatment recommendation, and a detailed explanation of the clinical case, along with supporting references. The debriefing text is as follows:

“For the risk estimate, the correct answer is: 16% chance of an adverse cardiac event within 30 days. For the treatment recommendation, the correct answer is: Option C: Full-dose aspirin and refer to the emergency department for evaluation and monitoring.

Explanation of the answer

The patient is at intermediate/moderate risk due to: (1) symptoms (discomfort with exertion, dyspnea), (2) history (concern for cardiac origin), (3) age (>65 years old), (4) EKG (T-wave inversion / flattening), (5) risk factors (hyperlipidemia). The patient has a HEART score of 5 (1 point for moderately suspicious history; 1 point for repolarization disturbance; 2 points for age >65; 1 point for 1–2 risk factors) without a troponin level. For a HEART score range from 4 to 6, the most accurate answer is 16% chance of an adverse cardiac event within 30 days. Even a mild troponin increase would place the patient at 7 points (or high risk). The recommendation for this patient who also has T-wave abnormalities is for same day troponin testing or further evaluation in the emergency department. The patient needs to be immediately evaluated for further risk stratification via cardiac enzymes or a same day non-invasive stress testing, and therefore option C is the preferred answer. Option A does not pursue necessary further evaluation. Option B delays this evaluation. Option D is not appropriate for an individual with intermediate risk.

Bosner, S. et al. Ruling out coronary artery disease in primary care: development and validation of a simple prediction rule. Can. Med. Assoc. J . 182 , 1295–1300 (2010).

Ebell, M. H. Evaluation of chest pain in primary care patients. Am. Fam. Physician . 83 , 603–605 (2011).

Mahler, S. A. et al. The HEART pathway randomized trial. Circulation 8 (2), 195–203 (2015).

Poldervaart, J. M. et al. Effect of using the HEART Score in patients with chest pain in the Emergency Department: a stepped-wedge, cluster randomized trial. Ann. Intern. Med . 166 , 689–697 (2017).

Recruitment

A total of 840 clinicians were recruited from around the US to participate in a diagnostic challenge facilitated by a mobile application designed for this study called “DxChallenge.” Clinicians were recruited between March 1 and November 29, 2019 from online discussion boards on Reddit and from Facebook’s advertising platforms, as well as through Penn Medicine’s Graduate Medical Education training program (for resident MD clinicians). Each advertisement directed clinicians to a webpage that specified the purpose of the research, the eligibility requirements, and the research compensation to interested participants. The webpage provided links to Google Play or the Apple App store, where participants could enroll by downloading the “DxChallenge” app for free. When registering in the app, participants were required to input a valid email address and a valid 10-digit National Provider Identification (NPI), i.e. the unique personal identifier given to healthcare providers in the US. The webpage informed clinicians that each diagnostic challenge would be announced via push notifications on their phone, which would appear on their screen and could be clicked to take them into the trial. The “DxChallenge” app was developed by the authors solely for the purpose of conducting this study, and the use of the DxChallenge app for this research is compliant with the terms of use for this app.

Experimental design

To initiate a trial, the DxChallenge app sent push notifications to all clinicians who had registered for the study (Supplementary Fig. 3 ). Once 120 clinicians had responded, they were randomized to conditions in a 2:1 ratio—80 clinicians were randomized to the network conditions, and 40 clinicians were randomized to the control conditions (Supplementary Fig-. 1 ). The 80 clinicians randomized to the network condition were then randomized in a 1:1 ratio into each of the network conditions (i.e. a standardized patient video of a white male patient-actor, or a standardize patient video of a Black female patient-actor). The 40 clinicians in the control condition were then randomized in a 1:1 ratio into each of the control conditions (i.e. a standardized patient video of a white male patient-actor, or a standardize patient video of a Black female patient-actor). All randomizations were automated through the app.

In the control conditions, clinicians were isolated and not embedded in social networks. In the network conditions, clinicians were randomly assigned to a single location in a large uniform social network ( n = 40), in which every clinician had four anonymous network contacts (Supplementary Fig. 4 ). Each network of 40 formed an interconnected chain of clinicians, each of whom had four direct contacts. Clinicians’ contacts in the network remained the same throughout the experiment. This created a structurally uniform network, defined as a topology in which every clinician had an equal number of connections ( z = 4), which ensured that no single clinician had greater power over the communication dynamics within the network 13 , 16 . More technically, for the network condition, we generated a random k-regular graph in which every node possessed exactly four connections; to generate this graph randomly, we first generated a k-regular lattice ( k = 4), and then we randomly rewired each connection, while making sure that every node retained only four connections. Clinicians in the network condition were then randomly assigned to a position within this randomly generated egalitarian network. The same network topology was used across all trials in the network condition.

Each clinician viewed a standardized patient video of either a white male patient-actor or Black female patient-actor, and provided clinical assessments and treatment recommendations for the depicted clinical case (see Supplementary Fig. 5 and Supplementary Fig. 6 below). Both the white male and Black female “patients” in the videos were portrayed by professional actors who appeared 65 years old, were dressed in identical attire, and depicted a patient with clinically significant chest pain symptoms. The patient-actors were recruited through a local casting service company (Kathy Wickline Casting) located in Philadelphia. An initial pool of 20 actors’ resumes and photos were reviewed by two researchers from the team. Two Black female and four white male actors were invited for sending in a test video where they narrated the female or the male patient script. All researchers reviewed the test videos, discussed their acting qualities and comparability in patient characteristics, and reached a consensus on selecting one Black female and one white male actor for the experiment. The two actors came to the media production studio of the Annenberg School for Communication on February 27, 2019. They were given the same clothes and light patient make-ups for quality comparison. All videos were filmed by the professional filming crew on the same day at the studio. Hereafter, we refer to the patient-actors in the standardized patient videos as “patients”.

In all four conditions, clinicians were asked to provide an initial evaluation of the patient video. All clinicians initially independently viewed the video and were then given two minutes to provide responses to the assessment and recommendation questions. All conditions viewed the same clinical vignette (see SI for full description of the vignette; Supplementary Figs. 5 – 7 ). Every aspect of the vignette was held constant across conditions, except for the race and gender of the patient in the video vignette. Regardless of the patient’s demographic, the patient wore the same clothing in the same environment, and the patient reported their symptoms using the same script. (See “Stimuli Design” for comprehensive detail on the structure of the vignette). All stimuli are publicly available for use in future research at the following link: https://github.com/drguilbe/cliniciansCI .

The vignette was displayed in the app. The patient’s symptoms were communicated by the patient-actor in an embedded video within the app (Supplementary Fig. 5 ). Each round, clinicians were given a question concerning the medical status of a patient and were asked to enter a diagnostic assessment in the “provide estimate” field. The “Clinical Recommendation” field provided a dropdown menu from which clinicians selected a clinical recommendation for the patient in the vignette. The case description for each vignette was designed in consultation with clinicians to represent the type of question that clinicians regularly face in board exams or continuing medical education exams, where the question has a preferred answer for both the probability of the specific condition and the proper clinical recommendation for patient management.

In round one, each clinician was asked to input a diagnostic assessment and a choice of treatment from a set of options in a dropdown menu (Supplementary Fig. 5 ). In round two and round three in the control condition, clinicians were shown the same vignette and were asked to answer the same question on their own, with no change to the user experience (Supplementary Fig. 6 ). In round two and round three in the network condition, clinicians were shown the average answer of the clinicians they were connected to in the social network structured through the DxChallenge app, and they were once again asked to provide a diagnostic assessment and to select a treatment option (Supplementary Fig. 6 ). The participant experience was identical between the control and the network condition, except for that participants in the network condition were exposed to the average assessment of the other clinicians they were connected to in the network. If at any point a participant attempted to advance to the next round without inputting a diagnostic assessment or a treatment choice, a message appeared telling them that they had to input all required responses before advancing. Each trial lasted for 8 min. Only clinicians who provided the guideline-recommended clinical recommendation in their final response were given a financial reward of $30. Clinicians who provided incorrect responses were not compensated for their participation.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data collected for this study are available for download from the Network Dynamics Group website: https://ndg.asc.upenn.edu/experiments/physician-reasoning/ . The data are also available at https://github.com/drguilbe/cliniciansCI . Source data are provided with this paper.

Code availability

The code for this study was written in R, and is deposited on GitHub, available at: https://github.com/drguilbe/cliniciansCI .

Chen, J., Rathore, S. S., Radford, M. J., Wang, Y. & Krumholz, H. M. Racial differences in the use of cardiac catheterization after acute myocardial infarction. N. Engl. J. Med. 344 , 1443–1449 (2001).

Article CAS PubMed Google Scholar

Dehon, E. et al. A Systematic review of the impact of physician implicit racial bias on clinical decision making. Acad. Emerg. Med. 24 , 895–904 (2017).

Article PubMed Google Scholar

FitzGerald, C. & Hurst, S. Implicit bias in healthcare professionals: a systematic review. BMC Med. Ethics 18 , 19 (2017).

Article PubMed PubMed Central Google Scholar

Hall, W. J. et al. Implicit racial/ethnic bias among health care professionals and its influence on health care outcomes: a systematic review. Am. J. Public Health 105 , E60–E76 (2015).

Hirsh, A. T., Hollingshead, N. A., Ashburn-Nardo, L. & Kroenke, K. The interaction of patient race, provider bias, and clinical ambiguity on pain management decisions. J. Pain. 16 , 558–568 (2015).

Maina, I. W., Belton, T. D., Ginzberg, S., Singh, A. & Johnson, T. J. A decade of studying implicit racial/ethnic bias in healthcare providers using the implicit association test. Soc. Sci. Med. 199 , 219–229 (2018).

Schulman, K. A. et al. The effect of race and sex on physicians’ recommendations for cardiac catheterization. N. Engl. J. Med . 340 , 618–626 (1999).

Burgess, D. J., Beach, M. C. & Saha, S. Mindfulness practice: a promising approach to reducing the effects of clinician implicit bias on patients. Patient Educ. Couns. 100 , 372–376 (2017).

Croskerry, P. The importance of cognitive errors in diagnosis and strategies to minimize them. Acad. Med. 78 , 775–780 (2003).

Sherbino, J., Kulasegaram, K., Howey, E. & Norman, G. Ineffectiveness of cognitive forcing strategies to reduce biases in diagnostic reasoning: a controlled trial. CJEM 16 , 34–40 (2014).

Sukhera, J., Gonzalez, C. & Watling, C. J. Implicit bias in health professions: from recognition to transformation. Acad. Med . 95 , 717–723 (2020).

Guilbeault, D. & Centola, D. Networked collective intelligence improves dissemination of scientific information regarding smoking risks. PLoS ONE 15 , e0227813 (2020).

Article CAS PubMed PubMed Central Google Scholar

Becker, J., Brackbill, D. & Centola, D. Network dynamics of social influence in the wisdom of crowds. Proc. Natl Acad. Sci. USA 114 , E5070–E5076 (2017).

Becker, J., Guilbeault, D. & Smith, E. B. The crowd classification problem. Manag . Sci. https://doi.org/10.1287/mnsc.2021.4127 (2021).

Becker, J., Porter, E. & Centola, D. The wisdom of partisan crowds. Proc. Natl Acad. Sci. USA 116 , 10717–10722 (2019).

Guilbeault, D., Becker, J. & Centola, D. Social learning and partisan bias in the interpretation of climate trends. Proc. Natl Acad. Sci. USA 115 , 9714–9719 (2018).

Elia, F., Apra, F. & Crupi, V. Understanding and improving decisions in clinical medicine (II): making sense of reasoning in practice. Intern. Emerg. Med . 13 , 287–289 (2018).

Kurvers, R. H. et al. Boosting medical diagnostics by pooling independent judgments. Proc. Natl Acad. Sci. USA 113 , 8777–8782 (2016).

Pauker, S. G. & Kassirer, J. P. The threshold approach to clinical decision making. N. Engl. J. Med . 302 , 1109–1117 (1980).

Brush, J. E., Lee, M., Sherbino, J., Taylor-Fishwick, J. C. & Norman, G. Effect of teaching bayesian methods using learning by concept vs learning by example on medical students’ ability to estimate probability of a diagnosis: a randomized clinical trial. JAMA Netw. Open 2 , e1918023 (2019).

Centor, R. M., Geha, R. & Manesh, R. The pursuit of diagnostic excellence. JAMA Netw. Open 2 , e1918040 (2019).

Byrne, C., Toarta, C., Backus, B. & Holt, T. The HEART score in predicting major adverse cardiac events in patients presenting to the emergency department with possible acute coronary syndrome: protocol for a systematic review and meta-analysis. Syst. Rev. 7 , 148 (2018).

Fernando, S. M. et al. Prognostic accuracy of the HEART score for prediction of major adverse cardiac events in patients presenting with chest pain: a systematic review and meta-analysis. Acad. Emerg. Med. Feb 26 , 140–151 (2019).

Article Google Scholar

Amsterdam, E. A. et al. 2014 AHA/ACC Guideline for the Management of Patients with Non-ST-Elevation Acute Coronary Syndromes: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J. Am. Coll. Cardiol. 64 , e139–e228 (2014).

Napoli, A. M., Choo, E. K., Dai, J. & Desroches, B. Racial disparities in stress test utilization in an emergency department chest pain unit. Crit. Pathw. Cardiol. 12 , 9–13 (2013).

Krockow, E. M. et al. Harnessing the wisdom of crowds can improve guideline compliance of antibiotic prescribers and support antimicrobial stewardship. Sci. Rep. 10 , 18782 (2020).

Article ADS CAS PubMed PubMed Central Google Scholar

Butler, M. et al. in Comparative Effectiveness Reviews, No. 170 (Agency for Healthcare Research and Quality, 2016).

Zeidan, A. J. et al. Implicit bias education and emergency medicine training: step one? awareness. AEM Educ. Train. 3 , 81–85 (2019).

Abimanyi-Ochom, J. et al. Strategies to reduce diagnostic errors: a systematic review. BMC Med. Inf. Decis. Mak. 19 , 174 (2019).

Henriksen, K. & Brady, J. The pursuit of better diagnostic performance: a human factors perspective. BMJ Qual. Saf. 22 , ii1–ii5 (2013).

Barnett, M. L., Boddupalli, D., Nundy, S. & Bates, D. W. Comparative accuracy of diagnosis by collective intelligence of multiple physicians vs individual physicians. JAMA Netw. Open 2 , e190096 (2019).

Chaudhry, B. et al. Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann. Intern. Med. 144 , 742–752 (2006).

Evans, S. C. et al. Vignette methodologies for studying clinicians’ decision-making: validity, utility, and application in ICD-11 field studies. Int. J. Clin. Health Psychol. 15 , 160–170 (2015).

Khoong, E. C. et al. Impact of digitally acquired peer diagnostic input on diagnostic confidence in outpatient cases: a pragmatic randomized trial. J. Am. Med. Inf. Assoc. 28 , 632–637 (2021).

Veloski, J., Tai, S., Evans, A. S. & Nash, D. B. Clinical vignette-based surveys: a tool for assessing physician practice variation. Am. J. Med. Qual. 20 , 151–157 (2005).

Roland, D., Coats, T. & Matheson, D. Towards a conceptual framework demonstrating the effectiveness of audiovisual patient descriptions (patient video cases): a review of the current literature. BMC Med. Educ. 12 , 125 (2012).

Adilman, R. et al. Social media use among physicians and trainees: results of a national medical oncology physician survey. J. Oncol. Pract. 12 , 79–80 (2016).

Cooper, C. P. et al. Physicians who use social media and other internet-based communication technologies. J. Am. Med. Inf. Assn 19 , 960–964 (2012).

Article ADS Google Scholar

Lambe, K. A., Hevey, D. & Kelly, B. D. Guided reflection interventions show no effect on diagnostic accuracy in medical students. Front. Psychol. 9 , 2297 (2018).

Monteiro, S. D. et al. Reflecting on diagnostic errors: taking a second look is not enough. J. Gen. Intern. Med. 30 , 1270–1274 (2015).

Carroll, A. E. The high costs of unnecessary care. JAMA 318 , 1748–1749 (2017).

Hess, P. L. et al. Appropriateness of percutaneous coronary interventions in patients with stable coronary artery disease in US Department of Veterans Affairs Hospitals from 2013 to 2015. JAMA Netw. Open 3 , e203144 (2020).

Ko, D. T. et al. Regional variation in cardiac catheterization appropriateness and baseline risk after acute myocardial infarction. J. Am. Coll. Cardiol. 51 , 716–723 (2008).

Lyu, H. et al. Overtreatment in the United States. PLoS ONE 12 , e0181970 (2017).

Jayles, B. & Kurvers, R. Exchanging small amounts of opinions outperforms sharing aggregated opinions of large crowds. PsyArxiv https://arxiv.org/pdf/2003.06802.pdf (2020).

Pletcher, M. J., Kertesz, S. G., Kohn, M. A. & Gonzales, R. Trends in opioid prescribing by race/ethnicity for patients seeking care in US emergency departments. JAMA 299 , 70–78 (2008).

Burgess, D. J. et al. The effect of cognitive load and patient race on physicians’ decisions to prescribe opioids for chronic low back pain: a randomized trial. Pain. Med. 15 , 965–974 (2014).

Rauscher, G. H., Allgood, K. L., Whitman, S. & Conant, E. Disparities in screening mammography services by race/ethnicity and health insurance. J. Women’s Health 21 , 154–160 (2012).

Harris, P. A. The impact of age, gender, race, and ethnicity on the diagnosis and treatment of depression. J. Manag Care Pharm. 10 , S2–S7 (2004).

PubMed Google Scholar

Download references

Acknowledgements

D.C. gratefully acknowledges support from a Robert Wood Johnson Foundation Pioneer Grant, #73593 (D.C.), and thanks Alan Wagner for App development assistance and Joshua Becker for helpful contributions to App development.

Author information

Authors and affiliations.

Annenberg School for Communication, University of Pennsylvania, Philadelphia, PA, 19106, USA

Damon Centola

School of Engineering, University of Pennsylvania, Philadelphia, PA, 19106, USA

Department of Sociology, University of Pennsylvania, Philadelphia, PA, 19106, USA

Network Dynamics Group, University of Pennsylvania, Philadelphia, PA, 19106, USA

Damon Centola, Douglas Guilbeault, Urmimala Sarkar, Elaine Khoong & Jingwen Zhang

Hass School of Management, University of California, Berkeley, Berkeley, CA, 94720, USA

Douglas Guilbeault

Division of General Internal Medicine, University of California, San Francisco, San Francisco, CA, 94110, USA

Urmimala Sarkar & Elaine Khoong

Department of Communication, University of California, Davis, Davis, CA, 95616, USA

Jingwen Zhang

You can also search for this author in PubMed Google Scholar

Contributions

D.C. designed the project, D.C., E.K., and U.S. developed the research materials, D.G. and J.W. ran the experiments, D.C., D.G., E.K., and J.W. conducted the analyses, D.C. wrote the manuscript. All authors commented on and approved the final manuscript.

Corresponding author

Correspondence to Damon Centola .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Chloe Fitzgerald, Dhruv Kazi, Ralf Kurvers and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information, source data, source data, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Centola, D., Guilbeault, D., Sarkar, U. et al. The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting. Nat Commun 12 , 6585 (2021). https://doi.org/10.1038/s41467-021-26905-5

Download citation

Received : 22 July 2020

Accepted : 28 October 2021

Published : 15 November 2021

DOI : https://doi.org/10.1038/s41467-021-26905-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Bias of ai-generated content: an examination of news produced by large language models.

Shangkun Che
Xiaohang Zhao

Scientific Reports (2024)

Cost-effectiveness and algorithmic decision-making

Jakob Mainz
Lauritz Munch
Jens Christian Bjerring

AI and Ethics (2024)

Unintentional Discrimination Against Patients with a Migration Background by General Practitioners in Mental Health Management: An Experimental Study

Camille Duveau
Camille Wets
Vincent Lorant

Administration and Policy in Mental Health and Mental Health Services Research (2023)

Black girls and referrals: racial and gender disparities in self-reported referral to substance use disorder assessment among justice-involved children

Micah E. Johnson
Shawnta L. Lloyd
Linda B. Cottler

Substance Abuse Treatment, Prevention, and Policy (2022)

Mitigating the impact of biased artificial intelligence in emergency decision-making

Hammaad Adam
Aparna Balagopalan
Marzyeh Ghassemi

Communications Medicine (2022)

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Introduction
Conclusions
Article Information

For the Gender-Career IAT, implicit measures include 34 662 women and 7624 men; explicit measures, 34 835 women and 7675 men. For the Gender-Specialty IAT, implicit and explicit measures included 45 women and 85 men. Error bars represent SE. Standard errors for the Gender-Career IAT data are so small that they are not visible on the graph.

Explicit bias scores are calculated as the difference between the responses to 2 self-reported items about participants’ associations of gender with career and family (Gender-Career Implicit Association Test [IAT]) or with surgery and family medicine (Gender-Specialty IAT).

eTable 1. IAT Design for Gender and Surgery vs Family Medicine

eTable 2. Regression Analysis Predicting Implicit and Explicit Bias From the Gender-Specialty IAT

Implicit Bias in Surgery JAMA Network Open Invited Commentary July 5, 2019 Fahima Dossa, MD; Nancy N. Baxter, MD, PhD

See More About

Customize your JAMA Network experience by selecting one or more topics from the list below.

Academic Medicine
Acid Base, Electrolytes, Fluids
Allergy and Clinical Immunology
American Indian or Alaska Natives
Anesthesiology
Anticoagulation
Art and Images in Psychiatry
Assisted Reproduction
Bleeding and Transfusion
Caring for the Critically Ill Patient
Challenges in Clinical Electrocardiography
Climate and Health
Climate Change
Clinical Challenge
Clinical Implications of Basic Neuroscience
Clinical Pharmacy and Pharmacology
Complementary and Alternative Medicine
Consensus Statements
Coronavirus (COVID-19)
Critical Care Medicine
Cultural Competency
Dental Medicine
Dermatology
Diabetes and Endocrinology
Diagnostic Test Interpretation
Digital Health
Drug Development
Emergency Medicine
End of Life, Hospice, Palliative Care
Environmental Health
Equity, Diversity, and Inclusion
Facial Plastic Surgery
Gastroenterology and Hepatology
Genetics and Genomics
Genomics and Precision Health
Global Health
Guide to Statistics and Methods
Hair Disorders
Health Care Delivery Models
Health Care Economics, Insurance, Payment
Health Care Quality
Health Care Reform
Health Care Safety
Health Care Workforce
Health Disparities
Health Inequities
Health Policy
Health Systems Science
History of Medicine
Hypertension
Images in Neurology
Implementation Science
Infectious Diseases
Innovations in Health Care Delivery
JAMA Infographic
Law and Medicine
Leading Change
Less is More
LGBTQIA Medicine
Lifestyle Behaviors
Medical Coding
Medical Devices and Equipment
Medical Education
Medical Education and Training
Medical Journals and Publishing
Narrative Medicine
Neuroscience and Psychiatry
Notable Notes
Nutrition, Obesity, Exercise
Obstetrics and Gynecology
Occupational Health
Ophthalmology
Orthopedics
Otolaryngology
Pain Medicine
Palliative Care
Pathology and Laboratory Medicine
Patient Care
Patient Information
Performance Improvement
Performance Measures
Perioperative Care and Consultation
Pharmacoeconomics
Pharmacoepidemiology
Pharmacogenetics
Pharmacy and Clinical Pharmacology
Physical Medicine and Rehabilitation
Physical Therapy
Physician Leadership
Population Health
Primary Care
Professional Well-being
Professionalism
Psychiatry and Behavioral Health
Public Health
Pulmonary Medicine
Regulatory Agencies
Reproductive Health
Research, Methods, Statistics
Resuscitation
Rheumatology
Risk Management
Scientific Discovery and the Future of Medicine
Sexual Health
Shared Decision Making and Communication
Sleep Medicine
Sports Medicine
Stem Cell Transplantation
Substance Use and Addiction Medicine
Surgical Innovation
Surgical Pearls
Teachable Moment
The Art of JAMA
The Arts and Medicine
The Rational Clinical Examination
Tobacco and e-Cigarettes
Translational Medicine
Trauma and Injury
Treatment Adherence
Ultrasonography
Users' Guide to the Medical Literature
Vaccination
Venous Thromboembolism
Veterans Health
Women's Health
Workflow and Process
Wound Care, Infection, Healing

Get the latest research based on your areas of interest.

Others also liked.

Download PDF
X Facebook More LinkedIn

Salles A , Awad M , Goldin L, et al. Estimating Implicit and Explicit Gender Bias Among Health Care Professionals and Surgeons. JAMA Netw Open. 2019;2(7):e196545. doi:10.1001/jamanetworkopen.2019.6545

Manage citations:

Permissions

Estimating Implicit and Explicit Gender Bias Among Health Care Professionals and Surgeons

1 Section of Minimally Invasive Surgery, Department of Surgery, Washington University in St Louis, St Louis, Missouri
2 Medical student, School of Medicine, Washington University in St Louis, St Louis, Missouri
3 Department of Psychological and Brain Sciences, Washington University in St Louis, St Louis, Missouri
Invited Commentary Implicit Bias in Surgery Fahima Dossa, MD; Nancy N. Baxter, MD, PhD JAMA Network Open

Question Do surgeons and health care professionals hold implicit or explicit biases regarding gender and career roles?

Findings A review of 42 991 Implicit Association Test records and a cross-sectional study of 131 surgeons provided evidence of implicit and explicit gender bias. Data suggest that health care professionals and surgeons hold implicit and explicit biases associating men with careers and surgery and women with family and family medicine.

Meaning This work contributes an estimate of the extent of implicit gender bias within medicine; awareness of bias, such as through an Implicit Association Test, is an important first step toward minimizing its potential effect.

Importance The Implicit Association Test (IAT) is a validated tool used to measure implicit biases, which are mental associations shaped by one’s environment that influence interactions with others. Direct evidence of implicit gender biases about women in medicine has yet not been reported, but existing evidence is suggestive of subtle or hidden biases that affect women in medicine.

Objectives To use data from IATs to assess (1) how health care professionals associate men and women with career and family and (2) how surgeons associate men and women with surgery and family medicine.

Design, Setting, and Participants This data review and cross-sectional study collected data from January 1, 2006, through December 31, 2017, from self-identified health care professionals taking the Gender-Career IAT hosted by Project Implicit to explore bias among self-identified health care professionals. A novel Gender-Specialty IAT was also tested at a national surgical meeting in October 2017. All health care professionals who completed the Gender-Career IAT were eligible for the first analysis. Surgeons of any age, gender, title, and country of origin at the meeting were eligible to participate in the second analysis. Data were analyzed from January 1, 2018, through March 31, 2019.

Main Outcomes and Measures Measure of implicit bias derived from reaction times on the IATs and a measure of explicit bias asked directly to participants.

Results Almost 1 million IAT records from Project Implicit were reviewed, and 131 surgeons (64.9% men; mean [SD] age, 42.3 [11.5] years) were recruited to complete the Gender-Specialty IAT. Healthcare professionals (n = 42 991; 82.0% women; mean [SD] age, 32.7 [11.8] years) held implicit (mean [SD] D score, 0.41 [0.36]; Cohen d = 1.14) and explicit (mean [SD], 1.43 [1.85]; Cohen d = 0.77) biases associating men with career and women with family. Similarly, surgeons implicitly (mean [SD] D score, 0.28 [0.37]; Cohen d = 0.76) and explicitly (men: mean [SD], 1.27 [0.39]; Cohen d = 0.93; women: mean [SD], 0.73 [0.35]; Cohen d = 0.53) associated men with surgery and women with family medicine. There was broad evidence of consensus across social groups in implicit and explicit biases with one exception. Women in healthcare (mean [SD], 1.43 [1.86]; Cohen d = 0.77) and surgery (mean [SD], 0.73 [0.35]; Cohen d = 0.53) were less likely than men to explicitly associate men with career ( B coefficient, −0.10; 95% CI, −0.15 to −0.04; P < .001) and surgery ( B coefficient, −0.67; 95% CI, −1.21 to −0.13; P = .001) and women with family and family medicine.

Conclusions and Relevance The main contribution of this work is an estimate of the extent of implicit gender bias within surgery. On both the Gender-Career IAT and the novel Gender-Specialty IAT, respondents had a tendency to associate men with career and surgery and women with family and family medicine. Awareness of the existence of implicit biases is an important first step toward minimizing their potential effect.

Enrollment of women in medical school has been nearly equivalent to that of men in the United States since 1999 1 and has recently surpassed that of men for the first time. 2 Despite this apparent equality, as of 2017 only 41% of all faculty and approximately 24% of full professors were women. 3 These gaps are even larger when looking at department chairs: only 14% are women. 4 Many factors likely contribute to women’s lack of equal representation in medical careers beyond medical school. Perhaps academic medical careers are less interesting or attractive to women than they are to men, or maybe pressures within medical training and academics favor men over women.

Implicit biases, or mental associations outside of conscious awareness or control that influence one’s interactions with others, 5 may hinder the advancement of women in medicine. Sometimes, implicit biases lead people to act in ways that are not in line with their explicit beliefs or values. 6 For example, one may explicitly believe that men and women are equally good at math. However, implicitly or unconsciously, one might be more likely to associate math with men than with women. These biases are shaped by the environment in which we live and are only weakly related to one’s conscious attitudes or beliefs. Importantly, implicit biases are associated with behaviors in socially sensitive contexts, such as interracial interactions. 7 , 8

Direct evidence of implicit biases concerning women in medicine has not yet been reported, to our knowledge, but existing evidence is suggestive of subtle or hidden biases. For example, women physicians are often addressed as Nurse instead of Doctor or are introduced by their first name rather than their title. 9 A study from 2016 showed that Medicare reimbursements to female physicians are lower than reimbursements to male physicians. 10 When Silver et al 11 tracked societal awards given out since 1945, they found that many societies had never given an award to a woman. Women are also less likely than men to be invited to give grand rounds, particularly as an outside speaker. 12 One might argue that these discrepancies are due to women being less competent than men. However, these biases persist even in experiments in which candidates are matched on qualifications but differ in gender. For example, despite identical qualifications on a curriculum vitae, evaluators perceive male applicants to be more hirable and worthy of higher salaries than female applicants. 13 Together, these data suggest that bias is an important factor that preempts women’s success in medicine.

The Implicit Association Test (IAT) was developed and validated to measure implicit biases 14 and has demonstrated high internal consistency and robust evidence for predictive validity in numerous studies. 7 , 15 , 16 To understand the degree of gender bias within the broad context of hospitals and health care systems, we examined the data of several thousand health care professionals who took the Gender-Career IAT from Project Implicit, the largest host of online IATs, with more than 26 million IATs started since 1998. Similar to how others have used IATs to assess health care professionals’ weight bias 17 or associations of race with adherence, 18 we developed a novel Gender-Specialty IAT to assess how surgeons associate men and women with surgery and family medicine. Surgery is of particular interest because of the known gender imbalance in the field, with only 25% of assistant professors being women. 19 Previous data suggest that men and women in surgery perceive a gender ability stereotype to exist within this field. 20 We chose family medicine as a comparison field because it may not be widely stereotyped as being masculine or feminine compared with other medical specialties. We hypothesized that men and women would be faster to associate men with surgery and women with family medicine than the reverse.

We followed the Strengthening the Reporting of Observational Studies in Epidemiology ( STROBE ) reporting guideline for reporting cross-sectional studies. Use of the Gender-Career IAT data and recruitment for the Gender-Specialty IAT were approved by the institutional review board of Washington University in St Louis, St Louis, Missouri. Participants taking the Gender-Specialty IAT provided written informed consent.

In an IAT, people sort words that appear on the screen into categories as quickly as they can. Concepts that are closely associated should be easier to sort together quickly. For example, in the Gender-Career IAT, participants sort gender ( male or female ) and career ( career or family ). In 1 part of the Gender-Career IAT, participants sort words related to male or career to one side of the screen and words related to female or family to the opposite side. In the next part, they do the reverse: instead of male/career and female/family being sorted to the same side, male/family are sorted together, as are female/career . The test uses reaction times for these tasks as a measure of the strength of associations between concepts. Thus, if one is faster at pairing male with career and female with family than male with family and female with career , a stronger association for men with careers and women with families than the reverse is suggested.

The Gender-Career IAT is hosted on the Project Implicit site and has been taken by 953 878 people during the past 12 years. From January 1, 2006, through December 31, 2017, 42 991 people who took the Gender-Career IAT self-identified as working in health care, and approximately one-fourth of these self-identified as diagnosing and treating professionals. The remaining categories of participants in health care are listed in Table 1 . We downloaded the full data set, which is available from Project Implicit. 21 In addition to the measure of implicit bias, the Gender-Career IAT included 2 questions assessing explicit bias: “How strongly do you associate career with males and females?” and “How strongly do you associate family with males and females?” Responses ranged from “strongly female” (1) to “strongly male” (7). As in previous IAT research, the measure of explicit bias was calculated as the difference between these 2 items, ranging from −6 (career is strongly female, whereas family is strongly male) to 6 (career is strongly male, whereas family is strongly female). 22

We developed an IAT with 2 categories (male and female) and 2 attributes (surgery and family medicine) based on the work of Greenwald and Banaji 5 and the Gender-Career IAT available at Project Implicit. 23 We replaced the terms for career and family with terms for surgery and family medicine . To ensure reliability of the IAT, stimuli must be accurate, clear, and similar across categories. 24 Based on pilot data, we revised the terms for this study to make them even more evocative of surgery and family medicine. Initially chosen words, such as scalpel and operating room, could cause indecision for participants because they could be associated with ideas other than surgery. They also had no corresponding terms in family medicine. Professional organizations, on the other hand, are easy to recognize and could be matched to both specialties. Thus, we ultimately used logos from societies such as the American Board of Surgery and the American College of Surgeons. eTable 1 in the Supplement shows the terms and images used for surgery and family medicine as well as the test blocks. The names of men and women we used were the ones used in the Gender-Career IAT (Ben, John, Daniel, Paul, Jeffrey, Julia, Michelle, Anna, Emily, and Rebecca). The order of the blocks was randomly assigned so that some participants were first asked to associate male with surgery and female with family medicine, whereas others were first asked to associate male with family medicine and female with surgery . The IAT was run from the Project Implicit website 21 with support from Project Implicit.

At the completion of the IAT, participants were asked questions similar to those on the Gender-Career IAT to assess their explicit bias about gender. One read as follows: “How strongly do you associate surgery with males and females? ” with responses ranging from “strongly female” (1) to “strongly male” (7). Similar to the Gender-Career IAT, a parallel question was asked about family medicine. Explicit bias was calculated as the difference between these 2 items, ranging from −6 (surgery is strongly female, whereas family medicine is strongly male) to 6 (surgery is strongly male, whereas family medicine is strongly female). Participants were also asked demographic questions, including gender, race, title, country, and region. For ease of data collection, data were collected using tablet devices.

We collected data from the Gender-Career IAT on Project Implicit and focused most analyses on participants who work in health care fields. For the novel Gender-Specialty IAT, we recruited surgeons (in practice and in training) in attendance at the American College of Surgeons meeting in October 2017 in San Diego, California. They were recruited by volunteers throughout meeting hotels and the convention center. Participants received a $10 Amazon gift card in exchange for their participation.

Data were analyzed from January 1, 2018, through March 31, 2019. The IAT is scored using the D score, a measure of bias based on the reaction times in the experimental blocks of the test (sequences 3 and 5 in eTable 1 in the Supplement ). 15 The D score is a variation on the Cohen d and is calculated by taking the difference in the mean reaction times for those 2 sequences divided by the pooled SD. The D scores range from −2 to 2, with positive D scores indicating a stronger association of men with career (or surgery) and women with family (or family medicine) and negative scores indicating the reverse. D scores are roughly equivalent in interpretation to the Cohen d , with a D score of 0.50 meaning that a participant was 0.5-SD faster in responding to men and career (or surgery) and to women and family (or family medicine) than the reverse. The means reported for the implicit IAT measure as well as those used in regression analyses for that measure are the means of the D scores.

The D score is a within-participants effect size comparing differences between one’s reaction times in 2 IAT blocks. The Cohen d , by contrast, is an effect size comparing the within-participant effect size with an external standard (eg, the point of no preference or a group mean). Thus, although the D score is an estimate of the difference in response times between blocks on the IAT, the Cohen d is an estimate of how different that score is from the point of no preference (in a single-sample test) or how different the scores of 2 different groups are from each other (when comparing means of 2 groups). As is common, we interpret effect sizes of approximately 0.2 to be small, approximately 0.5 to be medium, and approximately 0.8 or greater to be large. 25

For the Gender-Career IAT, we examined the overall D scores for health care professionals as well as differences in the D score and the measure of explicit bias by type of worker. We also analyzed differences in implicit and explicit bias by gender, age, and region.

We performed similar analyses for our novel Gender-Specialty IAT. Participants received feedback on their performance at the end of the IAT. We analyzed the D scores to assess the overall mean as well as any differences by gender, title, or region.

For both IATs, we used 2-tailed t tests for comparisons between 2 groups and analysis of variance for comparisons among multiple groups. We used linear regression analyses while controlling for demographic variables to examine associations between those variables and implicit and explicit bias. The threshold for statistical significance was set a priori at 2-sided α = .05 for all statistical analyses. All analyses were performed in SAS, version 9.4 (SAS Institute Inc). Only complete responses were included in analyses.

A total of 42 991 health care professionals completed the Gender-Career IAT ( Table 1 ). Consistent with the health care workforce, 82.0% of respondents were women, and 18.0% were men. Mean (SD) age was 32.7 (11.8) years. Most participants (69.2%) were white. A little more than one-third (33.5%) were nursing and home health care assistants, and 24.9% were diagnosing and treating professionals. Data were also available from 910 887 participants who were not health care professionals (67.5% female and 68.3% white).

The IAT scores linking men with career and women with family were significantly different from zero among health care professionals (mean [SD] D score, 0.41 [0.36]; Cohen d = 1.14) and non–health care professionals (mean [SD] D score, 0.37 [0.38]; Cohen d = 0.97). Health care professionals exhibited slightly stronger implicit associations for men with career and women with family than non–health care professionals ( t 46,921 = −23.65; P < .001; Cohen d = 0.11). Interestingly, female (mean [SD] D score, 0.44 [0.35]; Cohen d = 1.23) and male (mean [SD] D score, 0.31 [0.39]; Cohen d = 0.79) health care professionals exhibited implicit associations of men with career and women with family that were significantly different from zero. These associations were stronger among female health care professionals than among male health care professionals ( t 10,621 = 26.89; P < .001; Cohen d = 0.35). A significant difference was evident among the categories of health care professionals such that diagnosing and treating professionals whose scores were significantly different from zero (mean [SD] D score, 0.37 [0.38]; Cohen d = 0.97) showed significantly lower scores than each of the other categories ( t ≤ −4.76; P < .001 for all pairwise comparisons).

In regression analyses of implicit bias from gender, age, ethnicity, and country, we found that women were slightly more likely than men to associate men with career and women with family ( B coefficient, 0.13; 95% CI, 0.12-0.14; P < .001). Other statistically significant findings are given in Table 2 , such as the findings related to age, race, and country of residence. However, the regression coefficients are so small that these findings are not practically significant.

Explicit bias responses associating men with career and women with family were significantly different from zero for both women (mean [SD], 1.43 [1.86]; Cohen d = 0.77) and men (mean [SD], 1.44 [1.79]; Cohen d = 0.80) in health care ( t 11,585 = −0.64; P = .52, Cohen d = −0.01 for the comparison by gender). Explicit bias was significantly different from zero among health care professionals (mean [SD], 1.43 [1.86]; Cohen d = 0.77) and non–health care professionals (mean [SD], 1.36 [1.73]; Cohen d = 0.79). Health care professionals exhibited more explicit bias than non–health care professionals ( t 46,554 = −7.23; P < .001; Cohen d = 0.04). All categories of health care professionals expressed explicit bias linking men with career and women with family, including diagnosing and treating professionals (mean [SD], 1.50 [1.61]; Cohen d = 0.93), nursing and home health care assistants (mean [SD], 1.41 [1.98]; Cohen d = 0.71), and other health care support (mean [SD], 1.39 [1.87]; Cohen d = 0.74). When we compared categories of health care professionals, those professionals who were diagnosing and treating patients were more likely to explicitly associate men with career and women with family than were nursing and home health care assistants ( t 24,717 = 4.06; P < .001; Cohen d = 0.05) and other health care support ( t 23,298 = 5.07; P < .001; Cohen d = 0.06).

In contrast with the regression analysis of implicit bias, Table 2 demonstrates that women were less likely than men to express an explicit association of men with career and women with family ( B coefficient, −0.10; 95% CI, −0.15 to −0.04; P < .001). Hispanic participants and participants of other races/ethnicities were less likely than white participants to explicitly associate men with career and women with family (Hispanic participants: B coefficient, −0.11 [95% CI, −0.18 to −0.03]; t 32,009 = −2.72; P = .007; participants of other races/ethnicities: B coefficient, −0.18 [95% CI, −0.26 to −0.09]; t 32,009 = −3.96; P < .001).

We collected complete data on the Gender-Specialty IAT from 131 participants. Table 3 provides the demographic characteristics of the participants in the study. Eighty-five participants (64.9%) were men and 45 (34.4%) were women. The mean (SD) age of these participants was 42.3 (11.5) years, and 77 (58.8%) were white. Participants were distributed across all titles (assistant professor, associate professor, and full professor).

The mean IAT score indicated a significant association linking men with surgery and women with family medicine (mean [SD] D score, 0.28 [0.37]; Cohen d = 0.76). No difference in IAT scores was found between male and female participants ( t 99.04 = −0.11; P = .91; Cohen d = −0.03). When we restricted data to those living in the United States, no significant difference in gender bias was found by region ( F 3,80 = 0.89; P = .45).

None of the demographic variables we collected correlated with implicit bias. As shown in eTable 2 in the Supplement , no demographic variables were statistically significant in a regression analysis of implicit bias from gender, age, race, and title.

Explicit bias responses associating men with surgery and women with family medicine were significantly different from zero for men (mean [SD], 1.27 [0.39]; Cohen d = 0.93) and women (mean [SD], 0.73 [0.35]; Cohen d = 0.53). Men expressed more explicit bias than did women ( t 88.50 = −2.11; P = .04; Cohen d = 0.39).

As shown in eTable 2 in the Supplement , regression analysis of the explicit bias measure from gender, age, race, and title found that women were less likely than men to associate men with surgery and women with family medicine ( B coefficient, −0.67; 95% CI, −1.21 to −0.13; P = .001). Those in private practice also were less likely to associate men with surgery and women with family medicine than those who had listed their title as “other” ( B coefficient, −1.13; 95% CI, −1.96 to −0.29; P = .009). Those who identified as Asian were more likely than white participants to associate men with surgery and women with family medicine ( B coefficient, 0.81; 95% CI, 0.13-1.48; P = .02). There was no difference in explicit bias in surgery by age ( B coefficient, 0.02; 95% CI, −0.01 to 0.05; P = .26).

Figure 1 and Figure 2 show differences between men and women on levels of implicit and explicit bias. These figures illustrate the finding that women expressed lower levels of explicit gender bias than did men. Data for the implicit measures were mixed, with women expressing slightly higher levels of implicit bias than men on the Gender-Career IAT, whereas no difference by gender was noted on the Gender-Specialty IAT.

The data from Project Implicit’s Gender-Career IAT suggest that men and women in health care strongly implicitly associate men with career and women with family. With regard to explicit bias, however, men in health care were more likely than women to associate men with career and women with family. These findings are similar to what we found with the Gender-Specialty IAT assessing bias among surgeons. Surgeons tended to associate men with surgery and women with family medicine. Thus, from both data sets we found that, although men and women associated men with career and surgery (and women with family and family medicine), men were more likely than women to consciously express a bias linking men with career or surgery and women with family or family medicine. Future research should replicate these findings and assess whether these biases are linked to existing gender disparities. For example, previous studies 26 - 28 suggest that women may be more likely than men to leave surgical residency, and implicit gender biases could play a role. Girod et al 29 have also suggested that implicit bias among senior faculty may contribute to the gender disparity in leadership roles in academic medicine.

On the novel implicit measure of gender bias about surgery and family medicine, we found evidence of consensus. Across all social categories assessed (gender, race, title, region of the United States, and country of origin), participants taking our novel Gender-Specialty IAT expressed implicit and explicit bias about men and women in surgery. We found that male and female surgeons’ implicit gender-specialty biases were large and similar in magnitude to male and female health care workers’ implicit gender-career biases. With explicit biases, we found evidence of a difference between genders. Explicit gender-specialty biases for male surgeons were large and similar in magnitude to explicit gender-career biases for male health care workers. However, explicit gender-specialty biases for female surgeons were smaller than explicit gender-career biases for female health care workers. This difference could be due to variation in sample populations or topics assessed. These data, although not definitive, suggest that biases linking surgery with men and family medicine with women may be widespread across the United States among surgeons. Unlike the Gender-Career IAT, we did not identify a difference between the genders in implicit bias on the Gender-Specialty IAT. However, this finding may be due to the smaller sample size in the Gender-Specialty IAT.

Diversity is important to the success of organizations. 30 , 31 Specifically, organizations with more diverse leadership are more productive and profitable. Patients, who come from many different backgrounds, are more satisfied with their care when it is provided by someone who looks like them. 6 , 32 Given that women are approximately 50% of the population and that an increasing percentage in the United States is of minority race or ethnicity, we must ensure that we foster physicians of all gender and racial groups. Having diverse people in leadership positions ensures that role models and potential mentors are available for all applicants. 33 Role models and mentors, in turn, are important for recruiting trainees who are members of underrepresented groups. 34 To improve recruitment and retention of diverse trainees, we need to better understand the factors that contribute to underrepresentation of women.

For many, awareness of bias is an important first step toward minimizing its effects. 35 The data presented herein help to raise awareness of gender bias within medicine. In addition, these data allow trainees to understand the context in which they will practice, thus better preparing them for their future work environment. Finally, this study adds to the existing evidence that organizations can use to make the case for prioritizing diversity and possibly implicit bias training.

This study lacks granularity about health care fields from the Gender-Career IAT data. Thus, we are not able to isolate, for example, physicians exclusively. The category of diagnosing and treating professionals may include dentists, nurse practitioners, and physician assistants, for example. In addition, selection bias for the Gender-Career IAT may lead to a lower estimate of the degree of bias present generally. As many as 0.43% of respondents may have been repeated sessions. Data from these IATs do not allow us to assess the effect of intersectionality or other genders, since both IATs focused on male/female gender alone. We cannot determine whether those sessions were different individuals using the same computer or the same individual.

A limitation of the novel Gender-Specialty IAT is that we recruited participants attending a surgical meeting. We appear to have undersampled older surgeons. This limitation is unlikely to affect the results dramatically because results on the Gender-Career IAT and other similar IATs found only small correlations between age and implicit biases. 22 If anything, having fewer older surgeons may underestimate the degree of gender bias in this context. Our sample size for the novel Gender-Specialty IAT is modest.

The main contribution of this work is an initial estimate of the extent of implicit gender bias within health care. Future research could examine implications of implicit gender biases on gender inequality and discrimination. Other research already provides some interventions for addressing gender bias regardless of whether it comes from implicit bias or other sources. For example, increasing transparency of hiring and promotion policies, considering diversity as a performance metric for organizations, and promoting flexible leave all serve to increase the success of female physicians and trainees. 36 - 38 Further documentation of implicit associations and other potential psychological obstacles to women’s success will be important for determining the most effective interventions to reduce gender inequality. It is important to also intentionally study the effects of bias on individuals who hold more than one minority identity, such as black or Hispanic women. Such research will benefit current medical students who will become our physicians tomorrow.

Accepted for Publication: May 15, 2019.

Published: July 5, 2019. doi:10.1001/jamanetworkopen.2019.6545

Corresponding Author: Arghavan Salles, MD, PhD, Section of Minimally Invasive Surgery, Department of Surgery, Washington University in St Louis, 4901 S Euclid Ave, Ste 920, St Louis, MO 63108 ( [email protected] ).

Author Contributions: Drs Salles and Lai had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Salles, Awad, Goldin, Lai.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Salles, Goldin, Lai.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Salles, Goldin, Lai.

Administrative, technical, or material support: Salles, Lee.

Supervision: Salles, Awad.

Conflict of Interest Disclosures: Dr Salles reported receiving honoraria from Medtronic plc for consulting and speaking. Dr Lai reported serving as the director of research for Project Implicit. No other disclosures were reported.

Register for email alerts with links to free full-text articles
Access PDFs of free articles
Manage your interests
Save searches and receive search alerts

COMMENTS

Using the Implicit Relational Assessment Procedure (IRAP) to ...
The Implicit Relational Assessment Procedure (IRAP) was used to determine directionality of any implicit gender-STEM bias detected. In addition, the IRAP was used to explore the possibility of implicit ageism bias, because there is anecdotal evidence of high levels of ageism in the STEM areas.
Gender bias in academia: a lifetime problem that needs ...
Given the prevalent and deep-rooted nature of gender bias in academia, we aim to unravel different forms of bias, evaluate their manifestation over the career-span, and provide suggestions towards resolving gender disparity.
Exposure to sexism can decrease implicit gender stereotype bias
In two experiments, we exposed male and female participants to sexism (either hostile or benevolent sexist beliefs) or no sexism. In Experiment 1, we measured gender stereotype bias with an Implicit Asso-ciation Test (IAT; Greenwald, McGhee, & Schwartz, 1998).
Gender bias in funding evaluation: A randomized experiment
In this paper we use an experimental design to measure the effects of a cause: the effect of the gender of the principal investigator (PI) on the score of a research funding application (treatment). We embedded a hypothetical research application description in a field experiment.
Promoting concern about gender bias with evidence-based ...
All four experiments examined whether an evidence-based confrontation about gender bias increased negative self-directed affect and, in turn, concern about gender bias and intentions to monitor one's future behavior for possible gender bias.
Quality of evidence revealing subtle gender biases in science ...
Thus, growing evidence revealing a gender bias against women—or favoring men—within science, technology, engineering, and mathematics (STEM) settings is provocative and raises questions about the extent to which gender bias may contribute to women’s underrepresentation within STEM fields.
Quality of evidence revealing subtle gender biases in science ...
Thus, overall, experiments 1 and 2 provide converging evidence from multiple participant populations that men are less receptive than women—and by the same token, that women are more receptive than men—to experimental evidence of gender bias in STEM.
A re-evaluation of gender bias in receptiveness to scientific ...
In the gender bias relative to the no gender bias condition, compared to men, women indicated significantly lower levels of connection to, positive attitudes towards, and desire to pursue STEM. In another study, Moss-Racusin, Molenda and Cramer found further support for a lack of male receptiveness to gender bias.
The reduction of race and gender bias in clinical treatment ...
Bias in clinical practice, in particular in relation to race and gender, is a persistent cause of healthcare disparities. We investigated the potential of a peer-network approach to reduce bias...
Estimating Implicit and Explicit Gender Bias Among Health ...
Explicit bias scores are calculated as the difference between the responses to 2 self-reported items about participants’ associations of gender with career and family (Gender-Career Implicit Association Test [IAT]) or with surgery and family medicine (Gender-Specialty IAT). Table 1.

Using the Implicit Relational Assessment Procedure (IRAP) to Examine Implicit Gender Stereotypes in Science, Technology, Engineering and Maths (STEM)

Cite this article

Access this article

Similar content being viewed by others

Examining the effectiveness of brief interventions to strengthen a positive implicit relation between women and STEM across two timepoints

Social Context in a Collective IRAP Application about Gender Stereotypes: Mixed Versus Single Gender Groups

Availability of Data and Materials

Author information

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Additional information

Rights and permissions

About this article

Share this article

Gender bias in academia: a lifetime problem that needs solutions

2. Introduction

3. Gender biases are amplified through career stages

4. Gender bias hinders scientific productivity, authorship and peer-review

Suggestions for decreasing gender bias at an individual level:

Suggestions for decreasing gender bias at the institutional level:

5. Gender differences in the number of citations

Suggestions at the individual level:

Suggestions at the institutional level:

6. Scientific funding and awards are heavily biased

Suggestions at the Individual level:

Suggestions at the Institutional level:

7. Teaching evaluations reflect biases and gender-role expectations

8. Academic hiring, tenure decisions and promotions favor men

Suggestions at the societal level:

9. Gender bias in negotiation outcomes

10. Gender inequalities are present in conferences

11. Sexual harassment is a major obstacle encompassing all career stages

12. Encompassing all sectors: family planning in academia

13. Not all gender biases are the same: Intersectionality

14. Discussion

The fight for gender equity needs diverse role models and strong allies

Challenges and major open questions in addressing gender bias

Conclusions

Acknowledgements

References * :

Similar articles

Add to Collections

Quality of evidence revealing subtle gender biases in science is in the eye of the beholder

Significance

Current Research

SI Materials and Methods

Participants and Recruitment for Experiment 2.

Procedure for Experiment 1.

Procedure for Experiment 2.

Procedure for Experiment 3.

Dependent Variables.

Experiments 1 and 2.

Experiment 3.

SI Additional Analyses

Experiment 2.

Limitations and Future Directions

Materials and Methods

Acknowledgments

Similar articles

Add to Collections

The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting

Similar content being viewed by others

Explicit discrimination and ingroup favoritism, but no implicit biases in hypothetical triage decisions during COVID-19

“Influencing the influencers:” a field experimental approach to promoting effective mental health communication on TikTok

Reducing bias, increasing transparency and calibrating confidence with preregistration

Initial race and gender bias

Networks reduce race and gender bias

Network mechanism for bias reduction

Networks īncrease quality of care for all

Debriefing materials

Explanation of the answer

Recruitment

Experimental design

Reporting summary

Data availability

Code availability

Acknowledgements

Author information