Excel's Autocorrect Is Interfering with Scientific Research, and It's a Problem | Mytour NOWYou’ve likely encountered messages mangled by autocorrect or autocomplete, leading to amusing misunderstandings. Countless memes highlight these blunders, such as this one, this one, or this one, often resulting in laughter.
However, the situation turns less amusing when such errors affect scientific publications. Research papers often include supplementary files containing data, charts, and graphs that support their findings, many of which are spreadsheet-based.
Since 2004, researchers have observed that Microsoft Excel, a widely used spreadsheet tool, has a tendency to alter certain gene names into unrelated data formats. For instance, the gene MARCH1, which stands for membrane-associated ring-CH-type finger 1, is often misinterpreted by Excel as a date, transforming it into 1-Mar or a similar March 1st notation.
Excel mistakenly interprets certain gene names as coordinates or floating-point numbers. While it might be possible to deduce that 1-Mar refers to MARCH1, identifying 2.31E+13 as the RIKEN identifier 2310009E13 is far more challenging. RIKEN, a prominent Japanese research institute involved in genome projects, uses this identifier. (For clarity, we tested entering the RIKEN identifier into Excel, and it was automatically converted to 2.31E+19, which remains incorrect and misrepresents the data in academic papers.)
Although this issue was first recognized over a decade ago, it continues to persist. Researchers Mark Ziemann, Yotam Eren, and Assam El-Osta analyzed more than 35,000 supplementary files to assess the extent of the problem. They created automated tools to detect data resembling gene name lists, identifying 7,467 gene lists across 3,597 papers in 18 journals. Shockingly, 704 of these papers contained files with Excel formatting errors, resulting in a 19.6% error rate. This means nearly one in five published papers includes flawed supplementary data due to Excel's formatting.
This issue goes beyond mere inconvenience. Researchers depend on published data to inform their own studies. Errors in these files can lead to significant delays and complications in identifying accurate information.
Why not simply disable Excel's auto-formatting features? Unfortunately, the research team found no permanent way to turn off these features. While manual adjustments are possible for each new file, this approach is highly inefficient. They noted, however, that Google Sheets handles formatting differently. Additionally, transferring data from Google Sheets to other spreadsheet programs preserves the original formatting, avoiding such errors.
It’s somewhat ironic that a feature designed to simplify spreadsheet use is causing significant issues in academia. Future versions of Excel may offer a permanent solution to disable auto-formatting. Until then, geneticists are advised to meticulously review their data or consider using Google Sheets as an alternative.
