As researchers, medical practitioners clinicians and policymakers scramble to uncover the latest science on COVID-19, colleagues across the global here are working to continually update our Novel Coronavirus Information Center, where you can find our free, curated information for the research and health community.
An important part of this resource is access to coronavirus-related research on ScienceDirect. As of today, the article count is over 28,000.
To make sure this information is relevant and timely, we have several dedicated tech colleagues applying their machine learning and data science expertise.
“It’s very exciting to be able to help researchers find a cure for COVID,” said Kalyan Ram, a Principal Product Manager of ScienceDirect. He and Dr. Georgios Tsatsaronis, VP of Data Science in Research Content Operations, have collaborated to provide a “goldilocks” data offering — not so many articles as to include irrelevant information, and not too few as to exclude material that might be useful.
For most people, the obvious solution would be to create a search on key terms. Kalyan described that kind of search as a blunt tool:
The search query casts a wide net, but you can use a supervised machine learning model to then discover other content that is not caught by the net. One supports and augments the other.
Using this supervised machine learning approach, the results can be much more accurate and targeted. As Georgios explained, most searches start with an information need. A researcher will have something specific in mind that they want to know, and they’ll go to a search engine – whether ScienceDirect or Google – to find out about it.
“That’s where things can go wrong,” he explained. “A researcher in one discipline looking for information may not be using the same terms as an author from another discipline using different terms, or they may not think to exclude certain terms that are ambiguous.”
For Georgios and Kalyan, the first priority is to ensure researchers don’t miss anything that might be important – after all, expert clinicians and researchers can drill down past material they don’t need, but if an article isn’t included in the results, then users won’t have the opportunity to assess it.
Writing the query
The first trick you can use is some smart Boolean algebra when writing the search query. The Boolean method involves combining keywords with operators such as OR, AND, NOT. Following simple Boolean rules, you can minimize the length of that query and make it much simpler than it used to be.
As such, the user’s query provides the starting point for the machine learning approach. Georgios describes training a machine learning system as creating “an algorithm that can learn to mimic your way of deciding what is relevant and what is not.” Initiating a search query will yield either relevant or irrelevant content.
The algorithm on ScienceDirect is fed specific search terms such as “Coronavirus 19,” “COVID-19” and related coronaviruses like SARS and MERS. Machine learning then perfects classification flaws that are visible with irrelevant results. The model is maintained and remains current as it is tested and trained. Data scientists also use the model to foresee necessary adaptations to the query. As our research base expands, so do key terms and vocabulary.
Georgios and Kalyan also recruited editors from The Lancet to help ensure pertinent search results for users of the Information Center. With their help, Georgios and Kalyan identified two primary user groups: specialized doctors who are treating patients and working in clinical research; and molecular biologists who, for example, are targeting research on drug repurposing. So despite the search targeting a single specific topic as a virus, it produces very deep streams of interest and information needs depending on the user. It is up to the user to filter down relevant results from the total database.
Expanding beyond health and medicine
Of course, the effects of the COVID-19 pandemic extend beyond the realm of medicine to the global economy. For this reason, researchers are considering expanding the search to include results that relate to finance and economics. While this data wouldn’t be crucial to the survival of people with the virus, it is extremely important for our society.
While the pandemic has brought about a lot of challenges, it also has brought about compelling innovation in the world of research and medicine and here at Elsevier. We hope this will make a profound difference in the lives of many.