Shiro Takagi

Research Paper as a View of a Database


I am an independent researcher in machine learning. My long-term goal is to realize an artificial researcher.

I think it would be good to be able to treat the paper as a view of a research database. In this article, I would like to briefly introduce this idea. I also propose to use Notion as a first step toward making a research database.


Paper is just one form of presentation of research results

A research paper is an optimal format in an era when printing was the norm. Given today’s advances in information technology, I believe that this is not necessarily optimal today as well.

What kind of information you want from a paper depends on who reads it, when reading it, what background they have, etc. More specifically, what readers want to get from a research paper will vary depending on whether they read it during a survey or during an experimental design, for example.

Furthermore, the optimal format for displaying results will probably differ between research areas such as literature and physics. However, we currently express all of them as a research paper. In other words, we force a single display format to meet several different demands.

What do I mean by regarding a paper as a view?

To solve this problem, I think it would be a good idea to be able to treat articles as a view in a research database. If you have used Notion before, imagine that you switch the view of the database to “table view” or “timeline view.”

Research paper as a view of a research database.

Of course, you can adopt any view other than a paper view. For example, you may be able to reproduce the entire process of the research by arranging the data stored in the database in chronological order (timeline view). Suppose that you conducted multiple experiments that require interpretation across them. Then, it may be a good idea to use a format that allows you to follow the logical structure more intuitively (logical structure view).

Logical structure view of experiments.

In a nutshell, my suggestion is to store the research process in a database so that anyone can manipulate the view for their own purposes.


1. Single responsibility of a view

The first advantage is that it eliminates the need to request that a single display format perform multiple functions. Currently, we ask for the display format of a paper to serve several functions. First, a research paper is a “report.” Therefore, it should convey the necessary information to the intended audience efficiently. Second, a research paper is an “asset” cited by other studies. So, it must be comprehensive and rigorous enough to reproduce the original results. Finally, a research paper is a de facto “submission” (not desirable, though). To pass the peer review process of top journals, researchers try their best to make their papers look more appealing.

It is not hard to imagine that these requests can conflict. I believe that because of these complex demands, the paper may require special technical writing techniques. In fact, I had to read many books and research papers to understand the structure of the research papers. I am sure some of you have had similar experiences.

Having multiple views of a research database allows us to separate these functions in the form of, for example, “report views” or “stock views. You also be able to adopt another structured view suited for your purpose. This will make acquiring information from papers easier. As a result, this might make research more efficient or improve the quality of apprentice researchers’ papers.

You can create a structured view suited for your purpose.

2. More researchers might preserve raw research data

The second benefit is that it might incentivize researchers to preserve raw data. Until now, authors have had to discard information to be shared to format the paper as a research paper. However, if users can manipulate the view, they can cut the content out when reading by changing the display format. Readers would request that authors put any missing information in the database to display the views they want to use. This would make it a more natural requirement to store information as close to the raw data as possible. I believe this is important to ensure the reproducibility of academic results.

3. Other researchers may find overlooked findings

A third advantage is that more researchers may be able to find what the authors have not been aware of. Because readers can freely change their view, they can check the soundness of each part of the research process. Also, adopting a different view may enable other researchers to find implications from what authors dropped as noise. The separation of views and data creates room for other researchers to intervene in the interpretation of the outputs of the research process. I believe this is an advantage for the efficient utilization of research output.

4. Gradual transition from the paper view to a better view

The fourth advantage is that it helps gradual transition to a new display format. As mentioned above, I believe the research paper display format is not necessarily optimal. Therefore, I think it is preferable to adopt a new display format. However, immediately abandoning the research paper format and adopting a completely different display format would be a disruptive change. This would make it harder for those accustomed to the research paper format to support the new one. Above all, you must change every past research result published in a research paper format to the new one. This does not seem very feasible.

Positioning the display format of the research paper as one of several views allows you to use the research paper format as well. It is only a matter of time for each generation to decide which display format will eventually become dominant. Also, those accustomed to the existing research paper format can join the new display format by simply using it as a supplementary display format. In this sense, I believe adopting multiple display formats simultaneously, including the research paper format, will help the gradual transition to a better display format. I believe this is a feasible option for display format transition.


Using Notion as a research database

As a prerequisite for implementing a view, it is necessary to store the entire research process in a database and label each work in the research process. I do not think it will be easy for this to permeate.

I mentioned earlier that other researchers can freely manipulate the views. To that end, there must be enough labels attached to each piece of data to display a view readers want. How to label is a difficult question. To answer it, you need to understand what research is and what operations you want to perform on the intermediate outputs of your research.

I believe that repeated trial and error matters. This is because there are many things you can notice only in practice. If you can access a good server, it would be better to use a full-blown relational database. However, I think that Notion is the best fit for the starter.

Research information management in my Notion

I have recently started storing all information and notes generated during my research in the notion database. I throw anything related to my research into a database called ResearchData.

I give three labels to data: “project name,” “research process category,” and “page type.” The “page type” indicates the function of the page as a document. Specifically, I use “hypothesis,” “plan,” “verification,” “deliverables,” and “information” to classify the pages. As for the “research process category,” I roughly categorize it by “topic determination,” “issue determination,” “hypothesis discovery,” “validation plan,” “experiment,” “analysis,” “writing,” and “peer review,” and “sharing.”

This is just tentative labeling, and I think that another labeling might be better. I also believe more detailed labeling is necessary. Current labeling is not yet at the stage where I can provide information that would lead to meaningful views. For better labeling, I think it matters to repeat the hypothesis testing process of labeling from where we can actually do so. I believe that by doing so, I will be able to structure the research process more in line with practice. My label is just a first step toward that goal.

Illustration of research process

Notion API

Notion itself does not allow you to create your own views. However, Notion has been officially providing an API since March of this year, so you can read the database using the API. Therefore, by writing your own code for a nice view, you can realize a view of the research database. I would be happy if anyone who wrote a code to show a good view contacts me. If there’s anything I can help with, I’m willing to help!

Start building with the Notion APIConnect Notion pages and databases to the tools you use every day, creating powerful workflows.developers.notion.com

Also, Notion is now able to work with GitHub, and it can read PRs and Issues from GitHub. So, contributions to research via PRs, as I introduced in a previous post, may also be reflected in the Notion database. I expect the API and GitHub integration to be further strengthened in the future, and it may become easier to handle.

Synced Databases bridge the gap between different toolsSynced Databases give you the power to bring information from different sources into Notion. You can create a single…www.notion.so


Conclusion

In this article, I present the ideas of the research paper as a view of a research database and storing all research data in Notion as a database. To realize these ideas, we have to repeat trial and error. Although I am doing this by myself, I would also love to hear from anyone who is doing this. So if there are people who say they use Notion to take research notes, I would be happy to hear about how they do it.

If someone provides a good view by Notion API, more people will manage their research data in the Notion database. I believe this would help us accumulate more knowledge on better labeling/structuring of the research process.

As mentioned above, labeling each task in the research process requires structuring the research process and specifying the categories we want to have as views. Structuring the research process also matters for research automation/optimization, which I am interested in. So, if anyone has an idea for this kind of structuring, please let me know. I would love to help you as much as possible. I am also looking for someone to work with me on the automation/optimization of research. Let’s work together to automate the research process to broaden the possibility for humans.