Phresd
Phresd

A Bioinfo postgraduate student in mainland.

Document the experience of first discovering the novel coronavirus

Repost statement:

I am not the author of the article, just because this very meaningful article is blocked in the wall, I hereby reprint it. Salute to frontline researchers! May this disaster end soon.

-------------------------------

I just went to work on December 26, 2019. As usual, I will first glance at the results of the automatic interpretation of mNGS pathogenic microorganisms for this day. If there is no problem, I will start the R&D work for the day.

Unexpectedly, it was found that one sample reported a sensitive pathogen - SARS coronavirus, with dozens of sequences, and this sample has only such a meaningful pathogen. If it is a common virus, this is already a fairly reliable result. . Feeling nervous, I quickly checked the detailed analysis data in the background, and found that the similarity is not very high, only about 94.5% (this is related to the threshold of card similarity, which is equivalent to only screening out sequences with relatively high similarity). There are several possibilities: 1. There are certain differences in the genomes of different strains of SARS; 2. RNA viruses are easy to mutate, and it has been 17 years since the SRAS event, and the mutation is relatively large; 3. The wrong alignment of closely related species, etc. To confirm the reliability of the results, a detailed analysis was started.

Fortunately, I have encountered this kind of similar sensitive pathogen confirmation analysis several times before, and the leader has also discussed with me several times whether it is possible to do an analysis process of automatic mining of new pathogens, and I have always kept this in mind, When working on other projects with higher weight and higher priority, I also made a preliminary version, and this sample can just come in handy. I gave it a name. Compared with the analysis process used in daily production, it has an additional suffix: "Exploration Edition", which contains almost all sequenced viral genomes.

The analysis results of the exploratory version suggest that this pathogen is most similar to Bat SARS like coronavirus, with an overall similarity of about 87%, and a similarity to SARS of about 81%. The number of sequences in the alignment has increased from dozens to more than 500. In addition, 5 contigs have been assembled, which add up to more than 1200 bp. At this time, it can basically be confirmed that it is a coronavirus, and the detailed analysis of the coronavirus can be carried out. analyze. During the analysis, confidential discussions were also started with the interpretation leader and the leader in a small area.


Further analysis, whether the original sequence is taken to NCBI nt library blastn, or the assembled sequence is taken to blastn, the results are the most similar to Bat SARS like coronavirus, but the overall similarity is only about 87%, and the nr library protein blastx The similarity of the comparisons is also similar, so the results remain skeptical.

The classification information of viruses has always been confusing, and the classification rules are not uniform. Some are based on typing genes (such as influenza A), and some have no clear typing genes, and are classified by other methods. I have not investigated the classification method of Bat SARS like and SARS before. On NCBI Taxonomy, Bat SARS like is classified under SARS. In such an urgent situation, there is no time to research the literature, and there is not much data in hand. Based on the similarity of the genome, I subconsciously think that since Bat SARS like is a subordinate classification of SARS, then this detected unknown virus is at least a related virus. Bat SARS like parallel species, that is, a new type of Bat SARS like (later see the literature SARS and Bat SARS like are classified by several non-structural proteins).

We further analyzed thousands of coronavirus genomes in a carpet-like manner, and evaluated them in terms of similarity, coverage, and even genome distribution, and finally found the two most similar genomes, bat-SL-CoVZC45 and bat-SL-CoVZXC21 (After the genome sequence was released on January 9, many articles analyzed these two).

(The picture was sent to the group in the afternoon)

This information alone is not enough, at least we have to look at the evolutionary information, so I started to do phylogenetic tree analysis.

Downloaded the genomes of all coronaviruses, and finally screened out 160 coronavirus genes (basically including all known coronaviruses of various animals) through quality filtering, clustering and other analysis. The assembled sequence and the 160 coronavirus genomes were analyzed based on the average similarity of the whole genome (off topic, I personally think that in the analysis of species evolution, building an evolutionary tree based on the average similarity of the whole gene is better than the evolution based on a certain gene. The tree should be more accurate and reasonable. After all, it is considered from the overall situation. Of course, when studying the structure and function, the assembled sequence is also very short at this time, and there is no complete gene sequence). Like coronavirus has gathered recently, and it is also on the big branch of SARS.

I ran to the evening before drawing
The big red block in the upper left corner is SARS, the edge color is not so red is Bat SARS like, and the big blue border outside is another group of Bat SARS like, the unknown virus is clustered with 45 and 21 on a relatively independent branch, Circled in red.

It is rather strange that this unknown virus is clustered on a relatively independent branch with bat-SL-CoVZC45 and bat-SL-CoVZXC21, while other Bat SARS likes are concentrated in the SRAS group, thinking that maybe There is a problem with the classification of these two, but after reading the source of the literature, the method is no different from the others. Respect the classification of the literature, and I think it is right for the time being. (This is also one of the basis for some later experts to judge this unknown virus as a new type of coronavirus)

Screenshot of part of the evolutionary tree

The front-end feedback that the patient is seriously ill and is anxious to get the test results, but such a major pathogen cannot be easily reported. At noon, I held an emergency meeting with several leaders and decided to continue the in-depth analysis, delay the release of the report, and share the data with Chinese medicine. A piece of analysis by the Institute of Pathogens of the Academy of Sciences.

Later, in-depth analysis was carried out from the gene level (orf1ab, S, N and other genes), and there was no significant discovery, mainly because the number of detected sequences was small, the coverage rate was too low, and they were all incomplete genes. nothing.


At noon, the retest supplementary data has been proposed for analysis. Retesting can be used to verify technical repeatability, avoid false positives caused by contamination of unknown factors, and ensure that the sample does contain the pathogen. In addition, more data can be analyzed. For example, if a complete genome can be assembled, the analysis results are more reliable. More in-depth analysis can be done.

The next day (2019.12.27), after the data came out early in the morning, the assembly analysis was carried out quickly, and a nearly complete genome sequence was finally assembled. The data was also shared with other in-depth analyses of pathogens by the Chinese Academy of Medical Sciences. This time, the number of sequences has increased from more than 500 to more than 470,000!


Due to the limited time and other R&D projects to be done, there is no detailed gap-filling of the assembly results to obtain the complete genome. In addition, the data has been shared with the pathogen, and they will also do this, so there is no need to assemble a complete genome sequence. The existing assembly results can meet most of the analysis needs.

Some in-depth analysis is also continued later.

The sequence distribution of the replies is even, there is no obvious preference, the average depth and median depth are basically the same, and the depth reaches 1000x, indicating that there is no problem with the assembly, the sequencing is also good, and the unknown pathogen also has a complete genome.

The evolutionary tree was reconstructed, and this time, the reference strains of all coronaviruses in NCBI were selected (the accession number starts with NC, which is considered the most credible by officials), plus a few strains from the most recent source analyzed the day before.

The results of the evolutionary tree are basically the same as the previous day.

Genome collinearity analysis, ORF annotation, etc. showed that this unknown coronavirus is a typical BetaCoV (orf1ab, S, M, N, E, etc. genes). The lighter color in the collinearity map is the S protein region, which is the gene with the greatest difference.

After comparing with the 7 PCR-verified target sequences of SARS recommended by the WHO official website, it was found that the average similarity was only about 90%. The key is that the primer sequences also have several variations. It is speculated that the SARS detection kit cannot detect this unknown pathogen (many of the latter). The same is true for local experimental verification).

In addition, many other detailed in-depth analyses of genes and proteins have been carried out, which will not be repeated here.

The analysis has basically confirmed that there is indeed a virus in the sample of this patient, but this virus is not very similar to all viruses with known genomic information, and may be a new type of virus similar to Bat SARS like coronavirus.

Looking back at what I said at the time, it was a prophecy. I'm sorry for the people of the whole country. It's all my fault for this crow's mouth.

This is not necessarily SARS, the infectivity and pathogenicity are unknown, and at the same time, aware of the potential seriousness of the problem, the laboratory has been thoroughly cleaned and disinfected, the samples are harmlessly destroyed, and the relevant personnel of the experimental operation have carried out relevant monitoring.

The next step is how to report the problem. Reporting directly may scare the doctor, not to mention that this may be a new virus, and a wrong report will be a major accident. Some necessary information still needs to be checked first.

The first thing that comes to mind is, of course, the history of contact with wild animals. At that time, the information was that the patient had returned to his hometown, and it was not ruled out that he had come into contact with bats or was bitten by bats.

It was also suspected that the infection was caused by the leakage of artificial viruses. After all, the collective infection of Brucella some time ago was caused by the incomplete sterilization of a certain factory.

If there is no more information, it is still necessary to communicate this matter with the doctor quickly. After all, we can guarantee that the sample sent for inspection contains this unknown virus, and other matters will be handed over to the doctor to investigate and deal with.

The doctor has been communicated with before noon, and the patient has also been isolated.

Because there is no other information, the patient is also isolated, and this virus is not a real SARS. I think that it is just a wild Bat SARS like, and the infectivity is unknown, so my nervousness is somewhat relaxed. However, because the patient is seriously ill and cannot be underestimated, he is still in close communication with the hospital. During this period, some in-depth analysis is also continued. On the 27th and 28th, the leaders of the company also communicated with the hospital and the disease control personnel by telephone. On the 29th and 30th, they also went to Wuhan in person to report and exchange this matter with the leaders of the hospital and the CDC, including all our analysis results. And the analysis results of the Institute of Pathogens of the Academy of Medical Sciences. Everything is under intense, confidential, and rigorous investigation (at this time, the hospital and disease control people already knew that there were many similar patients, and after we communicated the test results, emergency treatment was started, but I didn't know it).

I thought that this matter would pass soon. After all, apart from this patient, it seems that no other patients have been infected. However, by December 30, I heard that there are quite a few patients with similar symptoms. Suddenly tense. In particular, around the afternoon of the 30th, a "friend business" may have detected the same virus in another patient's sample, but they directly sent a report on the detection of SARS coronavirus, which instantly exploded the news. In the evening, relevant departments also issued an announcement of "pneumonia of unknown cause", and in the early hours of the 31st, related rumors also began to spread on Weibo.

What really made me nervous again was that a friend and businessman shared the sequence for us to analyze. I analyzed it and found that it was indeed the same virus! The first thought in the subconscious is "this virus is contagious"! It may really be a new type of SARS!

In the middle of the night on the 30th, I got the sequence of the friend business for analysis.

Two unknown viruses are clustered together, and the similarity is more than 99%

Since the sequence of Youshang was compared and screened with SARS, the similarity with SARS reached 93%, while our complete sequence was about 86%, but the homology between the two was still close to 100%.

Pairwise comparison, all sequences can be compared, the similarity is 99.6%, and the similarity between different species is almost impossible in the conserved region, and the whole genome coverage rate exceeds 20%, confirming that it is the same A virus without a doubt!

The mood at this time was both tense and excited. The nervousness is that this unknown virus may be as terrifying as SARS; the excitement is that we have identified and confirmed this pathogen early through mNGS technology, and quarantined the patient, and it may be possible to prevent and control the virus before it spreads widely. strangled in the cradle!

Feeling complicated and emotional, I immediately posted a circle of friends that few people can understand.

On the morning of December 31, rumors about SARS on Weibo began to spread wildly. I've been waiting to see how the official responds. In the afternoon, the official announcement only stated that it was "pneumonia of unknown cause" and did not mention the pathogen. There were 27 similar cases, 7 of which were severe. After seeing this news, I felt that things were not good, and I guessed that the infectious ability of this virus was not low. However, the official notification is that "no obvious human-to-human transmission has been found". There is not much data in the early stage, and the situation is not easy to judge, not to mention a new virus. In order to stabilize social sentiment and avoid excessive panic among the people, such a notification is actually Understandable.

At this time, the expert group has begun to intervene, and the "national team" such as the Wuhan Institute of Virology has also begun to analyze and identify. They know more information, have more samples and data, and are more qualified and professional, so I will not do much in-depth follow-up. The analysis is done, waiting for the official results.

At the beginning, I was quite confident in the country's ability to deal with these sudden and major public health events. After all, it has been baptized by events such as SARS and influenza A. In addition, the Beijing plague incident just over two months ago was also detected and reported by us through mNGS. After the report was reported that day, they started the emergency response procedure and immediately used other methods to verify it again. The next morning I saw it. The news is out, and no new cases of infection have been found in the follow-up, and the prevention and control has been done very well. mNGS has made great contributions to the plague incident, and I think it can also play a great role in the prevention and control of this unknown virus.

The media began to refute the rumors. The first time the Beijing News published "the SARS-related rumors", the People's Daily and others used a slightly more euphemistic wording, "It cannot be concluded that it is SARS." The latter eight people who "made rumors" were arrested. After seeing these news, for some reason, I was suddenly a little disappointed. Are things that have not yet been conclusive or controversial in science, just rumors? There was a feeling that this dispelling rhetoric, as well as those overly optimistic propaganda, would push the matter into an irreversible situation. The high-profile refutation of rumors by the media will interfere with the characterization of this virus science, and the overly optimistic propaganda will make the public lack awe and will not take defensive measures. Subsequent developments have once again confirmed my concerns.

After the incident was detonated, some friends also came to ask me if I knew anything, especially friends from Wuhan. After telling them that they must be kept strictly confidential (after all, someone was arrested), I still revealed a little bit of information to them, so that they must pay attention to protection.

This dialogue may also understand why it is necessary to refute the rumor about SARS and name it as a new coronavirus. After all, the genome similarity is only about 80% (off topic, many different strains of the same type of enterovirus are only 80% similar). left and right), SARS has caused so much trauma to us that the public is extremely panicked by it.

The similarity of different genes of the new coronavirus and SARS is different, ranging from 75% to 94%, especially the S gene, which is related to the human cell receptor (ACE2), and the similarity is only 75%. Therefore, it is not SARS, and that is a basis for it. However, in Shi Zhengli's article later, through their method analysis, the new coronavirus also belongs to SARSr-CoV (SARS-related coronavirus).

We all know what happened next. Here are a few more questions.

Why did we have already analyzed that this unknown virus was a coronavirus very similar to SARS in two days, and reported all the analysis results, but the official did not announce until January 7 that the pneumonia was caused by a new type of coronavirus?

In fact, what we did was to analyze and identify such a virus in the samples submitted for inspection, but whether the pneumonia was caused by this virus, we did not analyze it, nor could we analyze it. The detection of the virus does not mean that the pneumonia was caused by the virus. For such a major health event, the officials naturally have to demonstrate rigorously, and there is also a set of international reference verification procedures (Koch's rule). What the official has to do is not only to detect that multiple samples have such a virus, but also to verify that the pneumonia is caused by this virus, etc.



Separation, cultivation, verification, etc. are all time-consuming, and need to be discussed by experts to reach a consensus. It can also be seen from many papers on the new coronavirus published recently that many sequencing data were completed in the first two or three days of January.

There is also a very comprehensible thing about this matter. Those who know the truth are silent, and those who don't know the truth are "vigorously popular science" and "in-depth analysis"; A kind of marketing. This is true from the plague to the new coronavirus, hehe, so interesting.

To talk about my views on the whole incident, the biggest feelings are disappointment, sadness, and anger. We have already found out so in time, why can't we control it now? Let the whole country enter the epidemic war? More not a scientific factor, nor a technical factor, but decision-making and media.

I used to be very angry with the youth. The matter has come to this point and I can't help it. Passing on confidence is the most important thing. After all, everyone is an elite, and no one would have thought that things would develop like this, so they are too lazy to criticize many things. But there are still some things I want to say.

Dispelling rumors of SARS and the optimism promoted by the media were not a big problem in the earliest days. After all, the understanding of this virus was very limited. In response to these major public health events, the disease control system may adopt the rules of "internal strictness and external looseness", with internal caution, strict verification, and careful evaluation, but external announcements may be optimistic to avoid causing excessive panic. What's more, in this incident, how to explain to the public is obviously not something that the disease control system can make alone.

The status of Wuhan's transportation hub is needless to say. At that time, the Spring Festival was approaching, which was the peak season for catering, tourism, movies and other service industries. Pessimistic propaganda will undoubtedly hit these service industries hard, and it will also cause excessive panic among the public, leading to serious consequences such as material looting and social chaos. If the virus really has no ability to spread, or because these measures are taken to stifle the spread of the virus, the society does not see what harm the virus can bring, and policymakers will inevitably be scolded for overreacting and making a big fuss, and then they will naturally take the blame. . On the contrary, if the propaganda is too optimistic, if the virus is not strong, everyone will be happy, and if the virus is strong, it will cause the public to lack awareness of prevention, and it will be difficult to carry out prevention and control work, which will eventually lead to the rapid spread of the virus and more serious consequences. Therefore, when you are a decision maker, you have to take all factors into consideration. Social, economic and political aspects need to be balanced, which will test the ability and foresight of decision makers and experts.

Go right, bland, go wrong, sinners through the ages.

There are no parallel universes in this world, and we have no way of knowing which decision is more right or wrong. In the domestic atmosphere where everyone likes to be optimistic about everything, and the existence of luck, it is conceivable what decision makers make.

In the later development of the incident, especially starting around January 12, I believe the experts have seen that things are going in a bad direction (about 30% of the first 41 confirmed patients had no history of exposure to the seafood market) At this time, suspected cases began to appear in many places, and some of them were initially positive for nucleic acid). However, in the follow-up publicity, it is still too optimistic and artistic, and at the same time, it is still constantly refuting rumors, and there is no sign of issuing an early warning. "Limited human-to-human transmission is not ruled out, the risk of sustained human-to-human transmission is low, and it is preventable and controllable." The wording is quite cautious, not to mention whether it is too optimistic, just ask how many ordinary people know what "limited human-to-human transmission" is? What is "sustained human-to-human transmission"? Their understanding may be that there is basically no human-to-human transmission, and they are naturally unsuspecting. I've also done some research in the circle of friends.



It can also be seen from many papers published now that many experts have a deep understanding of the spread of this virus very early, and the more data they have, the clearer the prediction of the situation. I believe that experts have already given not very optimistic predictions, but if some people, in order to take into account the face of certain people or certain media (after all, the propaganda was too optimistic before, and now it is too painful to slap in the face), ignoring the opinions of experts, Disregarding the health of the people, it is unforgivable to have to make announcements on optimism and insist not to issue early warnings. No matter who it is, the punishment that should be punished, the dismissal of the dismissal.

Without further ado, let's talk about hope.

mNGS is indeed a good technology, which plays a great role in the diagnosis of difficult and critical cases, as well as in the early surveillance and outbreak surveillance of such sudden and major public health events. There are many manufacturers providing mNGS pathogen detection services. It is suggested that the disease control system can establish direct communication channels with some companies with good technology, and can respond to emergencies like this more quickly. In addition, mNGS companies can also establish an information sharing platform to share information in time when encountering these events to see if there is an outbreak (I know this is difficult, on the one hand, it is a commercial secret, on the other hand, sensitive pathogens are not known to anyone. Dare to report it easily, but I still think this matter is very meaningful, and I hope it can become a reality one day).

I also hope that after we have experienced this new coronavirus incident, the country's ability to handle major public health events has made great progress. Personally, I believe that the rules of weather forecasting can be used for reference in announcements and publicity - "there will always be more harmful aspects to forecast" to remind the public to take more precautions and reduce psychological expectations. As the saying goes: without expectations, you will not be disappointed.

To me, this matter seems to be a year-end exam. I have exhausted what I have learned and handed in a fairly qualified answer sheet, but has this answer sheet played the biggest role? For the first time, I personally participated in such a major public health event, made a little contribution, and exercised a lot.

As far as I know, we should have been the first to discover this virus, because it was after we reported the results that the disease control system began to intervene. From the data submitted on the GISAID database website, the time of sample collection is also our earliest. Maybe other institutions have also detected this virus, but this is an unknown virus, there is no reference genome in the nucleic acid database, and they may not have the ability to analyze and identify it.


So, it should be the first time we discovered this new type of coronavirus, let’s record it.

Hope to overcome the epidemic soon.

Go China! Come on, Wuhan!

2020.01.28

CC BY-NC-ND 2.0

Like my work?
Don't forget to support or like, so I know you are with me..

Loading...

Comment