Shocker: China, 250,000+ Covid-19 Deaths, and Forecast Validation… Were We Close?

Xie Huanchi/Xinhua via AP

Evidence from two sources out of China points to the possibility that there were between 250,000 to 300,000 deaths in China.  Could the Chinese have hidden this number of deaths from Covid-19?

First the evidence…

Back on February 2nd a little noticed piece escaped into the wild in Taiwan about what China was hiding with their reporting around Coronavirus infections and deaths.  In a piece written by Keoni Everington at Taiwan News the major reveal, if it was that, showed that China’s numbers were a LOT higher than they were claiming to the world.  The higher numbers were posted on February 1st and they are way, way, way higher than the numbers posted on the 2nd !

Some background from the piece (Tencent may have accidentally leaked real data on Wuhan virus deaths):

As early as Jan. 26, Netizens were reporting that Tencent [Tencent Holdings Ltd. is a Chinese multinational conglomerate holding company founded in 1998], on its webpage titled “Epidemic Situation Tracker,” briefly showed data on the novel coronavirus (2019-nCoV) in China that was much higher than official estimates, before suddenly switching to lower numbers. Hiroki Lo, a 38-year-old Taiwanese beverage store owner, that day reported that Tencent and NetEase were both posting “unmodified statistics,” before switching to official numbers in short order.

Lo told Taiwan News than on Jan. 26 he checked the numbers on both Tencent and NetEase and found them “really scary.” He said he did not know whether the numbers were real or not but did not have much time to think about it as he had a busy day of work ahead at his store.

Lo said he did not check the numbers again until he went home that evening, when he was shocked to see they had dropped dramatically and “something was wrong.” He said he noticed individuals on a Hong Kong Facebook group also observed the same bizarre occurrence that day.

These are the money lines.

On late Saturday evening (Feb. 1), the Tencent webpage showed confirmed cases of the Wuhan virus in China as standing at 154,023, 10 times the official figure at the time. It listed the number of suspected cases as 79,808, four times the official figure.

The number of cured cases was only 269, well below the official number that day of 300. Most ominously, the death toll listed was 24,589, vastly higher than the 300 officially listed that day.  [See the screen shot below.]

Well, I’m gobsmacked!  The Chinese wouldn’t have made their lie that big, would they? 

More from the piece.

Netizens also noticed that each time the screen with the large numbers appears, a comparison with the previous day’s data appears above, which demonstrates a “reasonable” incremental increase, much like the official numbers. This has led some netizens to speculate that Tencent has two sets of data, the real data and “processed” data [read… fake data].

Some are speculating that a coding problem could be causing the real “internal” data to accidentally appear. Others believe that someone behind the scenes is trying to leak the real numbers.

It is worth repeating, some people, knowledgeable or not, are speculating that China had either one database with two columns, the ACTUAL and ADJUSTED numbers in the same file or, functionally equivalent to the above, two, identically structured databases and in both cases, they had something in the code that pointed to one or the other column or table.

But, as they say, there’s more to the story…

Calculating Adjusted Statistics

Considerations:  Overall, trying to find the true values for statistics when we know the Chinese are likely lying requires us to make some, maybe questionable, assumptions, below are what I tried to identify and factor into the process.

The challenge is that we have a tightly controlled, secretive society that, we think, a few leaks, data points, in their veil of secrecy that are related to, and (maybe) fit into graphs that look something like the probability functions that look similar to these, both graphs below represents the same probability function, the upper one is the bell shaped curves of legend and the lower one is what you see in many pieces on Covid-19.

Probability density function for the normal distribution
Probability density function for the normal distribution

It’s worth repeating, we have one point on a graph of deaths that looks like one of the graphs on the left but that have been re-scaled by the Chinese in some way that we don’t know.  We have a few reports of two other instances where they saw what appear to be unadjusted numbers but without the screenshots.

What we also don’t know is where the left tail of the graph starts (equivalent to the green lines in each graph), China claims their patient -0- was early December 2019.  Their adjusted numbers appear be intentionally adjusted (some would say manipulated) to create exactly that appearance, that is, they have not only hidden the true number of deaths from the world but they’ve also masked the date that the disease hit their nation.

Methods Used to Adjust the Chinese COVID-19 Numbers: Here are a few methods to estimate the true(r) number of deaths in China from COVID-19.  All have their deficiencies and, because short of the Chinese providing their real numbers all numbers are guesses or estimates, no claim is made that they are authoritative.  Additionally, the list is not exhaustive and other methods of backing into the statistics closer to ‘truth’, whatever that is, may yield better estimates.

Method 1: As of the date of this article we are 67 days beyond the above dates from the reports and China is currently reporting about 82,000 total cases and 3,331 deaths.  In this scenario, the numbers we use are the ones China has reported.

Method 2: A third method, one that is fairly conservative, would be to assume that the leaked, cumulative, number of deaths is close to or near the peak number of deaths so, we can simply double the number of cumulative deaths and get a re-scaling factor of -2-.

Method 3: An alternative method of scaling to recognize the Chinese adjusted their real numbers by some, unknown, factor and a single value that leaked out allows us to determine that value.  I will use the ratio of unadjusted deaths to the reported # of deaths (24,589/304=80.88) and re-scale the graph accordingly.

Special Note: Methods 2 and 3 both raise the question around the date for Patient -0-.  They do so because the graphs show peaks that are not, or may not be, consistent with the rate at which COVID-19 has been seen to grow in countries subsequently affected.  That is, assuming Patient -0- occurred in early December it would likely force the peak deaths further to the right on the graph by a fair amount but, even in non-Chinese countries, the infection and death rates do not appear to follow that shape.  Consequently, centering the graphs on February 1st, or equivalently, seeing Patient -0- in early December is put in doubt and instead, Patient -0- is assumed to have occurred many months before that date.

Again, we have to remember that we are attempting to discern both the shape and the location of the graph of deaths in China from COVID-19 using a single data point, the uncertainty factor is overly large although, I think, the analysis is useful.

The Results

Method 1:  Nothing changes with the reported numbers, total deaths in China from COVID-19 is 3,331.

Method 2: Reflexive up-scaling: Assuming February 1st represents the peak for deaths in China and, thus, the value represents the half-way point of the death count seen in the ‘leaked’ number of deaths of 24,589.

Results: The number of estimated total deaths is 49,178 , a large number but… well, it can get worse.

Method 3: Simple up-scaling: Based on an assumption that the Chinese used the same down-scaling factor on February 1st as they used throughout their data manipulations, use the ratio of the Feb 1st and April 9th official numbers (3,331/304=10.96) to upscale the total deaths found in the unofficially reported numbers on February 1st (24,589)

This would result in, as of April 9th, 269,987 total deaths in China from COVID-19; a very, very, very large number.

Some Assumptions:

  1. The assumption is that the daily numbers of death follow, roughly, a bell-shaped curve.
  2. In all scenarios there is a, roughly, bell shaped curve of deaths per day would be preserved but the scale of the left axis would change.
  3. In both Scenarios #’s1 and 2, short of a catastrophic explosion of cases starting in early December 2019, it seems impossible to reach over 154K cases and 24K deaths in a matter of months, this argues for a much earlier inception date for Patient -0-.
  4. With China being ground zero for the pandemic they should have had the worst outcomes on a per capita basis so we should then combine the above with the facts that:
    1. They were actively trying to suppress news of the disease to both their own population as well as the world at large and
    2. Since they were first infected and the source of the disease was unknown, China’s initial attempts at heading off and treating the disease are assumed to be both poorly executed as well as initially ineffective

Consequently, we should assume that China’s results were proportionately less effective than when the virus reached Italy and the US, assuming all other factors are equal.  That is, the per capita numbers of cases and deaths were lower in the non-Chinese by some factor, say 70% for the US and 80% for Italy; I used these two estimates in the table below.

The estimated number of deaths should be proportional to the population differences between China, Italy and the US.

Sorting Through the Estimates or… Validating the Simple Model (sometimes called performing a sanity check)

In building models of these types it behooves us to see if the results pass the smell test.  The table below uses a selection of countries with a significant number of COVID-19 cases and adjusts the current numbers of cases and deaths.  By necessity, the choice was somewhat arbitrary on my part but, hopefully, it proves useful.  One last note, if Italy’s or the US’ success in treating COVID-19 is significantly different than the assumptions used in the table, the estimates can be adjusted.

Bringing it back to what it means

Current estimates are that the US will reach peak deaths sometime in the coming week, with deaths currently at 22K, the US will exceed even this simpler model’s projections.  Italy has, unfortunately, exceeded the estimates of this simple model by a large amount but, given the open question about how Italy did, and does, count COVID-19 deaths we appear to be facing a similar problem with Italy’s reported COVID-19 deaths as we have with China’s, i.e. we can’t trust them, albeit for different reasons.


If we accept the starting assumption that we have a single value that somehow leaked out of China and the analysis that flowed from that point, the above analysis points to a number of conclusions:

  1. The number of projected deaths in the US from COVID-19 will be (using the second method of estimating) and, more importantly, could have been estimated on February 1st to be around 44,000 (probably +/- 5,000), far less than the official estimates used by the US government.
  2. Either China had twice the number of deaths from COVID-19 as even the most aggressive assumptions in the above analysis or the US did not meet the conditions assuming the rate of deaths in the US would be 70% of those expected in China. Demographics could have played a role in this and/or equally likely, China’s disease entered the country far earlier than assumed and had an opportunity to silently spread unhindered through the country and reached a critical mass that frustrated our public health efforts to control it.
  3. Based only on the Tancent numbers, China had far, far more deaths from COVID-19, likely in excess of 100,000 and possibly in excess of 250,000.
  4. Turning the number of deaths around, i.e. the US to China, with the US currently experiencing nearly 23,000 deaths and the peak near but not yet passed, it is not unreasonable to assume that the number of deaths in China were likely 2 times the already high estimate of 270k or, in excess of (5)300,000 deaths throughout the country.
  5. Based on the assumed date that the peak number of deaths occurred on February 1st, China’s Patient -0- likely occurred months before their reported December 1st  date, October 1st seems like a safe assumption.
  6. There are good reasons to assume that the peak number of deaths did not occur on February 1st but, instead occurred somewhat earlier or later. In both cases the peak death numbers would have, possibly, radically revised the total number of deaths in China upward and, subsequently, it would have changed how many deaths we should have expected in the US.
  7. The Validation Model estimation process developed above and summarized in the table should have been followed before the government’s official death estimates were released by the (inter)national health organizations. Performing these reasonableness checks would have urged far more caution in the use of these flawed pandemic forecasts.


This entire analysis flowed from the detection of a set of numbers from China on the Tencent website but consider this report at a better-known site in the west, The Economist.  The article notes two instances where the official numbers were reported as much higher than the previously reported values.  In the Wuhan Province the number of deaths was reported as 12,500+ in mid-February and, at a slightly later date, the rest of the country as 200+.

In both cases, note also that there were consequences to people in charge when this occurred, including the replacement the party leadership in Hubei and Wuhan.  Source: China’s data reveal a puzzling link between covid-19 cases and political events

This diary was analyses that I undertook to help me to understand what is at its core a complex story that has, and still is, affecting us all.  By necessity, the work is relatively complex but I have attempted to simplify it as much as possible.  It includes both the methods I used to arrive at my conclusions and the data or sources that went into my thinking process; both may have contained errors.  As they say, YMMV, so draw your own conclusions.

Editor’s Note: Due to the detection of a computational error, the calculations for Italy’s and US’ projections were updated for Methods 2 and 3.  The revised estimates now appear to align well with the current state of the COVID-19 disease progression.