Starting on March 27 the Santa Clara Department of Public Health began publishing a graphic showing the number of cases of COVID-19 in the county. Along with that graph, the county website also has a number of other demographics about the populations affected by the disease. Since one reason I started this blog was to publish a graph of the history of daily cases, I will not be updating my graph so frequently.
One statistic that the county’s website does not publish is any projection of the future number of cases. I think this is a good idea as it is difficult to forecast future cases, and any forecast would be based on a probabilistic model. Furthermore, with a favorable projection, the public may become more lax in their measures to protect themselves and the community which could lead to a flare-up of new cases.
I will still provide my projections just to see how they track with what actually occurs. In my last post, I noted an increasing slope in the linear trend of rate of increase of daily cases. The trend was a decreasing trend, so I am adding a projection based on a logarithmic decrease instead of a linear decrease. Since a logarithmic rate decreases more slowly than a linear one, this might fit the actual data trend more closely.
Here is the graph for the daily cases with both the linear and logarithmic trends.
Surprisingly, the logarithmic and linear trends have a very similar projections for future cases. Using a 2-week moving average of the daily cases though shows a more pronounced difference:
It is encouraging that the actual number of cases is tracking these projections, and that the number of daily cases peaks in early to mid-April. However, one must keep in mind that I am trying to fit a curve to data, with no underlying reason why that curve should be the one the data follows. An epidemiologist would be more qualified to suggest likely curves to use to forecast the number of future cases.
This type of reasoning, using data for the past instances of an event to predict future outcomes, is called inductive reasoning or induction. The problem with using induction with data which seem to fit a pattern to predict subsequent values is that, even if the actual outcomes of a series of events form a number pattern, there are an infinite number of patterns that begin with any given pattern. Of course, some patterns are more common or more recognizable than others, but frequency or familiarity of a pattern does not mean that the pattern is the pattern of the actual data.
The prototypical math problem where matching a pattern to a subset of cases of a problem fails to produce the correct results for the general problem is this:
Place n points on a circle and draw all the line segments that connect any two of the points. The points are placed such that no three of the segments intersect at a single interior point. Into how many regions do the segments divide the circles interior?
For example, if n=4, then the picture would like the diagram below, and the answer would be 8 regions.
If you start to tabulate the answers for small values of n, you find:
Can you fill in the missing answers?