I built a #MastoBot to post local #Covid19 numbers. In the process, I found Ontario Public Health’s data feeds and ended up building something much more generic and reusable than I initially intended. Aside from the very local hospitalization and ICU numbers from KHSC, the Ontario feeds can be filtered to be useful anywhere in the province.
My bot instance posts to https://botsin.space/@covid_ygk, if you’re interested in Kingston’s numbers. It posts a couple-three times daily (Ontario data is updated every day, KHSC updates on weekdays).
Initially, I sought out accessible local data. KFL&A Public Health offers a pretty comprehensive COVID dashboard, but as a Microsoft Power BI dashboard. This significantly complicates getting the data back out in a useful manner; I’ll get there, I’m sure, but it’s a bigger job for another time.
The other local source I found is the Kingston Health Science Centre’s weekday hospitalization updates. This one is simply text on a webpage, easy to extract.
I use node-fetch to retrieve the page, then cheerio to extract the part I want. If you’re doing any sort of web scraping in Nodejs, definitely check out Cheerio – it’s more or less “jQuery, but nodejs”. Once I have the lines I want, I massage them a bit and feed the result to MastoBot to be Fediversed.
That done, I went looking for more tasty data, and found the Ontario Data Catalogue and its sweet sweet trove of csv goodness.
CSV is an interesting format. It’s deceptively simple; at first glance, what could be easier than “one record per line, fields delimited by commas”? In the wild, however, it’s cursed in subtle ways. How are values containing commas escaped? What even is a “date”? Line endings, invalid rows, comments and blank lines, there are endless variations and subtleties. Ultimately, if you are parsing CSV manually, you are probably doing it wrong and eventually an edge case will find you and you are likely to be eaten by a grue.
Enter csv-parse. Feed it the raw csv as a string or a Stream, and you get rows as Stream events or one big `object` in a callback at the end. It took some trial-and-error to get the right options set (see: subtly cursed), but in the end I had the public health data I was after.
After a few refactors, I have a library of fairly reusable component functions (`fetchCsv()` and `postToot()` to do individual steps; `do_csv_post()` to automate the repetitive bits) and an extensible yargs-based cli.