Home » Valid Data

Valid Data

Obtaining valid data is not easy. The data are measured. The measurements are made with instruments. The instruments must be calibrated. The procedures to use the instruments must be appropriate for the context. The measurements must be translated into human understandable parameters. This is a lot of work, and much of this we don't know much about or take for granted. I'll go into all of these things in more detail later on, for now I'll assume that the data that is available is in good shape.

As a preliminary to the data presentation effort, I want to make the point that pretty much all of my work with data is done using the 'R' computer language. I've come to love it mostly because of its flexibility, a support by a world-wide community of users and developers, and a very active development effort. This pretty much guarantees an up-to-date collection of tools. It has a little bit of a learning curve, but there is a lot of support on the Internet, and almost anything you want to do can be found by googling. It is highly recommended for anyone to learn how to work with it. It's very much a tool used in academia. Don't be fooled by the word "Statistical Computing Environment." It is much more than that. For example, it can solve all kinds of differential equations. One can do simulations with it. I've done the Deep Creek Lake bathymetry with it. Check it out here. Above all, it's free. Just to get you going, several of resources are listed below in items 1-4.

Data Extraction

Much of the data collected on this website resulted from processing of other data sources. One large data source is Brookfield itself. Unfortunately their data, today, is not available directly in digital form and must be extracted from monthly and annual reports. I typically work with their pdf files. Unfortunately again, the pdf files are quirky entities and how to be processed separately.

I start with trying simply to copy and paste the data from the pdf file into a text editor. Sometimes this works very well and the data come across clean. Some hand editing may be required to remove items from the data that are artifacts from going from one page to the other.

If the above is not possible, I found that the best way is to make a jpg of the page and use OCR to convert to text. Often this introduces some errors in translation that must be cleaned up. Often this is easiest done by hand using some of the clever features that come with modern text processors, or if it's a ver regular defect, write a small R script that can clean the data.

Occasionally, when documents are old and have been copied and copied again, complete manual translation will be required. Fortunately, does does not occur much.

The next thing is to graph the data, either as scatter plots, line plots, or polygons. Often the brings out a few errors in the data which are then corrected

The final result is data in a text file that anyone can read with any type of computer language


1. What is R?
2. Beginner's guide to R: Introduction
3. R Tutorial
4. R-Studio