Literate statistical programming can be a useful way to put text, code, data, output all in one document. If you have a Linux desktop, and the distribution includes proper support for R, chances are you may want to work with R from the command line.
To get our bearings in the Natura 2000 data, we need to run a range of commands on each table, such as dim() and summary(). Since eventual changes in the table from one year to the next – and in particular any change that makes one table incomparable to others across years – are of interest, we need to run the commands on all the tables across all years and present the data for comparation. To achieve this, it is better to do it with a script rather than manually, given the number of tables involved, and the supposed similarity of the data structures.
I have taken on Natura 2000 as a case for analysis because I live close to a protected area that is part of Natura 2000, As Fragas do Eume (In English: The Woods of the Eume, a river in the province of A Corunna). This is a beautiful area with a rich biosphere that is definitively worthy of protection. But it is under pressure by invasive species – in particular the eucalyptus – which is replacing indigenous species and also ruining the soil. A devastating forest fire in 2012 didn’t help much; the eucalyptus got an even better foothold in the aftermath of the fire. Local tourism is also on the rise, with an ever increasing number of visitors. Moreover, the area is surrounded by relatively intensive farming and a dam. All put together, it is likely that both indigenous flora and fauna are under a fair amount of pressure.
To be able to analyze the data from Natura 2000, the first step is to get the data from the Natura 2000 page (,,,,,,). When acquiring data from any source, it is important to keep in mind that everything you do should be transparent and traceable. This can be done by either documenting each step properly or making an easy-to-read script that handles the steps involved, or both. Why is it important to ensure transparency and traceability? The most important reasons are to be able to prove that the basis for your analysis comes from the data sources that you say they come from and to be able to spot possible errors in the analysis.
Natura 2000 is a network of nature protection areas in the territory of the European Union. It is made up of Special Areas of Conservation and Special Protection Areas designated respectively under the Habitats Directive and Birds Directive. The network includes both terrestrial and marine sites. Natura 2000 protects around 18 percent of land in the EU countries, as well as 251,564 square km of marine environment.