## Thursday, March 17, 2011

### Simple scatter plots using R

Just some basic notes on getting scatter plots up and running. Without a lot of sweat. Occasionally I need to create plots of data. Each time the need arises I do the following: (1) google for the lowest learning curve plotting package, (2) find my way up the learning curve or bail and return to step (1). Then next time I have a plotting need (maybe 6 months later) it's back to step (1) again.

Now, with this post I will at least have a reference with notes to get started next time, skip step (1) and maybe even step (2) (and maybe you can benefit from this as well)...

This was from the work I just did on the Kalman filter bandwidth test.

Now, this coupled with the statistical language R (which reminds me of my Matlab days) has finally kicked me in the butt to document this so that I have something to fall back on.

The plots are scatter plots, but linear interpolation or other analysis shouldn't be hard to apply at this point.

First the data needs to be imported. The data in question is arranged into three columns, space delimited (note the row identifier). Importing is really easy:

```1 157500.000000 157500.000000 157500.000000
2 90000.000000 123750.000000 157500.000000
3 126000.000000 124500.000000 134595.593750
4 157500.000000 132750.000000 142266.484375
5 180000.000000 142200.000000 157961.890625
6 140000.000000 141833.328125 153334.156250
7 140000.000000 141833.328125 153334.156250
8 1350.000000 121764.281250 101214.062500
9 114545.000000 120861.875000 114544.515625
10 140000.000000 122988.335938 133289.015625
11 18805.000000 112570.000000 59496.128906
12 96923.000000 111147.546875 96771.718750
13 157500.000000 115010.250000 145469.109375
14 126000.000000 115855.617188 132061.843750
15 157500.000000 118830.218750 149703.421875
16 34054.000000 113178.468750 73995.484375
17 157500.000000 115948.562500 154123.687500
18 157500.000000 115948.562500 154123.687500
19 140000.000000 117363.351562 145258.671875
20 140000.000000 117363.351562 145258.671875
```

Data in the format above can be imported using the following R command:
```> inp <- scan("data.txt", list(id="",x=0,y=0,z=0,k=0))
```

Now conversion of the input values into data arrays is done as follows:
```> label <- inp[];
> x <- inp[];
> x1 <- inp[];
> x2 <- inp[];
```
Setting the display to take three plots vertically by one horizontally:
```> par(mfrow=c(3,1))
```

And finally the actual plots with titles and scales is done as follows:
```> plot(label,x,xlab="seconds",ylab="kbps",ylim=c(0,500000),main="actual")
> plot(label,x1,xlab="seconds",ylab="kbps",ylim=c(0,500000),main="average")
> plot(label,x2,xlab="seconds",ylab="kbps",ylim=c(0,500000),main="kalman")
```
Which in this case generates a plot that looks like the following:

All you need to do is download a copy of R and you are off and running.

[UPDATE]:

Great--just used this for my daughters science fair project attempting to demonstrate phototropism. To add lines segments to the plots is stupid simple:

```> plot(label,x,type="b",main="Habitat #1",xlab="day",ylab="growth height")
```

And here's a great page that goes into far more detail on plot types:

http://www.harding.edu/fmccown/r/

Finally to save and print a plot:

```> png(file="hab2")
> print(plot(label,x,type="b",main="Habitat #2",xlab="day",ylab="growth height (inches)",ylim=c(0,8)))
NULL
> dev.off()
```