Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Monday, January 23, 2012

Bland - Altman plot in R

Not so long time ago I did a comparison study of two software packages, more precisely their ability to estimate random effects. The plain approach was to do correlations among the outcomes, and if they were high I assumed that both programs are OK. I presented this to my more experienced colleagues who suggested to use Bland-Altman plots to confirm the results. This is a reasonably simple technique to measure the agreement of the two outcomes. As they say in their paper Statistical methods for assessing agreement between two methods of clinical measurement (pdf linked, worthwile to read):

In clinical measurement comparison of a new measurement technique with an established one is often needed to see whether they agree sufficiently for the new to replace the old. Such investigations are often analysed inappropriately, notably by using correlation coefficients.The use of correlation is misleading. An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

 Simple, yet beautifull technique.

Naturally I was searching for implementartion in R. I quickly found and installed a package called ResearchMethods, but personally I thought it has very few soft tuning possibilities (e.g. scaling or color change). I went deeper and found a quite well documented page with a custom/modified R function. 

I used this one, as I could scale the y axis, so the final plot was nicer. On the other hand I also found that this particular function was written for demonstration purposes, but it had difficulties to run on other data.

For example the limits of the y axis were hard coded to -60, 60, which is quite problematic if you are interested in much higher or smaller differences (In my case on the 0.01 level...). Also the data set names were hard coded into the funtion.

So I modified the code to a more generic function like this:

BAplot <- function(x,y,yAxisLim=c(-1,1),xlab="Average", ylab="Difference") {
   d <- ((x + y)/2)
   diff <- x - y        
   plot(diff ~ d,pch=16,ylim=yAxisLim,xlab=xlab,ylab=ylab)
   abline(h=mean(diff)-c(-2,0,2)*sd(diff),lty=2)
}
 You call it as:
BAplot(testSet1,testSet2,yAxisLim=c(-0.1,0.1))

The "testSet"s are the datasets to compare, the yAxisLim modifies the scaling of the axis according to your needs. You might modify the labels of x and y axis if you wish with xlab and ylab.


Thursday, November 25, 2010

Survival Kit v6 online


After many-many working hours spent with the project I finally can say that the new version of the Survival Kit (free and open source software for survival analysis) is on its official website at BOKU. I am very happy that the  program is already out there :)

If you are interested in the Kit in more detail, you might want to browse through the introductory paper we published for the 9th World Congress in Genetics Applied to Livestock Production in Leipzig, Germany this year. Also there is a bunch of wiki pages at Wikiversity, intended mostly for novice users.

This does not mean however that the development will stop. There is much more things to do! I am looking forward to these as well!

And in case you are wondering about the picture above: This was the computer where the new version was developed. ;)

Monday, September 20, 2010

The Padova trip

Saturday, September 18, 2010, 9.36 a.m. (in the train)

I am just returning from a week long journey in Padova, Italy. I attended the course "Statistical methods for genome-enabled selection" given by Daniel Gianola and Gustavo de los Campos, dealing with Bayesian statistics, machine learning and other statistical tools that can be used in evaluation of genomic information.

The group of attendants was very diverse, consisting mainly from Italians of course, but also people from Slovenia, Slovakia, Brazil, Indonesia, Australia/Ireland, Austria, US and others.  The content was very advanced (as one can expect from prof. Gianola), but it was presented enthusiastically and sometimes with very funny commentary (as one can expect on prof. Gianola).

I also liked the course not because of its content, but also because it helped some the new ideas to pop up in my mind. In particular it drove my attention to machine learning. I knew also before that something like this exists, but until now I omitted the topic. After this week it seems that I will dive into it more deeply. Gregor Gorjanc mentioned a video site dealing with Machine learning, perhaps this is a good place to start.

At the end of this short post I also would like to thank you Alessio Cecchinato for the great organization! 

Update: Here is a video showing a part of the course.