List of things working on for current paper tables and figures. Original Post: 2014-11-21

Github respository + R markdown file + Paper draft

Github repository for this paper is found here. In particular, the R markdown file named “abxD01_analysis.Rmd” (or the pdf version) contains info on all figures and how they were generated.

Figures

Figure 1: Barchart, Differences in community structures following array of antibiotic treatments

figure PDFs here

change the “0” to be min of 0.001 instead of 0.0001 relabund
- completed 11/20/14, just make sure future files are changed for other graphs using the “abxD01.barcharts.xOTU.test.r”
add the 1/2 dashed lines
- completed 11/20/14
log trans anova to compare the CFUs across abx treatments to see if there are signif. differences
- 1/21/15, without the untreated group included it is still significantly different, likely because of the ciprofloxacin
- 1/21/15, removing the ciprofloxacin group and then doing the anova again showed no significant differences in CFU level among the remaining abx treatments
update the graph on github to reflect new min
- 1/22/15
currently shows genera phylotype tx1, do family too tx2
- 1/22/15, both up on github
color shade the bars by phylum
- 2/15/15

Figure 2: Stripchart, Correlation analysis of bacterial species present on Day 0 with C. difficile levels on Day 1

figure PDF here

graph is first filtered by 16min tot requirement, so at least one sample must have 16min sequences of that otu
then based on the avg abundance for each otu, if that otu wasn’t greater than 0.001 relabund of the community in at least one of orig/cef/vanc/strep tit, then was eliminated
make for classification level family
- 1/25/15, fixed graph by family, showed break down by phylum

Figure 3: Barchart, Differences in community structures between abx-titration treated mice

figure PDF here

add possibly a right y axes with cdiff CFU, or make new trio graph alongside
I want to change the graph so that the OTUs are their own graphs and the titrations are side by side
- 11/20/14, completed most of the graphical parameters.
once code is working then put up the other titration data files
- 1/17/15, added to github
change min relabund from 0.0001 to 0.001 for cef and strep
- 11/20/14, changed the vanc titrations min
- 1/17/15, changed the cef and strep titrations min
eliminate the untreated groups
- 1/25/15
Possibly work on getting plots on top of grid lines
- 1/25/15
do the stats to compare the mid/low to the highest dose
- need to know how to graph
- 1/27/15, used letter naming scheme
determine the tx’s with statistical/interesting differences for each titabx, aggregate, use same for 3 paneled graph
- make forlogscale.csv that combines all three cef/strep/vanc data
  - edit barchart function so that it takes the combo file and makes it paneled by the 3 abx groups
- make one common ids file
- after use the combo file, make sure that there are sig diff in the add-on OTUs
  - 2/22/15

Figure 4: Heatmap, Correlation analysis results compared across abx experiments

figure PDF here

for all the significant values for each pair of original + abx rows that are both significant, quantify that significance for each column (abx treatment)… see personal notes for more
use classification names, possibly the genus if have it
figure out what want to use the side panel for? possibly highlight otus included in the model
Add “ns” to the graph where not significant
- 12/8/14
Is this the order of drugs I want? -keep consistent
- 12/8/14
change xlabel names
- 12/8/14
change code to get the key back
- 12/8/14

Figure 5: Barchart, Differences in the community with extra microbiome recovery time following abx treatment

figure PDF here

put metro/amp file together with all tx2 otus
pick subset of otus that important
run with divide=TRUE
possibly look at diffs between strong individual otus or genera
Do for ampicillin too
calculate differences in thetayc between individual days w/recovery
random forest regression to do family level feature selection, use these to inform selected tx2’s
- 1/26/15
family graph: get rid of OTU12, 14, 9
- 1/26/15
lactos decrease with recovery… are these the streptococcus? No
use the barcharts by graph/sort by phylum side by side like did for titration data… show family differences..
- 1/26/15
stats for difference, wilcox for difference
- 1/28/15

Figure 6:

See how few otus i could use before the toptitdel drops off
rewrite up modeling results
RF model built on delay data, tested on other data sets
RF model on 20OTUs based on toptitdel: Tried 0.5% (82 OTUs), 1% (44OTUs), and no cutoff (299 OTUs). RF models based on 1% and toptitdel combined data sets poorly predict delay data set alone (r2=0.4). 0.5% however was able to predict in upper r2=0.9s. Took the top 20 OTUs from feature selection in a random forest model based on toptitdel and still predicted delay data with r2=0.92, which is WAY better than the 44 OTUs (1% cutoff).
RF models using 0.5% cutoff compared to 1% and 0 cutoff
The criteria for picking the otus, these results can be listed in supplementary tables possibly.
- 12/1/14, made file with info for table, now make pretty
Consider rerunning the model building code using a bigger candidate list
- 12/1/14 with the BIC included in new calcs
Calculate the BIC for each–change code for this
- 12/1/14
- 12/2/14, found a faster easier way to determine variable selection using the “leaps” package in r for multiple linear regression analysis
using the best models from the leaps analysis predict the outcome for the newtitration data and the predicted r^2
- make file with newtitration data that contains the OTUs in the candidate list and run, got 0.67 for the 5 parameter model
  - 12/4/14
- now try the best model from both methods of selection…12/4/14, think im just going to go with the exhaustive method for paper
- then model with just the top 3 otus… seems the best trade off point between BIC and adjR2… 12/4/14, gives 0.67 too, but the anova between the 5parameter model and 3 parameter model is significantly different, meaning that the 5 parameter model does improve prediction overall -use random forest for delay data alone to see what features stand out
- how do these OTUs differ from the model built based on the original data set?
  - 12/17/14 OTU11 has most influence, followed by 1, 5, 23, 21, 3, 6, 39 before dropping off in %IncMSE. I’m curious to see what inclusion of OTU11 (Ecoli) will have on the effect of the model.
Make figure! for modeling results, actual vs predicted, line? with error lines?
- 1/29/15
try eliminating 283 from 5 model because is low in abundance, then see how performs
- 1/29/15
correlation calculations for the delay data
- done early february
force OTU11 in the model in addition to the others to test the delay data and see if makes a diff, also look up the results for the 5model+OTU11
- done early february
do leaps with all the filtered otus, 1/28/15
- also rerunning with 50 OTUs, which were picked from after filtering again the abundance, doing the rf, recalculate correlations
I have been semi frustrated with the process. There’s no one absolute way of building models and picking the right one. Trying out two different sets of guidelines – using a curated or an uncurated pool of OTUs from which to build the best models. I may use the results of this to next use with either 2/3-1/3 train-test sets + validation, or without this step? See http://math.furman.edu/~dcs/courses/math47/R/library/DAAG/html/cv.lm.html
2/8/15 - waiting on results for the uncurated pool with all samples… 2/14/15 canceling, I don’t think it’s working
incorporate validation sets for all 3
- 2/22/15

Supplemental Figure 1

make graph showing the inverse Simpson correlation plotted against C. difficile CFU

Tables

Tables are saved within the paper word document uploaded to github (see link at top).

Table 1: Description of antibiotics in first set of experiments.

There are probably other targets to include for a few abx: clinda, metro

change the order of the abx
- 11/21/14
make class/mechanism uniform throughout table
- 11/21/14
add an administration route column
- 11/21/14

Supplemental Table 1: Titration amounts for each antibiotic treatment.

change the order of the abx
- 11/21/14
add an administration route column
- 11/21/14

Supplemental Table 2: Model OTU Selection Criteria

make pretty