Executive Summary

This report contains a detailed statistical analysis of the results to the survey titled Global survey of models of preclinical TB drug testing. The analysis of the results includes answers from all respondents who took the survey in the 84 day period from Wednesday, November 7, 2007 to Tuesday, January 29, 2008. Twenty-seven completed responses from 24 institutions were received during this time.

Respondents were representative of a variety of labs: 18.5% government, 11.1% research institute, 11.1% industry and 56.5% from academia. While most (>90%) were willing to be visited or contacted regarding the program. Of respondents, 79% were willing to share protocols.

In vitro survey summary

70.4% of labs did whole cell screening assays and 40.7% did isolated molecular target assays while only 7.4% did cell-free multicomponent screening. A variety of strains were used for detection of anti-TB activity as demonstrated by the figure below.


Details of screening methods

Only 3 labs reported differences in compound activity between strains. One found differences in sensitivity of H37Ra vs. H37Rv to new compounds and considered that possibility of deletions in the H37Ra genome affecting drug activation. Another lab found at least 2 compounds which to which H37Ra is more sensitive than H37Rv, one being rifampin. One lab observed that BCG is more susceptible to nitro-imidazoles versus their standard H37Rv strain of TB used for MIC testing; no mechanistic explanation was given. Another group found that >90% of hits on BCG also had activity for M. tuberculosis. Source or history of the strain used was highly variable but the majority used H37Rv from ATCC (#27294). Most used drug susceptible lab strains but a number of labs incorporated clinical strains.

Media used was highly heterogeneous and selected comments are as follows: Middlebrook 7H9 OADC, Glycine-alanine salts (GAS) medium supplemented with Tween 80, pH 6.6. Bacto-Casitone, citric acid, L-alanine, glycerol and Tween 80. GAS (sterilized by autoclaving at 121° C for 30 min). 7H11 pH (with HCl prior to autoclaving) 5.9 for PZA, Sauton’s medium. Some filter sterilize and others autoclave or both. The incubation time and incubation temperature depended on readout and ranged from 45 min for radiometric assays to a range from 6 hrs at 37C to 14 days at 37° C. Seven days incubation time was typical for the alamar blue assay. The incubation period depends on the particular strain used. For H37Rv plates are incubated 7-10 days and 2 days at 35°C for M. smegmatis. The figure below indicates the reagents used for readouts by responding labs.


The incubation time between reagent addition and reading ranged from 5 min to at least 16 hours, usually 24 hours for the alamar blue assay method. Many used more than one endpoint readout. Approximately one third of respondents used visual readouts in liquid culture whereas 30% used cfu readouts.


Many felt that the signal to noise ratio was affected by the time of incubation. The time of growth and incubation with alamar blue reagent were found to be key variables. In a spectrophotometric assay the purity of the enzyme preparation and the way of preparation of the substrates and the source (vendor) of the substrates was important. In general, the innoculum, temperature and evaporation from plate as well as precipitation of the compounds could give misleading results.

Definition of a “hit”

For those groups that conduct primary screening prior to determining MICs, maximum concentrations ranged from 2-10 ug/ml or 30 uM. One group screened at multiple concentrations: 0.12, 1, 2 ug/ml. Criteria for a hit varied widely from 30% inhibition in a spectrophotometric assay to 50% inhibition in a radiometric assay. A 10-fold reduction or 90% inhibition in viability/metabolism compared to DMSO controls was commonly used to define a hit.


Positive control drugs included streptomycin isoniazid ethambutol, rifampin, moxifloxacin, cefotaxime, ofloxacin levofloxacin, amikacin and linezolid. DMSO alone, erythromycin and vancomycin were variably used as negative controls. For anaerobic assays most labs use metronidazole as positive and isoniazid as a negative control.

Minimum inhibitory concentration (MIC)

To calculate MICs, a visual read MIC is done most commonly from a series of two fold dilutions of test compound. This was also done with Alamar Blue. Other examples include the lowest concentration for which luminescence is lower than a 1:100 dilution of test culture with no drug. Agar MIC was also reported and is based on absence of growth on drug-containing 7H11 plates. Some used software (WHONET) to calculate MIC50/90. Others actually look for a total inhibition of growth and some also look at IC50 and slope of curve. Others used software to (WHONET was cited) calculate/extrapolate MICs from a graph plotted with PRISM or other software. Some conducted primary screens at fixed concentrations but many determine MICs for every sample. The former is often conducted for large scale screens followed by MICs for all hits.

Incubation time, pH, inoculum, compound solubility, batch of compound, use of H37Ra (10-fold lower than against H37Rv), media age, compounds used after prolonged storage, growth phase of the cells, initial OD and inoculum size are variables found to affect the MIC for standard anti-TB drugs or experimental compounds using fixed concentrations. The following figure demonstrates the types of libraries used in HTS.


Source of the library

The source of HTS libraries is mostly commercial but synthetic, natural products and extracts, usually from academic organizations, have also been screened. The screens are typically performed in microtiter plates. Sources mentioned are Chembridge, Analyticon, Broad Institute, Tripos and Nanosyn and Novartis archive.

Special aspects of assays

Compounds come in solution (usually DMSO) or as powders. Known drugs are included in each assay (separate or in some cases on the same plate) and their MICs used to QC assay performance. Isoniazid and rifampin are often mentioned as positive controls with. solvent alone as a negative control. Often these controls are setup in triplicate for each experiment and negative controls are included. Sixty percent of respondents indicated they do interplate controls and 40% do intraplate controls. Assessment of potency included the percentage inhibition based on drug-free controls; bacteria-free controls were often included. Growth as expressed as MIC was a method of potency determination. Others used an estimate of the MIC50 using PRISM software. About half of respondents determine MBCs by sub-culture from the MIC test while others conducted it as a separate test.

Other specialty assays

Macrophage assays are performed with a wide variety of cell sources including rat cell lines, bone-marrow derived macrophages, J774 and THP-1 cells.

Non-replicating or slowly replicating assays

The Wayne, the Nitric Oxide and the Hypoxia models are used to measure the activity of new compounds against non-replicating (NR) or slowly replicating (SR)TB bacilli. Mycobacterium tuberculosis, Mycobacterium smegmatis and Mycobacterium bovis BCG are usually used for these studies. One group mentions that for all cases they use H37Rv strain of M. tuberculosis for NR and SR assays as well as BCG and both were chosen because they are lab strains and the genome sequence is available. In one case, a resistant mutant to R207910 is also used. The positive and negative controls for the NR and SR assays included such agents as metronidazole, isoniazid, moxifloxacin and rifampin.

The following responses were amongst those reported:


The NRP and SRP assays are not felt to be high through put and one lab does this once every three weeks with only 12-20 compounds per run. Another did 5-10 compounds/week and the assay was setup once a week and a final group stated that 2 to 3 experiments can be performed per week.

The major variables that affect the outcome of the NR or SR assay include the head space ratio, abrupt stirring, the rpm and proper sealing of vials. Variation is seen at times with INH (showing slight activity), or with experimental compounds which is usually due to solubility issues. Other assay variables included the starting inoculum, degree of anaerobiosis and clumping (for nutrient starvation model).

Weaknesses of the NR or SR assay system include the fact that the in vivo relevance of these assays is not known and it often results in heterogeneous results. Also mentioned are the lack of suitability for HTS and is an assay that is less reproducible than regular MIC assay. Assay optimization is critical for the Wayne model to obtain meaningful results. The strength is its strong ability to differentiate between active compounds such as rifampicin and non active compounds such as isoniazid. A perceived strength is that it is a good indicator of efficacy against dormant, anaerobic non-replicating persistent bacilli which would result in compounds that could potentially shorten time of TB therapy.

2. In vivo results

Choice of animal model

For in vivo testing, the respondents described the use of the following animal species. The following bar graph demonstrates the range of species used for TB drug testing by the respondents to the survey. Mice of a variety of strains (one group used a gamma interferon knock out mouse) was represented and 11% used guinea pigs.


In the survey, a major advantage of the mouse model for efficacy testing of experimental compounds was found by most investigators in that it is economical, widely available and easy to handle. The expense compared to other animals was emphasized. For inbred mouse strains there is little variability, and one can easily include sufficient controls. Hence, the mouse model is used by many researchers because of the standardization, cost, simplicity and reproducibility. It was felt by an investigator that the aerosol infection of a susceptible animal species which mimics human granuloma formation and allows study of the extrapulmonary dissemination and hematogenous reseeding of the lung allows the evaluation of drug effects under conditions which are directly relevant to human TB. Regarding the route of infection, the aerosol was mentioned by several groups as the most realistic mode of infection when compared to natural infection and the guinea pig was found by several of the responding investigators to be the most reflective of the human pathology and disease. Also mentioned was that the BALB/c mice develop productive infection with stable plateau in CFU counts (as opposed to declining counts observed in some other inbred, more resistant mouse strains). The characteristics of the Balb/c were felt to hold for Swiss mice, which are less expensive and are outbred (which is similar to humans). Mice are also mostly commonly used for in vivo toxicity studies on lead compounds. It was commented on that C57BL/6 is more sensitive to some drugs (toxic effects) than other mouse strains.

As disadvantages on the mouse model, several investigators mentioned that the mouse does not demonstrate all lesion types seen in typical human TB pathology. In addition, pharmacokinetics and drug metabolism in rodents are different from humans and cross species bridging studies/calculations have to be performed. One group of investigators mentioned as a disadvantage of outbred mice the higher variability in CFU counts in Swiss mice. Therefore, a higher number of mice have to be utilized for outbred strains to reach sufficient statistical power. Using the current mouse strains, alternative physiological states of the bacilli are not observed such as latency or in most cases mice have mainly intracellular organisms. Finally it was pointed out that as with all experimental models, it is an experimental model only.

Oral bioavailability and early pharmacokinetics

Before testing the efficacy of the experimental compound in vivo, most laboratories test the oral bioavailability to a lesser or greater extent depending on the laboratories. Usually, the industry (who has access to a formulation) and pharmacology department include extensive PK analysis prior to any efficacy testing. However, the other institutions and academic groups appear to also include some evaluation of oral bioavailability by assay or by HPLC. Before any long term mouse studies are started by everyone via a more formal pharmacokinetics analysis is performed to establish appropriate dose (equivalent dose compared to humans for known TB drugs). One lab that performs PK uses curves containing 6-8 time points following single doses or steady-state dosing, obtaining serum or plasma by cardiac puncture using 3 mice per time point and a non-compartmental analysis is performed using WinNonlin. This laboratory has data on bioequivalence for 1st-line drugs and some second-line drugs. Based on in vivo PK data in the relevant species where efficacy is tested, the WinNonLin software is used to model the PK profile and simulate a range of doses and dosing regimens. PK/PD indices (AUC/MIC,Cmax/MIC,T>MIC ) are then determined at each dose in order to select doses expected to be efficacious in the infection model. Some investigators describe to use combination drug studies to rule out significant PK interactions. In another lab, a maximum tolerated dose (MTD) and PK/PD assays of all compounds are done before any efficacy animal studies.

Toxicity and ADME

About half of the responders mention to use in vitro cytotoxicity assays. The industry again uses a wider array of cell lines. For instance one laboratory evaluates compounds against a panel of 4 different cell lines: THP-1 (human monocyte cell line, suspension), BHK21 (Syrian Golden hamster kidney cells, adherent), HepG2 (Human Liver Carcinoma, adherent) and C6 Glioma (Rat brain glioma, adherent) (Cell lines obtained from ATCC). In addition, the industry has easy access to additional assays and will therefore also implement these sooner in the screening process, such as genotoxicity (Ames test) -cardiotoxicity (hERG binding and patch clamp assays) assays. Extensive in vivo toxicology is usually not performed by the most laboratories at the stage of compound optimization; however most variably look at the survival of the animals and gross necropsy findings (liver, spleen, kidney pathology analysis) after single or multiple dose administration. A few laboratories dose agents at higher than anticipated concentrations for a week to evaluate weight loss. In other laboratories, acute and chronic Maximum Tolerated Dose assay is performed before any in vivo work is initiated.

Most respondents do not routinely test the ADME (absorption, distribution, metabolism and elimination) of compounds themselves. For oral bioavailability some investigators use a bioassay using M. tuberculosis to estimate drug levels in serum. For drug metabolism, the microsomal assay is utilized. This assay takes usually place in parallel with in vivo efficacy testing. Solubility is tested between pH 1.0 and pH 6.8 – PAMPA (artificial membrane permeability) – CaCO2 assay – Microsomal stability in mouse and human microsomes – CYP450 inhibition are being used in most laboratories at some stage of compound optimization. At a later pre-clinical stage, more extensive ADME studies are conducted in rat and one non rodent species, including tissue distribution studies with radio-labeled compounds. Formulation work is begun only after a lead is identified and later in preclinical development.

Infection Models

Routes of infection differed across the different research groups, as well as the start inoculums of M. tuberculosis. The infection occurs in the animals for 10-20 CFU per animal (guinea pigs) to more than 7 Logs (i.v. in mice). Most groups use aerosol or intranasal infection (using 100 to 3500 CFU per animal), fewer groups use the intravenous infection with a high bacterial number. Investigators mention the use of direct disposition in the lung allows the infection and disease to develop in a manner very similar to clinical TB. Others using the intravenous route where the early bacterial number is high, describe that the intravenous route of infection is the only one which allows extrapolating the results of activities of antibiotics to humans. Investigators stated that the IV route is harder to perform compared to an aerosol. At least one group uses intranasal route of infection for mice which is felt to at least partially mimic the natural infection. For safety considerations, intravenous route is not used anymore by at least one lab. Two thirds of laboratories include a placebo gavage group and have observed stress or adverse outcomes of the placebo gavage (no further details were given). Almost all laboratories include INH or RIF as control compounds in every study, and all mention to have good reproducibility of the results.

Bacterial Stocks

There was major variability seen in the origin and propagation of the M. tuberculosis strain used for the in vivoexperiments: the origin of the M. tuberculosis in many of the laboratories is not entirely clear (gift from other investigators in the past) and there is not a precise log of how many times the strain has been passaged. In half of the labs there was no record of passage number kept for your M. tuberculosis strain used. Only one lab has compared TB strains and found that Erdman was more virulent than H37Rv in guinea pigs and mice. On handling/aliquoting the bacterial strains: in one of the laboratories, the H37Rv strain was obtained from the ATCC, then grown to mid-log phase in liquid culture, homogenized, sonicated briefly, and filtered to obtain a single-cell suspension before freezing at minus 80°C and determining the CFU from a thawed aliquot. Whereas another laboratory uses filtered bacteria (through 5 um syringe filter). Still others use media-containing freshly grown in vitro cultures of H37Rv. On growth conditions: one laboratory uses an early passage number of ATCC Erdman grown as pellicle and frozen as seed stocks. The seed stock is then grown as expanded cultures over three passages to 100-200 ml cultures, in Proskauer Beck, up to OD of 0.5-0.7, aliquoted witha virulence test in GKO/B6 mice. Others mentioned that the strain is mouse-passaged, frozen and then sub-cultured. The following media were used to propagate the bacterial culture: Dubos broth with ADCC enrichment, 7H9, No Tween80, 7H9/ADC/glycerol/0.05% Tween 80, Difco 7H9 with OADC Tween80, 7H9-ADS medium containing 0.05% Tween 80 and 7H9 broth with 0.02% Tween80.

In most laboratories frozen bacterial stocks are prepared (typically in 7H9) and stored in small aliquots at minus 80°C. At the time of the experiment, a bacterial vial is thawed and diluted to the appropriate concentration based upon the known CFU for that stock. This method allows a precise duplication of the infection level each time an experiment is performed. This large number of aliquots will last then in this lab for years. There appears to be no detectable change in infectivity or virulence over many months or years. Some have noticed that the viability can drop down after prolonged storage at minus 80°C (e.g. 3-4 years). Freezing the inoculum does not adversely affect the viability as certain groups found the reproducibility of this method over decades has been remarkable. It was reported that the OD is not reliable due to clumping of bacteria. One laboratory uses TB strains grown in a 500 ml flask only every few years and large aliquots are made once the OD 600 approaches late log (0.6-0.9). The titers are confirmed by plate counts and the virulence is checked in mouse models. Another lab grows TB to an OD600 of 0.3-0.5 in 7H9-ADS medium, centrifuges 10 minutes at 3,300rpm and then pellet is washed once in warm 7H9 medium. The washed pellet is resuspended at an OD600 of 1.0 in 7H9-ADS medium supplemented with 15% glycerol and frozen at -80C. The viable count of each batch of frozen cultures is determined by plating prior to infection. The frozen aliquots are then diluted and sonicated (25 sec in a water bath sonicator) before infection. Others favor infecting with an actively growing culture. The bacterial culture containing the culture media is then used for infection. Finally, one group standardizes the inoculum according to McFarland suspensions.

Inoculum for infection

The inoculum of M. tuberculosis used for infection differs widely across the laboratories. Some laboratories aim at inoculating with 10 CFU, others indicate that 200-300 CFU/mouse at initial aerosol challenge is better for testing drug candidates because the bacteria grow to higher numbers in the lung and spleen. A higher inoculum of 3.5-4 logs is implanted by the aerosol route was felt to be important in order to reach a bacterial burden of 7-8 logs (which is hypothesized to be similar to a human cavity) at start of the combination therapy which would occur 14 days later. In this latter model, it takes approximately 6 months of treatment with the standard RHZ-based regimen to cure all mice which is similar to the length of treatment used in TB patients. It was pointed out that often the model to use depends on what question you are asking. For instance, a low-dose aerosol may be used for monotherapy trials and will prevent emergence of resistance, and will better represent a chronic, established TB infection. One group who used an intranasal model, considered their model as a low dose infection model, because the chosen inoculum used was found the lowest which can establish a lung infection in our animal models. However if a very low inoculum is used, some bacteria could get trapped in the airway and not reach the lungs (standard deviation was more when less than 5 X 102 bacteria was used).

Another offered the suggestion that there is no real need to perform CFU determinations the day after infection in most routine experiments, unless the challenge strain changes. Another group feels the same as long as clumps are removed by filtration. While still others perform all controls for enumeration of bacteria: lungs are plated at the day of infection or one day after, the bacterial suspension used as inoculate is plated, the lung the day after infection and the day of treatment initiation. Multiple responders stated that they use 5 mice at each time point.

Plating of homogenates

For testing of single compounds and short term experiments some plate both lungs while some others plate the homogenate of the same lung lobe (usually lower caudal) from each animal; if very low numbers of bacteria are expected, a larger portion (mostly the whole lung) is homogenized and a sizeable fraction of the homogenate (one-third to one-half) plated. Only a minority of investigators saves organs for histopathology and of these most used formalin fixation anywhere from 3-10 days to decontaminate the samples. No group has ever seen cavitation in the mouse, though necrosis is sometimes seen. Other readout methods besides CFU enumeration are only occasionally investigated in efficacy trials (pathology in 15% of the institutions, gross necropsy in 22%, Zeihl-Neelsen staining 15% and auramine rhodamine in 4%.)

Regarding the sensitivity of plating after long term drug treatment trials, some investigators plate a fraction (one-tenth) of the homogenate from one large lung lobe and half of the spleen while most respondents plated the entire lung lobe and the total spleen. For short term trials usually with single compounds, one group collected left lungs and spleens, transfer in 4.5 ml of saline and plate 1:5 dilutions. For long term trials looking at sterilization activity of compounds, this same group modifies slightly the protocol and collects whole lungs and adds 1 ml of saline and plates the entire homogenate. Another group homogenizes whole lungs in a total of 2.5 ml PBS and then plates the entire homogenate on five 7H11 plates (0.5ml/plate). Another group uses whole lungs and total spleen homogenized in 4 ml of PBS supplemented with 0.05% Triton X-100.


Relapse experiments have been done by only few of the respondents. They vary from 12-40 weeks following cessation of the drug regimen; with and without immunosuppression (hydrocortisone; dexamethasone and cyclophosphamide). Readouts include observation of clinical signs, enumeration of colony forming units, histopathology, spleen weights, and gross lung lesions. Some investigators compare relapse proportions using statistical methods such as Fishers exact test or the Khi2 test.


All respondents used statistical methods in conjunction with their animal experiments and most do not consult with a statistician. Treatment effects are mostly analyzed by ANOVA, followed by a T-test, Tukey or Dunnetts test. For relapse studies a Fisher Exact or Khi2 test is run.

Unexpected Results

There were some unexpected data in animals mentioned in the survey: for one investigator metronidazole and PZA are drugs that have shown activity in some laboratories in mice but were not found to be effective in other labs. Some groups typically observe antagonism of INH on the RIF-PZA combination. One group noted that CFU counts are below limit of detection after only two weeks of treatment with isoniazid (10mg/kg) and moxifloxacin (100mg/kg).

Relevance of Animal model

Groups with evidence to support or refute the ability of any animal model to mimic human disease and predict treatment outcomes mentioned the following: mouse pathology has limited similarity with the human pathology, some aim not to mimic human disease but try to mimic the response of M. tuberculosis to drugs used at equivalent doses. Certain groups aims to have in their mouse model the HRZ regimen displays almost the same relapse rate as in patients, or aim to have similar number of bacilli in mice as seen in cavitary lesions in TB patients, or a similar way of exposure as in humans. The mouse model has its limitations (only one lesion type is present and all bacilli are intracellular) and several investigators felt there is a need for an additional animal model to address these issues and in order to confirm results. The acute mouse model used by several investigators has severe limitations in reproducing human pathology and disease (high actively replicating bacillary load, lack of certain pathological characteristics and metabolic stages of bacilli). The use of this model is therefore also primarily to confirm in vitro potency and evaluate PK features, tissue distribution and in vivo activity. Despite its many flaws, all of the respondents find that the mouse is a reasonable compromise to qualitatively and rapidly assess the efficacy of new compounds in a medium throughput manner.

Preliminary survey results can be found by clicking on the link below;

Survey Results