Process-based and correlative modeling of desert mistletoe distribution : a multiscalar approach

Because factors affecting distributional areas of species change as scale (extent and grain) changes, different environmental and biological factors must be integrated across geographic ranges at different resolutions, to understand fully the patterns and processes underlying species’ ranges. We expected climate factors to be more important at coarse resolutions and biotic factors at finer resolutions. We used data on occurrence of a parasitic plant (Phoradendron californicum), restricted to parts of the Sonoran and Mojave deserts, to analyze how climate and mobility factors explain its distributional area. We developed analyses at five spatial resolutions (1, 5, 10, 20, 50 km) within the distributional area of the disperser species, and compared ecological niche models from three commonly used correlative methods with a process-based model that estimates colonization and extinction rates in a metapopulation framework. Correlative models improved when layers associated with hosts and disperser were used as predictors, in comparison with models based on climate only; however, they tended to overfit to data as more layers were added. Dispersal-related parameters were more relevant at finer resolutions (1–5 km), but importance of extinction-related parameters did not change with scale. We observed greater coincidence between correlative and process-based models when based only on dimensions of the abiotic niche (i.e., climate), but a clearer and more comprehensive mechanistic understanding was derived from the process-


INTRODUCTION
The relative roles of biotic and abiotic factors in determining distributions of species at specific spatial scales are a central organizing theme in ecology (Levin 1992).Understanding how patterns at one scale are manifestations of or influence processes operating at other scales is a particular challenge (Levin and Pacala 1997, Pearson and Dawson 2003, Hastings et al. 2010).The core of this challenge lies in disentangling how changes in scale affect the importance of different factors in shaping species' distributional patterns and processes.
When studying species' distributions, it is useful to distinguish interacting variables from those that are dynamically uncoupled from the presence or abundance of the species in question (scenopoetic variables; Hutchinson 1978).These variables may have contrasting effects at fine versus coarse spatial resolutions (Wiens 1989).The many ways in which scenopoetic variables operate at different extents and resolutions can be used to explain distributions of the species and propose scale-dependent hypotheses regarding factors most relevant at different scales (Pulliam 2000, Gaston 2003, Sobero ´n and Peterson 2005, Sobero ´n 2007, 2010, Peterson et al. 2011).
Using data available documenting species' occurrence and climatic variation, researchers have been able to estimate environmental requirements of many species across broad regions (Guisan andZimmermann 2000, Elith et al. 2006).Factors manifested at finer resolutions are less well studied at the scope of broad geographic ranges, leaving a gap in understanding as to how effects of these factors vary spatially and temporally.Also, knowledge of species' distributions at local scales may be sparse or biased spatially across geographic ranges, which makes analyzing the relative importance of variables even more difficult (MacArthur 1972, Levin 1992).These considerations explain why studies of spatial distributions of species that combine broad-scale and fine-scale views remain uncommon (e.g., Mackey and Lindenmayer 2001, Gaston et al. 2004, Heikkinen et al. 2007, Cunningham et al. 2009).
Correlative and process-based modeling are two approaches for exploring factors important in determining distributions of species at different spatial scales (Robertson et al. 2003, Kearney et al. 2010).The main difference between these approaches is that correlative methods simply seek associations between environments and occurrences from across broad geographic ranges, whereas process-based methods incorporate explicit hypotheses about biological processes (Robertson et al. 2003, Kearney and Porter 2009, Monahan 2009, Morin and Thuiller 2009, Cabral and Schurr 2010, Kearney et al. 2010).In particular, a set of factors that is emerging as crucial in process-based modeling studies is dispersal: incorporating dispersal factors improves model performance markedly (Allouche et al. 2008, Cabral and Schur 2010, Brotons et al. 2012).
In this paper, we explore how dispersal and scenopoetic factors act synergistically to explain the distribution of the Desert Mistletoe, Phoradendron californicum, at varying spatial resolutions.P. californicum is a hemi-parasitic plant associated with legume hosts in the Sonoran and Mojave deserts, and it is dispersed in largest part by the bird species Phainopepla nitens (Walsberg 1975, Kuijt 2003).This parasite-host-disperser system offers two advantages for understanding how scaling of biological processes govern distributions (Overton 1996, Aukema and Martınez del Rio 2002, Aukema 2003, 2004): (1) hosts are known, and their distributions can be studied at high spatial resolution using aerial photographs; and (2) dispersal is determined chiefly by a single disperser.We modeled the portion of the distribution of P. californicum in the U.S., where detailed data on Phainopepla nitens are available (almost 900,000 km 2 ; Fig. 1), encompassing approximately half of the parasite species' range.We compare commonly-used correlative niche modeling methods (GARP, Maxent, GAM) with a process-based model derived from a metapopulation-dynamic framework coupling rates of colonization and extinction.Following Pearson and Dawson (2003) and others, we hypothesize that, as resolution coarsens, variables associated with abiotic factors (e.g., climate) will become more important, but that the opposite will apply for biotic variables (e.g., dispersal).

Conceptual framework
To help understand causal factors underlying geographic distributions of species, a heuristic device called the BAM diagram is useful (Sobero ´n and Peterson 2005, Peterson et al. 2011).The BAM diagram displays relationships between abiotic or uncoupled (A) and coupled biotic factors (B), and the movement or dispersal capacities of the species (M).This framework can be used to make explicit the possible arrangements of factors that determine distributions of species, and gives the opportunity to generate hypothetical scenarios, depending on the degree and geometry of their overlap (Fig. 2).Specifically, with this framework, we can analyze interactions of factors at fine and coarse spatial resolutions.
In the mistletoe-host-disperser system, the BAM diagram can be simplified significantly by making some reasonable assumptions.The conditions manifested across A (Sobero ´n and Peterson 2005, Sobero ´n 2007) represent those under which the species can have a positive growth rate (Hutchinson 1978); therefore, conditions in A are linked intimately to the environmental dimensions on which mechanisms of establishment and survivorship depend (Wiens 2000).Factors in M are related to movement: what happens across M determines how populations will be structured, demographically and genetically, although testing this particular idea will be accomplished in a later phase of this project (Lira-Noriega et al., unpublished manuscript).
We hypothesize that, to a first degree of approximation, the interaction with the host can be regarded as uncoupled, since hosts have expected lifetimes at least 10-fold longer than those of parasites (Overton 1996(Overton , 1997) ) and therefore their population dynamics can be regarded as approximately static relative to the faster demography of the parasites.The simplifying assumption of uncoupled dynamics of the total host population is often made in epidemiological studies (Anderson 1981).This means that we regard climate and presence of hosts as part of the A circle.Second, we did not include the effect of competitors, which are not known to be present, nor of herbivores or pollinators, because information about them at the spatial extent of our study is simply unavailable.
Finally, in the mistletoe system, one major biotic factor, birds, act directly as dispersers.From a certain point of view the dispersers may be considered as part of B, but their effect is clearly felt in the dispersal circle, making B and M coincident.In consequence, we can simplify our system to a two-factor diagram: the abiotic climatic factors (A) and the hosts substrate, that The study area corresponds to the distributional area of the principal disperser, assumed to represent accessible areas for the mistletoe (M), for which abundance data are available (thickest black line).Notice there are areas where the disperser or hosts can be present but not the mistletoe; these areas are relevant to understand the interaction of factors that limit the distribution of the species as well as challenging areas in which to test the models.
v www.esajournals.orgdetermine overall potential distributions, and the movement or dispersal capacity of the species (M), driven mainly by the bird that disperses seeds among trees.M represents a hypothesis of the regions that have been accessible for the species over relevant time periods; as such, M is the area across which models should be calibrated (Barve et al. 2011; see also Acevedo et al. 2012).Following this reasoning, we designated an M as the area of distribution of the mistletoe's main disperser, P. nitens, for which we have detailed abundance information (Fig. 1).Although other potential dispersers of the mistletoe species in question are known, none is as closely associated as this species (Walsberg 1975); adding the densities of other bird species to the model would not have changed our results, given that they overlap broadly in their distribution and behavior with that of P. nitens.This assumption can be relaxed in future studies, but for the present simplifies the process-based modeling exercise considerably.

METHODS
We designed a comparison of a process-based model with correlative models from three algorithms to understand the distribution of P. californicum at five spatial resolutions (1, 5, 10, 20, 50 km).We first describe the series of steps we followed in order to obtain data on the distribution of the parasite followed by the assemblage of three sets of environmental variables: spatial variation in disperser density, spatial variation in numbers of hosts, and climatic factors across the study area.We then describe the design and implementation of each model type and finally the comparisons among them.

Distribution of species presences and landscape characteristics
To estimate proportions of mistletoe-infected trees, we recorded geographic locations of infected host trees along roads.One of us (A.L.-N.) sampled occurrences of host trees infected by P. californicum across the southwestern United States and northwestern Mexico (Fig. 1).We used roads as sampling transects as an efficient means of surveying the species' range limits, allowing us to gather information regarding the broader set of climatic situations where the species occurs.We used a GPS unit to record geographic coordinates for each infected host tree located within 100 m of the road.In total, we sampled 16,000 km (4153 km within the modeling extent) of roads, and collected 17,371 unique geographic coordinates of infected trees (Fig. 1), 12,578 of which fell within the modeling extent (see  ), assuming that the interaction between host and parasite is uncoupled, and that the positive interaction between disperser and parasite makes biotic conditions (B) and accessible area (M) coincident.As scale changes, limiting factors in A and M will also change.At higher resolutions, almost the entire area is accessible, and climate is not so limiting, whereas at coarser resolutions dispersal becomes more difficult and climate is more constant.v www.esajournals.orgbelow).
Sampling along roads may potentially bias the data, since roads are usually areas of higher water accumulation, thus impacting quality of the hosts (Norton and Smith 1999).However, given that we are generalizing prevalence of the species from host trees at several spatial resolutions (cells of 1, 5, 10, 20 and 50 km by side), we assume that the effect of roads is constant and will not influence our general understanding of mistletoe distributions.Identification of host and mistletoe species was achieved via collection of herbarium specimens at sites every ;50 km across the study area or every time changes in host species composition were suspected.We used a second GPS unit to record landmarks by which to annotate general descriptions of species composition and vegetation physiognomy and landscape characteristics along the road; this information was used in processing aerial photographs (see below).All voucher specimens are deposited at the Ronald L. McGregor Herbarium, University of Kansas; copies of the collection were deposited at different herbaria in Mexico and the United States (see Acknowledgments).The two GPS units were Garmin 60CSx, with antennas to improve precision of coordinate estimates.

Host tree mapping
To derive proportions of infected trees for the process-based model, we extracted geographic coordinates of tree canopies along roads sampled.This step was achieved using National Agriculture Imagery Program (NAIP) aerial photographs, ideal for tree mapping because of their high spatial and spectral resolution (1 m and four bands-red, green, blue, and near infrared).We carried out object-oriented classification of these photographs with the software eCognition 3.0 (Baatz et al. 2003).We first identified all 3.75 0 3 3.75 0 aerial photographs that intersected the roads sampled.We further reduced the set of aerial photographs to those including the land use cover types in which mistletoe infections were found via comparisons with the 2001 National Land Cover Database (30 m resolution; Homer et al. 2004); land cover types considered were barren land, cultivated land, deciduous forest, development low density, development medium intensity, development open space, emergent herbaceous wetlands, hay/pasture, herbaceous, open water, shrub/ scrub, and woody wetlands.We selected at random 10% (67) of the aerial photographs for the states of Arizona (25; from 2007), New Mexico (6;2009), California (13;2009), and Nevada (23;2006).Because of significant computational demands in the classification process, each 3.75 0 3 3.75 0 image was split into four equal portions for processing, generating a total of 268 images to classify.
In eCognition, we first segmented each image into two levels, with the following parameters: level 1 (scale ¼ 10, color ¼ 0.8, shape ¼ 0.2, smoothness and compactness ¼ 0.5) and level 2 (scale ¼ 3, color ¼ 0.8, shape ¼ 0.2, smoothness and compactness ¼ 0.5).Then, to classify trees, we selected polygons manually from the level 2 objects to be used as canopy samples, and classified each image using the standard nearest neighbor with the mean of each of the four bands (R, G, B, and NIR), the mean difference from neighbors, and the mean difference between the level 2 objects nested within the level 1 superobject.These procedures have been used in previous studies that have analyzed mesquite distributions in similar landscapes and regions (e.g., Laliberte et al. 2004).Using ArcGIS 9.3, we dissolved the raw polygon output from eCognition to obtain single-tree polygons.This step was necessary because the output sometimes contained multiple polygons subdividing single tree crowns.Although this procedure sometimes reduced numbers of trees estimated, it was not a common problem overall, and estimates of trees were close to true numbers of trees in each image (R 2 ¼ 0.617, P , 0.001).This step allowed us to extract 5,786,586 polygons corresponding to tree crowns, and count actual numbers of trees within the 200 m strip along roads, to estimate proportions of trees infected (via the GPS coordinates of mistletoes described above).

Climate, host, and disperser summaries
These datasets were each derived at five spatial resolutions, as follows.Climatic layers used were annual mean temperature and annual precipitation, obtained from the WorldClim database (Hijmans et al. 2005), using the 30 00 , 2.5 0 , 5 0 , and 10 0 resolution products to match 1 km, 5 km, 10 km, and 20 km resolutions, respectively; for the 50 km resolution, we scaled the values up with the mean of the 10 0 raster layers using a bilinear interpolation from nearest neighbor cells.Because other climatic factors, particularly freezing temperatures and sunny days, can be important limiting factors for the mistletoe, we tested models with two other variables estimated directly from weather-station data (Easterling et al. 1999) for a 20-yr period (number of continuous days of freezing temperatures and number of continuous rainless days).Because models resulting showed similar or lower performance scores, these variables were not included in our final analyses.
The raster data layer summarizing abundances of Phainopepla nitens was obtained from the Breeding Bird Survey database (v.12.07.2011 [Sauer et al. 2011]), which comes as a vectorformat data layer with cells at a resolution of 25 km, the product of interpolating point abundance information across the bird species' distribution in the United States.In contrast to the rest of the layers used for modeling, this data layer was used statically at its native resolution (i.e., not scale-dependent) for the first four resolutions of our analyses.
To estimate numbers of host trees across the region, we developed models to predict numbers of trees per cell at 1 km resolution using random forests (randomForest in R; Liaw and Wiener 2002).This estimate was based on 580 1-km cells sampled across the region from 85 images from the NAIP (55 images from Arizona, 7 from New Mexico, 7 from California, and 16 from Nevada) that were classified following the protocol described above.Specifically, we asked for a total of 10,000 trees developed under the random forest protocol, from which we estimated the logarithm of the number of host trees with the following statistical model: log 10 of host trees ; The grass/scrub/woodland (GRS 2000) and barren/very sparsely vegetated land (NVG 2000) are raster data layers obtained from the Harmonized World Soil Database v 1.1 (Fischer et al. 2008).The final estimate of the number of host trees was more explanatory (67.0% of total variation) and showed better fit between observed and predicted numbers of trees (r ¼ 0.91, P , 0.01) and between observed and independent external data set aside from the classified images (r ¼ 0.60, P , 0.01) than other random forests based on different combinations of variables, numbers of observations, and numbers of trees within the process.This procedure also gave better results than the two most explanatory (67.7-68.1% of total variation) generalized additive models (GAMs) of different combinations of same predictors as used in the random forest (r ¼À0.22,P .0.01; r ¼À0.16,P .0.01).Although the explained variances are similar, the shape of the relationship in the random forests is more regular and linear than with the GAMs.

Process-based model
To estimate probability of occurrence of P. californicum in grid cells via a process-based modeling approach, we developed a spatially explicit model incorporating the balancing rates of colonization and local extinction, as proposed in the theory of metapopulations (Hanski 1999, Vandermeer and Goldberg 2003, Roberts and Hamann 2012).The model was developed as follows.
After some assumptions related to using a mean-field approximation, we fit the following equation following Hanski's (1999) incidence function model: where " p J represents the proportion of hosts in a composite cell J occupied by mistletoes.The parameter " c J is the mean colonization rate, assumed to be a function of the density of birds and host trees in the focal cell J and neighboring cells.The parameter " e J is the mean death rate of mistletoes, assumed to be a function of the distance between optimal climate, estimated as the difference of average climatic conditions across known occurrence points (for each resolution) and the climate in the cell J.This model was fit for each of the five spatial resolutions using a maximum likelihood routine implemented in R. Derivations of formulas for this model are provided in Appendix A.
To estimate variability in parameters, we resampled the data with replacement 115 times v www.esajournals.orgto obtain a distribution of colonization and extinction-related parameter values with the following number of observations in each iteration: 200 (56% of the original data matrix) at 1 km resolution, 70 (63%) at 5 km, 50 (68%) at 10 km, 30 (54%) at 20 km, and 20 (64%) at 50 km.The distributions obtained were used to assess whether parameters differed from zero, and how different they were among scales, using ttests.Specifically, we evaluated two colonization parameters, disperser abundance and disperser cost of movement across neighboring cells; in the extinction formula, we tested the deviation from a climate optimum as defined by Mahalanobis distance to the environments in the occurrence points for each resolution (see Appendix A).

Correlative models
We ran a total of 45 correlative models: three algorithms, three sets of environmental predictors, and five spatial resolutions.We used two presence-background correlative niche-modeling algorithms: DesktopGARP (best subsets version in OpenModeller v.1.1.0;de Souza Muñoz et al. 2011) andMaxent v.3.3.3 (Phillips et al. 2006), as well as the presence-absence algorithm generalized additive models (GAM; mgcv [Wood 2011]).We set parameters in GARP as follows: occurrence data for training 50%, soft omission threshold, 100 replicate runs, and best subsets option; the rest of the settings were as default.In Maxent, we set 50% of occurrence points for model calibration, and chose logistic output; the rest of settings were as default.To run models, we selected 5% (N ¼ 629) of the total number of infected trees, after considering the spatial lag of spatial autocorrelations in environmental characteristics as follows.The spatial lag was estimated on the first principal component of the 19 bioclimatic variables as a surrogate for the environmental space in the study region.Principal components were calculated using a correlation matrix in R (stats; R Development Core Team 2011) and the spatial lag was estimated as the distance associated with the sill of a semivariogram in the package Spatial Analysis in Macroecology (SAM 4.0; Rangel et al. 2010).We then sampled the 5% of the occurrence points subject to the constraint that they be separated by at least that distance in space.
We explored influences of different environ-mental factors on the distribution of P. californicum in correlative models using three sets of variables: (1) annual mean temperature and annual precipitation (''climate''); (2) abundance of Phainopepla nitens and numbers of host trees (''disperser þ host''); and (3) a combination of all three (''climate þ disperser þ host'').All models were trained and projected within the extent that corresponds to the distribution of the disperser (M) at each spatial resolution.The importance of environmental predictors at each scale was estimated using the jackknife of regularized training gain in Maxent, recalculation of GARP accuracy using jacknife in openModeller, and the P-value for significance of each variable in GAMs.These procedures form part of the output of each algorithm.

Model evaluations and comparisons
All 50 models (45 correlative and 5 processbased) were evaluated using the partial AUC approaches implemented by Peterson et al. (2008), using 262 independent unique occurrence points for P. californicum from the Global Biodiversity Information Facility (www.gbif.org;search in September 2010).This procedure allowed us to evaluate performance of each model as compared to random expectations, as well as to compare performance across scales and modeling methods.Partial AUC approaches limit analysis to portions of the ROC curve that are relevant to the question at hand (i.e., within omission error tolerances); so one calculates the ratio between the area under the curve for observed values against the area under the line of random discrimination, AUC ratios are expected to depart upwards from one as model performance is better than random.The main advantage of this procedure is that the comparison covers only the range of values over which each algorithm predicts, thus avoiding problem caused by using equal scales of values when such is not the case in all comparisons (e.g., Maxent and GARP; Peterson et al. 2008).
To do this testing, we first multiplied grids by 1000, and converted each floating-point grid to an integer grid in ArcGIS 10.Using the modeled suitability values associated with each independent testing point, we implemented partial AUCs by running 1000 bootstrap simulations in a Vi s u a l B a s i c p r o g r a m ( B a r v e , w w w. biodiversity-informatics-training.org), with 50% of points resampled with replacement in each iteration of the bootstrap, and E ¼ 0.05, given that these testing occurrences were obtained from GBIF and may be subject to some georeferencing or identification error.Distributions of the randomized ratios were compared with z-tests to see if ratios were consistently larger than 1 (1 corresponds to random discrimination).The ratios that resulted from this procedure were also used to compare models in a three-way ANOVA for type of algorithm, environmental predictors, resolution, and interactions of these three factors using R (stats; R Development Core Team 2011).
The degree of spatial agreement between model predictions was calculated using a fuzzy Kappa statistic in the software Map Comparison Kit (Visser and de Nijs 2006), and Pearson correlations were estimated between 3000 random points for resolutions 1-20 km and between the 527 pixels at the 50 km resolution following the Dutilleul method to correct for spatial autocorrelation in SAM 4.0 (Rangel et al. 2010), which corrects the number of degrees of freedom of estimated variances.Kappa statistics were calculated for models thresholded using the minimum training presence value (Liu et al. 2005, Wilson et al. 2005), given that we are confident that occurrence points used for model calibration correspond to a P. californicum with no error (E ¼ 0).Finally, we compared the area predicted as suitable for each model using boxplots to examine effects of model type, environmental predictors, and spatial resolution visually.

RESULTS
All models, except for a marginally lower result in the 1 km process-based model, performed better than random expectations when tested against independent occurrence points (all P , 0.05; Appendix B: Table B1).Comparisons of model performance using AUC ratios indicated differences in performance of models depending on the algorithm, environmental predictors, and resolution, as well as interactions among these factors (Table 1).At all resolutions, the lowest AUC ratios were for models using climate only as environmental predictors; AUC ratios were in general higher for models with disperser þ host and climate þ disperser þ host environmental predictors alone (Fig. 3).AUC ratios at resolutions of 1 and 5 km were highest when the environmental predictors were climate þ disperser þ host, followed by the AUC ratios when the environmental predictors were disperser þ host, and comparatively lower AUC ratios when the environmental predictor was climate (Fig. 3).AUC ratios at resolutions of 10 and 20 km had lower mean ratios when the whole set of predictors were used.At the coarsest resolution (50 km), AUC ratios remained low for climate and disperser þ host, but were considerably higher for climate þ disperser þ host (Fig. 3).
Coincidence among models, after converting them to binary maps, was highly variable in terms of both amount of area predicted as suitable and amount of overlap.Proportional areas identified as suitable differed depending on algorithm and environmental predictors, but not on resolution (Fig. 4; see also Appendix B: Fig. B1).On the other hand, amount of overlap between models at each resolution varied considerably when compared with the fuzzy Kappa statistic (Fig. 5).The spatial variation in the area identified as suitable can be divided in two broad groups: contiguous areas with high geographic coincidence and more disjunct areas with lower geographic coincidence (see an example in Appendix B: Fig. B2).In general, most map comparisons fell in the more disjunct category, meaning that completely overlapping areas are unusual (Fig. 5).Moreover, these broad groups v www.esajournals.orgdid not fall in the same categories when compared across scales.Correlations between models using raw output values instead of thresholded values coincided with results from fuzzy Kappa comparisons (Table 2).
In the process-based model, the colonization parameter associated with disperser cost of movement across the landscape showed significant departure from zero for the 1 and 5 km resolutions, and the parameter associated with disperser abundance showed significant departure from zero at all resolutions but was significantly higher at the three finest resolutions (Table 3; Fig. 6).Comparing across resolutions, the parameter associated with disperser cost of movement was significantly higher for the scales of 1 and 5 km.Also, when comparing across resolutions, the parameter associated with disperser abundance at 1 km resolution was significantly higher than those for 20 and 50 km; the value from 5 km was higher than that of the value of 20 km; and the value from 10 km was significantly higher than the values from 20 and 50 km (Table 4).Parameter values associated with the extinction function of the process-based model did not differ significantly from one another across scales.
The importance of variables within each of the correlative models was generally consistent across algorithms and resolutions (Table 5).For the climate model, mean annual temperature was the most important variable, except in GAM at 5 km, where precipitation was more important.When models were trained with climate þ disperser þ host as environmental predictors, the most important variable was the disperser, except GARP at 20 km, where host was more important, and Maxent at 50 km, where annual mean temperature and annual precipitation were overall more relevant; GAMs also showed high importance of temperature at resolutions of 5 and 10 km, precipitation at 20 km, and host at 1 and 5 km.When trained with disperser þ host as predictors, GARP showed host as most relevant, while Maxent and GAM showed disperser as most important (Table 5).

DISCUSSION
The two modeling approaches that we implemented are very different in nature (Morin and Thuiller 2009).In the process-based model, we made explicit assumptions about the way that the mistletoe could be dispersed and in this way ''explore'' the region, and reasons behind it successfully establishing or not given the availability of hosts and abiotic climatic conditions.On the other hand, in the correlative models, it is assumed that the effect of non-scenopoetic mechanisms covary with environmental factors.The differences and similarities between the results of the two approaches illuminate the roles of different factors across different spatial scales (Schurr et al. 2012).We could think of processbased models as a hypothetic-deductive approach by which to test hypothesis, whereas correlative models are an inductive way to propose hypotheses; the differences in results are explained by the ways in which the two methods make use of observations.
The results from the process-based model suggest that dispersal factors are most relevant at fine resolutions, but that they become practically irrelevant at coarser resolutions, where climatic factors dominate, as expected under the Eltonian Noise Hypothesis (Sobero ´n and Nakamura 2009, Peterson et al. 2011).This result is consistent with previous general thinking about effects of scale on the relative importance of different factors on distributional ecology (Whittaker et al. 2001, Pearson andDawson 2003), and also with previous results on spatial aggregation patterns of P. californicum (Aukema 2004).This later study found mistletoe aggregation within trees and at about 1.5-2 km, and again at a distance !4 km, that was correlated with elevation.This finding led the author to hypothesise that dispersal was key to the first two scales (within trees and ;1500 m) and abiotic factors to the .4km scale.Although our simulations did not include these finer resolutions, our results are consistent where comparisons were possible.Coincident aggregation patterns have also been reported in studies of the Indian mistletoe Taxillus tomentosus (Lemieux et al. 2011).
The fact that parameters of the extinction rate did not show significant change across resolutions is consistent with the relative stability of scenopoetic variables across a broad set of scales (Sobero ´n 2007).Scenopoetic variables are related to the abiotic fundamental niche, an adaptive feature related to physiology (James et al. 1984, Fig. 4. Proportion of area predicted as suitable summarized by (a) model or algorithm type, (b) predictor variables, and (c) resolution.Suitable area was considered after converting models into binary predictions using a minimum training presence threshold approach.v www.esajournals.orgJackson and Overpeck 2000).In this case, climate is relevant because it adjusts species' survival limits in each cell of the grid.Overall, what is apparently changing with scale are factors associated with dispersal of the species, as seen in the colonization part of this model, whereas climate is a constant determinant of the species' range across scales.These results support the hypotheses proposed with the BAM diagram in the conceptual framework (Fig. 2).
Process-based models, which are much more time-consuming to implement than correlative models, are nevertheless potentially very informative.After comparing the models generated in this exercise, it is evident that incorporating colonization and extinction processes within a metapopulation framework sheds light on the importance that factors related to dispersal may have at finer scales for this species (Overton 1994, 1996, Aukema 2004, Lemieux et al. 2011), and that more stable factors like climate either remain constant across resolutions or become more important in controlling the distribution of the species as the resolution becomes coarser.
Using correlative methods, we were able to perceive differences in contributions of variables as a function of changes in resolution, as other authors have shown (e.g., Buckley et al. 2010).However, interpretation of the roles of different variables, not being linked to mechanisms, was not straightforward, and potentially would lead to interpretations very different from what we observed using the process-based model (Cabral and Schurr 2010).Arau ´jo and Luoto (2007) used GAMs to test whether biotic interactions played a role at macroscales for the butterfly Parnassius mnemosyne: they found that the type of interaction (contingent on species) and methods considered played significant roles in explaining distributions of species at coarse scales.Other Fig. 5. Summary of degree of coincidence or overlap between pairs of maps (45 in total) using fuzzy Kappa statistics (Hagen 2003).Fuzzy Kappa shows values greater than 0 when maps have a large coincidence between areas predicted as suitable or non-suitable, and values less than 0 (either suitable or not) when maps show less contiguous more disjoint predictions.
v www.esajournals.orgstudies using correlative approaches have shown improvement in predictions of species' distributions when resource-related variables are incorporated (Heikkinen et al. 2007, Real et al. 2009, Wisz et al. 2012).In our exercise, when we used correlative models, we also found that bioticrelated predictors play a role at coarse scales, but were most relevant at scales finer than the macroscale of Arau ´jo and Luoto (2007).These authors worked only at a scale similar to the coarsest of our analysis (50 km); they might have detected additional patterns if their analysis had extended to finer resolutions.Their results match ours at the coarsest resolution if we consider both disperser and host layers as part of the mistletoe's biotic interactions: in our case, the disperser layer showed some explanatory power, although not very significant, whereas, in Arau ´jo and Luoto's example, the host plant distribution of the butterfly did not improve results drastically, but rather echoed climate signals (Figs. 3 and 4).Although these results from correlative models    (Peterson et al. 2007), in this case, it also shows how combinations of abiotic and dispersal-related predictors restrict the area predicted as suitable.An improvement on this exploration would be to test for significance of explanatory power of other (less important) dispersers and perhaps different suites of hosts in the model.
It is interesting to notice that the different modeling approaches produced predictions with only fair to good agreement.One possible explanation for such lack of coincidence is that the climate-only predictions agree closely with the distribution of the disperser, which is the element of the process-based model that covers the entire extent moving the parasite across cells in search for suitable conditions for its establishment (M; Barve et al. 2011).However, in the metapopulation framework of the process-based model, the disperser is just one part of the formula, and how the mistletoe reaches its distributional limits depends on the balance between colonization and extinction.Assuming that the disperser actually carries mistletoe seeds all across the study extent and that hosts are present, our study suggests that the mistletoe is able to reach and potentially establish in areas more distant than the known geographic range edges; however, its populations are restricted to an area smaller than its ''accessible area,'' suggesting that other factors (including climate) constrain its survival.In a more realistic processbased modeling approach, actual seed processing times and dispersal distances could be incorporated (e.g., Overton 1996, 1997, Liu et al. 2011).
The fact that the species is not present in the northwestern and southeastern parts of the disperser's range suggests that the species is  (Dimmit 2000).Although our exercise is realistic according to what we know about the species' distribution and how biotic and abiotic factors operate across scales, it carries some significant assumptions and limitations.First, the environmental predictors we used and the way they were developed is far from ideal.We chose predictors that are likely important to the species, but generating them and matching their resolutions was not always possible.In particular, this concern applies to the disperser layer: it was originally created at a spatial resolution of 461 km 2 (Sauer et al. 2011), but data availability considerations prevented us from scaling it directly to other resolutions.Scaling it to high resolutions requires a larger density of survey routes than is actually available; as a consequence, we used this single resolution for the first four resolutions, and generalized it still more for the coarsest.
Other bioclimatic variables could have been chosen, such as those associated with extremes of temperature and precipitation; however, in tests using other sets of climate variables and the entire set of 19 bioclimatic variables of Hijmans et al. (2005), neither the process-based model nor the correlative approach appeared to benefit from such additional information (data not shown).To check this point more thoroughly, we tested the performance and geographic predictions of correlative models with two other environmental layers that were calculated using daily weather stations information for a period of 20 years: number of continuous days of freezing temperatures and number of continuous rainless days.However, these models did not produce results markedly better than annual mean temperature and annual precipitation visually, nor did models increase in performance, which suggests that different trends would not result in either the model prediction or in model performance.
For the process-based model, it would be possible to create and explore other colonization and extinction functions.In particular, the extinction function used was a simple exponential of the difference between the estimated optimal niche and the environments of a given cell: clearly, more complex may prove more appropriate (Martı ´nez-Meyer et al. 2013).However, based on extensive initial exploration and simulations (not shown), in which our basic results were supported, it is not likely that major differences and conclusions from our current findings would emerge.
Other types of information could be incorporated in new models, as well as in experiments that could analyze details of mistletoe population dynamics in different parts of the range.For example, numbers of viable seeds dispersed by Phainopepla nitens and other bird species may indicate regions with unfavorable niche conditions and sink population dynamics, or perhaps even other physiological mechanisms of hostparasite recognition.Other studies of such dynamics in P. californicum have been carried out at much finer resolutions than ours, and are very illuminating when compared with our macro-geographic approximations (e.g., Overton 1994, 1996, Aukema and Martı ´nez del Rio 2002, v www.esajournals.orgLiu et al. 2011).From an ecological point of view, for example, these analyses would allow exploring whether abundance, genetic variation, population differentiation, or gene flow vary as functions of centrality within the geographic range (Thuiller et al. 2010, Torres et al. 2012, Martı ´nez-Meyer et al. 2013) or the ecological niche of the species, thus presenting different opportunities for local optima and associated adaptation (VanDerWal et al. 2009, Martı ´nez-Meyer et al. 2013).

Derivation of the formulas for the process-based model
We modeled population dynamics of mistletoes using formalism borrowed from metapopulation theory.We consider a grid of cells of maximum resolution, corresponding to the typical area under an average-sized legume tree, on the order of 20 m 2 .We now construct coarserresolution grids that contain maximum-resolution cells, asking for the proportion of the coarseresolution cells that is occupied by infected trees (number of high-resolution infected cells/total number of cells in the coarse-resolution unit), that is, by trees with at least one individual of mistletoe.Let i ¼ 1, 2, 3 . . .n denote the coarseresolution cells in the grid.Then the following equation is proposed (Hanski 1999): where p i is the probability of species presence in cell i, dp i /dt is its time derivative; c i is the colonization rate in cell i and e i is the extinction rate in cell i.In words, the growth rate of the proportion of space occupied by the mistletoe in the i-th cell increases proportionally to a coloni-zation rate c i and in inverse proportion to an extinction rate e i .Now, imagine that the grid is coarsened, such that inside a new larger cell J (we use a capital letter to denote an aggregate of smaller cells) is a collection of the higher-resolution cells i.This can be achieved by using square cells and doubling the side of the cells in the grid: one doubling contains 4 smaller cells, another doubling contains 16 smaller cells, and so on.
We are interested in the mean value of the occupied proportion of a cell: and therefore: Now, we make the reasonable assumption that, inside every larger cell J, the values of colonization rate and extinction rates can be approximated as their averages over the smaller cells: v www.esajournals.orgd" p J dt ¼ 1 jJj X i2J c i À X i2J c i p i À X i2J e i p i !¼ " c J À " c J " p J À " e J " p J ; which, in the steady state, implies: The equations for the colonization and extinction rates are postulated after consideration of the BAM diagram: colonization is proportional to the abundance of birds and trees in cell J, and inversely related to the distance to other occupied cells (Hanski 1999): where b 0 and b 1 represent parameters to be fitted, d J,K is a measure of distance (Euclidean, corrected by topography) between cells J and K; b K is the density of birds present in cell K, and L K is the number of trees in cell K.These three last quantities are obtained from data (see Methods for more detail).
The extinction rate in cell J is assumed to be a function of the distance from the centroid of the fundamental niche of the species to the environmental data in the cell, as represented in an ellipsoid: " e J ¼ f ðx J À lÞ T R À1 J À lÞ h i ; where the parameters l and R represent fitted values that define a multi-normal distribution of optimal, ideal environmental preferences.The values of x J represent the observed environmental values in the cell J.

Fig. 1 .
Fig. 1.Author-collected (A.L.-N.) and Internet-located occurrences of P. californicum and extent of study area.The study area corresponds to the distributional area of the principal disperser, assumed to represent accessible areas for the mistletoe (M), for which abundance data are available (thickest black line).Notice there are areas where the disperser or hosts can be present but not the mistletoe; these areas are relevant to understand the interaction of factors that limit the distribution of the species as well as challenging areas in which to test the models.

Fig. 2 .
Fig. 2. Conceptual framework showing (a) the relationship between biotic-abiotic-mobility (BAM) conditions for a species to be present.(b) Interpretation of the BAM diagram in this modeling exercise restricts to the use of abiotic conditions (A) and accessible areas (M), assuming that the interaction between host and parasite is uncoupled, and that the positive interaction between disperser and parasite makes biotic conditions (B) and accessible area (M) coincident.As scale changes, limiting factors in A and M will also change.At higher resolutions, almost the entire area is accessible, and climate is not so limiting, whereas at coarser resolutions dispersal becomes more difficult and climate is more constant.

Fig. 3 .
Fig. 3. Interaction plot of AUC ratios after 1000 iterations in the partial ROC based on independent occurrences of Phoradendron californicum.Differences in model performances are statistically significant (P , 0.001).Processbased model results were included in the set of variables ''climate þ disperser þ host.''

Fig. 6 .
Fig.6.Importance of parameters from the process-based model associated with (1) cost of movement for the disperser across the landscape given by topographic heterogeneity and (2) disperser abundance.

Fig. B1 .
Fig. B1.Example binary maps for different models and sets of environmental predictors at the resolutions of 5 and 50 km.Binary predictions were generated using a minimum training presence threshold.

Fig. B2 .
Fig. B2.Example of map comparison with fuzzy Kappa statistic.In (a), note an overall high coincidence between two maps (in this case GARP disperser þ host versus GARP climate þ disperser þ host), and in (b), note an example of low coincidence (GARP climate þ disperser þ host versus Maxent climate) at a resolution of 5 km.

Table 2 .
Pairwise correlations between models after correcting for spatial autocorrelation with Dutilleul method in SAM (see Methods for details).g ¼ GARP, gm ¼ GAM, m ¼ Maxent, process ¼ metapopulation process-based model; c ¼ climate, d ¼ disperser, h ¼ host.

Table 3 .
Departure from zero (t-test) of colonization parameters for disperser movement cost across the landscape given topographic heterogeneity and disperser abundance.

Table 5 .
Variable importance in correlative models calculated from jackknifing permutations in GARP and Maxent, and from significance values in GAM.