Biodiversity Management Under Limiting Conditions : Estimating Effective Population Size Using the Molecular Mark and Recapture ( MMR ) Method

Although many people have been paying attention to the decrease of biodiversity on earth in recent years, many local people, even staff of national parks, live under limiting conditions (such as a shortage of funds, specialists, literature, equipment for experiments and so on). To conserve biodiversity, it is important to be clear about which species decrease or increase. To find such information, it is quite important to know the dynamics of effective population size for each species. Although a large number of papers have been written about how to improve the precision of the estimated effective population size, little has been studied on how to estimate the dynamics of the effective population sizes for many species together under limiting situations, very similar to the management methods of national parks in countries which have biological hot spots. In this paper, we are not concerned with the improvement of the precision of the estimates. We do, however, propose a simple method for the estimation of the effective population size. We named it the “MMR method.” It is not difficult to understand and is easily applied to many species. To show the usefulness of the MMR method we made simple virtual species, which included the first generation and the second generation, on a computer, and then we conducted simulations to estimate the effective population size of the first generation. We calculated three statistics to estimate whether the MMR method is useful or not. The three statistics showed that the MMR method is useful.


INTRODUCTION
A large number of studies of the methods for conservation and management of biological resources have been made on the methods used on one specific or a few target species (e.g.Nielsen et al. 2001;Frankham et al. 2007).Such studies needed measurements of prior information (such as reproduction patterns, mortality etc.) of target species, and constructions of specific models of the target species.As a result, these kinds of previous studies needed not only an expert, but also a lot of time, labor, and money.This kind of biological resource management is impractical when people have to manage a lot of species at the same time within limited budgets.Although people have been emphasizing the need for biodiversity management for a long time, little is known about a method of biodiversity management that can be implemented under the constraints of a limited budget.We think using a simplified method of estimating population dynamics is one of the best ways when low cost is a key factor.However, little attention has been given to the point by bioinformaticians.In addition, no studies have ever tried to investigate the effectiveness of a simplified method to estimate population dynamics using bioinformatics.
To solve this problem, we paid attention to the importance of the "order of magnitude" level information (for example, when we want to know the number of individuals).In fact, when people have to manage biodiversity within a limited budget, we want to know only "order of magnitude" numbers (like 100 or 1000); not too much precision (for example, 1011 individuals or 1014 individuals).We developed a simplified method to estimate population dynamics, which has the potential to be used for many species, and thus nullifying the need to construct species-specific models.
So, what properties are needed in the simplified method to estimate population dynamics within limited budgets in order to maintain many species?
(1) The method should be simple and easy.
Sometimes people, such as local people in the national parks in tropical areas, ask us: "How do we maintain the biodiversity in this area without a professional book on taxonomy, a university, a research institute, a specimen room and a specialist?"How many researchers can answer this question?Not only in developing countries, but also in developed countries we have to depend on local people for the maintenance of biodiversity where there are no specialists or researchers stationed.Examples of this can be found in mountain villages or isolated islands.Thus, the method to estimate population dynamics must be simple and easy for local people.It will be more important for long-term biodiversity management that local people treat biodiversity management as their own problem or challenge.If local people can understand the simplified method to estimate population dynamics, the method will contribute to spreading the mindset amongst them that "we will conduct sampling by ourselves to maintain the biodiversity in this area."Moreover, if the method is simple, there are many advantages, such as applying the simple method to regionally specific assignments, and enhancing environmental education so that there can be a commonly held consciousness about the local nature and environment.
(2) The method should be based on the assumption that molecular data analysis is used.
The use of molecular data to estimate population dynamics is becoming an important tool in molecular ecology.However, to understand this study area, especially paternity, advanced statistical backgrounds are usually required (e.g.Foltz & Hogland 1981;Amos et al. 1993;Clapham & Palsbøll 1997;Nielsen et al. 2001).
The simple traditional methods for estimating population dynamics is a mark-recapture method was first used for ecological study in 1896 by Petersen;later, Lincoln (1930) independently developed the method to estimate waterfowl populations (Southwood & Henderson 2000).These methods, however, cannot estimate the detailed ecological and lifecycle information.A new method should give us ecological and lifecycle information such as blood relations, sub-population structures, genetic diversity, reproductive patterns, and other structures using molecular data.Our new method set out in this paper pays attention to the estimation of an effective population size only.However, once we get DNA data, we can also estimate other ecological and lifecycle information from that data.Moreover, it has the advantage of using molecular data for the simple estimation of population dynamics.Other specialists can reanalyze the data in detail when someone finds some problem or trend using the new simple method.
We think that if local people know the importance of sampling on the assumption of molecular data analysis, many people will be able to grasp how important the samples are and what valuable information is included in them, which have been largely dismissed until recently.For example, how many local people know that if they only conduct random samplings once or twice (for example, before and after making a road), that they can know the extent of genetic diversity lost and the decrease in population size, etc.?
In recent years, the cost of obtaining molecular data from samples has been decreasing remarkably.Ordinary people can get the molecular data by offering a commission to a company to perform laboratory work.However, to date, there is little study on a sampling strategy for local people based on the assumption of molecular data information.This study is an initiative to study sampling strategy on the assumption of molecular data analysis.
(3) The method should be used across generations.
A long-term perspective is essential: it is important for biodiversity maintenance to know whether an effective population size decreases or increases across several generations.Our method as set out in this paper can be used not only between two generations but also across many generations such as grandparents, parents, children and grandchildren generations by applying our simple method to each adjacent generation.Thus, this method can also be used for estimating simple population dynamics as long as we prepare for more than two generations.
(4) The method should allow use of samples that had been collected for other purposes.
It is better that the new method is used to contribute to resource management not only for in the future but also for analyzing data from the past.The new method should estimate the effective population size with samples that had already been collected for other purposes (for example, to know the mating season of one insect species, researchers had collected the insect species by light traps monthly over several years).In most studies, such samples are thrown away after those samples are used only once for only one purpose.In the above example case, we lose information not only of a target species but also of other insect species, which had also been collected by light traps.Even though studies that concentrate their efforts on one species have invested substantial amounts of money (for example, studies of deer), and there are many samples in museums or research institutes for ten years or more, the samples have not been used to estimate population dynamics effectively in the past.We think some samples, which had been collected for other purposes, can be used in our new method.
(5) The method should be used not only in the biodiversity field but also the medical field.
We add one more paragraph to focus on the medical field, because it is important to say that our new method is also useful in the medical field.Especially in developing countries with biodiversity hotspot areas, it is not easy to survey the dynamics of many serious diseases under the constraints of a limited budget.Basically, the new method is broadly applicable to many diseases such as insect-borne diseases, hereditary diseases and infectious diseases.
(6) The method should be applicable to limited populations.
In the study area of statistics, there are a lot of cases that tend to use an infinite population, as it is easier than using a limited population.However, most national parks, protected areas, and primary forests are found only in limited areas.We therefore make the qualification that a limited population be assumed.
In conclusion, with this study we proposed a simple method (we named this method the MMR method) which combined the above six properties for estimating effective population size.Moreover, we showed the usefulness of the MMR method.When showing the usefulness of the new method, gathering field data is not better because field samples include many kinds of environmental effects and species-specific life histories, and we cannot test repeatedly whether the MMR method is useful or not.
Thus, we used the bioinformatics method to test whether the MMR method is useful or not.First, we made two generations of simple virtual species in computer.Next, we collected some samples from each generation using simulation programs that we had made.Finally, we compiled three statistics that were calculated from results of the simulations to investigate whether the MMR method was useful or not.Moreover, we discussed the cost and benefits of the sampling effort, especially the value of the contribution in improving estimation precision when we add one more sample.

MATERIALS AND METHODS
First, we describe a principle of the MMR method.MMR is the abbreviation of "Molecular Mark and Recapture".Next, we describe estimation methods of the MMR method using the three statistics that we had made.Third, we wrote our material which is simple virtual species in a computer.Finally, we show the results of the three experiments (the three simulations) using the MMR method.

The principle of the MMR method
We want to estimate an effective population size of the first generation, but we can get only two sample sets from the first generation (F 1 ) and the second generation (F 2 ).Thus, we estimated an effective population size from samples of two generations using formula (1), (1) where  g1 is the estimated effective population size of the first generation, n g1 is the number of the sampled first generation, n g2 is the number of the sampled second generation and m g2 is the number of the second generation whose parent is found in the first generation (Fig. 1).
The probability distribution of the MMR method is hypergeometric distribution because this method is based on sampling without replacement (like ordinary mark-recapture method).
In the case of the ordinary mark and recapture method, we must conduct sampling twice in field.In the case of the MMR method, if we can collect two generations together, we need collect samples only once because the MMR method uses DNA information.
In statistics, "point estimates" is a single value given as the estimate of a population parameter that is of interest, in our case an estimated effective population size of the first generation."Interval estimates" specifies instead a range within which the parameter is estimated to lie.The MMR method can give not only point estimates but also interval estimates.In this paper, however, we do not discuss the interval estimates because the most important aim in this paper is to show the principle of the MMR method simply.
Estimation methods of the MMR method using three statistics We calculated the three statistics to estimate whether the MMR method is useful or not.The first statistic is the mean value of the estimated effective population size of the first generation.
The second statistic is the range of  g1 (i.e.[ g1 ]SD), which is obtained by substituting (2) into m g2 of the equation (1), SD where 2 g m is the mean of the number of the sampled second generation and SD is the standard deviation.
The third statistic is the quartile points (upper quartile and lower quartile) of the estimated effective population size of the first generation.

A simple virtual species: two generations, 100 individuals in each generation
We made a simple virtual species which had 100 adult individuals of the first generation (or parent generation).This establishment meant that the effective population size of the first generation was 100, and this number is an answer that we want from fewer samples.

The simulation experiments
In all experiments, at first, we collected some samples through random sampling from the first generation.Next, we also collected some samples through random sampling from the second generation.Finally, we estimated the effective population size of the first generation by the MMR method using virtual microsatellite data of the two sampling sets.We conducted the three steps 1000 times under the same condition.
When we could not find any parents in samples of the second generation, we denoted the situation as 'Inf.' , which is the abbreviated form of Infinite (i.e.unmeasurable).

Experiment 1
In experiment 1, we set a strict standard that the number of samples was not enough when a simulation set, which was 1000 repetitions, had even one Inf.To know this strict standard sample number, we counted the number of Inf in the 1000 times repetition.We collected the same number of samples (such as 10, 20, 30, 40, 50, 60, 70, 80, 90) from each generation.We estimated one effective population size of the first generation from one sample set of the first generation and one sample set of the second generation.We collected all samples randomly.

Experiment 2
From experiment 1, the number with no occurrence of Inf.among 1000 repetitions was 30 individuals.Thus, in experiment 2, we fixed the sample number of the first generation at 30 individuals.And we also collected 10,20,30,40,50,60,70,80,90 individuals from the second generations.We estimated one effective population size of the first generation from one sample set (30 individuals) of the first generation and one sample set (such as 10, 20, 30 … individuals) of the second generation.We collected all samples randomly.

Experiment 3
In experiment 3, we fixed the sample number of the second generation at 30 individuals.And we also collected 10,20,30,40,50,60,70,80,90 individuals from the first generations.We estimated one effective population size of the first generation from one sample set (such as 10, 20, 30 …individuals) of the first generation and one sample set (30 individuals) of the second generation.We collected all samples randomly.

Experiment 1
The mean, which is the first statistic, showed around the expected value (100 individuals), independently of the number of samples (Table 1).The second statistic showed when the sample number was 10, the range was wide (54.05125 -1220.94065),whereas when the sample number was 20, the range reduced suddenly (70.52745 -163.23536)(Table 1).The third statistic showed when the sample number was 10, it was not possible to determine a specific range (50 -Inf.) because there were many Inf, whereas when the sample number was 20, the range showed around an expected value (80.00 -133.30)(Table 1).
Table 1.The three statistics of experiment 1 of the virtual DNA data by simulation.The number of sampling from first generation is 'N.1st g.' .The number of sampling from second generation is 'N.2nd g.' .The number of repetitions of the simulation is 'N.repetition' .The mean value of the estimated effective population size of the first generation (the first statistic) is g1.The standard deviation of g1 (the second statistic) is [g1]SD.The upper quartile and lower quartile of the estimated effective population size of the first generation (the third statistic) is 'Upper -Lower quartile' .If we could not measure g1 after 1000 times, we denoted the situation as 'Inf.' , which is the abbreviated form of Infinite (i.e.unmeasurable).When the sample number was 10, the number of Inf. in the 1000 repetitions was 339.When the sample number was 20, the number of Inf. was 10.When the sample number was more than 30, the number of Inf. were 0 (Table 1).
We illustrated the same result of experiment 1 as histograms (Fig. 2).The distributions were a discrete distribution.Fig. 2 shows that as the sampling number increased, the tail of the distribution shortened.In the case of more than 50 samples, the distributions were like a normal distribution and the degree of concentration was extremely high around an expected value (100) (Fig. 2).

Experiment 2
The first statistic showed around an expected value (100), independently of the number of samples (Table 2).The second statistic also showed around an expected value (67.64291 -176.16536) when the sample number of the second generation was 10 or higher.As the number of samples increased, the Fig. 2 The histograms of the estimated effective population size using the same number of samples (10,20,30,40,50,60,70,80,90) from each generation in computer simulations.The upper right number of each graph is the number of samples.Table 2.The three statistics of experiment 2 of the virtual DNA data by simulation.The number of sampling from first generation is 'N.1st g.' .The number of sampling from second generation is 'N.2nd g.' .The number of repetitions of the simulation is 'N.repetition' .The mean value of the estimated effective population size of the first generation (the first statistic) is g1.The standard deviation of g1 (the second statistic) is [g1]SD.The upper quartile and lower quartile of the estimated effective population size of the first generation (the third statistic) is 'Upper -Lower quartile' .If we could not measure g1 after 1000 times, we denoted the situation as 'Inf.' , which is the abbreviated form of Infinite (i.e.unmeasurable).
N. 1st g.N. 2nd g.N. repetition g1 [g1]SD range reduced (Table 2).The third statistic also showed around an expected value (75.00 -150.00) when the sample number was 10 or more.As the number of samples increased, the range reduced (Table 2).There was no Inf.except in the case where the number of samples was 10.We illustrated the same results of experiment 2 as histograms (Fig. 3).The distributions were a discrete distribution.Fig. 3 shows that as the sampling number increased, the tail of the distribution shortened.In the case of more than 50 samples, the distributions were like a normal distribution and the degree of a concentration was extremely high around an expected value (100), however the degree of concentration was lower than experiment 1 (Fig. 3).

Experiment 3
The first statistic showed around an expected value (100), independently of the number of samples (Table 3).The second statistic also showed around an expected value (68.8818 -189.3087)from when the sample number of the first generation was 10.As the number of samples increased, the range reduced (Table 3).The third statistic also showed around an expected value (75.00 -150.00)from when the sample number was 10.As the number of samples increased, the range reduced (Table 3).There was no Inf.except the case of the number of samples was 10.
We illustrated the same result of experiment 3 as histograms (Fig. 4).The distributions were a discrete distribution.Fig. 4 showed that as the sampling number increased, the tail of the distribution shortened.In the case of more than 50 samples, the distributions were like a normal distribution and the degree of a concentration was extremely high around an expected value (100), however the degree of concentration was lower than experiment 1 (Fig. 1, 4).

DISCUSSION
Our results showed the estimated three statistics were stable around the correct answer (100 individuals) in all of the three simulations.Thus, the MMR method gave unbiased estimated values.These results indicated the MMR method was one of the best methods to roughly estimate the effective population size, especially in cases of the management of biodiversity within the constraints of a limited budget.
As the number of samples increased, the precision of the estimated effective population size of the first generation also increased in all three of the simulations.This result is consistent with the classical sampling theories (Petersen 1896; Lincoln 1930).Our results also indicated that when we collect samples, we should collect samples impartially, i.e., equally, between the two generations (for example, samples of 20 individuals from each generation is better than samples of 10 individuals from one generation and 30 individuals from another generation).However, in some cases, it is difficult to collect exactly the same number of samples from each generation.Which generation's sample is more informative to estimate the effective population size in such cases?The results of experiment 2 and experiment 3 showed that the effect of increased samples was almost the same between the two generations.This result indicated that the value of samples of both generations is the same if we collect samples randomly.This result also indicated that when it is difficult to collect the same number of samples from each generation, it is preferable to collect more samples from one generation so as to increase the precision of the effective population size (provided that we collect the samples randomly), rather than halting the collection of samples for the sake of impartiality, i.e., numerically equal samplings across the two generations.
Regarding optimal sampling strategy, our results showed that when the number of samples was increased extensively, the precision of the estimated effective population size of the first generation did not increase appreciably (Fig. 2, 3, 4).For example, let's see the width of the value of ([ g1 ]SD) in experiment 1.When the number of samples was increased from 10 to 20, the precision of the estimates increased almost 10 times.However, when the number of samples was increased from 80 to 90, the precision was almost the same, even though the increase in the number of samples was the same as the previous increase (10 individuals).In field researches, many people collect excessive samples because they do not know an efficient number of samples with which to estimate some ecological information from their target species.Because the MMR method can estimate the effective population size with a small number of pre-samples, less sampling is required.
Our results in experiment 1 showed that when the number of samples (the first generation and the second generation) was 20 individuals, there was not any order-level failure of estimation among 1000 replicates.This result indicated that 20% of the total sample is one of the objective percentages to collect in cases of rare species.Our results in experiment 1 also showed that when the number of samples (the first generation and the second generation) was 30 individuals, there was not any Inf.among 1000 replicates.This result indicated that 30% of the total sample is one of the objective percentages to collect in cases of management of biodiversity under a limited budget.
Our results in experiment 1 and 2 showed that when the number of samples of one generation was 20 and the number of samples of another generation was 30, there was not any Inf.among 1000 replicates.This result indicates that 30% of the one generation and 20% of another generation is also one of the objective percentages to collect in the case of the management of biodiversity within the limits of a budget and/or time.(In the field, however, if you can, it is better to collect more specimens from adult generations to avoid genealogical problems; for example, some people may collect samples which are offspring generation from only one or few families.) We think we can use this MMR method to estimate the effective population size and the population dynamics from certain samples which were already collected for other research purposes.For example, many colonies of one social ant species were already collected over 10 years to study their life history including their social structure, mating strategy and so on.These ant samples can be used to understand population dynamics across a 10 year period using the MMR method.Another example: Many samples of some harmful animals such as deer and bears were collected for several years to maintain low density.These samples of the harmful animals can also be used to understand their population dynamics and the level of effectiveness of the sampling.
It is also necessary to consider the relation between cost (the sampling effort, money spent and so on) and benefit.In conclusion, the MMR method is less expensive than other standard methods.In the case of the study of wild bears, researchers use DNA information from hair traps and software "MARK" to estimate the effective population size (e.g.Boulanger 2002).The cost of this method is more than 10 times as high as the MMR method.Because there were many hairs in a trap, and researchers need to continue collecting hair many times from each trap.It is difficult to raise the considerably high budgets needed for this kind of research every year.On the other hand, the MMR method needs only one sample set that is collected at one time (in the case of bears, we can collect two generations together).Moreover, even though in cases of species from which it is dangerous to collect samples without special skills such as hunting and capture techniques, e.g., bears, we can estimate the effective population size of the species using the MMR method easily by collecting fresh dung from both adult and young bears in the target study site.Thus all the costs, such as the field sampling effort, experimental effort in a laboratory, the cost of chemical reagents and so on, of the MMR method is very low.In cases of this type of species which attracts people's attention, we can construct a new management scheme to combine the MMR method and the standard method (for example, we change the two estimate methods depending on the year).
In this study, we used a virtual population to illustrate the principle of the MMR method and the results of the simulations simply.However, Can we apply the MMR method to real life?Our answer is, needless to say: "Yes".Even though unique species have some specific life history, we can use the MMR method through small improvements or adjustments.For example, Murase applied the MMR method to social insects (K.Murase, unpublished data).
Finally, we want to emphasize the point that we hope that local people who live under limited conditions (such as a shortage of funds, specialists, literature, equipment for experiments, etc. ) will become aware that they can contribute to maintaining biodiversity (sometime the biodiversity is also a a part of local people's living environment).Maintenance of biodiversity requires only samples collected under random sampling.Funds, specialists, literature and equipment are not necessarily needed.Of course, people must record the environmental information (such as the date and time, the locality, whether the sample is from an adult or young specimen) for each sample; and they must know some basic techniques, such as how to preserve the sample for later use (for example, plant leaves must dry before being put into a plastic bag).However, if local people all over the world conduct the management of their local biodiversity using the MMR method, we think it has great potential to enhance management methods of biodiversity on earth.
established the 100 adult individuals of the first generation (or parent generation) made one child individual each of the second generation (or child generation).We also established that one adult individual of the first generation can be identified by one child individual of the second generation by the virtual DNA (microsatellite) data.

Fig. 3
Fig. 3 The histograms of the estimated effective population size using two sample sets which are the 30 individuals from the first generation and 10, 20, 30, 40, 50, 60, 70, 80, 90 individuals from the second generations in computer simulations.The upper right number of each graph is the number of samples of the second generation.

Fig. 4
Fig. 4 The histograms of the estimated effective population size using two sample sets which are 10, 20, 30, 40, 50, 60, 70, 80, 90 individuals from the first generation and the 30 individuals from the second generation in computer simulations.The upper right number of each graph is the number of samples of the first generation.

Table 3
. The three statistics of experiment 3 of the virtual DNA data by simulation.The number of sampling from first generation is 'N.1st g.' .The number of sampling from second generation is 'N.2nd g.' .The number of repetitions of the simulation is 'N.repetition' .The mean value of the estimated effective population size of the first generation (the first statistic) is g1.The standard deviation of g1 (the second statistic) is [g1]SD.The upper quartile and lower quartile of the estimated effective population size of the first generation (the third statistic) is 'Upper -Lower quartile' .If we could not measure g1 after 1000 times, we denoted the situation as 'Inf.' , which is the abbreviated form of Infinite (i.e.unmeasurable).