|
|
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. A relational approach to split decomposition. In
H.-H. Bock,
W. Lenski and
M. M. Richter editors, Information Systems and Data Analysis, Proceedings of the 17th Annual Conference of the Gesellschaft Für Klassifikation (GFKL93), Vol. 42:123-131 of Studies in Classification, Data Analysis, and Knowledge Organization, springer, 1994. Keywords: characterization, from quartets, phylogenetic network, weakly compatible.
|
|
|
|
|
Hans-Jürgen Bandelt. Phylogenetic Networks. In Verhandlungen des Naturwissenschaftlichen Vereins Hamburg, Vol. 34:51-71, 1994.
|
|
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. An order theoretic framework for overlapping clustering. In DM, Vol. 136(1-3):21-37, 1994.
Toggle abstract
"Cluster analysis deals with procedures which - given a finite collection X of objects together with some kind of local dissimilarity information - identify those subcollections C of objects from X, called clusters, which exhibit a comparatively low degree of internal dissimilarity. In this note we study arbitrary mappings φ which assign to each subcollection A ⊆ X of objects its internal degree of dissimilarity φ (A), subject only to the natural condition that A ⊆ B ⊆ X implies φ (A) ̌ φ (B), and we analyse on a rather abstract, purely order theoretic level how assumptions concerning the way such a mapping φ might be constructed from local data (that is, data involving only a few objects at a time) influence the degree of overlapping observed within the resulting family of clusters, - and vice versa. Hence, unlike previous order theoretic approaches to cluster analysis, we do not restrict our attention to nonoverlapping, hierarchical clustering. Instead, we regard a dissimilarity function φ as an arbitrary isotone mapping from a finite partially ordered set I - e.g. the set P(X) of all subsets A of a finite set X - into a (partially) ordered set R - e.g. the nonnegative real numbers - and we study the correspondence between the two subsets C(φ) and D(φ) of I, formed by the elements whose images are inaccessible from above and from below, respectively. While D(φ) constitutes the local data structure from which φ can be built up, C(φ) embodies the family of clusters associated with φ. Our results imply that in case I: = P(X) and R: = R≥0 one has # D ̌ n for all Dε{lunate}D(φ) and some fixed nε{lunate}N if and only if{A figure is presented} for all C0,..., Cnε{lunate}C(φ) if and only if this holds for all subsets C0,..., Cn ⊆ X, generalizing a well-known criterion for n-conformity of hypergraphs as well as corresponding results due to Batbedat, dealing with the case n = 2. © 1994."
|
|
|
|
|
|
|
Jotun Hein. A heuristic method to reconstruct the history of sequences subject to recombination. In JME, Vol. 36(4):396-405, 1993. Keywords: explicit network, from sequences, heuristic, parsimony, phylogenetic network, phylogeny, Program RecPars, recombination, recombination detection, software. Note: http://dx.doi.org/10.1007/BF00182187.
|
|
|
|
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. A canonical decomposition theory for metrics on a finite set. In Advances in Mathematics, Vol. 92(1):47-105, 1992. Keywords: abstract network, circular split system, from distances, split, split decomposition, split network, weak hierarchy, weakly compatible.
Toggle abstract
"We consider specific additive decompositions d = d1 + ... + dn of metrics, defined on a finite set X (where a metric may give distance zero to pairs of distinct points). The simplest building stones are the slit metrics, associated to splits (i.e., bipartitions) of the given set X. While an additive decomposition of a Hamming metric into split metrics is in no way unique, we achieve uniqueness by restricting ourselves to coherent decompositions, that is, decompositions d = d1 + ... + dn such that for every map f:X → R with f(x) + f(y) ≥ d(x, y) for all x, y ε{lunate} X there exist maps f1, ..., fn: X → R with f = f1 + ... + fn and fi(x) + fi(y) ≥ di(x, y) for all i = 1,..., n and all x, y ε{lunate} X. These coherent decompositions are closely related to a geometric decomposition of the injective hull of the given metric. A metric with a coherent decomposition into a (weighted) sum of split metrics will be called totally split-decomposable. Tree metrics (and more generally, the sum of two tree metrics) are particular instances of totally split-decomposable metrics. Our main result confirms that every metric admits a coherent decomposition into a totally split-decomposable metric and a split-prime residue, where all the split summands and hence the decomposition can be determined in polynomial time, and that a family of splits can occur this way if and only if it does not induce on any four-point subset all three splits with block size two. © 1992."
|
|
|
|
|
|
|
Jotun Hein. Reconstructing evolution of sequences subject to recombination using parsimony. In MBIO, Vol. 98(2):185-200, 1990. Note: http://dx.doi.org/10.1016/0025-5564(90)90123-G.
Toggle abstract
"The parsimony principle states that a history of a set of sequences that minimizes the amount of evolution is a good approximation to the real evolutionary history of the sequences. This principle is applied to the reconstruction of the evolution of homologous sequences where recombinations or horizontal transfer can occur. First it is demonstrated that the appropriate structure to represent the evolution of sequences with recombinations is a family of trees each describing the evolution of a segment of the sequence. Two trees for neighboring segments will differ by exactly the transfer of a subtree within the whole tree. This leads to a metric between trees based on the smallest number of such operations needed to convert one tree into the other. An algorithm is presented that calculates this metric. This metric is used to formulate a dynamic programming algorithm that finds the most parsimonious history that fits a given set of sequences. The algorithm is potentially very practical, since many groups of sequences defy analysis by methods that ignore recombinations. These methods give ambiguous or contradictory results because the sequence history cannot be described by one phylogeny, but only a family of phylogenies that each describe the history of a segment of the sequences. The generalization of the algorithm to reconstruct gene conversions and the possibility for heuristic versions of the algorithm for larger data sets are discussed. © 1990."
|
|
|
|
|
Hans-Jürgen Bandelt and
Andreas W. M. Dress. Weak hierarchies associated with similarity measures: an additive clustering technique. In BMB, Vol. 51:113-166, 1989. Keywords: abstract network, clustering, from distances, from trees, phylogenetic network, phylogeny, Program WeakHierarchies, reconstruction, weak hierarchy. Note: http://dx.doi.org/10.1007/BF02458841.
Toggle abstract
"A new and apparently rather useful and natural concept in cluster analysis is studied: given a similarity measure on a set of objects, a sub-set is regarded as a cluster if any two objects a, b inside this sub-set have greater similarity than any third object outside has to at least one of a, b. These clusters then form a closure system which can be described as a hypergraph without triangles. Conversely, given such a system, one may attach some weight to each cluster and then compose a similarity measure additively, by letting the similarity of a pair be the sum of weights of the clusters containing that particular pair. The original clusters can be reconstructed from the obtained similarity measure. This clustering model is thus located between the general additive clustering model of Shepard and Arabie (1979) and the standard hierarchical model. Potential applications include fitting dendrograms with few additional nonnested clusters and simultaneous representation of some families of multiple dendrograms (in particular, two-dendrogram solutions), as well as assisting the search for phylogenetic relationships by proposing a somewhat larger system of possibly relevant "family groups", from which an appropriate choice (based on additional insight or individual preferences) remains to be made. © 1989 Society for Mathematical Biology."
|
|
|
|
|
Alain Guénoche. Graphical Representation of a Boolean Array. In Computers and the Humanities, Vol. 20(4):277-281, 1986. Keywords: from splits, median network, reconstruction. Note: http://dx.doi.org/10.1007/BF02400118.
Toggle abstract
"In this paper, we represent a boolean array of data with a median connected graph. Vertices are the different lines of the array plus virtual monomials, and an edge links two vertices that are different for only one variable. We describe an algorithm to compute this graph, that is an exact representation of the symmetrical difference distance between lines, and we show an application to Bronze age pins. © 1986 Paradigm Press, Inc."
|
|
|
Ingo Althöfer. On optimal realizations of finite metric spaces by graphs. In Discrete and Computational Geometry, Vol. 3(1):103-122, 1986. Keywords: NP complete, optimal realization, realization. Note: http://dx.doi.org/10.1007/BF02187901.
Toggle abstract
"Graph realizations of finite metric spaces have widespread applications, for example, in biology, economics, and information theory. The main results of this paper are: 1. Finding optimal realizations of integral metrics (which means all distances are integral) is NP-complete. 2. There exist metric spaces with a continuum of optimal realizations. Furthermore, two conditions necessary for a weighted graph to be an optimal realization are given and an extremal problem arising in connection with the realization problem is investigated. © 1988 Springer-Verlag New York Inc."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Richard R. Hudson. Properties of the neutral allele model with intragenic recombination. In TPP, Vol. 23:183-201, 1983. Keywords: coalescent. Note: http://dx.doi.org/10.1016/0040-5809(83)90013-8, see also http://www.brics.dk/~compbio/coalescent/hudson_animator.html.
Toggle abstract
"An infinite-site neutral allele model with crossing-over possible at any of an infinite number of sites is studied. A formula for the variance of the number of segregating sites in a sample of gametes is obtained. An approximate expression for the expected homozygosity is also derived. Simulation results are presented to indicate the accuracy of the approximations. The results concerning the number of segregating sites and the expected homozygosity indicate that a two-locus model and the infinite-site model behave similarly for 4Nu ≤ 2 and r ≤ 5u, where N is the population size, u is the neutral mutation rate, and r is the recombination rate. Simulations of a two-locus model and a four-locus model were also carried out to determine the effect of intragenic recombination on the homozygosity test ofWatterson (Genetics 85, 789-814; 88, 405-417) and on the number of unique alleles in a sample. The results indicate that for 4Nu ≤ 2 and r ≤ 10u, the effect of recombination is quite small. © 1983."
|
|
|
|
|