Open policy decisions ===================== http://www.debian.org/doc/debian-policy/ch-controlfields.html#s5.6.10 - what to do about cyclic dependencies ? Update: addressed in http://www.debian.org/doc/debian-policy/ch-relationships.html#s-binarydeps A cyclic dependency can be bad new or something perfectly normal, depending on how we define the semantics of package A depending on package B, and what policy we adopt with respect to the existence of cyclic dependencies: 1) "B must be installed before A" In this case, a cyclic dependency means that the package in question cannot be installed using the respective sequence of installations. However, this does not mean that no other sequence can exist in which the package could be installed. Example: A depends on B. There are two versions of B: B_0 depends on nothing else while B_1 depends on A. If we try to resolve A's dependency with B_1, we enter a circular dependency and fail. If we use B_0 instead, there is no problem. This means that there are (at least) the following three possible policies: 1A) Cyclic dependencies are tolerated and just mean that the package in question may not be installable (for whatever reason). 1B) A cyclic dependency is always considered an error. 1C) Cyclic dependencies are tolerated as long as there is a way around them, as in the example above. 2) "B must be installed with A" In this case, the cyclic dependency would not be a problem as long as all the packages in the cycle are installed together. Should an installation get interrupted and cause only part of the packages to get installed, the system would then be in an anomalous configuration. If cyclic dependencies are to be interpreted this way, they are not a problem per se. Policy may still discourage their use, though. - what to do if we need something that's "provided" ? Update: "Provides" is described here, but without answering the above question: http://www.debian.org/doc/debian-policy/ch-relationships.html#s-virtual When determining prerequisites, we may encounter a dependency on an item that only appears in the Provides: field of a package but is not an installable package itself. Should we 1) consider installing the package that provides the requested item, or 2) ignore the package, leaving it to the user to choose what to do. 3) if there's only one choice do 1) else do 2). ? Policy 1 would make sense if this is merely an alias or if a package enumerates its constituents, which at some point in time - in the past or in the future - are separate packages. Example: - package "dwarf-pluto" could provide "planet-pluto", for packages that haven't been updated yet, - "binutils" could provide "as", "ld", etc., to allow packages that only need specific parts to depend on them (with the option of breaking binutils into its constituents in the future), - similarly, if "as", "ld", etc., where individual packages in the past but are now combined into "binutils", "binutils" could still provide its constituents for compatibility with packages whose dependencies have not been updated yet. Policy 2 would seem more appropriate in the common case of multiple choices. Example: - packages "emacs" and "vim" could both provide "editor", leaving the choice to the user. - similarly, message packages "foo-en", "foo-zh", etc., could both provide "foo-messages". In the above example, "Provides" could also be use to prioritize choices, e.g., if "foo-en" provides "lang-en" and "foo-zh" provides "lang-zh", future installations could prefer prerequisites that introduce fewer new items. So a package "bar-en" providing "bar-messages" and "lang-en" would be chosen over "bar-zh" providing "bar-messages" and "lang-zh" if we have already installed "foo-en" but not "foo-zh" (or vice versa). Still left to do ================ - make comp_versions work according to http://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version - consider reducing the size of the lists of conflicts, e.g., by making them unique via a red-black tree - handle Provides: Update: Provides data is now parsed and properly integrated in the package database, but not yet used to resolve prerequisites. - sort prerequisites such that they can be installed in the specified order - consider Architecture: Update: we parse and record it now but don't use it yet. - what to do with explicit and implicit replacement ? - if we can't resolve the prerequisites, give at least a hint of what one can do to improve the situation - check database for internal consistency Update: added detection of cyclic dependencies (in progress) Update: added test for QPKG_ADDING cleanup bug - implement keyword search - consider also supporting the similar but not identical (parent ?) format of /var/lib/dpkg/status and /var/lib/apt/lists/*Packages Update: added as much as my Ubuntu system can reach before hitting | Done ==== - optimize the search trees. Right now, we have 81812 calls to make_id for 14601 packages, resulting in 7420560 calls to comp_id. There can be at most 2 new identifiers per package (package name and version), so a perfectly balanced tree should have a depth of no more than 14. If we assume that each call to make_id searches to the bottom, we'd get 1145368 calls to comp_id, about 15% of the current number. So the tree is clearly degenerated. Update: after switching to red-black trees, we get only 1497604 calls to comp_id. This is 130% of the "good case" estimate above. Insertion of a new node is currently done with two lookups, so we'll get rid of some more lookups after further optimization. Update: after merging the two lookups per new node into one, we're at 1172642 calls to comp_id, or 102% of the predicted "good case". - if there are multiple choices, try to prefer more recent versions - check whether introducing a new package would cause a conflict Update: conflicts among the packages considered for installation are now checked. - compile the list of conflicts of installed packages