I. C.
Parmee
Advanced Computation in Design and Decision-making CEMS, University of the West of England, Bristol, UK. E-mail: ian.parmee@uwe.ac.uk
This speculative article discusses research and development relating to computational intelligence (CI) technologies comprising powerful machine-based search and exploration techniques that can generate, extract, process and present high-quality information from complex, poorly understood biotechnology domains. The integration and capture of user experiential knowledge within such CI systems in order to support and stimulate knowledge discovery and increase scientific and technological understanding is of particular interest. The manner in which appropriate user interaction can overcome problems relating to poor problem representation within systems utilising evolutionary computation (EC), machine-learning and software agent technologies is investigated. The objective is the development of user-centric intelligent systems that support an improving knowledge-base founded upon gradual problem re-definition and reformulation. Such an approach can overcome initial lack of understanding and associated uncertainty.
Compound design perhaps presents a typical example where the chemist is faced with a problem of such magnitude in terms of the number of possible solutions that finding an appropriate starting point upon which to base empiric study is a major task involving extensive experiential knowledge, skill and intuition. Although some computational representations may be available to provide an indication of performance of, say, reagent combinations against specific criteria, a degree of uncertainty with regard to the fidelity of their output is generally inherent. Hence the need for human evaluation to eliminate poor reagent combinations that have survived machine-based evaluation whilst identifying high potential combinations for further empiric investigation. Due to the number of possible combinations across multiple reagent libraries some form of computational search and exploration capability is essential to identify potential high performance solutions for further evaluation by the chemist.1 Thus a machine/human procedure could ensure that experimental effort is concentrated upon ‘best’ candidates thereby significantly reducing design lead time. The above example is used in the paper to aid understanding of the proposed speculative approaches. Given the potential in the compound design domain it is apparent that the development of similar human/computer based search, exploration and classification capabilities would also be of significant benefit in other biotechnology domains. The analysis of data sets from gene expression experiments to provide insights into gene activity under differing environmental conditions and the identification of gene regulatory networks is another area currently receiving attention.2
The above could be considered a general description of how we progress when faced with problems that initially seem beyond our perceived analytic capabilities. Using this description the following sections explore the human-centric utilisation of evolutionary computation, machine learning and agent-based approaches integrated with enabling computational technologies to significantly enhance this iterative, knowledge discovery and representation development process. Particular areas requiring attention are:
• the development of meaningful computational representations from experiential knowledge, sparse data and collective reasoning;
• non-linear search and exploration processes that can negotiate the complex solution spaces described by such representations (where the solution space is described by all possible combinations of variables e.g. reagent libraries);
• the capture of user experiential knowledge and intuition during re-definition of machine-based representations and reformulation and subsequent exploration of innovative solution spaces;
• development of software agent-based activities for information extraction, processing and succinct presentation to the user resulting in a reduction in cognitive load.
The overall objective is the establishment of user-interactive computationally intelligent search and exploration environments that support rapid concept and hypothesis formulation, exploration and evaluation. Novel human-centred problem-solving processes integrated with such ‘virtual laboratories’ may lead to innovation and scientific breakthrough within an academic research environment whilst supporting competitive product development through continuous knowledge discovery in an industrial context.
The author has been actively researching the development and integration of such user-centric CI systems primarily in the field of conceptual engineering design3,4 Recent involvement with pharmaceutical and biotechnology design through close collaboration with Evotec OAI, Milton Park, Abingdon, UK (http://www.oai.co.uk/) indicates a very real potential for the integration of similar systems with these areas.
Fig. 1 Search space/fitness landscape described by values of two continuous variable parameters (X1, X2) with solution performance indicated on the vertical axis. Regon A: multi-modal characteristics. Region B: unimodal characteristics. |
Obviously, a problem defined by larger numbers of variables rapidly becomes impossible to visualize or imagine in terms of the resulting high dimensional landscape. The complexity of this high-dimensional landscape is significantly increased when integer variables or when complex mixes of integer and continuous variables are involved in the problem description. In addition, the presence of local optima representing best values relating to the criteria under consideration will create a multi-modal surface (region A in Fig. 1) comprising peaks/troughs upon which any form of search and optimization procedure may prematurely converge. Region B is unimodal i.e. only one peak is evident within the region. This offers a far lesser challenge as a variety of gradient-based optimizers would rapidly converge upon the optimal point. The surface described by the reagent combination example would likely be extremely rugged with many local optima due to the random ordering of reagents within the reagent libraries.
Various constraints (e.g. maximum allowable weight) will create infeasible regions of a surface and these regions may be convoluted and disjoint. Several quantitative criteria may be involved (e.g. similarity, QSAR, docking etc.) introducing varying degrees of conflict and creating landscapes with differing characteristics relating to a particular criteria. This introduces a requirement to search for common regions of these landscapes offering best compromise solutions.
Searching the two-dimensional landscape in Fig. 1 presents little problem. Given existing computational capability a coarse exhaustive search of solutions would be possible. An increase in the number of variables and the inclusion of the additional complexities described above presents a far greater challenge. An exhaustive search becomes non-viable and the investigator has to rely upon sophisticated search, exploration and optimisation algorithms.
If the problem is well-defined in terms of variables, constraints and objectives that are quantifiable and of known relative importance then a range of search and optimisation techniques can be utilised that can handle the above complexities to a varying degree. Modern heuristic techniques involving populations of trial solutions and stochastic operators that promote search and exploration and eventual convergence are particularly well-suited to the negotiation of such complex problem spaces. The term evolutionary computation tends to cover techniques such as these and perhaps the genetic algorithm (GA)5 has become the best known.
In most cases, however, high problem definition is characteristic of the latter stages of a problem-solving process. These final stages may represent the tip of the iceberg given the time and effort involved in initial problem understanding, definition, formulation and representation. During early stages a high degree of assumption, particularly relating to objective representation, generally provides a starting point for our investigations. An initial variable set may be selected with later addition or removal of variables as the sensitivity of the problem to various aspects becomes apparent. Constraints may be treated in the same way with the added option of softening them to allow exploration of non-feasible regions. Included objectives may change as significant payback becomes apparent through a re-ordering of objective preferences. Some non-conflicting objectives may merge whilst difficulties relating to others may require serious re-thinking with regard to problem formulation. The initial problem space is therefore a moving feast rich in information which, when extracted and coupled with the investigators' experiential knowledge and intuition supports significant problem insight and subsequent problem re-formulation. It is quite possible that final solutions will be identified from a space that bears little resemblance to the search space that provided a starting point for our investigations.
We are, perhaps, considering two problem search spaces:
(1) The machine-based quantitative space that is bounded and inflexible when considered stand-alone (i.e. the space defined by reagent libraries within a compound design situation). Search and exploration algorithms utilizing machine-based criteria representations to evaluate solutions can rapidly provide novel information from this space that aids problem understanding at a human level. Such understanding and subsequent search space redefinition can remove the initial bounds.
(2) The investigators' mental representations of the problem. These representations are only bounded by current knowledge and understanding. The development of this problem space relies upon external stimuli and human intuition and judgement at both a quantitative and qualitative level.
The indication from previous conceptual design work in the engineering domain is that the appropriate melding of these two spaces will support a holistic, knowledge-based approach that can result in significant step changes to machine-based objective representation and in scientific/technological understanding.
Many design research concepts map well onto generic problem-solving and decision-making processes where complexity, high-dimensionality and the inability of the user to concurrently cope with many dimensions of information (cognitive overload) obstructs progress and inhibits exploration. Computational intelligence techniques relevant to and developed within the design domain are now reaching a level of sophistication that allows them to be utilised to support a more holistic approach to problem solving.8 Machine-based exploratory systems can better handle the complexities of high-dimensional space ensuring that succinct specific information is available to the investigator thus enabling a greater user-concentration upon the significance of emerging results.
The introduction of supporting and enabling technologies such as state-of-the-art data visualisation techniques and high-performance computing (HPC) would result in interactive CI search and exploration systems where the user becomes immersed within an information-rich computing environment accepting and analysing output and introducing change. High-performance computing capabilities would be essential to achieve a seamless interface between interactive processes. On-line data-mining techniques11 coupled with agent-assisted data processing and visualisation would contribute greatly to the immersion concept. Overall integration with e-Science technology could lead to the establishment of Grid-based search and exploration capabilities widely available to the UK research community whilst also enabling remote access to very significant HPC resources and possibly to diverse information sources that enhance current knowledge of the problem at hand.
The establishment of a seamless user/machine-based information generation environment as described is ambitious. However highly efficient search across changing fitness landscapes with varying objective preferences and changing constraint conditions is achievable. It is also possible to spawn concurrent/complementary local search utilising appropriate algorithms. Constraint-handling techniques can be introduced that allow exploration and information extraction relating to constraint sensitivity. Search space sampling techniques can be integrated with exploration processes to rapidly generate concepts of problem complexity as landscapes change. Statistical and CI-based modelling techniques are now available whereas the concurrent utilisation of differing model types to provide better overall representation and increased confidence is accepted practice in some areas.
A possible configuration of the various system components and of user interactivity is simply illustrated in Fig. 2.
Fig. 2 Simple illustration of a possible system configuration. |
At any point this relatively continuous exploration process can be paused and relevant information downloaded and presented to the decision-making team for discussion. An easily understood graphic provides a recorded history of user-instigated change thereby supporting traceability and allowing analysis of the logical progression of the team's thinking based upon extracted information. The presentation of such material promotes discussion and allows the perspectives of others to be integrated in further exploratory interactive activity via appropriate problem re-definition and re-formulation.
As this iterative interactive process continues so confidence in the developing criteria representations increases, the knowledge-base becomes well-founded and uncertainty significantly decreases. A natural result is a reduction in user-interaction as we move from a high-risk problem definition phase through an intermediate phase of increasing confidence to the final stages of detailed analysis of a well-defined problem space. This could be considered analogous to the conceptual, embodiment and detailed stages of engineering design.19
Both research council and industrial funding plus close interdisciplinary working will be required to resolve arising problems. From an industrial point of view, however, user-centric CI search and exploration systems could best utilise seemingly endless increases in desktop computational processing capability especially considering that in-house networked machines potentially support access to very high levels of distributed computing power. Such systems continuously running as background processes could support the development of in-house knowledge and expertise whilst reducing lead times to the discovery of innovative products when allied with complementary investigative processes.
From a more academic research-oriented point of view the further development and utilisation of such systems within a research environment could support significant leaps in understanding relating to the characteristics of poorly defined complex problem space. The ability to rapidly and efficiently play ‘what-if’ whilst concurrently gathering high-quality information that either confirms or contradicts current thinking suggests an environment well-suited to the support of knowledge discovery and scientific breakthrough. The role of human intuition, experience and judgement within such an environment would be paramount whilst the inherent support of agent-based entities in terms of information processing and presentation would be invaluable.
This journal is © The Royal Society of Chemistry 2005 |