SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
AI Magazine Volume 19 Number 4 (1998) (© AAAI)
                                                                                                                             Articles




           The DARPA High-
        Performance Knowledge
              Bases Project
           Paul Cohen, Robert Schrag, Eric Jones, Adam Pease, Albert Lin,
                 Barbara Starr, David Gunning, and Murray Burke


■ Now completing its first year, the High-Perfor-               many applications, and they should be main-
  mance Knowledge Bases Project promotes technol-               tained and modified easily. Clearly, these goals
  ogy for developing very large, flexible, and                  require innovation in many areas, from knowl-
  reusable knowledge bases. The project is supported
                                                                edge representation to formal reasoning and
  by the Defense Advanced Research Projects Agency
                                                                special-purpose problem solving, from knowl-
  and includes more than 15 contractors in univer-
  sities, research laboratories, and companies. The             edge acquisition to information gathering on
  evaluation of the constituent technologies centers            the web to machine learning, from natural lan-
  on two challenge problems, in crisis management               guage understanding to semantic integration
  and battlespace reasoning, each demanding pow-                of disparate knowledge bases.
  erful problem solving with very large knowledge                  For roughly one year, HPKB researchers have
  bases. This article discusses the challenge prob-             been developing knowledge bases containing
  lems, the constituent technologies, and their inte-           tens of thousands of axioms concerning crises
  gration and evaluation.
                                                                and battlefield situations. Recently, the tech-
                                                                nology was tested in a month-long evaluation



A
                                                                involving sets of open-ended test items, most
        lthough a computer has beaten the
                                                                of which were similar to sample (training)
        world chess champion, no computer has
                                                                items but otherwise novel. Changes to the cri-
        the commonsense of a six-year-old
                                                                sis and battlefield scenarios were introduced
child. Programs lack knowledge about the
                                                                during the evaluation to test the comprehen-
world sufficient to understand and adjust to
new situations as people do. Consequently,                      siveness and flexibility of knowledge in the
programs have been poor at interpreting and                     HPKB systems. The requirement for compre-
reasoning about novel and changing events,                      hensive, flexible knowledge about general sce-
such as international crises and battlefield sit-                narios forces knowledge bases to be large. Chal-
uations. These problems are more open ended                     lenge problems, which define the scenarios
than chess. Their solution requires shallow                     and thus drive knowledge base development,
knowledge about motives, goals, people, coun-                   are a central innovation of HPKB. This article
tries, adversarial situations, and so on, as well               discusses HPKB challenge problems, technolo-
as deeper knowledge about specific political                    gies and integrated systems, and the evaluation
regimes, economies, geographies, and armies.                    of these systems.
   The High-Performance Knowledge Base                             The challenge problems require significant
(HPKB) Project is sponsored by the Defense                      developments in three broad areas of knowl-
Advanced Research Projects Agency (DARPA)                       edge-based technology. First, the overriding
to develop new technology for knowledge-                        goal of HPKB—to be able to select, compose,
based systems.1 It is a three-year program, end-                extend, specialize, and modify components
ing in fiscal year 1999, with funding totaling                  from a library of reusable ontologies, common
$34 million. HPKB technology will enable                        domain theories, and generic problem-solving
developers to rapidly build very large knowl-                   strategies—is not immediately achievable and
edge bases—on the order of 106 rules, axioms,                   requires some research into foundations of
or frames—enabling a new level of intelligence                  very large knowledge bases, particularly
for military systems. These knowledge bases                     research in knowledge representation and
should be comprehensive and reusable across                     ontological engineering. Second, there is the



Copyright © 1998, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1998 / $2.00     WINTER 1998   25
Articles

                    problem of building on these foundations to            Often, one will accept an answer that is
                    populate very large knowledge bases. The goal       roughly correct, especially when the alterna-
                    is for collaborating teams of domain experts        tives are no answer at all or a very specific but
                    (who might lack training in computer science)       wrong answer. This is Lenat and Feigenbaum’s
                    to easily extend the foundation theories,           breadth hypothesis: “Intelligent performance
                    define additional domain theories and prob-         often requires the problem solver to fall back
                    lem-solving strategies, and acquire domain          on increasingly general knowledge, and/or to
                    facts. Third, because knowledge is not enough,      analogize to specific knowledge from far-flung
                    one also requires efficient problem-solving         domains” (Lenat and Feigenbaum 1987, p.
                    methods. HPKB supports research on efficient,        1173). We must, therefore, augment high-pow-
                    general inference methods and optimized task-       er knowledge-based systems, which give spe-
                    specific methods.                                    cific and precise answers, with weaker but ade-
                       HPKB is a timely impetus for knowledge-          quate knowledge and inference. The inference
                    based technology, although some might think         methods might not all be sound and complete.
                    it overdue. Some of the tenets of HPKB were         Indeed, one might need a multitude of meth-
                    voiced in 1987 by Doug Lenat and Ed Feigen-         ods to implement what Polya called plausible
                    baum (Lenat and Feigenbaum 1987), and some          inference. HPKB encompasses work on a vari-
                    have been around for longer. Lenat’s CYC Pro-       ety of logical, probabilistic, and other infer-
                    ject has also contributed much to our under-        ence methods.
                    standing of large knowledge bases and ontolo-          It is one thing to recognize the need for
                    gies. Now, 13 years into the CYC Project and        commonsense knowledge, another to inte-
                    more than a decade after Lenat and Feigen-          grate it seamlessly into knowledge-based sys-
                    baum’s paper, there seems to be consensus on        tems. Lenat observes that ontologies often are
                    the following points:                               missing a middle level, the purpose of which is
                       The first and most intellectually taxing task     to connect very general ontological concepts
                    when building a large knowledge base is to          such as human and activity with domain-specif-
                    design an ontology. If you get it wrong, you        ic concepts such as the person who is responsible
                    can expect ongoing trouble organizing the           for navigating a B-52 bomber. Because HPKB is
                    knowledge you acquire in a natural way.             grounded in domain-specific tasks, the focus of
                    Whenever two or more systems are built for          much ontological engineering is this middle
                    related tasks (for example, medical expert sys-     layer.
                    tems, planning, modeling of physical process-
                    es, scheduling and logistics, natural language
                    understanding), the architects of the systems
                                                                                    The Participants
                    realize, often too late, that someone else has      The HPKB participants are organized into three
                    already done, or is in the process of doing, the    groups: (1) technology developers, (2) integra-
                    hard ontological work. HPKB challenges the          tion teams, and (3) challenge problem devel-
                    research community to share, merge, and col-        opers. Roughly speaking, the integration teams
                    lectively develop large ontologies for signifi-     build systems with the new technologies to
                    cant military problems. However, an ontology        solve challenge problems. The integration
                    alone is not sufficient. Axioms are required to      teams are led by SAIC and Teknowledge. Each
                    give meaning to the terms in an ontology.           integration team fields systems to solve chal-
                    Without them, users of the ontology can inter-      lenge problems in an annual evaluation. Uni-
                    pret the terms differently.                         versity participants include Stanford Universi-
                       Most knowledge-based systems have no             ty, Massachusetts Institute of Technology
                    common sense; so, they cannot be trusted.           (MIT), Carnegie Mellon University (CMU),
                    Suppose you have a knowledge-based system           Northwestern University, University of Massa-
                    for scheduling resources such as heavy-lift heli-   chusetts (UMass), George Mason University
                    copters, and none of its knowledge concerns         (GMU), and the University of Edinburgh
                    noncombatant evacuation operations. Now,            (AIAI). In addition, SRI International, the Uni-
                    suppose you have to evacuate a lot of people.       versity of Southern California Information Sci-
                    Lacking common sense, your system is literally      ences Institute (USC-ISI), the Kestrel Institute,
                    useless. With a little common sense, it could       and TextWise, Inc., have developed important
                    not only support human planning but might           components. Information Extraction and
                    be superior to it because it could think outside    Transport (IET), Inc., with Pacific Sierra
                    the box and consider using the helicopters in       Research (PSR), Inc., developed and evaluated
                    an unconventional way. Common sense is              the crisis-management challenge problem, and
                    needed to recognize and exploit opportunities       Alphatech, Inc., is responsible for the battle-
                    as well as avoid foolish mistakes.                  space challenge problem.



26    AI MAGAZINE
Articles


         Challenge Problems                         be assessed not only by technology developers
                                                    but also by DARPA management and involved
A programmatic innovation of HPKB is chal-          members of the DoD community.
lenge problems. The crisis-management chal-            The HPKB challenge problems are designed
lenge problem, developed by IET and PSR, is         to support new and ongoing DARPA initiatives
designed to exercise broad, relatively shallow      in intelligence analysis and battlespace infor-
knowledge about international tensions. The         mation systems. Crisis-management systems
battlespace challenge problem, developed by         will assist strategic analysts by evaluating the
Alphatech, Inc., has two parts, each designed       political, economic, and military courses of
to exercise relatively specific knowledge about      action available to nations engaged at various
activities in armed conflicts. Movement analysis     levels of conflict. Battlespace systems will sup-
involves interpreting vehicle movements             port operations officers and intelligence ana-
detected and tracked by idealized sensors. The      lysts by inferring militarily significant targets
workaround problem is concerned with finding         and sites, reasoning about road network traffi-
military engineering solutions to traffic-          cability, and anticipating responses to military
obstruction problems, such as destroyed             strikes.
bridges and blocked tunnels.
   Good challenge problems must satisfy sever-      Crisis-Management
al, often conflicting, criteria. A challenge prob-   Challenge Problem
lem must be challenging: It must raise the bar
                                                    The crisis-management challenge problem is
for both technology and science. A problem
                                                    intended to drive the development of broad,
that requires only technical ingenuity will not
                                                    relatively shallow commonsense knowledge
hold the attention of the technology develop-
                                                    bases to facilitate intelligence analysis. The
ers, nor will it help the United States maintain
                                                    client program at DARPA for this problem is
its preeminence in science. Equally important,
                                                    Project GENOA—Collaborative Crisis Under-
a challenge problem for a DARPA program
                                                    standing and Management. GENOA is intended
must have clear significance to the United
States Department of Defense (DoD). Chal-           to help analysts more rapidly understand
lenge problems should serve for the duration        emerging international crises to preserve U.S.
of the program, becoming more challenging           policy options. Proactive crisis management—
each year. This continuity is preferable to         before a situation has evolved into a crisis that
designing new problems every year because           might engage the U.S. military—enables more
the infrastructure to support challenge prob-       effective responses than reactive management.
lems is expensive.                                  Crisis-management systems will assist strategic
   A challenge problem should require little or     analysts by evaluating the political, economic,
no access to military subject-matter experts. It    and military courses of action available to
should not introduce a knowledge-acquisition        nations engaged at various levels of conflict.
bottleneck that results in delays and low pro-         The challenge problem development team
ductivity from the technology developers. As        worked with GENOA representatives to identify
much as possible, the problem should be solv-       areas for the application of HPKB technology.
able with accessible, open-source material. A       This work took three or four months, but the
challenge problem should exercise all (or           crisis-management challenge problem specifi-
most) of the contributions of the technology        cation has remained fairly stable since its ini-
developers, and it should exercise an integra-      tial release in draft form in July 1997.
tion of these technologies. A challenge prob-          The first step in creating the challenge prob-
lem should have unambiguous criteria for            lem was to develop a scenario to provide con-
evaluating its solutions. These criteria need       text for intelligence analysis in time of crisis.
not be so objective that one can write algo-        To ensure that the problem should require
rithms to score performance (for example,           development of real knowledge about the
human judgment might be needed to assess            world, the scenario includes real national
scores), but they must be clear and they must       actors with a fictional yet plausible story line.
be published early in the program. In addition,     The scenario, which takes place in the Persian
although performance is important, challenge        Gulf, involves hostilities between Saudi Arabia
problems that value performance above all else      and Iran that culminate in closing the Strait of
encourage “one-off” solutions (a solution           Hormuz to international shipping.
developed for a specific problem, once only)           Next, IET worked with experts at PSR to
and discourage researchers from trying to           develop a description of the intelligence analy-
understand why their technologies work well         sis process, which involves the following tasks:
and poorly. A challenge problem should pro-         information gathering—what happened, situ-
vide a steady stream of results, so progress can    ation assessment—what does it mean, and sce-



                                                                                                        WINTER 1998 27
Articles




            III. What of significance might happen following the Saudi air strikes?
            B. Options evaluation
            Evaluate the options available to Iran.
               Close the Strait of Hormuz to shipping.
               Evaluation: Probable
               Motivation: Respond to Saudi air strikes and deter future strikes.
               Capability:
                     (Q51) Can Iran close the Strait of Hormuz to international shipping?
                     (Q83) Is Iran capable of firing upon tankers in the Strait of Hormuz? With what weapons?
               Negative outcomes:
                     (Q53) What risks would Iran face in closing the strait?




                               Figure 1. Sample Questions Pertaining to the Responses to an Event.


                       nario development—what might happen next.            rent and a historical context. Crises can be rep-
                          Situation assessment (or interpretation)          resented as events or as larger episodes tracking
                       includes factors that pertain to the specific sit-    the evolution of a conflict over time, from
                       uation at hand, such as motives, intents, risks,     inception or trigger, through any escalation, to
                       rewards, and ramifications, and factors that         eventual resolution or stasis. The representa-
                       make up a general context, or “strategic cul-        tions being developed in HPKB are intended to
                       ture,” for a state actor’s behavior in interna-      serve as a crisis corporate memory to help ana-
                       tional relations, such as capabilities, interests,   lysts discover historical precedents and analo-
                       policies, ideologies, alliances, and enmities.       gies for actions. Much of the challenge-prob-
                       Scenario development, or speculative predic-         lem specification is devoted to sample
                       tion, starts with the generation of plausible        questions that are intended to drive the devel-
                       actions for each actor. Then, options are eval-      opment of general models for reasoning about
                       uated with respect to the same factors for situ-     crisis events.
                       ation assessment, and a likelihood rating is            Sample questions are embedded in an ana-
                       produced. The most plausible actions are             lytic context. The question “What might hap-
                       reported back to policy makers.                      pen next?” is instantiated as “What might
                          These analytic tasks afford many opportuni-       happen following the Saudi air strikes?” as
                       ties for knowledge-based systems. One is to use      shown in figure 1. Q51 is refined to Q83 in a
                       knowledge bases to retain or multiply corpo-         way that is characteristic of the analytic
                       rate expertise; another is to use knowledge and      process; that is, higher-level questions are
                       reasoning to “think outside the box,” to gener-      refined into sets of lower-level questions that
                       ate analytic possibilities that a human analyst      provide detail.
                       might overlook. The latter task requires exten-         The challenge-problem developers (IET with
                       sive commonsense knowledge, or “analyst’s            PSR) developed an answer key for sample ques-
                       sense,” about the domain to rule out implausi-       tions, a fragment of which is shown in figure 2.
                       ble options.                                         Although simple factual questions (for exam-
                          The crisis-management challenge problem           ple, “What is the gross national product of the
                       includes an informal specification for a proto-       United States?”) have just one answer; ques-
                       type crisis-management assistant to support ana-     tions such as Q53 usually have several. The
                       lysts. The assistant is tested by asking ques-       answer key actually lists five answers, two of
                       tions. Some are simple requests for factual          which are shown in figure 2. Each is accompa-
                       information, others require the assistant to         nied by suitable explanations, including source
                       interpret the actions of nations in the context      material. The first source (Convention on the
                       of strategic culture. Actions are motivated by       Law of the Sea) is electronic. IET maintains a
                       interests, balancing risks and rewards. They         web site with links to pages that are expected to
                       have impacts and require capabilities. Interests     be useful in answering the questions. The sec-
                       drive the formation of alliances, the exercise of    ond source is a fragment of a model developed
                       influence, and the generation of tensions            by IET and published in the challenge-problem
                       among actors. These factors play out in a cur-       specification. IET developed these fragments to



28    AI MAGAZINE
Articles




      Answer(s):
      1. Economic sanctions from {Saudi Arabia, GCC, U.S., U.N.}
         • The closure of the Strait of Hormuz would violate an international norm promoting freedom of the seas and
           would jeopardize the interests of many states.
         • In response, states might act unilaterally or jointly to impose economic sanctions on Iran to compel it to
           reopen the strait.
         • The United Nations Security Council might authorize economic sanctions against Iran.
      2. Limited military response from {Saudi Arabia, GCC, U.S., others}…

      Source(s):
      • The Convention on the Law of the Sea.
      • (B5) States may act unilaterally or collectively to isolate and/or punish a group or state that violates interna-
        tional norms. Unilateral and collective action can involve a wide range of mechanisms, such as intelligence
        collection, military retaliation, economic sanction, and diplomatic censure/isolation.




                                        Figure 2. Part of the Answer Key for Question 53.


indicate the kinds of reasoning they would be
testing in the challenge problem.
   For the challenge-problem evaluation, held
                                                           PQ53 [During/After <TimeInterval>,] what {risks, rewards}
in June 1998, IET developed a way to generate
                                                             would <InternationalAgent> face in <InternationalAction-
test questions through parameterization. Test
                                                             Type>?
questions deviate from sample questions in
specified, controlled ways, so the teams partic-
                                                           <InternationalActionType> =
ipating in the challenge problem know the
                                                             {[exposure of its] {supporting, sponsoring}
space of questions from which test items will
                                                                <InternationalAgentType> in <InternationalAgent2>,
be selected. This space includes billions of
                                                             successful terrorist attacks against <InternationalAgent2>’s
questions so the challenge problem cannot be
                                                                <EconomicSector>,
solved by relying on question-specific knowl-                 <InternationalActionType>(PQ51),
edge. The teams must rely on general knowl-                  taking hostage citizens of <InternationalAgent2>,
edge to perform well in the evaluation.                      attacking targets <SpatialRelationship>
(Semantics provide practical constraints on the              <InternationalAgent2> with <Force>}
number of reasonable instantiations of para-
meterized questions, as do online sources pro-             <InternationalAgentType> =
vided by IET.) To illustrate, Q53 is parameter-              {terrorist group, dissident group, political party, humani-
ized in figure 3. Parameterized question 53,                 tarian organization}
PQ53, actually covers 8 of the roughly 100
sample questions in the specification.
   Parameterized questions and associated class
definitions are based on natural language, giv-                Figure 3. A Parameterized Question Suitable for Generating
ing the integration teams responsibility for                             Sample Questions and Test Questions.
developing (potentially different) formal rep-
resentations of the questions. This decision
was made at the request of the teams. An
instance of a parameterized question, say,
PQ53, is mechanically generated, then the
teams must create a formal representation and
reason with it—without human intervention.

Battlespace Challenge Problems
The second challenge-problem domain for
HPKB is battlespace reasoning. Battlespace is an
abstract notion that includes not only the



                                                                                                             WINTER 1998 29
Articles

                    physical geography of a conflict but also the         form and an order of battle that describes the
                    plans, goals, and activities of all combatants        structure and composition of the enemy forces
                    prior to, and during, a battle and during the         in the scenario region.
                    activities leading to the battle. Three battle-          Given these input, movement analysis com-
                    space programs within DARPA were identified            prises the following tasks:
                    as potential users of HPKB technologies: (1) the         First is to distinguish military from nonmil-
                    dynamic multiinformation fusion program,              itary traffic. Almost all military traffic travels in
                    (2) the dynamic database program, and (3) the         convoys, which makes this task fairly straight-
                    joint forces air-component commander                  forward except for very small convoys of two
                    (JFACC) program. Two battlespace challenge            or three vehicles. Second is to identify the sites
                    problems have been developed.                         between which military convoys travel, deter-
   The second                                                             mine which of these sites are militarily signifi-
                    The Movement-Analysis Challenge
     challenge-     Problem The movement-analysis challenge               cant, and determine the types of each militar-
        problem     problem concerns high-level analysis of ideal-        ily significant site. Site types include battle
                    ized sensor data, particularly the airborne           positions, command posts, support areas, air-
   domain for       JSTARS moving target indicator radar. This            defense sites, artillery sites, and assembly-stag-
       HPKB is      Doppler radar can generate vast quantities of         ing areas.
                    information—one reading every minute for                 Third is to identify which units (or parts of
    battlespace     each vehicle in motion within a 10,000-               units) in the enemy order of battle are partici-
     reasoning.     square-mile area.2 The movement-analysis sce-         pating in each military convoy.
                                                                             Fourth is to determine the purpose of each
Battlespace is      nario involves an enemy mobilizing a full divi-
                                                                          convoy movement. Purposes include recon-
                    sion of ground forces—roughly 200 military
   an abstract      units and 2000 vehicles—to defend against a           naissance, movement of an entire unit toward
   notion that      possible attack. A simulation of the vehicle          a battle position, activities by command ele-
                                                                          ments, and support activities.
                    movements of this division was developed, the
  includes not      output of which includes reports of the posi-            Fifth is to infer the exact types of the vehi-
        only the    tions of all the vehicles in the division at 1-       cles that make up each convoy. About 20 types
                    minute intervals over a 4-day period for 18           of military vehicle are distinguished in the
       physical     hours each day. These military vehicle move-          enemy order of battle, all of which show up in
 geography of       ments were then interspersed with plausible           the scenario data.
                    civilian traffic to add the problem of distin-           To help the technology base and the integra-
 a conflict but      guishing military from nonmilitary traffic. The        tion teams develop their systems, a portion of
        also the    movement-analysis task is to monitor the              the simulation data was released in advance of
  plans, goals,     movements of the enemy to detect and identi-          the evaluation phase, accompanied by an
                                                                          answer key that supplied model answers for
                    fy types of military site and convoy.
 and activities        Because HPKB is not concerned with signal          each of the inference tasks listed previously.
           of all   processing, the input are not real JSTARS data           Movement analysis is currently carried out
                    but are instead generated by a simulator and          manually by human intelligence analysts,
   combatants       preprocessed into vehicle tracks. There is no         who appear to rely on models of enemy
  prior to, and     uncertainty in vehicle location and no radar          behavior at several levels of abstraction. These
                    shadowing, and each vehicle is always accu-           include models of how different sites or con-
      during, a     rately identified by a unique bumper number.           voys are structured for different purposes and
     battle and     However, vehicle tracks do not precisely iden-        models of military systems such as logistics
                                                                          (supply and resupply). For example, in a logis-
     during the     tify vehicle type but instead define each vehi-
                                                                          tics model, one might find the following frag-
                    cle as either light wheeled, heavy wheeled, or
       activities   tracked. Low-speed and stationary vehicles are        ment: “Each echelon in a military organiza-
leading to the      not reported.                                         tion is responsible for resupplying its
                       Vehicle-track data are supplemented by             subordinate echelons. Each echelon, from bat-
          battle.   small quantities of high-value intelligence           talion on up, has a designated area for storing
                    data, including accurate identification of a few       supplies. Supplies are provided by higher ech-
                    key enemy sites, electronic intelligence reports of   elons and transshipped to lower echelons at
                    locations and times at which an enemy radar is        these areas.” Model fragments such as these
                    turned on, communications intelligence reports        are thought to constitute the knowledge of
                    that summarize information obtained by mon-           intelligence analysts and, thus, should be the
                    itoring enemy communications, and human               content of HPKB movement-analysis systems.
                    intelligence reports that provide detailed infor-     Some such knowledge was elicited from mili-
                    mation about the numbers and types of vehi-           tary intelligence analysts during programwide
                    cle passing a given location. Other input             meetings. These same analysts also scripted
                    include a detailed road network in electronic         the simulation scenario.



30    AI MAGAZINE
Articles


The Workaround Challenge Problem                      battle damage are carried out by Army engi-
The workaround challenge problem supports air-        neers; so, this description takes the form of a
campaign planning by the JFACC and his/her            detailed engineering order of battle.
staff. One task for the JFACC is to determine            All input are provided in a formal represen-
suitable targets for air strikes. Good targets        tation language.
allow one to achieve maximum military effect             The workaround generator is expected to
with minimum risk to friendly forces and min-         provide three output: First is a reconstitution
imum loss of life on all sides. Infrastructure        schedule, giving the capacity of the damaged
often provides such targets: It can be sufficient      link as a function of time since the damage was
to destroy supplies at a few key sites or critical    inflicted. For example, the workaround gener-
nodes in a transportation network, such as            ator might conclude that the capacity of the
bridges along supply routes. However, bridges         link is zero for the first 48 hours, but thereafter,
and other targets can be repaired, and there is       a temporary bridge will be in place that can
little point in destroying a bridge if an avail-      sustain a capacity of 170 vehicles an hour. Sec-
able fording site is nearby. If a plan requires an    ond is a time line of engineering actions that the
interruption in traffic of several days, and the       enemy might carry out to implement the
bridge can be repaired in a few hours, then           repair, the time these actions require, and the
another target might be more suitable. Target         temporal constraints among them. If there
selection, then, requires some reasoning about        appears to be more than one viable repair strat-
how an enemy might “work around” the dam-             egy, a time line should be provided for each.
age to the target.                                    Third is a set of required assets for each time line
   The task of the workaround challenge prob-         of actions, a description of the engineering
lem is to automatically assess how rapidly and        resources that are used to repair the damage
by what method an enemy can reconstitute or           and pointers to the actions in the time line           The challenge
bypass damage to a target and, thereby, help          that utilize these assets. The reconstitution          problems are
air-campaign planners rapidly choose effective        schedule provides the minimal information
targets. The focus of the workaround problem          required to evaluate the suitability of a given
                                                                                                             solved by
in the first year of HPKB is automatic                target. The time line of actions provides an           integrated
workaround generation.                                explanation to justify the reconstitution
                                                      schedule. The set of required assets is easily
                                                                                                             systems
   The workaround task involves detailed rep-
resentation of targets and the local terrain          derived from the time line of actions and can          fielded by
around the target and detailed reasoning about        be used to suggest further targets for preemp-         integration
actions the enemy can take to reconstitute or         tive air strikes against the enemy to frustrate its
bypass this damage. Thus, the input to                repair efforts.                                        teams led by
workaround systems include the following ele-            A training data set was provided to help            Teknowledge
ments:                                                developers build their systems. It supplied
   First is a description of a target (for example,   input and output for several sample problems,
                                                                                                             and SAIC.
a bridge or a tunnel), the damage to it (for          together with detailed descriptions of the cal-
example, one span of a bridge is dropped; the         culations carried out to compute action dura-
bridge and vicinity are mined), and key fea-          tions; lists of simplifying assumptions made to
tures of the local terrain (for example, the          facilitate these calculations; and pointers to
slope and soil types of a terrain cross section       text sources for information on engineering
coincident with the road near the bridge,             resources and their use (mainly Army field
together with the maximum depth and the               manuals available on the World Wide Web).
speed of any river or stream the bridge crosses).        Workaround generation requires detailed
   Second is a specific enemy unit or capability       knowledge about what the capabilities of the
to be interdicted, such as a particular armored       enemy’s engineering equipment are and how
battalion or supply trucks carrying ammuni-           it is typically used by enemy forces. For exam-
tion.                                                 ple, repairing damage to a bridge typically
   Third is a time period over which this unit        involves mobile bridging equipment, such as
or capability is to be denied access to the tar-      armored vehicle-launched bridges (AVLBs),
geted route. The presumption is that the ene-         medium girder bridges, Bailey bridges, or float
my will try to repair the damage within this          bridges such as ribbon bridges or M4T6
time period; a target is considered to be effec-      bridges, together with a range of earthmoving
tive if there appears to be no way for the ene-       equipment such as bulldozers. Each kind of
my to make this repair.                               mobile bridge takes a characteristic amount of
   Fourth is a detailed description of the enemy      time to deploy, requires different kinds of bank
resources in the area that could be used to           preparation, and is “owned” by different eche-
repair the damage. For the most part, repairs to      lons in the military hierarchy, all of which



                                                                                                              WINTER 1998 31
Articles

                    affect the time it takes to bring the bridge to a    object-oriented format (Pease and Carrico
                    damage site and effect a repair. Because HPKB        1997a, 1997b), and applications of this generic
                    operates in an entirely unclassified environ-        semantics to domain-specific tasks are promis-
                    ment, U.S. engineering resources and doctrine        ing (Pease and Albericci 1998). The develop-
                    were used throughout. Information from               ment of ontologies for integrating manufactur-
                    Army field manuals was supplemented by a             ing planning applications (Tate 1998) and
                    series of programwide meetings with an Army          work flow (Lee et al. 1996) is ongoing.
                    combat engineer, who also helped construct              Another option for semantic integration is
                    sample problems and solutions.                       software mediation (Park, Gennari, and Musen
                                                                         1997). This software mediation can be seen as
                                                                         a variant on pairwise integration, but because
                              Integrated Systems                         integration is done by knowledge-based
                    The challenge problems are solved by integrat-       means, one has an explicit expression of the
                    ed systems fielded by integration teams led by        semantics of the conversion. Researchers at
                    Teknowledge and SAIC. Teknowledge favors a           Kestrel Institute have successfully defined for-
                    centralized architecture that contains a large       mal specifications for data and used these the-
                    commonsense ontology (CYC); SAIC has a dis-          ories to integrate formally specified software.
                    tributed architecture that relies on sharing spe-    In addition, researchers at Cycorp have suc-
                    cialized domain ontologies and knowledge             cessfully applied CYC to the integration of mul-
                    bases, including a large upper-level ontology        tiple databases.
                    based on the merging of CYC, SENSUS, and other          The Teknowledge approach to integration is
                    knowledge bases.                                     to share knowledge among applications and
                                                                         create new knowledge to support the challenge
                    Teknowledge Integration
   Teknowledge      The Teknowledge integration team comprises
                                                                         problems. Teknowledge is defining formal
                                                                         semantics for the input and output of each
        favors a    Teknowledge, Cycorp, and Kestrel Institute. Its      application and the information in the chal-
    centralized     focus is on semantic integration and the cre-        lenge problems.
                    ation of massive amounts of knowledge.                  Many concepts defy simple definitions.
    architecture    Semantic Integration          Three issues make      Although there has been much success in
  that contains     software integration difficult. Transport issues     defining the semantics of mathematical con-
                                                                         cepts, it is harder to be precise about the
         a large    concern mechanisms to get data from one
                                                                         semantics of the concepts people use every
                    process or machine to another. Solutions
 commonsense        include sockets, remote-method invocation            day. These concepts seem to acquire meaning
       ontology     (RMI), and CORBA. Syntactic issues concern how       through their associations with other con-
                                                                         cepts, their use in situations and communica-
                    to convert number formats, “syntactic sugar,”
       (CYC) ….     and the labels of data. The more challenging         tion, and their relations to instances. To give
                    issues concern semantic integration: To integrate    the concepts in our integrated system real
                    elements properly, one must understand the           meaning, we must provide a rich set of associ-
                    meaning of each. The database community              ations, which requires an extremely large
                    has addressed this issue (Wiederhold 1996); it       knowledge base. CYC offers just such a knowl-
                    is even more pressing in knowledge-based sys-        edge base.
                    tems.                                                   CYC (Lenat 1995; Lenat and Guha 1990)
                       The current state of practice in software inte-   consists of an immense, multicontextual
                    gration consists largely of interfacing pairs of     knowledge base; an efficient inference engine;
                    systems, as needed. Pairwise integration of this     and associated tools and interfaces for acquir-
                    kind does not scale up, unanticipated uses are       ing, browsing, editing, and combining knowl-
                    hard to cover later, and chains of integrated        edge. Its premise is that knowledge-based soft-
                    systems at best evolve into stovepipe systems.       ware will be less brittle if and only if it has
                    Each integration is only as general as it needs      access to a foundation of basic commonsense
                    to be to solve the problem at hand.                  knowledge. This semantic substratum of
                       Some success has been achieved in low-level       terms, rules, and relations enables application
                    integration and reuse; for example, systems          programs to cope with unforeseen circum-
                    that share scientific subroutine libraries or        stances and situations.
                    graphics packages are often forced into similar         The CYC knowledge base represents millions
                    representational choices for low-level data.         of hand-crafted axioms entered during the 13
                    DARPA has invested in early efforts to create        years since CYC’s inception. Through careful
                    reuse libraries for integrating large systems at     policing and generalizing, there are now
                    higher levels. Some effort has gone into             slightly fewer than 1 million axioms in the
                    expressing a generic semantics of plans in an        knowledge base, interrelating roughly 50,000



32    AI MAGAZINE
Articles


atomic terms. Fewer than two percent of these       Teknowledge has developed a template into
axioms represent simple facts about proper          which user-specified parameters can be insert-
nouns of the sort one might find in an              ed. START parses English queries for a few of the
almanac. Most embody general consensus              crisis-management questions to fill in these
information about the concepts. For example,        templates. Each filled template is a legal CYC
one axiom says one cannot perform volitional        query. TextWise Corporation has been devel-
actions while one sleeps, another says one can-     oping natural language information-retrieval
not be in two places at once, and another says      software primarily for news articles (Liddy,
you must be at the same place as a tool to use      Paik, and McKenna 1995). Teknowledge
it. The knowledge base spans human capabili-        intends to use the TextWise knowledge base
ties and limitations, including information on      information tools (KNOW-IT) to supply many
emotions, beliefs, expectations, dreads, and        instances to CYC of facts discovered from news
goals; common everyday objects, processes,          stories. The system can parse English text and
and situations; and the physical universe,          return a series of binary relations that express
including such phenomena as time, space,            the content of the sentences. There are several
causality, and motion.                              dozen relation types, and the constants that
   The CYC inference engine comprises an epis-      instantiate each relation are WORDNET synset
temological and a heuristic level. The epistemo-    mappings (Miller et al. 1993). Each of the con-
logical level is an expressive nth-order logical    cepts has been mapped to a CYC expression,
language with clean formal semantics. The           and a portion of WORDNET has been mapped to
heuristic level is a set of some three dozen spe-   CYC. For those synsets not in CYC, the WORDNET
cial-purpose modules that each contains its         hyponym links are traversed until a mapped          …
own algorithms and data structures and can          CYC term is found.
recognize and handle some commonly occur-           Battlespace Integration           Teknowledge
                                                                                                        SAIC has a
ring sorts of inference. For example, one           supported the movement-analysis workaround          distributed
heuristic-level module handles temporal rea-
soning efficiently by converting temporal rela-
                                                    problem.
                                                       Movement analysis: Several movement-
                                                                                                        architecture
tions into a before-and-after graph and then        analysis systems were to be integrated, and         that relies
doing graph searching rather than theorem
proving to derive an answer. A truth mainte-
                                                    much preliminary integration work was done.         on sharing
                                                    Ultimately, the time pressure of the challenge
nance system and an argumentation-based             problem evaluation precluded a full integra-        specialized
explanation and justification system are tight-
ly integrated into the system and are efficient
                                                    tion. The MIT and UMass movement-analysis           domain
                                                    systems are described briefly here; the SMI and
enough to be in operation at all times. In addi-    SRI systems are described in the section enti-
                                                                                                        ontologies
tion to these inference engines, CYC includes       tled SAIC Integrated Systems.                       and
numerous browsers, editors, and consistency            The MIT MAITA system provides tools for
checkers. A rich interface has been defined.         constructing and controlling networks of dis-
                                                                                                        knowledge
Crisis-Management Integration The cri-              tributed-monitoring processes. These tools          bases ….
sis-management challenge problem involves           provide access to large knowledge bases of
answering test questions presented in a struc-      monitoring methods, organized around the
tured grammar. The first step in answering a        hierarchies of tasks performed, knowledge
test question is to convert it to a form that CYC   used, contexts of application, the alerting of
can reason with, a declarative decision tree.       utility models, and other dimensions. Individ-
When the tree is applied to the test question       ual monitoring processes can also make use of
input, a CYC query is generated and sent to CYC.    knowledge bases representing commonsense
   Answering the challenge problem questions        or expert knowledge in conducting their rea-
takes a great deal of knowledge. For the first      soning or reporting their findings. MIT built
year’s challenge problem alone, the Cycorp          monitoring processes for sites and convoys
and Teknowledge team added some 8,000 con-          with these tools.
cepts and 80,000 assertions to CYC. To meet the        The UMass group tried to identify convoys
needs of this challenge problem, the team cre-      and sites with very simple rules. Rules were
ated significant amounts of new knowledge,          developed for three site types: (1) battle posi-
some developed by collaborators and merged          tions, (2) command posts, and (3) assembly-
into CYC, some added by automated processes.        staging areas. The convoy detector simply
   The Teknowledge integrated system in-            looked for vehicles traveling at fixed distances
cludes two natural language components:             from one another. Initially, UMass was going
START and TextWise. The START system was cre-       to recognize convoys from their dynamics, in
ated by Boris Katz (1997) and his group at MIT.     which the distances between vehicles fluctuate
For each of the crisis-management questions,        in a characteristic way, but in the simulated



                                                                                                         WINTER 1998 33
Articles

                    data, the distances between vehicles remained       soning process. For this reason, a hierarchical
                    fixed. UMass also intended to detect sites by       task network (HTN) approach was taken. A
                    the dynamics of vehicle movements between           planning-specific ontology was defined within
                    them, but no significant dynamic patterns           the larger CYC ontology, and planning rules
                    could be found in the movement data.                only referenced concepts within this more
                       Workarounds: Teknowledge developed two           constrained context. The planning application
                    workaround integrations, one an internal            was essentially embedded in CYC.
                    Teknowledge system, the other from AIAI at             CYC had to be extended to represent com-
                    the University of Edinburgh.                        posite actions that have several alternative
                       Teknowledge developed a planning tool            decompositions and complex preconditions-
                    based on CYC, essentially a wrapper around          effects. Although it is not a commonsense
                    CYC’s existing knowledge base and inference         approach, AIAI decided to explore HTN plan-
                    engine. A plan is a proof that there is a path      ning because it appeared suitable for the
                    from the final goal to the initial situation        workaround domain. It was possible to repre-
                    through a partially ordered set of actions. The     sent actions, their conditions and effects, the
                    rules in the knowledge base driving the plan-       plan-node network, and plan resources in a
                    ner are rules about action preconditions and        relational style. The structure of a plan was
                    about which actions can bring about a certain       implicitly represented in the proof that the
                    state of affairs. There is no explicit temporal     corresponding composite action was a relation
                    reasoning; the partial order of temporal prece-     between particular sets of conditions and
                    dence between actions is established on the         effects. Once proved, action relations are
                    basis of the rules about preconditions and          retained by CYC and are potentially reusable.
                    effects.                                            An advantage of implementing the AIAI plan-
                       The planner is a new kind of inference           ner in CYC was the ability to remove brittleness
                    engine, performing its own search but in a          from the planner-input knowledge format; for
                    much smaller search space. However, each step       example, it was not necessary to account for all
                    in the search involves interaction with the         the possible permutations of argument order
                    existing inference engine by hypothesizing          in predicates such as bordersOn and between.
                    actions and microtheories and doing asks and
                    asserts in these microtheories. This hypothe-
                                                                        SAIC Integrated System
                    sizing and asserting on the fly in effect           SAIC built an HPKB integrated knowledge envi-
                    amounts to dynamically updating the knowl-          ronment (HIKE) to support both crisis-manage-
                    edge base in the course of inference; this capa-    ment and battlespace challenge problems. The
                    bility is new for the CYC inference engine.         architecture of HIKE for crisis management is
                       Consistent with the goals of HPKB, the           shown in figure 4. For battlespace, the architec-
                    Teknowledge workaround planner reused CYC’s         ture is similar in that it is distributed and relies
                    knowledge, although it was not knowledge            on the open knowledge base connectivity
                    specific to workarounds. In fact, CYC had never      (OKBC) protocol, but of course, the compo-
                    been the basis of a planner before, so even stat-   nents integrated by the battlespace architecture
                    ing things in terms of an action’s precondi-        are different. HIKE’s goals are to address the dis-
                    tions was new. What CYC provided, however,          tributed communications and interoperability
                    was a rich basis on which to build workaround       requirements among the HPKB technology
                    knowledge. For example, the Teknowledge             components—knowledge servers, knowledge-
                    team needed to write only one rule to state “to     acquisition tools, question-and-answering sys-
                    use something as a device you must have con-        tems, problem solvers, process monitors, and so
                    trol over that device,” and this rule covered the   on—and provide a graphic user interface (GUI)
                    cases of using an M88 to clear rubble, a mine       tailored to the end users of the HPKB environ-
                    plow to breach a minefield, a bulldozer to cut       ment.
                    into a bank or narrow the gap, and so on. The          HIKE provides a distributed computing infra-
                    reason one rule can cover so many cases is          structure that addresses two types of commu-
                    because clearing rubble, demining an area,          nications needs: First are input and output
                    narrowing a gap, and cutting into a bank are        data-transportation and software connectivi-
                    all specializations of IntrinsicStateChange-        ties. These include connections between the
                    Event, an extant part of the CYC ontology.          HIKE server and technology components, con-
                       The AIAI workaround planner was also             nections between components, and connec-
                    implemented in CYC and took data from               tions between servers. HIKE encapsulates infor-
                    Teknowledge’s FIRE&ISE-TO-MELD translator as its    mation content and data transportation
                    input. The central idea was to use the scriptlike   through JAVA objects, hypertext transfer proto-
                    structure of workaround plans to guide the rea-     col (HTTP), remote-method invocation ( JAVA



34    AI MAGAZINE
Articles




                 Electronic               WWW
                  archival
                   sources

     Real time
    news feeds

                                                                        Question Answering

                                                                                                            Qu
                                                                                                                es
                                                                      KNOW                                         tio
                                                                                          SKC                         n
                                            e                          -IT                                                An
                                      f  ac         START                                                                   sw
                                     r                                                                                         er
                                  te                                                                                             in
                                In                                                                       ATP                        g
                        s  er
                       U

                                                                     OKBC
                                  Hike                         (Open Knowledge                                       SNARK
                                 Servers                       Base Connectivity)
                                                                      BUS


                                                                                                               SPOOK
                                           HIKE
                                          Clients
    Analyst                                                  GKB-                               webKB
                                                             Editor          Ontolingua                                                   WWW



                                                                 Knowledge Services                                            Training
                                                                                                                                Data
                     Knowledge Engineer


                                                Figure 4. SAIC Crisis-Management Challenge Problem Architecture.



RMI), and database access (JDBC). Second, HIKE                        system for GMU. SAIC built a front end to the
provides for knowledge-content assertion and                          OKBC server for LOOM that was extensively
distribution and query requests to knowledge                          used by the members of the battlespace chal-
services.                                                             lenge problem team.
   The OKBC protocol proved essential. SRI                               With OKBC and other methods, the HIKE
used it to interface the theorem prover SNARK to                      infrastructure permits the integration of new
an OKBC server storing the Central Intelli-                           technology components (either clients or
gence Agency (CIA) World Fact Book knowledge                          servers) in the integrated end-to-end HPKB sys-
base. Because this knowledge base is large, SRI                       tem without introducing major changes, pro-
did not want to incorporate it into SNARK but                         vided that the new components adhere to the
instead used the procedural attachment fea-                           specified protocols.
ture of SNARK to look up facts that were avail-                       Crisis-Management Integration              The
able only in the World Fact Book. MIT’s START                         SAIC crisis-management architecture is
system used OKBC to connect to SRI’s OCELOT-                          focused around a central OKBC bus, as shown
SNARK OKBC server. This connection will even-                         in figure 4. The technology components pro-
tually give users the ability to pose questions                       vide user interfaces, question answering, and
in English, which are then transformed to a                           knowledge services. Some components have
formal representation by START and shipped to                         overlapping roles. For example, MIT’s START sys-
SNARK using OKBC; the result is returned using                        tem serves both as a user interface and a ques-
OKBC. ISI built an OKBC server for their LOOM                         tion-answering component. Similarly, CMU’s



                                                                                                                                    WINTER 1998 35
Articles

                    WEBKB supports both question answering and             to reuse knowledge whenever it made sense.
                    knowledge services.                                    The SAIC team reused three knowledge bases:
                       HIKE provides a form-based GUI with which           (1) the HPKB upper-level ontology developed
                    users can construct queries with pull-down             by Cycorp, (2) the World Fact Book knowledge
                    menus. Query-construction templates corre-             base from the CIA, and the Units and Measures
                    spond to the templates defined in the crisis-          Ontology from Stanford. Reusing the upper-
                    management challenge problem specification.             level ontology required translation, compre-
                    Questions also can be entered in natural lan-          hension, and reformulation. The ontology was
                    guage. START and the TextWise component                released in MELD (a language used by Cycorp)
                    accept natural language queries and then               and was not directly readable by the SAIC sys-
                    attempt to answer the questions. To answer             tem. In conjunction with Stanford, SRI devel-
                    questions that involve more complex types of           oped a translator to load the upper-level ontol-
                    reasoning, START generates a formal representa-        ogy into any OKBC-compliant server. Once
                    tion of the query and passes it to one of the          loaded into the OCELOT server, the GKB editor
                    theorem provers.                                       was used to comprehend the upper ontology.
                       The Stanford University Knowledge Systems           The graphic nature of the GKB editor illuminat-
                    Laboratory ONTOLINGUA, SRI International’s             ed the interrelationships between classes and
                    graphic knowledge base (GKB) editor, WEBKB,            predicates of the upper-level ontology. Because
                    and TextWise provide the knowledge service             the upper-level ontology represents functional
                    components. The GKB editor is a graphic tool           relationships as predicates but SNARK reasons
                    for browsing and editing large knowledge bases,        efficiently with functions, it was necessary to
                    used primarily for manual knowledge acquisi-           reformulate the ontology to use functions
  The guiding       tion. WEBKB supports semiautomatic knowledge           whenever a functional relationship existed.
   philosophy       acquisition. Given some training data and an
                    ontology as input, a web spider searches in a
                                                                           Battlespace Integration           The distributed
                                                                           HIKE infrastructure is well suited to support an
       during       directed manner and populates instances of             integrated battlespace challenge problem as it
   knowledge        classes and relations defined in the ontology.         was originally designed: a single information
                    Probabilistic rules are also extracted. TextWise       system for movement analysis, trafficability,
base develop-       extracts information from text and newswire            and workaround reasoning. However, the traffi-
ment for crisis     feeds, converting them into knowledge inter-           cability problem (establishing routes for various
                    change format (KIF) triples, which are then            kinds of vehicle given the characteristics) was
 management         loaded into ONTOLINGUA. ONTOLINGUA is SAIC’s           dropped, and the integration of the other prob-
 was to reuse       central knowledge server and information               lems was delayed. The components that solved
   knowledge        repository for the crisis-management challenge
                    problem. ONTOLINGUA supports KIF as well as
                                                                           these problems are described briefly later.
                                                                              Movement analysis: The movement-analy-
  whenever it       compositional modeling language (CML). Flow            sis problem is solved by MIT’s monitoring,
  made sense.       models developed by Northwestern University            analysis, and interpretation tools arsenal (MAI-
                    (NWU) answer challenge problem questions               TA); Stanford University’s Section on Medical
                    related to world oil-transportation networks           Informatics’ (SMI) problem-solving methods;
                    and reside within ONTOLINGUA. Stanford’s system        and SRI International’s Bayesian networks. The
                    for probabilistic object-oriented knowledge            MIT effort was described briefly in the section
                    (SPOOK) provides a language for class frames to        entitled Teknowledge Integration. Here, we
                    be annotated with probabilistic information,           focus on the SMI and SRI movement-analysis
                    representing uncertainty about the properties          systems.
                    of instances in this class. SPOOK is capable of rea-      For scalability, SMI adopted a three-layered
                    soning with the probabilistic information based        approach to the challenge problem: The first
                    on Bayesian networks.                                  layer consisted primarily of simple, context-
                       Question answering is implemented in sev-           free data processing that attempted to find
                    eral ways. SRI International’s SNARK and Stan-         important preliminary abstractions in the data
                    ford’s abstract theorem prover (ATP) are first-        set. The most important of these were traffic
                    order theorem provers. WEBKB answers                   centers (locations that were either the starting
                    questions based on the information it gathers.         or stopping points for a significant number of
                    Question answering is also accomplished by             vehicles) and convoy segments (a number of
                    START and TextWise taking a query in English as        vehicles that depart from the same traffic cen-
                    input and using information retrieval to               ter at roughly the same time, going in roughly
                    extract the answers from text-based sources            the same direction). Spotting these abstractions
                    (such as the web, newswire feeds).                     required setting a number of parameters (for
                       The guiding philosophy during knowledge             example, how big a traffic center is). Once
                    base development for crisis management was             trained, these first-layer algorithms are linear in



36    AI MAGAZINE
Articles


the size of the data set and enabled SMI to use      ers were developed. One is a novel planner
knowledge-intensive techniques on the result-        whose knowledge base is represented in the
ing (much smaller) set of data abstractions.         ontologies, including its operators, state
   The second layer was a repair layer, which        descriptions, and problem-specific informa-
used knowledge of typical convoy behaviors           tion. It uses a novel partial-match capability
and locations on the battlespace to construct a      developed in LOOM (MacGregor 1991). The oth-
“map” of militarily significant traffic and traf-      er is based on a state-of-the-art planner (Veloso
fic centers. The end result was a network of         et al. 1995). Each solution lists several engi-
traffic connected by traffic. Three main tasks       neering actions for this workaround (for exam-
remain: (1) classify the traffic centers, (2) figure   ple, deslope the banks of the river, install a
out what the convoys are doing, and (3) iden-        temporary bridge), includes information about
tify which units are involved. SMI iteratively       the sources used (for example, what kind of
answered these questions by using repeated           earthmoving equipment or bridge is used), and
layers of heuristic classification and constraint     asserts temporal constraints among the indi-
satisfaction. The heuristic-classification com-      vidual actions to indicate which can be execut-
ponents operated independently of the net-           ed in parallel. A temporal estimation-assess-
work, using known (and deduced) facts about          ment problem solver evaluates each of the
single convoys or traffic centers. Consider the       alternatives and selects one as the most likely
following rule for trying to identify a main         choice for an enemy workaround. This prob-
supply brigade (MSB) site (paraphrased into          lem solver was developed in EXPECT (Swartout
English, with abstractions in boldface):             and Gil 1995; Gil 1994).
  If we have a current site which is unclas-            Several general battlespace ontologies (for
  sified                                              example, military units, vehicles), anchored
     and it’s in the Division support area,          on the HPKB upper ontology, were used and
     and the traffic is high enough                   augmented with ontologies needed to reason
     and the traffic is balanced                      about workarounds (for example, engineering
     and the site is persistent with no major        equipment). Besides these ontologies, the
     deployments emanating from it                   knowledge bases used included a number of
  then it’s probably an MSB                          problem-solving methods to represent knowl-
SMI used similar rules for the constraint-satis-     edge about how to solve the task. Both ontolo-
faction component of its system, allowing            gies and problem-solving knowledge were used
information to propagate through the network         by two main problem solvers.
in a manner similar to Waltz’s (1975) well-             EXPECT’s knowledge-acquisition tools were

known constraint-satisfaction algorithm for          used throughout the evaluation to detect miss-
edge labeling.                                       ing knowledge. EXPECT uses problem-solving
   The goal of the SRI group was to induce a         knowledge and ontologies to analyze which
knowledge base to characterize and identify          information is needed to solve the task. This
types of site such as command posts and battle       capability allows EXPECT to alert a user when
positions. Its approach was to induce a              there is missing knowledge about a problem
Bayesian classifier and use a generative model        (for example, unspecified bridge lengths) or a
approach, producing a Bayesian network that          situation. It also helps debug and refine
could serve as a knowledge base. This model-         ontologies by detecting missing axioms and
ing required transforming raw vehicle tracks         overgeneral definitions.
into features (for example, the frequency of            GMU developed the DISCIPLE98 system. DISCI-
certain vehicles at sites, number of stops) that     PLE is an apprenticeship multistrategy learning
could be used to predict sites. Thus, it was also    system that learns from examples, from expla-
necessary to have hypothetical sites to test. SRI    nations, and by analogy and can be taught by
relied on SMI to provide hypothetical sites,         an expert how to perform domain-specific
and it also used some of the features that SMI       tasks through examples and explanations in a
computed. As a classifier, SRI used tree-aug-        way that resembles how experts teach appren-
mented naive (TAN) Bayes (Friedman, Geiger,          tices (Tecuci 1998). For the workaround
and Goldszmidt 1997).                                domain, DISCIPLE was extended into a baseline-
   Workarounds: The SAIC team integrated             integrated system that creates an ontology by
two approaches to workaround generation,             acquiring concepts from a domain expert or
one developed by USC-ISI, the other by GMU.          importing them (through OKBC) from shared
   ISI developed course-of-action–generation         ontologies. It learns task-decomposition rules
problem solvers to create alternative solutions      from a domain expert and uses this knowledge
to workaround problems. In fact, two alterna-        to solve workaround problems through hierar-
tive course-of-action–generation problem solv-       chical nonlinear planning.



                                                                                                         WINTER 1998 37
Articles

                       First, with DISCIPLE’s ontology-building tools,   ferent metrics. The test items for crisis manage-
                    a domain expert assisted by a knowledge engi-        ment were questions, and the test was similar
                    neer built the object ontology from several          to an exam. Overall competence is a function
                    sources, including expert’s manuals, Alphate-        of the number of questions answered correctly,
                    ch’s FIRE&ISE document and ISI’s LOOM ontology.      but the crisis-management systems are also
                    Second, a task taxonomy was defined by refin-          expected to “show their work” and provide
                    ing the task taxonomy provided by Alphatech.         justifications (including sources) for their
                    This taxonomy indicates principled decomposi-        answers. Examples of questions, answers, and
                    tions of generic workaround tasks into subtasks      justifications for crisis management are shown
                    but does not indicate the conditions under           in the section entitled Crisis-Management
                    which such decompositions should be per-             Challenge Problem.
                    formed. Third, the examples of hierarchical             Performance metrics for the movement-
                    workaround plans provided by Alphatech were          analysis problem are related to recall and pre-
                    used to teach DISCIPLE. Each such plan provided      cision. The basic problem is to identify sites,
                    DISCIPLE with specific examples of decomposi-        vehicles, and purposes given vehicle track
We claim that       tions of tasks into subtasks, and the expert guid-   data; so, performance is a function of how
         HPKB       ed DISCIPLE to “understand” why each task            many of these entities are correctly identified
                    decomposition was appropriate in a particular
   technology       situation. From these examples and the expla-
                                                                         and how many incorrect identifications are
                                                                         made. In general, identifications can be
    facilitates     nations of why they are appropriate in the giv-      marked down on three dimensions: First, the
                    en situations, DISCIPLE learned general task-
          rapid     decomposition rules. After a knowledge base
                                                                         identified entity can be more or less like the
                                                                         actual entity; second, the location of the iden-
 modification        consisting of an object ontology and task-
                                                                         tified entity can be displaced from the actual
                    decomposition rules was built, the hierarchical
             of     nonlinear planner of DISCIPLE was used to auto-
                                                                         entity’s true location; and third, the identifica-
   knowledge-       matically generate workaround plans for new
                                                                         tion can be more or less timely.
                                                                            The workaround problem involves generat-
          based     workaround problems.
                                                                         ing workarounds to military actions such as
systems. This                                                            bombing a bridge. Here, the criteria for suc-

    claim was                       Evaluation                           cessful performance include coverage (the
                                                                         generation of all workarounds generated),
                    The SAIC and Teknowledge integrated systems
tested in both      for crisis management, movement analysis,
                                                                         appropriateness (the generation of work-
                                                                         arounds appropriate given the action), speci-
 phases of the      and workarounds were tested in an extensive
                                                                         ficity (the exact implementation of the work-
   experiment       study in June 1998. The study followed a two-
                                                                         around), and accuracy of timing inferences
                    phase, test-retest schedule. In the first phase,
 because each       the systems were tested on problems similar to
                                                                         (the length each step in the workaround takes
                                                                         to implement).
 phase allows       those used for system development, but in the
                                                                            Performance evaluation, although essential,
                    second, the problems required significant
        time to     modifications to the systems. Within each
                                                                         tells us relatively little about the HPKB inte-
                                                                         grated systems, still less about the component
       improve      phase, the systems were tested and retested on
                                                                         technologies. We also want to know why the
                    the same problems. The test at the beginning
  performance       of each phase established a baseline level of        systems perform well or poorly. Answering this
        on test     performance, but the test at the end measured        question requires credit assignment because
                                                                         the systems comprise many technologies. We
     problems.      improvement during the phase.
                       We claim that HPKB technology facilitates         also want to gather evidence pertinent to sev-
                    rapid modification of knowledge-based sys-           eral important, general claims. One claim is
                    tems. This claim was tested in both phases of        that HPKB facilitates rapid construction of
                    the experiment because each phase allows             knowledge-based systems because ontologies
                    time to improve performance on test prob-            and knowledge bases can be reused. The chal-
                    lems. Phase 2 provides a more stringent test:        lenge problems by design involve broad, rela-
                    Only some of the phase 2 problems can be             tively shallow knowledge in the case of crisis
                    solved by the phase 1 systems, so the systems        management and deep, fairly specific knowl-
                    were expected to perform poorly in the test at       edge in the battlespace problems. It is unclear
                    the beginning of phase 2. The improvement in         which kind of problem most favors the reuse
                    performance on these problems during phase           claim and why. We are developing analytic
                    2 is a direct measure of how well HPKB tech-         models of reuse. Although the predictions of
                    nology facilitates knowledge capture, represen-      these models will not be directly tested in the
                    tation, merging, and modification.                    first year’s evaluation, we will gather data to
                       Each challenge problem is evaluated by dif-       calibrate these models for a later evaluation.



38    AI MAGAZINE
Articles


      Results of the Challenge                       ity of the presentation of the explanation, the
                                                     automatic production by the system of a repre-
        Problem Evaluation                           sentation of the question, source novelty, and
We present the results of the crisis-manage-         reconciliation of multiple sources. Each ques-
ment evaluation first, followed by the results       tion could garner a score between 0 and 3 on
of the battle-space evaluation.                      each criterion, and the criteria were themselves
                                                     weighted. Some questions had multiple parts,        When you
Crisis Management                                    and the number of parts was a further weight-
The evaluation of the SAIC and Teknowledge           ing criterion. In retrospect, it might have been    consider the
crisis-management systems involved 7 trials or       clearer to assign each question a percentage of     difficulty of the
                                                     the points available, thus standardizing all
batches of roughly 110 questions. Thus, more
than 1500 answers were manually graded by            scores, but in the data that follow, scores are
                                                                                                         task, both
the challenge problem developer, IET, and sub-       on an open-ended scale. Subject-matter              systems
ject matter experts at PSR on criteria ranging       experts were assisted with scoring the quality      performed
from correctness to completeness of source           of knowledge representations when necessary.
material to the quality of the representation of        A web-based form was developed for scor-         remarkably
the question. Each question in a batch was           ing, with clear instructions on how to assign       well. Scores on
posed in English accompanied by the syntax of        scores. For example, on the correct-answer cri-
the corresponding parameterized question (fig-        terion, the subject-matter expert was instruct-     the sample
ure 3). The crisis-management systems were           ed to award “zero points if no top-level answer     questions were
supposed to translate these questions into an        is provided and you cannot infer an intended
internal representation, MELD for the Teknowl-       answer; one point for a wrong answer without
                                                                                                         relatively high,
edge system and KIF for the SAIC system. The         any convincing arguments, or most required          which is not
MELD translator was operational for all the tri-     answer elements; two points for a partially cor-
                                                     rect answer; three points for a correct answer
                                                                                                         surprising
als; the KIF translator was used to a limited
extent on later trials.                              addressing most required elements.”                 because these
   The first trial involved testing the systems         When you consider the difficulty of the task,     questions had
on the sample questions that had been avail-         both systems performed remarkably well.
able for several months for training. The            Scores on the sample questions were relatively      been available
remaining trials implemented the “test and           high, which is not surprising because these         for training for
retest with scenario modification” strategy dis-      questions had been available for training for
cussed earlier. The first batch of test questions,    several months (figure 5). It is also not surpris-
                                                                                                         several months
TQA, was repeated four days later as a retest; it    ing that scores on the first batch of test ques-     ….
was designated TQA’ for scoring purposes. The        tions (TQA) were not high. It is gratifying,
                                                     however, to see how scores improve steadily
                                                                                                         It is also not
difference in scores between TQA and TQA’
represents improvements in the systems. After        between test and retest (TQA and TQA’, TQC          surprising that
solving the questions in TQA’, the systems           and TQC’) and that these gains are general:         scores on the
tackled a new set, TQB, designed to be “close        The scores on TQA’ and TQB and TQC’ and
to” TQA. The purpose of TQB was to check             TQD were similar.                                   first batch of
whether the improvements to the systems gen-            The scores designated auto in figure 5 refer      test questions
eralized to new questions. After a short break,      to questions that were translated automatically
a modification was introduced into the crisis-        from English into a formal representation. The
                                                                                                         (TQA) were not
management scenario, and new fragments of            Teknowledge system translated all questions         high. It is
                                                     automatically, the SAIC system very few. Ini-
knowledge about the scenario were released.
Then, the cycle repeated: A new batch of ques-       tially, the Teknowledge team did not manipu-
                                                                                                         gratifying,
tions, TQC, tested how well the systems coped        late the resulting representations, but in later    however, to see
with the scenario modification; then after four       batches, they permitted themselves minor            how scores
days, the systems were retested on the same          modifications. The effects of these can be seen
questions, TQC’, and on the same day, a final         in the differences between TQB and TQB-Auto,        improve
batch, TQD, was released and answered.               TQC and TQC-Auto, and TQD and TQD-Auto.             steadily
   Each question in a trial was scored according        Although the scores of the Teknowledge and
to several criteria, some official and others        SAIC systems appear close in figure 5, differ-
                                                                                                         between test
optional. The four official criteria were (1) the     ences between the systems appear in other           and retest…
correctness of the answer, (2) the quality of the    views of the data. Figure 6 shows the perfor-
explanation of the answer, (3) the complete-         mance of the systems on all official questions
ness and quality of cited sources, and (4) the       plus a few optional questions. Although these
quality of the representation of the question.       extra questions widen the gap between the sys-
The optional criteria included lay intelligibility   tems, the real effect comes from adding
of explanations, novelty of assumptions, qual-       optional components to the scores. Here,



                                                                                                            WINTER 1998 39
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results
Hpkb   year 1 results

Weitere ähnliche Inhalte

Andere mochten auch

Event templatesfor qa2
Event templatesfor qa2Event templatesfor qa2
Event templatesfor qa2Barbara Starr
 
Knowledge intensive query Processing
Knowledge intensive query ProcessingKnowledge intensive query Processing
Knowledge intensive query ProcessingBarbara Starr
 
SAIC System architecture
SAIC System architectureSAIC System architecture
SAIC System architectureBarbara Starr
 
Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Barbara Starr
 

Andere mochten auch (7)

Event templatesfor qa2
Event templatesfor qa2Event templatesfor qa2
Event templatesfor qa2
 
Proceedings
ProceedingsProceedings
Proceedings
 
Knowledge intensive query Processing
Knowledge intensive query ProcessingKnowledge intensive query Processing
Knowledge intensive query Processing
 
SAIC System architecture
SAIC System architectureSAIC System architecture
SAIC System architecture
 
Proceedings
ProceedingsProceedings
Proceedings
 
Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...Event templates for improved narrative understanding in Question Answering sy...
Event templates for improved narrative understanding in Question Answering sy...
 
Saic aqua summary
Saic aqua summarySaic aqua summary
Saic aqua summary
 

Ähnlich wie Hpkb year 1 results

An Essay Concerning Human Understanding Of Genetic Programming
An Essay Concerning Human Understanding Of Genetic ProgrammingAn Essay Concerning Human Understanding Of Genetic Programming
An Essay Concerning Human Understanding Of Genetic ProgrammingJennifer Roman
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
Opening Horizons keynote COST Poland 2011
Opening Horizons keynote COST Poland 2011Opening Horizons keynote COST Poland 2011
Opening Horizons keynote COST Poland 2011Totti Könnölä
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interestsadil raja
 
Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...
Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...
Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...ataloadane
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) CommonsJames Hendler
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebValentina Presutti
 
Guldberg Design Project
Guldberg Design ProjectGuldberg Design Project
Guldberg Design Projectaguldberg
 
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSA HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSIJNSA Journal
 
T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.
T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.
T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.Daniel Crawford
 
Reflections on the Nile BDC science and reflection workshop
Reflections on the Nile BDC science and reflection workshopReflections on the Nile BDC science and reflection workshop
Reflections on the Nile BDC science and reflection workshopILRI
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883paserv
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)James Hendler
 

Ähnlich wie Hpkb year 1 results (20)

An Essay Concerning Human Understanding Of Genetic Programming
An Essay Concerning Human Understanding Of Genetic ProgrammingAn Essay Concerning Human Understanding Of Genetic Programming
An Essay Concerning Human Understanding Of Genetic Programming
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Opening Horizons keynote COST Poland 2011
Opening Horizons keynote COST Poland 2011Opening Horizons keynote COST Poland 2011
Opening Horizons keynote COST Poland 2011
 
Statement of Research Interests
Statement of Research InterestsStatement of Research Interests
Statement of Research Interests
 
Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...
Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...
Recent_Trends_in_Deep_Learning_Based_Open-Domain_Textual_Question_Answering_S...
 
NASA Engineering Handbook
NASA Engineering Handbook NASA Engineering Handbook
NASA Engineering Handbook
 
AIML-MODULE1.pdf
AIML-MODULE1.pdfAIML-MODULE1.pdf
AIML-MODULE1.pdf
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
340
340340
340
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Guldberg Design Project
Guldberg Design ProjectGuldberg Design Project
Guldberg Design Project
 
The Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text TechnologiesThe Value and Benefits of Data-to-Text Technologies
The Value and Benefits of Data-to-Text Technologies
 
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSA HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
 
T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.
T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.
T. Daniel Crawford' presentation at WSSSPE 2013 at SC13.
 
Reflections on the Nile BDC science and reflection workshop
Reflections on the Nile BDC science and reflection workshopReflections on the Nile BDC science and reflection workshop
Reflections on the Nile BDC science and reflection workshop
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
10.1.1.21.5883
10.1.1.21.588310.1.1.21.5883
10.1.1.21.5883
 
Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)Tragedy of the Data Commons (ODSC-East, 2021)
Tragedy of the Data Commons (ODSC-East, 2021)
 

Mehr von Barbara Starr

Kdd14 t2-bordes-gabrilovich (3)
Kdd14 t2-bordes-gabrilovich (3)Kdd14 t2-bordes-gabrilovich (3)
Kdd14 t2-bordes-gabrilovich (3)Barbara Starr
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Smx west Barbara Starr Mac Version - Schema 201 for Real world Succes
Smx west Barbara Starr Mac Version - Schema 201 for Real world SuccesSmx west Barbara Starr Mac Version - Schema 201 for Real world Succes
Smx west Barbara Starr Mac Version - Schema 201 for Real world SuccesBarbara Starr
 
Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Barbara Starr
 
Event templates for Question answering
Event templates for Question answeringEvent templates for Question answering
Event templates for Question answeringBarbara Starr
 
Semantic alignment paper
Semantic alignment paperSemantic alignment paper
Semantic alignment paperBarbara Starr
 
Knowledge intensive query processing copy
Knowledge intensive query processing copyKnowledge intensive query processing copy
Knowledge intensive query processing copyBarbara Starr
 
Semantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingSemantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingBarbara Starr
 
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prangeAquaint kickoff-overview-prange
Aquaint kickoff-overview-prangeBarbara Starr
 
Hike (hpkb integrated knowledge environment)
Hike (hpkb integrated knowledge environment)Hike (hpkb integrated knowledge environment)
Hike (hpkb integrated knowledge environment)Barbara Starr
 
Global accessibility day untapped minority
Global accessibility day  untapped minorityGlobal accessibility day  untapped minority
Global accessibility day untapped minorityBarbara Starr
 

Mehr von Barbara Starr (16)

Kdd14 t2-bordes-gabrilovich (3)
Kdd14 t2-bordes-gabrilovich (3)Kdd14 t2-bordes-gabrilovich (3)
Kdd14 t2-bordes-gabrilovich (3)
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Smx west Barbara Starr Mac Version - Schema 201 for Real world Succes
Smx west Barbara Starr Mac Version - Schema 201 for Real world SuccesSmx west Barbara Starr Mac Version - Schema 201 for Real world Succes
Smx west Barbara Starr Mac Version - Schema 201 for Real world Succes
 
Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Smxeastbarbarastarr2012
Smxeastbarbarastarr2012
 
Event templates for Question answering
Event templates for Question answeringEvent templates for Question answering
Event templates for Question answering
 
RDFa, SEO wave
RDFa, SEO waveRDFa, SEO wave
RDFa, SEO wave
 
Semantic alignment paper
Semantic alignment paperSemantic alignment paper
Semantic alignment paper
 
Knowledge intensive query processing copy
Knowledge intensive query processing copyKnowledge intensive query processing copy
Knowledge intensive query processing copy
 
Semantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencingSemantic Search, Question Answering systems, inferencing
Semantic Search, Question Answering systems, inferencing
 
Saic aqua summary
Saic aqua summarySaic aqua summary
Saic aqua summary
 
Aquaint kickoff-overview-prange
Aquaint kickoff-overview-prangeAquaint kickoff-overview-prange
Aquaint kickoff-overview-prange
 
Saic aqua
Saic aquaSaic aqua
Saic aqua
 
Hike (hpkb integrated knowledge environment)
Hike (hpkb integrated knowledge environment)Hike (hpkb integrated knowledge environment)
Hike (hpkb integrated knowledge environment)
 
Rdfa semtech2011
Rdfa semtech2011Rdfa semtech2011
Rdfa semtech2011
 
Global accessibility day untapped minority
Global accessibility day  untapped minorityGlobal accessibility day  untapped minority
Global accessibility day untapped minority
 

Kürzlich hochgeladen

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 

Kürzlich hochgeladen (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Hpkb year 1 results

  • 1. AI Magazine Volume 19 Number 4 (1998) (© AAAI) Articles The DARPA High- Performance Knowledge Bases Project Paul Cohen, Robert Schrag, Eric Jones, Adam Pease, Albert Lin, Barbara Starr, David Gunning, and Murray Burke ■ Now completing its first year, the High-Perfor- many applications, and they should be main- mance Knowledge Bases Project promotes technol- tained and modified easily. Clearly, these goals ogy for developing very large, flexible, and require innovation in many areas, from knowl- reusable knowledge bases. The project is supported edge representation to formal reasoning and by the Defense Advanced Research Projects Agency special-purpose problem solving, from knowl- and includes more than 15 contractors in univer- sities, research laboratories, and companies. The edge acquisition to information gathering on evaluation of the constituent technologies centers the web to machine learning, from natural lan- on two challenge problems, in crisis management guage understanding to semantic integration and battlespace reasoning, each demanding pow- of disparate knowledge bases. erful problem solving with very large knowledge For roughly one year, HPKB researchers have bases. This article discusses the challenge prob- been developing knowledge bases containing lems, the constituent technologies, and their inte- tens of thousands of axioms concerning crises gration and evaluation. and battlefield situations. Recently, the tech- nology was tested in a month-long evaluation A involving sets of open-ended test items, most lthough a computer has beaten the of which were similar to sample (training) world chess champion, no computer has items but otherwise novel. Changes to the cri- the commonsense of a six-year-old sis and battlefield scenarios were introduced child. Programs lack knowledge about the during the evaluation to test the comprehen- world sufficient to understand and adjust to new situations as people do. Consequently, siveness and flexibility of knowledge in the programs have been poor at interpreting and HPKB systems. The requirement for compre- reasoning about novel and changing events, hensive, flexible knowledge about general sce- such as international crises and battlefield sit- narios forces knowledge bases to be large. Chal- uations. These problems are more open ended lenge problems, which define the scenarios than chess. Their solution requires shallow and thus drive knowledge base development, knowledge about motives, goals, people, coun- are a central innovation of HPKB. This article tries, adversarial situations, and so on, as well discusses HPKB challenge problems, technolo- as deeper knowledge about specific political gies and integrated systems, and the evaluation regimes, economies, geographies, and armies. of these systems. The High-Performance Knowledge Base The challenge problems require significant (HPKB) Project is sponsored by the Defense developments in three broad areas of knowl- Advanced Research Projects Agency (DARPA) edge-based technology. First, the overriding to develop new technology for knowledge- goal of HPKB—to be able to select, compose, based systems.1 It is a three-year program, end- extend, specialize, and modify components ing in fiscal year 1999, with funding totaling from a library of reusable ontologies, common $34 million. HPKB technology will enable domain theories, and generic problem-solving developers to rapidly build very large knowl- strategies—is not immediately achievable and edge bases—on the order of 106 rules, axioms, requires some research into foundations of or frames—enabling a new level of intelligence very large knowledge bases, particularly for military systems. These knowledge bases research in knowledge representation and should be comprehensive and reusable across ontological engineering. Second, there is the Copyright © 1998, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1998 / $2.00 WINTER 1998 25
  • 2. Articles problem of building on these foundations to Often, one will accept an answer that is populate very large knowledge bases. The goal roughly correct, especially when the alterna- is for collaborating teams of domain experts tives are no answer at all or a very specific but (who might lack training in computer science) wrong answer. This is Lenat and Feigenbaum’s to easily extend the foundation theories, breadth hypothesis: “Intelligent performance define additional domain theories and prob- often requires the problem solver to fall back lem-solving strategies, and acquire domain on increasingly general knowledge, and/or to facts. Third, because knowledge is not enough, analogize to specific knowledge from far-flung one also requires efficient problem-solving domains” (Lenat and Feigenbaum 1987, p. methods. HPKB supports research on efficient, 1173). We must, therefore, augment high-pow- general inference methods and optimized task- er knowledge-based systems, which give spe- specific methods. cific and precise answers, with weaker but ade- HPKB is a timely impetus for knowledge- quate knowledge and inference. The inference based technology, although some might think methods might not all be sound and complete. it overdue. Some of the tenets of HPKB were Indeed, one might need a multitude of meth- voiced in 1987 by Doug Lenat and Ed Feigen- ods to implement what Polya called plausible baum (Lenat and Feigenbaum 1987), and some inference. HPKB encompasses work on a vari- have been around for longer. Lenat’s CYC Pro- ety of logical, probabilistic, and other infer- ject has also contributed much to our under- ence methods. standing of large knowledge bases and ontolo- It is one thing to recognize the need for gies. Now, 13 years into the CYC Project and commonsense knowledge, another to inte- more than a decade after Lenat and Feigen- grate it seamlessly into knowledge-based sys- baum’s paper, there seems to be consensus on tems. Lenat observes that ontologies often are the following points: missing a middle level, the purpose of which is The first and most intellectually taxing task to connect very general ontological concepts when building a large knowledge base is to such as human and activity with domain-specif- design an ontology. If you get it wrong, you ic concepts such as the person who is responsible can expect ongoing trouble organizing the for navigating a B-52 bomber. Because HPKB is knowledge you acquire in a natural way. grounded in domain-specific tasks, the focus of Whenever two or more systems are built for much ontological engineering is this middle related tasks (for example, medical expert sys- layer. tems, planning, modeling of physical process- es, scheduling and logistics, natural language understanding), the architects of the systems The Participants realize, often too late, that someone else has The HPKB participants are organized into three already done, or is in the process of doing, the groups: (1) technology developers, (2) integra- hard ontological work. HPKB challenges the tion teams, and (3) challenge problem devel- research community to share, merge, and col- opers. Roughly speaking, the integration teams lectively develop large ontologies for signifi- build systems with the new technologies to cant military problems. However, an ontology solve challenge problems. The integration alone is not sufficient. Axioms are required to teams are led by SAIC and Teknowledge. Each give meaning to the terms in an ontology. integration team fields systems to solve chal- Without them, users of the ontology can inter- lenge problems in an annual evaluation. Uni- pret the terms differently. versity participants include Stanford Universi- Most knowledge-based systems have no ty, Massachusetts Institute of Technology common sense; so, they cannot be trusted. (MIT), Carnegie Mellon University (CMU), Suppose you have a knowledge-based system Northwestern University, University of Massa- for scheduling resources such as heavy-lift heli- chusetts (UMass), George Mason University copters, and none of its knowledge concerns (GMU), and the University of Edinburgh noncombatant evacuation operations. Now, (AIAI). In addition, SRI International, the Uni- suppose you have to evacuate a lot of people. versity of Southern California Information Sci- Lacking common sense, your system is literally ences Institute (USC-ISI), the Kestrel Institute, useless. With a little common sense, it could and TextWise, Inc., have developed important not only support human planning but might components. Information Extraction and be superior to it because it could think outside Transport (IET), Inc., with Pacific Sierra the box and consider using the helicopters in Research (PSR), Inc., developed and evaluated an unconventional way. Common sense is the crisis-management challenge problem, and needed to recognize and exploit opportunities Alphatech, Inc., is responsible for the battle- as well as avoid foolish mistakes. space challenge problem. 26 AI MAGAZINE
  • 3. Articles Challenge Problems be assessed not only by technology developers but also by DARPA management and involved A programmatic innovation of HPKB is chal- members of the DoD community. lenge problems. The crisis-management chal- The HPKB challenge problems are designed lenge problem, developed by IET and PSR, is to support new and ongoing DARPA initiatives designed to exercise broad, relatively shallow in intelligence analysis and battlespace infor- knowledge about international tensions. The mation systems. Crisis-management systems battlespace challenge problem, developed by will assist strategic analysts by evaluating the Alphatech, Inc., has two parts, each designed political, economic, and military courses of to exercise relatively specific knowledge about action available to nations engaged at various activities in armed conflicts. Movement analysis levels of conflict. Battlespace systems will sup- involves interpreting vehicle movements port operations officers and intelligence ana- detected and tracked by idealized sensors. The lysts by inferring militarily significant targets workaround problem is concerned with finding and sites, reasoning about road network traffi- military engineering solutions to traffic- cability, and anticipating responses to military obstruction problems, such as destroyed strikes. bridges and blocked tunnels. Good challenge problems must satisfy sever- Crisis-Management al, often conflicting, criteria. A challenge prob- Challenge Problem lem must be challenging: It must raise the bar The crisis-management challenge problem is for both technology and science. A problem intended to drive the development of broad, that requires only technical ingenuity will not relatively shallow commonsense knowledge hold the attention of the technology develop- bases to facilitate intelligence analysis. The ers, nor will it help the United States maintain client program at DARPA for this problem is its preeminence in science. Equally important, Project GENOA—Collaborative Crisis Under- a challenge problem for a DARPA program standing and Management. GENOA is intended must have clear significance to the United States Department of Defense (DoD). Chal- to help analysts more rapidly understand lenge problems should serve for the duration emerging international crises to preserve U.S. of the program, becoming more challenging policy options. Proactive crisis management— each year. This continuity is preferable to before a situation has evolved into a crisis that designing new problems every year because might engage the U.S. military—enables more the infrastructure to support challenge prob- effective responses than reactive management. lems is expensive. Crisis-management systems will assist strategic A challenge problem should require little or analysts by evaluating the political, economic, no access to military subject-matter experts. It and military courses of action available to should not introduce a knowledge-acquisition nations engaged at various levels of conflict. bottleneck that results in delays and low pro- The challenge problem development team ductivity from the technology developers. As worked with GENOA representatives to identify much as possible, the problem should be solv- areas for the application of HPKB technology. able with accessible, open-source material. A This work took three or four months, but the challenge problem should exercise all (or crisis-management challenge problem specifi- most) of the contributions of the technology cation has remained fairly stable since its ini- developers, and it should exercise an integra- tial release in draft form in July 1997. tion of these technologies. A challenge prob- The first step in creating the challenge prob- lem should have unambiguous criteria for lem was to develop a scenario to provide con- evaluating its solutions. These criteria need text for intelligence analysis in time of crisis. not be so objective that one can write algo- To ensure that the problem should require rithms to score performance (for example, development of real knowledge about the human judgment might be needed to assess world, the scenario includes real national scores), but they must be clear and they must actors with a fictional yet plausible story line. be published early in the program. In addition, The scenario, which takes place in the Persian although performance is important, challenge Gulf, involves hostilities between Saudi Arabia problems that value performance above all else and Iran that culminate in closing the Strait of encourage “one-off” solutions (a solution Hormuz to international shipping. developed for a specific problem, once only) Next, IET worked with experts at PSR to and discourage researchers from trying to develop a description of the intelligence analy- understand why their technologies work well sis process, which involves the following tasks: and poorly. A challenge problem should pro- information gathering—what happened, situ- vide a steady stream of results, so progress can ation assessment—what does it mean, and sce- WINTER 1998 27
  • 4. Articles III. What of significance might happen following the Saudi air strikes? B. Options evaluation Evaluate the options available to Iran. Close the Strait of Hormuz to shipping. Evaluation: Probable Motivation: Respond to Saudi air strikes and deter future strikes. Capability: (Q51) Can Iran close the Strait of Hormuz to international shipping? (Q83) Is Iran capable of firing upon tankers in the Strait of Hormuz? With what weapons? Negative outcomes: (Q53) What risks would Iran face in closing the strait? Figure 1. Sample Questions Pertaining to the Responses to an Event. nario development—what might happen next. rent and a historical context. Crises can be rep- Situation assessment (or interpretation) resented as events or as larger episodes tracking includes factors that pertain to the specific sit- the evolution of a conflict over time, from uation at hand, such as motives, intents, risks, inception or trigger, through any escalation, to rewards, and ramifications, and factors that eventual resolution or stasis. The representa- make up a general context, or “strategic cul- tions being developed in HPKB are intended to ture,” for a state actor’s behavior in interna- serve as a crisis corporate memory to help ana- tional relations, such as capabilities, interests, lysts discover historical precedents and analo- policies, ideologies, alliances, and enmities. gies for actions. Much of the challenge-prob- Scenario development, or speculative predic- lem specification is devoted to sample tion, starts with the generation of plausible questions that are intended to drive the devel- actions for each actor. Then, options are eval- opment of general models for reasoning about uated with respect to the same factors for situ- crisis events. ation assessment, and a likelihood rating is Sample questions are embedded in an ana- produced. The most plausible actions are lytic context. The question “What might hap- reported back to policy makers. pen next?” is instantiated as “What might These analytic tasks afford many opportuni- happen following the Saudi air strikes?” as ties for knowledge-based systems. One is to use shown in figure 1. Q51 is refined to Q83 in a knowledge bases to retain or multiply corpo- way that is characteristic of the analytic rate expertise; another is to use knowledge and process; that is, higher-level questions are reasoning to “think outside the box,” to gener- refined into sets of lower-level questions that ate analytic possibilities that a human analyst provide detail. might overlook. The latter task requires exten- The challenge-problem developers (IET with sive commonsense knowledge, or “analyst’s PSR) developed an answer key for sample ques- sense,” about the domain to rule out implausi- tions, a fragment of which is shown in figure 2. ble options. Although simple factual questions (for exam- The crisis-management challenge problem ple, “What is the gross national product of the includes an informal specification for a proto- United States?”) have just one answer; ques- type crisis-management assistant to support ana- tions such as Q53 usually have several. The lysts. The assistant is tested by asking ques- answer key actually lists five answers, two of tions. Some are simple requests for factual which are shown in figure 2. Each is accompa- information, others require the assistant to nied by suitable explanations, including source interpret the actions of nations in the context material. The first source (Convention on the of strategic culture. Actions are motivated by Law of the Sea) is electronic. IET maintains a interests, balancing risks and rewards. They web site with links to pages that are expected to have impacts and require capabilities. Interests be useful in answering the questions. The sec- drive the formation of alliances, the exercise of ond source is a fragment of a model developed influence, and the generation of tensions by IET and published in the challenge-problem among actors. These factors play out in a cur- specification. IET developed these fragments to 28 AI MAGAZINE
  • 5. Articles Answer(s): 1. Economic sanctions from {Saudi Arabia, GCC, U.S., U.N.} • The closure of the Strait of Hormuz would violate an international norm promoting freedom of the seas and would jeopardize the interests of many states. • In response, states might act unilaterally or jointly to impose economic sanctions on Iran to compel it to reopen the strait. • The United Nations Security Council might authorize economic sanctions against Iran. 2. Limited military response from {Saudi Arabia, GCC, U.S., others}… Source(s): • The Convention on the Law of the Sea. • (B5) States may act unilaterally or collectively to isolate and/or punish a group or state that violates interna- tional norms. Unilateral and collective action can involve a wide range of mechanisms, such as intelligence collection, military retaliation, economic sanction, and diplomatic censure/isolation. Figure 2. Part of the Answer Key for Question 53. indicate the kinds of reasoning they would be testing in the challenge problem. For the challenge-problem evaluation, held PQ53 [During/After <TimeInterval>,] what {risks, rewards} in June 1998, IET developed a way to generate would <InternationalAgent> face in <InternationalAction- test questions through parameterization. Test Type>? questions deviate from sample questions in specified, controlled ways, so the teams partic- <InternationalActionType> = ipating in the challenge problem know the {[exposure of its] {supporting, sponsoring} space of questions from which test items will <InternationalAgentType> in <InternationalAgent2>, be selected. This space includes billions of successful terrorist attacks against <InternationalAgent2>’s questions so the challenge problem cannot be <EconomicSector>, solved by relying on question-specific knowl- <InternationalActionType>(PQ51), edge. The teams must rely on general knowl- taking hostage citizens of <InternationalAgent2>, edge to perform well in the evaluation. attacking targets <SpatialRelationship> (Semantics provide practical constraints on the <InternationalAgent2> with <Force>} number of reasonable instantiations of para- meterized questions, as do online sources pro- <InternationalAgentType> = vided by IET.) To illustrate, Q53 is parameter- {terrorist group, dissident group, political party, humani- ized in figure 3. Parameterized question 53, tarian organization} PQ53, actually covers 8 of the roughly 100 sample questions in the specification. Parameterized questions and associated class definitions are based on natural language, giv- Figure 3. A Parameterized Question Suitable for Generating ing the integration teams responsibility for Sample Questions and Test Questions. developing (potentially different) formal rep- resentations of the questions. This decision was made at the request of the teams. An instance of a parameterized question, say, PQ53, is mechanically generated, then the teams must create a formal representation and reason with it—without human intervention. Battlespace Challenge Problems The second challenge-problem domain for HPKB is battlespace reasoning. Battlespace is an abstract notion that includes not only the WINTER 1998 29
  • 6. Articles physical geography of a conflict but also the form and an order of battle that describes the plans, goals, and activities of all combatants structure and composition of the enemy forces prior to, and during, a battle and during the in the scenario region. activities leading to the battle. Three battle- Given these input, movement analysis com- space programs within DARPA were identified prises the following tasks: as potential users of HPKB technologies: (1) the First is to distinguish military from nonmil- dynamic multiinformation fusion program, itary traffic. Almost all military traffic travels in (2) the dynamic database program, and (3) the convoys, which makes this task fairly straight- joint forces air-component commander forward except for very small convoys of two (JFACC) program. Two battlespace challenge or three vehicles. Second is to identify the sites problems have been developed. between which military convoys travel, deter- The second mine which of these sites are militarily signifi- The Movement-Analysis Challenge challenge- Problem The movement-analysis challenge cant, and determine the types of each militar- problem problem concerns high-level analysis of ideal- ily significant site. Site types include battle ized sensor data, particularly the airborne positions, command posts, support areas, air- domain for JSTARS moving target indicator radar. This defense sites, artillery sites, and assembly-stag- HPKB is Doppler radar can generate vast quantities of ing areas. information—one reading every minute for Third is to identify which units (or parts of battlespace each vehicle in motion within a 10,000- units) in the enemy order of battle are partici- reasoning. square-mile area.2 The movement-analysis sce- pating in each military convoy. Fourth is to determine the purpose of each Battlespace is nario involves an enemy mobilizing a full divi- convoy movement. Purposes include recon- sion of ground forces—roughly 200 military an abstract units and 2000 vehicles—to defend against a naissance, movement of an entire unit toward notion that possible attack. A simulation of the vehicle a battle position, activities by command ele- ments, and support activities. movements of this division was developed, the includes not output of which includes reports of the posi- Fifth is to infer the exact types of the vehi- only the tions of all the vehicles in the division at 1- cles that make up each convoy. About 20 types minute intervals over a 4-day period for 18 of military vehicle are distinguished in the physical hours each day. These military vehicle move- enemy order of battle, all of which show up in geography of ments were then interspersed with plausible the scenario data. civilian traffic to add the problem of distin- To help the technology base and the integra- a conflict but guishing military from nonmilitary traffic. The tion teams develop their systems, a portion of also the movement-analysis task is to monitor the the simulation data was released in advance of plans, goals, movements of the enemy to detect and identi- the evaluation phase, accompanied by an answer key that supplied model answers for fy types of military site and convoy. and activities Because HPKB is not concerned with signal each of the inference tasks listed previously. of all processing, the input are not real JSTARS data Movement analysis is currently carried out but are instead generated by a simulator and manually by human intelligence analysts, combatants preprocessed into vehicle tracks. There is no who appear to rely on models of enemy prior to, and uncertainty in vehicle location and no radar behavior at several levels of abstraction. These shadowing, and each vehicle is always accu- include models of how different sites or con- during, a rately identified by a unique bumper number. voys are structured for different purposes and battle and However, vehicle tracks do not precisely iden- models of military systems such as logistics (supply and resupply). For example, in a logis- during the tify vehicle type but instead define each vehi- tics model, one might find the following frag- cle as either light wheeled, heavy wheeled, or activities tracked. Low-speed and stationary vehicles are ment: “Each echelon in a military organiza- leading to the not reported. tion is responsible for resupplying its Vehicle-track data are supplemented by subordinate echelons. Each echelon, from bat- battle. small quantities of high-value intelligence talion on up, has a designated area for storing data, including accurate identification of a few supplies. Supplies are provided by higher ech- key enemy sites, electronic intelligence reports of elons and transshipped to lower echelons at locations and times at which an enemy radar is these areas.” Model fragments such as these turned on, communications intelligence reports are thought to constitute the knowledge of that summarize information obtained by mon- intelligence analysts and, thus, should be the itoring enemy communications, and human content of HPKB movement-analysis systems. intelligence reports that provide detailed infor- Some such knowledge was elicited from mili- mation about the numbers and types of vehi- tary intelligence analysts during programwide cle passing a given location. Other input meetings. These same analysts also scripted include a detailed road network in electronic the simulation scenario. 30 AI MAGAZINE
  • 7. Articles The Workaround Challenge Problem battle damage are carried out by Army engi- The workaround challenge problem supports air- neers; so, this description takes the form of a campaign planning by the JFACC and his/her detailed engineering order of battle. staff. One task for the JFACC is to determine All input are provided in a formal represen- suitable targets for air strikes. Good targets tation language. allow one to achieve maximum military effect The workaround generator is expected to with minimum risk to friendly forces and min- provide three output: First is a reconstitution imum loss of life on all sides. Infrastructure schedule, giving the capacity of the damaged often provides such targets: It can be sufficient link as a function of time since the damage was to destroy supplies at a few key sites or critical inflicted. For example, the workaround gener- nodes in a transportation network, such as ator might conclude that the capacity of the bridges along supply routes. However, bridges link is zero for the first 48 hours, but thereafter, and other targets can be repaired, and there is a temporary bridge will be in place that can little point in destroying a bridge if an avail- sustain a capacity of 170 vehicles an hour. Sec- able fording site is nearby. If a plan requires an ond is a time line of engineering actions that the interruption in traffic of several days, and the enemy might carry out to implement the bridge can be repaired in a few hours, then repair, the time these actions require, and the another target might be more suitable. Target temporal constraints among them. If there selection, then, requires some reasoning about appears to be more than one viable repair strat- how an enemy might “work around” the dam- egy, a time line should be provided for each. age to the target. Third is a set of required assets for each time line The task of the workaround challenge prob- of actions, a description of the engineering lem is to automatically assess how rapidly and resources that are used to repair the damage by what method an enemy can reconstitute or and pointers to the actions in the time line The challenge bypass damage to a target and, thereby, help that utilize these assets. The reconstitution problems are air-campaign planners rapidly choose effective schedule provides the minimal information targets. The focus of the workaround problem required to evaluate the suitability of a given solved by in the first year of HPKB is automatic target. The time line of actions provides an integrated workaround generation. explanation to justify the reconstitution schedule. The set of required assets is easily systems The workaround task involves detailed rep- resentation of targets and the local terrain derived from the time line of actions and can fielded by around the target and detailed reasoning about be used to suggest further targets for preemp- integration actions the enemy can take to reconstitute or tive air strikes against the enemy to frustrate its bypass this damage. Thus, the input to repair efforts. teams led by workaround systems include the following ele- A training data set was provided to help Teknowledge ments: developers build their systems. It supplied First is a description of a target (for example, input and output for several sample problems, and SAIC. a bridge or a tunnel), the damage to it (for together with detailed descriptions of the cal- example, one span of a bridge is dropped; the culations carried out to compute action dura- bridge and vicinity are mined), and key fea- tions; lists of simplifying assumptions made to tures of the local terrain (for example, the facilitate these calculations; and pointers to slope and soil types of a terrain cross section text sources for information on engineering coincident with the road near the bridge, resources and their use (mainly Army field together with the maximum depth and the manuals available on the World Wide Web). speed of any river or stream the bridge crosses). Workaround generation requires detailed Second is a specific enemy unit or capability knowledge about what the capabilities of the to be interdicted, such as a particular armored enemy’s engineering equipment are and how battalion or supply trucks carrying ammuni- it is typically used by enemy forces. For exam- tion. ple, repairing damage to a bridge typically Third is a time period over which this unit involves mobile bridging equipment, such as or capability is to be denied access to the tar- armored vehicle-launched bridges (AVLBs), geted route. The presumption is that the ene- medium girder bridges, Bailey bridges, or float my will try to repair the damage within this bridges such as ribbon bridges or M4T6 time period; a target is considered to be effec- bridges, together with a range of earthmoving tive if there appears to be no way for the ene- equipment such as bulldozers. Each kind of my to make this repair. mobile bridge takes a characteristic amount of Fourth is a detailed description of the enemy time to deploy, requires different kinds of bank resources in the area that could be used to preparation, and is “owned” by different eche- repair the damage. For the most part, repairs to lons in the military hierarchy, all of which WINTER 1998 31
  • 8. Articles affect the time it takes to bring the bridge to a object-oriented format (Pease and Carrico damage site and effect a repair. Because HPKB 1997a, 1997b), and applications of this generic operates in an entirely unclassified environ- semantics to domain-specific tasks are promis- ment, U.S. engineering resources and doctrine ing (Pease and Albericci 1998). The develop- were used throughout. Information from ment of ontologies for integrating manufactur- Army field manuals was supplemented by a ing planning applications (Tate 1998) and series of programwide meetings with an Army work flow (Lee et al. 1996) is ongoing. combat engineer, who also helped construct Another option for semantic integration is sample problems and solutions. software mediation (Park, Gennari, and Musen 1997). This software mediation can be seen as a variant on pairwise integration, but because Integrated Systems integration is done by knowledge-based The challenge problems are solved by integrat- means, one has an explicit expression of the ed systems fielded by integration teams led by semantics of the conversion. Researchers at Teknowledge and SAIC. Teknowledge favors a Kestrel Institute have successfully defined for- centralized architecture that contains a large mal specifications for data and used these the- commonsense ontology (CYC); SAIC has a dis- ories to integrate formally specified software. tributed architecture that relies on sharing spe- In addition, researchers at Cycorp have suc- cialized domain ontologies and knowledge cessfully applied CYC to the integration of mul- bases, including a large upper-level ontology tiple databases. based on the merging of CYC, SENSUS, and other The Teknowledge approach to integration is knowledge bases. to share knowledge among applications and create new knowledge to support the challenge Teknowledge Integration Teknowledge The Teknowledge integration team comprises problems. Teknowledge is defining formal semantics for the input and output of each favors a Teknowledge, Cycorp, and Kestrel Institute. Its application and the information in the chal- centralized focus is on semantic integration and the cre- lenge problems. ation of massive amounts of knowledge. Many concepts defy simple definitions. architecture Semantic Integration Three issues make Although there has been much success in that contains software integration difficult. Transport issues defining the semantics of mathematical con- cepts, it is harder to be precise about the a large concern mechanisms to get data from one semantics of the concepts people use every process or machine to another. Solutions commonsense include sockets, remote-method invocation day. These concepts seem to acquire meaning ontology (RMI), and CORBA. Syntactic issues concern how through their associations with other con- cepts, their use in situations and communica- to convert number formats, “syntactic sugar,” (CYC) …. and the labels of data. The more challenging tion, and their relations to instances. To give issues concern semantic integration: To integrate the concepts in our integrated system real elements properly, one must understand the meaning, we must provide a rich set of associ- meaning of each. The database community ations, which requires an extremely large has addressed this issue (Wiederhold 1996); it knowledge base. CYC offers just such a knowl- is even more pressing in knowledge-based sys- edge base. tems. CYC (Lenat 1995; Lenat and Guha 1990) The current state of practice in software inte- consists of an immense, multicontextual gration consists largely of interfacing pairs of knowledge base; an efficient inference engine; systems, as needed. Pairwise integration of this and associated tools and interfaces for acquir- kind does not scale up, unanticipated uses are ing, browsing, editing, and combining knowl- hard to cover later, and chains of integrated edge. Its premise is that knowledge-based soft- systems at best evolve into stovepipe systems. ware will be less brittle if and only if it has Each integration is only as general as it needs access to a foundation of basic commonsense to be to solve the problem at hand. knowledge. This semantic substratum of Some success has been achieved in low-level terms, rules, and relations enables application integration and reuse; for example, systems programs to cope with unforeseen circum- that share scientific subroutine libraries or stances and situations. graphics packages are often forced into similar The CYC knowledge base represents millions representational choices for low-level data. of hand-crafted axioms entered during the 13 DARPA has invested in early efforts to create years since CYC’s inception. Through careful reuse libraries for integrating large systems at policing and generalizing, there are now higher levels. Some effort has gone into slightly fewer than 1 million axioms in the expressing a generic semantics of plans in an knowledge base, interrelating roughly 50,000 32 AI MAGAZINE
  • 9. Articles atomic terms. Fewer than two percent of these Teknowledge has developed a template into axioms represent simple facts about proper which user-specified parameters can be insert- nouns of the sort one might find in an ed. START parses English queries for a few of the almanac. Most embody general consensus crisis-management questions to fill in these information about the concepts. For example, templates. Each filled template is a legal CYC one axiom says one cannot perform volitional query. TextWise Corporation has been devel- actions while one sleeps, another says one can- oping natural language information-retrieval not be in two places at once, and another says software primarily for news articles (Liddy, you must be at the same place as a tool to use Paik, and McKenna 1995). Teknowledge it. The knowledge base spans human capabili- intends to use the TextWise knowledge base ties and limitations, including information on information tools (KNOW-IT) to supply many emotions, beliefs, expectations, dreads, and instances to CYC of facts discovered from news goals; common everyday objects, processes, stories. The system can parse English text and and situations; and the physical universe, return a series of binary relations that express including such phenomena as time, space, the content of the sentences. There are several causality, and motion. dozen relation types, and the constants that The CYC inference engine comprises an epis- instantiate each relation are WORDNET synset temological and a heuristic level. The epistemo- mappings (Miller et al. 1993). Each of the con- logical level is an expressive nth-order logical cepts has been mapped to a CYC expression, language with clean formal semantics. The and a portion of WORDNET has been mapped to heuristic level is a set of some three dozen spe- CYC. For those synsets not in CYC, the WORDNET cial-purpose modules that each contains its hyponym links are traversed until a mapped … own algorithms and data structures and can CYC term is found. recognize and handle some commonly occur- Battlespace Integration Teknowledge SAIC has a ring sorts of inference. For example, one supported the movement-analysis workaround distributed heuristic-level module handles temporal rea- soning efficiently by converting temporal rela- problem. Movement analysis: Several movement- architecture tions into a before-and-after graph and then analysis systems were to be integrated, and that relies doing graph searching rather than theorem proving to derive an answer. A truth mainte- much preliminary integration work was done. on sharing Ultimately, the time pressure of the challenge nance system and an argumentation-based problem evaluation precluded a full integra- specialized explanation and justification system are tight- ly integrated into the system and are efficient tion. The MIT and UMass movement-analysis domain systems are described briefly here; the SMI and enough to be in operation at all times. In addi- SRI systems are described in the section enti- ontologies tion to these inference engines, CYC includes tled SAIC Integrated Systems. and numerous browsers, editors, and consistency The MIT MAITA system provides tools for checkers. A rich interface has been defined. constructing and controlling networks of dis- knowledge Crisis-Management Integration The cri- tributed-monitoring processes. These tools bases …. sis-management challenge problem involves provide access to large knowledge bases of answering test questions presented in a struc- monitoring methods, organized around the tured grammar. The first step in answering a hierarchies of tasks performed, knowledge test question is to convert it to a form that CYC used, contexts of application, the alerting of can reason with, a declarative decision tree. utility models, and other dimensions. Individ- When the tree is applied to the test question ual monitoring processes can also make use of input, a CYC query is generated and sent to CYC. knowledge bases representing commonsense Answering the challenge problem questions or expert knowledge in conducting their rea- takes a great deal of knowledge. For the first soning or reporting their findings. MIT built year’s challenge problem alone, the Cycorp monitoring processes for sites and convoys and Teknowledge team added some 8,000 con- with these tools. cepts and 80,000 assertions to CYC. To meet the The UMass group tried to identify convoys needs of this challenge problem, the team cre- and sites with very simple rules. Rules were ated significant amounts of new knowledge, developed for three site types: (1) battle posi- some developed by collaborators and merged tions, (2) command posts, and (3) assembly- into CYC, some added by automated processes. staging areas. The convoy detector simply The Teknowledge integrated system in- looked for vehicles traveling at fixed distances cludes two natural language components: from one another. Initially, UMass was going START and TextWise. The START system was cre- to recognize convoys from their dynamics, in ated by Boris Katz (1997) and his group at MIT. which the distances between vehicles fluctuate For each of the crisis-management questions, in a characteristic way, but in the simulated WINTER 1998 33
  • 10. Articles data, the distances between vehicles remained soning process. For this reason, a hierarchical fixed. UMass also intended to detect sites by task network (HTN) approach was taken. A the dynamics of vehicle movements between planning-specific ontology was defined within them, but no significant dynamic patterns the larger CYC ontology, and planning rules could be found in the movement data. only referenced concepts within this more Workarounds: Teknowledge developed two constrained context. The planning application workaround integrations, one an internal was essentially embedded in CYC. Teknowledge system, the other from AIAI at CYC had to be extended to represent com- the University of Edinburgh. posite actions that have several alternative Teknowledge developed a planning tool decompositions and complex preconditions- based on CYC, essentially a wrapper around effects. Although it is not a commonsense CYC’s existing knowledge base and inference approach, AIAI decided to explore HTN plan- engine. A plan is a proof that there is a path ning because it appeared suitable for the from the final goal to the initial situation workaround domain. It was possible to repre- through a partially ordered set of actions. The sent actions, their conditions and effects, the rules in the knowledge base driving the plan- plan-node network, and plan resources in a ner are rules about action preconditions and relational style. The structure of a plan was about which actions can bring about a certain implicitly represented in the proof that the state of affairs. There is no explicit temporal corresponding composite action was a relation reasoning; the partial order of temporal prece- between particular sets of conditions and dence between actions is established on the effects. Once proved, action relations are basis of the rules about preconditions and retained by CYC and are potentially reusable. effects. An advantage of implementing the AIAI plan- The planner is a new kind of inference ner in CYC was the ability to remove brittleness engine, performing its own search but in a from the planner-input knowledge format; for much smaller search space. However, each step example, it was not necessary to account for all in the search involves interaction with the the possible permutations of argument order existing inference engine by hypothesizing in predicates such as bordersOn and between. actions and microtheories and doing asks and asserts in these microtheories. This hypothe- SAIC Integrated System sizing and asserting on the fly in effect SAIC built an HPKB integrated knowledge envi- amounts to dynamically updating the knowl- ronment (HIKE) to support both crisis-manage- edge base in the course of inference; this capa- ment and battlespace challenge problems. The bility is new for the CYC inference engine. architecture of HIKE for crisis management is Consistent with the goals of HPKB, the shown in figure 4. For battlespace, the architec- Teknowledge workaround planner reused CYC’s ture is similar in that it is distributed and relies knowledge, although it was not knowledge on the open knowledge base connectivity specific to workarounds. In fact, CYC had never (OKBC) protocol, but of course, the compo- been the basis of a planner before, so even stat- nents integrated by the battlespace architecture ing things in terms of an action’s precondi- are different. HIKE’s goals are to address the dis- tions was new. What CYC provided, however, tributed communications and interoperability was a rich basis on which to build workaround requirements among the HPKB technology knowledge. For example, the Teknowledge components—knowledge servers, knowledge- team needed to write only one rule to state “to acquisition tools, question-and-answering sys- use something as a device you must have con- tems, problem solvers, process monitors, and so trol over that device,” and this rule covered the on—and provide a graphic user interface (GUI) cases of using an M88 to clear rubble, a mine tailored to the end users of the HPKB environ- plow to breach a minefield, a bulldozer to cut ment. into a bank or narrow the gap, and so on. The HIKE provides a distributed computing infra- reason one rule can cover so many cases is structure that addresses two types of commu- because clearing rubble, demining an area, nications needs: First are input and output narrowing a gap, and cutting into a bank are data-transportation and software connectivi- all specializations of IntrinsicStateChange- ties. These include connections between the Event, an extant part of the CYC ontology. HIKE server and technology components, con- The AIAI workaround planner was also nections between components, and connec- implemented in CYC and took data from tions between servers. HIKE encapsulates infor- Teknowledge’s FIRE&ISE-TO-MELD translator as its mation content and data transportation input. The central idea was to use the scriptlike through JAVA objects, hypertext transfer proto- structure of workaround plans to guide the rea- col (HTTP), remote-method invocation ( JAVA 34 AI MAGAZINE
  • 11. Articles Electronic WWW archival sources Real time news feeds Question Answering Qu es KNOW tio SKC n e -IT An f ac START sw r er te in In ATP g s er U OKBC Hike (Open Knowledge SNARK Servers Base Connectivity) BUS SPOOK HIKE Clients Analyst GKB- webKB Editor Ontolingua WWW Knowledge Services Training Data Knowledge Engineer Figure 4. SAIC Crisis-Management Challenge Problem Architecture. RMI), and database access (JDBC). Second, HIKE system for GMU. SAIC built a front end to the provides for knowledge-content assertion and OKBC server for LOOM that was extensively distribution and query requests to knowledge used by the members of the battlespace chal- services. lenge problem team. The OKBC protocol proved essential. SRI With OKBC and other methods, the HIKE used it to interface the theorem prover SNARK to infrastructure permits the integration of new an OKBC server storing the Central Intelli- technology components (either clients or gence Agency (CIA) World Fact Book knowledge servers) in the integrated end-to-end HPKB sys- base. Because this knowledge base is large, SRI tem without introducing major changes, pro- did not want to incorporate it into SNARK but vided that the new components adhere to the instead used the procedural attachment fea- specified protocols. ture of SNARK to look up facts that were avail- Crisis-Management Integration The able only in the World Fact Book. MIT’s START SAIC crisis-management architecture is system used OKBC to connect to SRI’s OCELOT- focused around a central OKBC bus, as shown SNARK OKBC server. This connection will even- in figure 4. The technology components pro- tually give users the ability to pose questions vide user interfaces, question answering, and in English, which are then transformed to a knowledge services. Some components have formal representation by START and shipped to overlapping roles. For example, MIT’s START sys- SNARK using OKBC; the result is returned using tem serves both as a user interface and a ques- OKBC. ISI built an OKBC server for their LOOM tion-answering component. Similarly, CMU’s WINTER 1998 35
  • 12. Articles WEBKB supports both question answering and to reuse knowledge whenever it made sense. knowledge services. The SAIC team reused three knowledge bases: HIKE provides a form-based GUI with which (1) the HPKB upper-level ontology developed users can construct queries with pull-down by Cycorp, (2) the World Fact Book knowledge menus. Query-construction templates corre- base from the CIA, and the Units and Measures spond to the templates defined in the crisis- Ontology from Stanford. Reusing the upper- management challenge problem specification. level ontology required translation, compre- Questions also can be entered in natural lan- hension, and reformulation. The ontology was guage. START and the TextWise component released in MELD (a language used by Cycorp) accept natural language queries and then and was not directly readable by the SAIC sys- attempt to answer the questions. To answer tem. In conjunction with Stanford, SRI devel- questions that involve more complex types of oped a translator to load the upper-level ontol- reasoning, START generates a formal representa- ogy into any OKBC-compliant server. Once tion of the query and passes it to one of the loaded into the OCELOT server, the GKB editor theorem provers. was used to comprehend the upper ontology. The Stanford University Knowledge Systems The graphic nature of the GKB editor illuminat- Laboratory ONTOLINGUA, SRI International’s ed the interrelationships between classes and graphic knowledge base (GKB) editor, WEBKB, predicates of the upper-level ontology. Because and TextWise provide the knowledge service the upper-level ontology represents functional components. The GKB editor is a graphic tool relationships as predicates but SNARK reasons for browsing and editing large knowledge bases, efficiently with functions, it was necessary to used primarily for manual knowledge acquisi- reformulate the ontology to use functions The guiding tion. WEBKB supports semiautomatic knowledge whenever a functional relationship existed. philosophy acquisition. Given some training data and an ontology as input, a web spider searches in a Battlespace Integration The distributed HIKE infrastructure is well suited to support an during directed manner and populates instances of integrated battlespace challenge problem as it knowledge classes and relations defined in the ontology. was originally designed: a single information Probabilistic rules are also extracted. TextWise system for movement analysis, trafficability, base develop- extracts information from text and newswire and workaround reasoning. However, the traffi- ment for crisis feeds, converting them into knowledge inter- cability problem (establishing routes for various change format (KIF) triples, which are then kinds of vehicle given the characteristics) was management loaded into ONTOLINGUA. ONTOLINGUA is SAIC’s dropped, and the integration of the other prob- was to reuse central knowledge server and information lems was delayed. The components that solved knowledge repository for the crisis-management challenge problem. ONTOLINGUA supports KIF as well as these problems are described briefly later. Movement analysis: The movement-analy- whenever it compositional modeling language (CML). Flow sis problem is solved by MIT’s monitoring, made sense. models developed by Northwestern University analysis, and interpretation tools arsenal (MAI- (NWU) answer challenge problem questions TA); Stanford University’s Section on Medical related to world oil-transportation networks Informatics’ (SMI) problem-solving methods; and reside within ONTOLINGUA. Stanford’s system and SRI International’s Bayesian networks. The for probabilistic object-oriented knowledge MIT effort was described briefly in the section (SPOOK) provides a language for class frames to entitled Teknowledge Integration. Here, we be annotated with probabilistic information, focus on the SMI and SRI movement-analysis representing uncertainty about the properties systems. of instances in this class. SPOOK is capable of rea- For scalability, SMI adopted a three-layered soning with the probabilistic information based approach to the challenge problem: The first on Bayesian networks. layer consisted primarily of simple, context- Question answering is implemented in sev- free data processing that attempted to find eral ways. SRI International’s SNARK and Stan- important preliminary abstractions in the data ford’s abstract theorem prover (ATP) are first- set. The most important of these were traffic order theorem provers. WEBKB answers centers (locations that were either the starting questions based on the information it gathers. or stopping points for a significant number of Question answering is also accomplished by vehicles) and convoy segments (a number of START and TextWise taking a query in English as vehicles that depart from the same traffic cen- input and using information retrieval to ter at roughly the same time, going in roughly extract the answers from text-based sources the same direction). Spotting these abstractions (such as the web, newswire feeds). required setting a number of parameters (for The guiding philosophy during knowledge example, how big a traffic center is). Once base development for crisis management was trained, these first-layer algorithms are linear in 36 AI MAGAZINE
  • 13. Articles the size of the data set and enabled SMI to use ers were developed. One is a novel planner knowledge-intensive techniques on the result- whose knowledge base is represented in the ing (much smaller) set of data abstractions. ontologies, including its operators, state The second layer was a repair layer, which descriptions, and problem-specific informa- used knowledge of typical convoy behaviors tion. It uses a novel partial-match capability and locations on the battlespace to construct a developed in LOOM (MacGregor 1991). The oth- “map” of militarily significant traffic and traf- er is based on a state-of-the-art planner (Veloso fic centers. The end result was a network of et al. 1995). Each solution lists several engi- traffic connected by traffic. Three main tasks neering actions for this workaround (for exam- remain: (1) classify the traffic centers, (2) figure ple, deslope the banks of the river, install a out what the convoys are doing, and (3) iden- temporary bridge), includes information about tify which units are involved. SMI iteratively the sources used (for example, what kind of answered these questions by using repeated earthmoving equipment or bridge is used), and layers of heuristic classification and constraint asserts temporal constraints among the indi- satisfaction. The heuristic-classification com- vidual actions to indicate which can be execut- ponents operated independently of the net- ed in parallel. A temporal estimation-assess- work, using known (and deduced) facts about ment problem solver evaluates each of the single convoys or traffic centers. Consider the alternatives and selects one as the most likely following rule for trying to identify a main choice for an enemy workaround. This prob- supply brigade (MSB) site (paraphrased into lem solver was developed in EXPECT (Swartout English, with abstractions in boldface): and Gil 1995; Gil 1994). If we have a current site which is unclas- Several general battlespace ontologies (for sified example, military units, vehicles), anchored and it’s in the Division support area, on the HPKB upper ontology, were used and and the traffic is high enough augmented with ontologies needed to reason and the traffic is balanced about workarounds (for example, engineering and the site is persistent with no major equipment). Besides these ontologies, the deployments emanating from it knowledge bases used included a number of then it’s probably an MSB problem-solving methods to represent knowl- SMI used similar rules for the constraint-satis- edge about how to solve the task. Both ontolo- faction component of its system, allowing gies and problem-solving knowledge were used information to propagate through the network by two main problem solvers. in a manner similar to Waltz’s (1975) well- EXPECT’s knowledge-acquisition tools were known constraint-satisfaction algorithm for used throughout the evaluation to detect miss- edge labeling. ing knowledge. EXPECT uses problem-solving The goal of the SRI group was to induce a knowledge and ontologies to analyze which knowledge base to characterize and identify information is needed to solve the task. This types of site such as command posts and battle capability allows EXPECT to alert a user when positions. Its approach was to induce a there is missing knowledge about a problem Bayesian classifier and use a generative model (for example, unspecified bridge lengths) or a approach, producing a Bayesian network that situation. It also helps debug and refine could serve as a knowledge base. This model- ontologies by detecting missing axioms and ing required transforming raw vehicle tracks overgeneral definitions. into features (for example, the frequency of GMU developed the DISCIPLE98 system. DISCI- certain vehicles at sites, number of stops) that PLE is an apprenticeship multistrategy learning could be used to predict sites. Thus, it was also system that learns from examples, from expla- necessary to have hypothetical sites to test. SRI nations, and by analogy and can be taught by relied on SMI to provide hypothetical sites, an expert how to perform domain-specific and it also used some of the features that SMI tasks through examples and explanations in a computed. As a classifier, SRI used tree-aug- way that resembles how experts teach appren- mented naive (TAN) Bayes (Friedman, Geiger, tices (Tecuci 1998). For the workaround and Goldszmidt 1997). domain, DISCIPLE was extended into a baseline- Workarounds: The SAIC team integrated integrated system that creates an ontology by two approaches to workaround generation, acquiring concepts from a domain expert or one developed by USC-ISI, the other by GMU. importing them (through OKBC) from shared ISI developed course-of-action–generation ontologies. It learns task-decomposition rules problem solvers to create alternative solutions from a domain expert and uses this knowledge to workaround problems. In fact, two alterna- to solve workaround problems through hierar- tive course-of-action–generation problem solv- chical nonlinear planning. WINTER 1998 37
  • 14. Articles First, with DISCIPLE’s ontology-building tools, ferent metrics. The test items for crisis manage- a domain expert assisted by a knowledge engi- ment were questions, and the test was similar neer built the object ontology from several to an exam. Overall competence is a function sources, including expert’s manuals, Alphate- of the number of questions answered correctly, ch’s FIRE&ISE document and ISI’s LOOM ontology. but the crisis-management systems are also Second, a task taxonomy was defined by refin- expected to “show their work” and provide ing the task taxonomy provided by Alphatech. justifications (including sources) for their This taxonomy indicates principled decomposi- answers. Examples of questions, answers, and tions of generic workaround tasks into subtasks justifications for crisis management are shown but does not indicate the conditions under in the section entitled Crisis-Management which such decompositions should be per- Challenge Problem. formed. Third, the examples of hierarchical Performance metrics for the movement- workaround plans provided by Alphatech were analysis problem are related to recall and pre- used to teach DISCIPLE. Each such plan provided cision. The basic problem is to identify sites, DISCIPLE with specific examples of decomposi- vehicles, and purposes given vehicle track We claim that tions of tasks into subtasks, and the expert guid- data; so, performance is a function of how HPKB ed DISCIPLE to “understand” why each task many of these entities are correctly identified decomposition was appropriate in a particular technology situation. From these examples and the expla- and how many incorrect identifications are made. In general, identifications can be facilitates nations of why they are appropriate in the giv- marked down on three dimensions: First, the en situations, DISCIPLE learned general task- rapid decomposition rules. After a knowledge base identified entity can be more or less like the actual entity; second, the location of the iden- modification consisting of an object ontology and task- tified entity can be displaced from the actual decomposition rules was built, the hierarchical of nonlinear planner of DISCIPLE was used to auto- entity’s true location; and third, the identifica- knowledge- matically generate workaround plans for new tion can be more or less timely. The workaround problem involves generat- based workaround problems. ing workarounds to military actions such as systems. This bombing a bridge. Here, the criteria for suc- claim was Evaluation cessful performance include coverage (the generation of all workarounds generated), The SAIC and Teknowledge integrated systems tested in both for crisis management, movement analysis, appropriateness (the generation of work- arounds appropriate given the action), speci- phases of the and workarounds were tested in an extensive ficity (the exact implementation of the work- experiment study in June 1998. The study followed a two- around), and accuracy of timing inferences phase, test-retest schedule. In the first phase, because each the systems were tested on problems similar to (the length each step in the workaround takes to implement). phase allows those used for system development, but in the Performance evaluation, although essential, second, the problems required significant time to modifications to the systems. Within each tells us relatively little about the HPKB inte- grated systems, still less about the component improve phase, the systems were tested and retested on technologies. We also want to know why the the same problems. The test at the beginning performance of each phase established a baseline level of systems perform well or poorly. Answering this on test performance, but the test at the end measured question requires credit assignment because the systems comprise many technologies. We problems. improvement during the phase. We claim that HPKB technology facilitates also want to gather evidence pertinent to sev- rapid modification of knowledge-based sys- eral important, general claims. One claim is tems. This claim was tested in both phases of that HPKB facilitates rapid construction of the experiment because each phase allows knowledge-based systems because ontologies time to improve performance on test prob- and knowledge bases can be reused. The chal- lems. Phase 2 provides a more stringent test: lenge problems by design involve broad, rela- Only some of the phase 2 problems can be tively shallow knowledge in the case of crisis solved by the phase 1 systems, so the systems management and deep, fairly specific knowl- were expected to perform poorly in the test at edge in the battlespace problems. It is unclear the beginning of phase 2. The improvement in which kind of problem most favors the reuse performance on these problems during phase claim and why. We are developing analytic 2 is a direct measure of how well HPKB tech- models of reuse. Although the predictions of nology facilitates knowledge capture, represen- these models will not be directly tested in the tation, merging, and modification. first year’s evaluation, we will gather data to Each challenge problem is evaluated by dif- calibrate these models for a later evaluation. 38 AI MAGAZINE
  • 15. Articles Results of the Challenge ity of the presentation of the explanation, the automatic production by the system of a repre- Problem Evaluation sentation of the question, source novelty, and We present the results of the crisis-manage- reconciliation of multiple sources. Each ques- ment evaluation first, followed by the results tion could garner a score between 0 and 3 on of the battle-space evaluation. each criterion, and the criteria were themselves weighted. Some questions had multiple parts, When you Crisis Management and the number of parts was a further weight- The evaluation of the SAIC and Teknowledge ing criterion. In retrospect, it might have been consider the crisis-management systems involved 7 trials or clearer to assign each question a percentage of difficulty of the the points available, thus standardizing all batches of roughly 110 questions. Thus, more than 1500 answers were manually graded by scores, but in the data that follow, scores are task, both the challenge problem developer, IET, and sub- on an open-ended scale. Subject-matter systems ject matter experts at PSR on criteria ranging experts were assisted with scoring the quality performed from correctness to completeness of source of knowledge representations when necessary. material to the quality of the representation of A web-based form was developed for scor- remarkably the question. Each question in a batch was ing, with clear instructions on how to assign well. Scores on posed in English accompanied by the syntax of scores. For example, on the correct-answer cri- the corresponding parameterized question (fig- terion, the subject-matter expert was instruct- the sample ure 3). The crisis-management systems were ed to award “zero points if no top-level answer questions were supposed to translate these questions into an is provided and you cannot infer an intended internal representation, MELD for the Teknowl- answer; one point for a wrong answer without relatively high, edge system and KIF for the SAIC system. The any convincing arguments, or most required which is not MELD translator was operational for all the tri- answer elements; two points for a partially cor- rect answer; three points for a correct answer surprising als; the KIF translator was used to a limited extent on later trials. addressing most required elements.” because these The first trial involved testing the systems When you consider the difficulty of the task, questions had on the sample questions that had been avail- both systems performed remarkably well. able for several months for training. The Scores on the sample questions were relatively been available remaining trials implemented the “test and high, which is not surprising because these for training for retest with scenario modification” strategy dis- questions had been available for training for cussed earlier. The first batch of test questions, several months (figure 5). It is also not surpris- several months TQA, was repeated four days later as a retest; it ing that scores on the first batch of test ques- …. was designated TQA’ for scoring purposes. The tions (TQA) were not high. It is gratifying, however, to see how scores improve steadily It is also not difference in scores between TQA and TQA’ represents improvements in the systems. After between test and retest (TQA and TQA’, TQC surprising that solving the questions in TQA’, the systems and TQC’) and that these gains are general: scores on the tackled a new set, TQB, designed to be “close The scores on TQA’ and TQB and TQC’ and to” TQA. The purpose of TQB was to check TQD were similar. first batch of whether the improvements to the systems gen- The scores designated auto in figure 5 refer test questions eralized to new questions. After a short break, to questions that were translated automatically a modification was introduced into the crisis- from English into a formal representation. The (TQA) were not management scenario, and new fragments of Teknowledge system translated all questions high. It is automatically, the SAIC system very few. Ini- knowledge about the scenario were released. Then, the cycle repeated: A new batch of ques- tially, the Teknowledge team did not manipu- gratifying, tions, TQC, tested how well the systems coped late the resulting representations, but in later however, to see with the scenario modification; then after four batches, they permitted themselves minor how scores days, the systems were retested on the same modifications. The effects of these can be seen questions, TQC’, and on the same day, a final in the differences between TQB and TQB-Auto, improve batch, TQD, was released and answered. TQC and TQC-Auto, and TQD and TQD-Auto. steadily Each question in a trial was scored according Although the scores of the Teknowledge and to several criteria, some official and others SAIC systems appear close in figure 5, differ- between test optional. The four official criteria were (1) the ences between the systems appear in other and retest… correctness of the answer, (2) the quality of the views of the data. Figure 6 shows the perfor- explanation of the answer, (3) the complete- mance of the systems on all official questions ness and quality of cited sources, and (4) the plus a few optional questions. Although these quality of the representation of the question. extra questions widen the gap between the sys- The optional criteria included lay intelligibility tems, the real effect comes from adding of explanations, novelty of assumptions, qual- optional components to the scores. Here, WINTER 1998 39