ORIGINALPAPER
Vol.21no.72005,pages1246–1256
doi:10.1093/bioinformatics/bti137
Databasesandontologies
ISYMOD:aknowledgewarehousefortheidentification,assemblyandanalysisofbacterialintegratedsystems
JulieChabalier1,2,,CécileCapponi1,2,YvesQuentin2andGwennaeleFichant3
1Laboratoire
d’InformatiqueFondamentaledeMarseille,39rueJoliotCurie13453Marseillecedex13,
France,2Laboratoire
deChimieBactérienne,CNRS,31CheminJosephAiguier,13402Marseillecedex20,Franceand3LaboratoiredeMicrobiologieetGénétiqueMoléculaires,UMR5100CNRS-UniversitéPaulSabatier,ToulouseCedex,France
ReceivedonApril14,2004;revisedonAugust25,2004;acceptedonOctober22,2004AdvanceAccesspublicationNovember5,2004
ABSTRACT
Consortium,2004)orMetaCyc(Karpetal.,2004;Kriegeretal.,Motivation:Complexbiologicalfunctionsemergefrominteractions2004)havebeendevelopedforsettingupcommonterminologies,betweenproteinsinstablesupra-molecularassembliesand/orthroughhencefacilitatingthesharingandthemappingofbiologicalannota-transitorycontacts.Mostofthetimeproteinpartnersoftheassem-tions.Someontologiesarethebackboneofknowledgebasesandbliesarecomposedofoneorseveraldomainswhichexhibitdifferentdatabases.Besides,numerousknowledgebasesanddatawarehousesbiochemicalfunctions.Thusthestudyofcellularprocessrequireshavebeendeveloped,eachusuallyfocusingonaspecificbiologicaltheidentificationofdifferentfunctionalunitsandtheirintegrationindomain,suchasRiboWeb(Altmanetal.,1999),dedicatedtotheaninteractionnetwork;suchcomplexesarereferredtoasintegratedmodelingofthesupra-molecularorganizationoftheribosome,orsystems.InordertoexploitwithoptimumefficiencytheincreasedGemCore(Bronneretal.,2002)devotedtoknowledgerepresenta-releaseofdata,automatedbioinformaticsstrategiesareneededtotionincomparativegenomics.Meanwhile,somerecentprojectsaimidentify,reconstructandmodelsuchsystems.Forthatpurpose,weatstructuringandorganizingbioinformaticsresources(dataandpro-havedevelopedaknowledgewarehousededicatedtotherepresent-grams)distributedovertheweb(Stevensetal.,2003).Finally,manyationandacquisitionofbacterialintegratedsystemsinvolvedintheapproachesdealwithdatawarehouses:thedataislocallyimportedexchangeofthebacterialcellwithitsenvironment.
fromadistantserver,thenlocalmethodsareappliedtopredictnewResults:ISYMODisaknowledgewarehousethatconsistentlyinteg-datafromavailableones(Perrièreetal.,2000).
ratesinthesameenvironmentthedataandthemethodsusedforThesevariousapproachesusuallyrequirethemodeling,andthetheiracquisition.Thisisachievedthroughtheconstructionof(1)aparsable—consistent—representation,ofthenumerousunderlyingdomainknowledgebase(DKB)devotedtothestorageoftheknow-objectsandthewaytheyinteract(biologicalconcepts,bioinformaticsledgeaboutthesystems,theirfunctionalspecificities,theirpartnersmethods,problemstobesolved,etc.).Oncetheconsistencyoftheandhowtheyarerelatedand(2)amethodologicalknowledgebasemodelingisensured,automaticcomputationalmethodscansafely(MKB)whichdepictsthetasklayoutusedtoidentifyandreconstructmanipulatethedescribeddata.Usually,suchaconsistencyisensuredfunctionalintegratedsystems.InstantiationoftheDKBisobtainedbybyaknowledgerepresentationsystem(KRS)builtoveraknowledgesolvingthetasksoftheMKB,whereassometasksneedinstancesofrepresentationlanguage.However,inthecaseofdatawarehouses,theDKBtobesolved.AROM,anobject-basedknowledgerepresent-theconsistencycannotalwaysbeautomaticallyensured:asprogramsationsystem,hasbeenusedtodesigntheDKB,anditstaskmanager,areexecuteddailytoproducenewdata,itgetsmoreandmoredifficultAROMTasks,fordevelopingtheMKB.Inthisstudytwointegratedsys-toguaranteethecorrectnessofthewholedatabasebecausethehosttems,ABCtransportersandtwocomponentsystems,bothinvolvedinrepresentationsystemisusuallynotabletocontrolallthestagesofadaptationprocessesofabacterialcelltoitsbiotope,havebeenusedthecomputationalprocesses.
toevaluatethefeasibilityoftheapproach.Wepresentherethedevelopmentofanoperationalenvironment—Contact:julie.chabalier@ibsm.cnrs-mrs.fr
namedIntegratedSYstemMODeling(ISYMOD)—whichisatypeofdatawarehousewhoseoriginalityistobebuiltoveraKRSensur-1INTRODUCTION
ingtheconsistencyofcomputedandrepresenteddata:onecouldWiththeincreasingamountofavailablegenomicdataithasbecomenameitaknowledgewarehouse(Nematietal.,2002).ItisbuiltclearthatcomputationalapproachesareneededtosupervisetheoverAROM,aKRSwhichfeaturestwoconnectedrepresentationmethodsusedtoexploitthem.
languages:oneforrepresentingtheknowledgeofthedomain,andDifferentapproacheshavebeendevelopedinordertoaddressthetheotherforrepresentingthestrategiesusedtoinferandstorenewvariousproblemsofsafelystoring,sharingandexploitingbiolo-informationfromavailabledata.Hence,dataandanalyzingmethodsgicaldata.Ontologies,suchasGeneOntology(GeneOntology
aremergedwithinthesame—local—environment,whichfacilitatestheirinteraction:theformerareproducedbythelatter,whilethelat-teraretunedbytheformer.AROMcontrolstheconsistencyofthe∗To
whomcorrespondenceshouldbeaddressed.
whole.Moreprecisely,ISYMODisaknowledgewarehouse(KW)
1246
©TheAuthor2004.PublishedbyOxfordUniversityPress.Allrightsreserved.ForPermissions,pleaseemail:journals.permissions@oupjournals.org
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011Aknowledgewarehouseforintegratedsystems
whichintegrates,inthesameenvironment,adomainknowledgebase(DKB)andamethodologicalknowledgebase(MKB).Bothknow-ledgebasesareextensible,whichmeansthatnewconceptsandnewmethodscanbeaddedatanymoment,eithertocompletetheknow-ledgeaboutintegratedsystems,ortotunesomenewstrategiesandalgorithmsforidentifyingthem.ThecurrentversionofISYMODisdevotedtotherepresentationandacquisitionofbacterialintegratedsystemsthatareinvolvedintheexchangesbetweenthebacterialcellanditsenvironment.Thesesystemsareparticularlyinterestingsincetheyareadrivingforceinevolution,allowingtheadaptationofthebacteriatoalargespectraofbiotopes,someofthemhavingenvironmental,industrialorpublichealthimplications.Thefirstsys-Fig.1.TypicalABCimportsystem.Fiveproteinsinteracttoimportthesub-temstobeinstantiatedwereABCtransportersandtwocomponentstrateintothecell.TheMSDsconstitutethemembranechannelandtheNBDssystems(TCS).TheycansharefunctionallinksasdemonstratedbyenergizethetransportthroughATPhydrolysis.TheSBPconferspecificityexperimentalstudies(Josephetal.,2002andreferencestherein).forcompoundstothetransporter.Inthecaseoftheexportsystem,theSBPThepaperisorganizedasfollows.First,themaincharacteristicsofisabsent.
biologicalintegratedsystemsareintroduced,togetherwithastrategytotargetthemonanewgenome.ThentheKRSAROMispresented,membrane-spanningdomains(MSDs)andtwonucleotide-bindingwhichfeaturesaobject-basedkernelandataskmanager.Wecandomains(NBDs)(Fig.1).TheimportsystemsareassociatedwiththendescribeISYMOD,whichmodelssomeintegratedsystemsasasolute-bindingprotein(SBP).Inbacteria,differentgenesgener-wellasthestrategiestoidentifyandreconstructthem,andeventuallyallyencodethedifferentdomains,andinnewlysequencedgenomesstoresexperimentalandpredicteddata.BeforeadiscussionofthatonlygenesencodingNBDsarecorrectlyannotated.Indeed,amongworkamongothers,anoperationalviewofISYMODisbrieflygiven.
thesedomains,onlytheNBDsexhibitmuchsequenceconservation.TheMSDsandSBPsincontrastshowonlyfuzzyglobalsequence2BIOLOGICALINTEGRATEDSYSTEMSconservation.
2.1Definitions
TheTCSareregulatorysystemsinvolvedinthedetectionandthetransductionofspecificsignalsthattriggerthebacteriatoanComplexbiologicalfunctionsemergefrominteractionsbetweenadaptativeprocess(ParkinsonandKofoid,1992).Thesesystemsareproteinsinstablesupra-molecularassembliesand/orthroughoutusuallycomposedofasensorkinasethatisabletodetectoneortransitorycontacts.Suchemergingfunctionsaremostofthetimeseveralenvironmentalstimuliandwhichphosphorylatesaresponsedifferentfromthefunctionoftheindividualproteinsinvolvedintheregulator,whichinturnactivatesexpressionofgenesnecessaryforinteractions.Thus,ifwewanttogetasystemicviewoftheorganismtheappropriatephysiologicalresponse.Thesetwopartnerscontainunderstudyandunderstandtheevolutionofbiologicalprocesses,twodomains(1)aninputdomainandatransmitterdomainforthethequestionofthepredictionandtheassemblingofindividualpro-sensorkinaseand(2)areceiverdomainandanoutputdomainforteinsinsuchcomplexes,referredtoasintegratedsystems,shouldbetheresponseregulator.Thesensorinputdomainvariesinaminoaddressed.
acidsequence,conferringspecificityfordifferentstimuli.Theout-Theintegratedsystemsthatareinvolvedintheexchangesbetweenputdomainoftheresponseregulatorcanbeclassifiedintodifferentthebacterialcellanditsenvironmentareparticularlyinterestingforsubfamilies(Mizuno,1997;Fabretetal.,1999;BeierandFrank,biologicalreasonsandalsoforillustratingcomputationalchallenges2000;Throupetal.,2000;Rodrigueetal.,2000).However,morewearefacedwith.Suchsystemsareimportantfortheadaptationofcomplexstructurecanbefoundandsomesystemsuseextraisol-thebacteriatoitsmediaandthegenomiccomparativeanalysisoftheiratedphospho-relaydomains(HPTdomains)thatareencodedbyrepertoriesshouldhelptounderstandthemolecularmechanismsthatseparatedgenes.
areinvolvedintheadaptationprocessesofbacterialgenomes.TwoBothsystemssharecommoncharacteristics:(1)theyarecomposedtypesofintegratedsystemshavebeenanalyzed:ABCtransportersofdifferentdomainsencoded,inbacteria,byseparatedgenes,someandTCS.
ofthosepartnerproteinshavingonlyfuzzysequenceconservation,TheABCtransporters,ortrafficATPases,turnupinthethree(2)fromthecomputationalpointofview,theyareencodedbytwomajorlifekingdoms(Prokaryota,ArcheaandEukaryota)andarefamiliesofparalogousgeneswhichareamongthemostnumerousinvolvedinmanyphysiologicalprocesses.MostABCtransport-inbacterialgenomes,andshowthesameidentificationandrecon-ersmediatetheactiveuptakeoreffluxofspecificmoleculesacrossstructionproblemsand(3)manyexperimentalstudieshaveshownbiologicalmembranes,handlingawidevarietyofcompoundsthatthattheyarerelatedatthefunctionallevelallowinganextensionofdifferinnatureandsize(oligosaccharides,aminoacids,peptides,themodelingtoahigherlevel(Josephetal.,2002andreferencesantibiotics,metalliccations,etc.)(reviewedbyHollandandBlight,therein).Indeed,uponstimulusdetection,theresponseregulatoris1999;Higgins,2001).Theyareencodedbylargefamiliesofpara-activatedthroughacascadeofphosphorylationsandactivatesinturnlogousgenesandcanbearrangedinacomprehensiveclassificationtheexpressionofthegenesencodingthedifferentpartnersofthewellcorrelatedwithspecificityoftransportforthesubstrate(LintonABCtransporter.
andHiggins,1998;Paulsenetal.,1998;TaglichtandMichaelis,1998;TomiiandKanehisa,1998;Quentinetal.,1999;Saurinetal.,2.2Identification,reconstructionandstorage
1999;Braibantetal.,2000;DassaandBouiges,2001).Atyp-InordertoestablishinacompletegenometherepertoryofagivenicalABCtransporter—eitherexporterorimporter—consistsoftwo
integratedsystem,wehavetogofartherthanthefirstlevelofgenome
1247
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011J.Chabalieretal.
annotation(geneandfunctionalpredictions).Thishigherlevelofstoretheindividualproteinsinvolvedinanintegratedsysteminaannotationincludesthefollowingsteps:(1)identifyingthedifferentdatabase,representingcomplexrelationshipsamongthemrequirespartnersusingdifferentbioinformaticsmethodsaccordingtotheirmoresophisticateddata-processingandsemantictools.
sequenceproperties(sequenceconservationlevelsandstructuralAROM,anobject-basedknowledgerepresentationsystemcharacteristics),(2)reconstructingthesystemsusingassemblyrules(OBKRS),fitstheseobjectivesinseveralways:(1)Itallowsaand(3)classifyingthesystemintothecorrectfunctionalsubfamily.formalandexplicitrepresentationoftheintegratedsystem’sobjectsInformationontheinteractionpathwayisnotdirectlyaccessiblefromandtheirrelationshipsasclassesandn-aryinstantiableassociations,theanalysisofthecompletesequence.However,thisknowledgecan(2)itsinternalclassificationmechanism,addedwithapropagationbeinferredeitherbytheanalysisofthegenomiclocalizationofthealgorithm,ensuresautomaticmanagementoftheevolvingknow-genesencodingsuchsystemsorthroughoutphylogeneticinferencesledgethroughtherecursiveprocessofidentifyingpartnersanddrawnfrommultiplegenomiccomparisons.Indeed,genesfunction-reconstructingassemblies,(3)thecurrentintegrationofataskman-allyrelatedareoftenfoundinthesamechromosomalneighborhoodager,AROMTasks,providesadeclarativewaytorepresentandstoreandwhentheyaredispersedalongthechromosome,homologyrela-bioinformaticsstrategiesand(4)theformalmodelingofbiologicaltionshipsmayhelptoreassemblepartners(Quentinetal.,2002).knowledgeandsequenceanalysistoolsinthesameenvironmentTherefore,complexcomputationalapproachesareneededtohandleensuresanautomaticcontrolofthedataflow.
theanalysisandmodelingofintegratedsystemsandtheanalyz-ThemodelingandstorageoftheknowledgeisachievedthroughingmethodshavetobecombinedinspecificwaysreferredtoasthedevelopmentofaDKBandthemodelingofthemethodsthroughbioinformaticsstrategies.
thedevelopmentofanMKB,bothbeingtightlyconnected.
Asafirststepinthisdirection,wehaveimplementedageneralautomatedstrategy(Quentinetal.,2002).ItsvalidationhasbeendonefirstontheABCtransportersandthenextendedtothetwo3KNOWLEDGEREPRESENTATIONINAROMcomponentsystems.Itreliesuponalearningstepforcomputing3.1AROM’skernelandclassification
theparametersofthemethodsinvolvedintheidentificationprocess.InAROMaswellasinotherOBKRSs,aconceptdenotesasetofTheseparametersareoftwotypes,motifsandprofiles,sincetheobjectswithcommonproperties.Therefore,biologicalentitiesaredifferentbioinformaticsmethodsusedarebasedeitheronsimilaritymodeledbyclasses,whiletheirpropertiesarenamedandspecifiedsearches(Blast,PsiBlast,Hmmer),ormotifidentification(MetaM-astypedvariables.
eme,Mast).TheywerefirstderivedfromasetofABCtransportersLetustakeABCtransportersasanillustrationofanintegratedsys-andtwocomponentsystems,wehadannotatedbyanalyzingthesettem(Fig.1).Threebiologicalentitiesemerge:domain,proteinandofproteinsencodedby20bacterialgenomes.Foranalyzinganewassembly.Thedomain(NBD,MSDorSBP)carriesthebiochemicalincominggenome,thedifferentbioinformaticsmethodsarelaunchedfunctionandisapartofaproteinsequence.Theproteinisthephysicalsimultaneouslyontheencodedproteinsequencesandtheirresultsentitycomposedofoneormanydomains.Theassemblycorrespondsarecombined.Thenextstepisthevalidationoftheprediction.Thetothebiologicalreconstructedsystem.Itsstructureresultsfrompro-methodswereusedwithahighsensitivitylevelinordertomaximizeteininteractionsanditsfunctionisissuedfromthecombinationofthenumberoftruepositives.Thedrawbackresidesinahighnumberdomains.Thesethreebiologicalentitiescanberepresentedbythreeoffalsepositives.Inordertoreducethis,weappliedtheBlastP2pro-classes(Fig.2a).Eachclassisdescribedbyasetofvariablescorres-gramasfollows:eachpredictedproteinwasusedasqueryagainstapondingtoobjectfeaturessuchasthenumberofproteinsinvolveddatasetcomposedofallproteinsequencesencodedbypreviouspro-inanassembly(variablepartnersNboftheclassAssembly),thecessedgenomes.Queriesforwhichthefirsthitsdidnotbelongtotheorganizationofthedomainsfoundinaprotein(variableStructural-integratedsystempartnersub-familiesareconsideredasfalseposit-OrganizationoftheclassProtein),orthedomain’stype(variableives.WerefertothischeckingprocedureasBackBlast.Therefore,typeoftheclassDomain)(Fig.2a).
theremainingproteinsareconsideredasvalidatedsystemspartnersTherelationshipsbetweennclasses(n≥2)arerepresented,fol-andareclassifiedintofunctionalsub-families,dividedintofunc-lowingtheUMLterminology,byn-aryassociationsthroughoutrolestionaldomainsandassembledintoabiologicalintegratedsystemspecifiedbyanameandtypedbytheclassinvolvedintherelation-usingtworules:(1)thegenesencodingthedifferentsystempartnersships.Namedandtypedvariablescanthencompletethedescriptionarecloselylocatedonthechromosomeand(2)thepartnersbelongtoofanassociation.InFig.2a,thethreeclassesDomain,Proteinandcompatiblesub-families(Quentinetal.,1999).TheoutputdatahaveAssemblyarelinkedthroughtheternaryassociationHasDomainbeenstoredinaspecializeddatabase,ABCdb(QuentinandFichant,whichisdescribedbythreevariables:begin,endanddomainFam-2000).
ilythatstorethefirstandlastpositionsofthedomainontheprotein,Thestrategiesdevelopedusesrulesandparametersupdatedwithanditssub-family.
theincomingdata.Theanalysismechanismstoresthedataobtained,Objectsareinstancesofaclassandtuplesareinstancesofasso-thenreusesthistoreevaluatethemethod’sparametersandtherebyciations.Atupleofann-aryassociationisthe(n+p)-tuplemadelaunchesmoreaccurateanalysisofnewdatasets.Thuspredictionsofalinkinvolvingnpartners(objects)anddescribedbypvariables.madeininitialstatescanbeupdatedintheupcomingrunswithTherefore,alinkrepresentsanexistingrelationshipbetweenobjectsthepossibilityofincorrectlypropagatingthemodificationsintheeachplayingoneormoreidentifiedrolesinthelink.Examplesofcomplexnetworkofdependencies.ThispitfallcanbeavoidedifobjectsandtuplesinAROM’slanguagearegiveninFig.2b.Theythedataflowbetweenthedatabaseandtheanalysismechanismiscorrespondtotheinstantiationofthethreeclassesandtheassociationcontrolledallalongthestrategies.ThiscontrolcanbefullysatisfiedisdescribedinFig.2a.
onlyifthedataandthebioinformaticsmethods,whichareinvolvedOncetyped,avariablecanbefurthercharacterizedusingdifferentinthestrategies,areformallydefined.Inaddition,ifwecaneasily
kindsofdescriptors.Forexample,domaindescriptorsrestrictthe
1248
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011Aknowledgewarehouseforintegratedsystems
(a)
descriptorisattachedtoavariableforcomputingitsvalueforagiveninstance.Suchaninferencemayinvolvenotonlyothervariablesoftheclass,butalsovariablesofanotherclassassociatedwiththeformerthroughanassociation.
Then,classesandassociationsprovidenecessaryconditionsformembership:aninstancecanbelongtoaclass(anassociation)ifeachvariableoftheclass(association)isassignedacorrectvalueaccordingtothemodel.
WithAROM,classesareorganizedwithinatree-likepartialordernamedspecialization,whoseset-basedsemanticsisclosetothatofsubsumption.Aclassinheritsallvariables(includingdescriptors)ofitssuper-class;classspecializationcanalsoinvolvetheadditionofnewvariables,therestrictionofthedomainofaninheritedvari-able,andthespecificationofanewinferencemethodtocomputethevalueofaninheritedvariable.Associationscanbespecializedin(b)
thesameway,althoughthearityofanassociationneverchanges.TheISYMODDKBdevelopedinAROMlanguageisdescribedintheparagraphpresentingtheDKB(Section4.1).
AROMpromotesaninternalclassificationmechanism(CapponiandGensel,2000)that,givenanobjectandatreeofclasses,findsthemostspecializedclasstheobjectcanbelongtoaccordingtovariablespecifications.Classificationalsorunsovertuplesoftheassoci-ationhierarchy.Inaddition,werecentlyaddedanalgorithmthatrecursivelypropagatestheclassificationtoallrelatedobjects.ThisalgorithmisbasedontheassociationpropertiesofAROMwhichdrawcontrolledpathsamongobjects(Chabalieretal.,2003).MoredetailsonAROMcanbefoundinPageetal.(2001)andCapponietal.(2001).
Fig.2.(a)UML-likegraphicalrepresentationofintegratedsystems.Thethreeprincipalbiologicalentitiesinvolvedintheintegratedsystemsarerep-3.2AROMTasks:thetaskmanagerofAROM
resentedbythreerootclassesdepictedbydarkgrayrectangles:Assembly,ThemethodologicalknowledgeistheknowledgethatproducesandDomainandProtein.Theseclassesdescribedbytypedvariables,arelinkedexploitstheknowledgeofthedomainunderstudy.Thegoalofdevel-bytheternaryassociationHasDomain(roundcornerrectangle),describedopinganMKBconsistsofstructuringthemethodologicalknowledgebytypedvariables.Rolesandmultiplicitiesoftheassociationarespecifiedoninordertoselectthemostappropriatemethodsforsolvingonepre-theblacklines.TheAssemblyclassisspecializedintotheABC_Assemblysub-class(whiterectangle)thatrepresentstheABCtransporters.Therela-viouslyidentifiedanddescribedproblemwithinanevolvingcontexttionofspecializationisdepictedbyanarrowfromsub-classtowardtherepresentedbytheDKB.Itallowstheusertoselectand/orcontrolrootclass.(b)Illustrationofinstantiationofthedomainknowledgebasetheexecutableprocess.
inAROM’slanguage.TheproteinBsubA01_OPUBAisaninstanceoftheAROM’staskmanager(AROMTasks)providesalanguageforclassProtein.Itsfunctionisinvolvedincholineuptake(variableidentific-describingproblems,andanexecutioncontrollerwhichman-ation)anditcarriesadomainNBD(variablestructuralOrganization).Itagestheirresolution.InordertoconstructanMKB,wehaveto:islinkedtotheinstanceBSUBA01_OPUBAoftheclassABC_Assembly(1)identifyanddescribethedifferentproblemsencounteredinaandtotheinstanceBsubA01_OPUBA_N1oftheclassDomainthrough-givendomain(inourcasetheidentificationandreconstructionofoutthetupleoftheternaryassociationHasDomain.Thisassociationisbiologicalintegratedsystems),(2)definethesolvingmethod(s)asso-describedbythreeattributes:beginPositionandendPositiongivingtheciatedwitheachproblemand(3)associateproblemsandsolvingdomainpositionsontheproteinsequence,anddomainFamilycorrespond-ingtothefunctionalclassificationofthedomain.Assembly,proteinandmethodsthroughasolvingstrategy.
domaincorrespondtothethreeroleswhoseinstantiationcreatesthelink3.2.1DescribingproblemsInAROMTasks,aclassofproblemsbetweenthethreeobjectsBsubA01_OPUBA,BSUBA01_OPUBAandallowsaparsabledescriptionofasetofsimilarproblemsthatcanBsubA01_OPUBA_N1belongingtooneofthethreeclassesconnectedthroughouttheassociation.peptideLengthandtestSBPareattributeswhosebeencounteredinaspecificdomainmodeledintotheconnectedvaluesarecomputedusinganinferencedescriptor:forexample,thelengthDKB(ParmentierandZiébelin,1999).Therefore,aclassofproblemofthepeptideiscomputedfromitspositiononthechromosome,whichis(namedaproblem)isdescribedbyalistofinputsandoutputswhichretrievedthroughtheassociationBelongsTo(datanotshown,Fig.4).
canbetypedbyclassesand/orassociationsoftheDKB.Thislistmaybecompletedwithatextualdescriptionoftheproblem.Problemsareorganizedthroughtwokindsofrelationships:anis-arelationshipanddomainofvaluesthatonecanassigntotheconsideredvariable.apart-ofrelationship.
Inferencedescriptorsareeitherprograms,oralgebraicequations•Theis-arelationshipcorrespondstothespecializationofprob-writtenusinganalgebraicmodelinglanguage(AML),thatallowslems.Thus,asub-problemisaparticularcaseofthemoretheexpressionofequationsinvolvingobjectsandtuplesoftheKBgeneralproblem,alsocalledsuper-problem;itinheritsallinputsthroughaformalismresemblingmathematicalnotation.Aninference
andoutputsofitssuper-problemandcanalsobedescribedby
1249
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011J.Chabalieretal.
Fig.3.TextualextractofanMKB.DescriptionofthePredByPsiBlastproblem,oneofitscorrespondingsolvingmethod,andastrategy.
newspecificinputs.Theproblemspecializationfacilitatestheinput/output,canbeinvolvedinordertoensurethedataflowdescriptionofanynewproblemthathasalreadybeenpartiallybetweenthedifferentcomponentproblems.
describedinamoregeneralcontext.Atexecutiontime,whenaproblemmustbesolved,amorespecificproblemiseventuallyselectedandsolvedaccordingtothestateoftheDKBandtheAnelementaryproblemisaproblemthatcanbeneitherfurthersub-availabilityoftheinputsoftheproblem.
dividednorspecialized.Usually,oneormoreexecutablemethodsareassociatedwitheachelementaryproblem.Anexampleofaprob-•Morecomplexproblemscanbeencountered,whichcanbelemdescriptioninAROMTasksisgiveninFig.3.Theconsistencysolvedonlybythecombinationofotherones.Thepart-ofrela-oftheproblemsdecompositionsalongthesetwokindsofhierarchiestionshipallowstosubdivideaproblem,namedcomposite,intoischeckedbyAROMTaskswithregardstoitsunderlyinglanguageseveralotherproblems,calledcomponents.Therefore,allalongsemantics.Moreover,atexecutiontime,thedataflowisautomat-acompositionhierarchy,aproblemcanbeeitheracompos-icallymanagedbyamodulenamedtheexecutioncontrollerwhichiteoracomponentdependingonthesubdividinglevel.Thedynamicallyselectsthemostappropriateproblemtobesolvedatanyinputs/outputsofacompositeproblemareincludedinthesetbreakpoint,andwhichintegratesitsresultsinaconsistentway.Iftheoftheinputs/outputsofallitscomponentproblems.WhentheresultofaproblemisamodificationoftheDKB,theconsistencyresolutionislaunched,additionalarguments,namedtemporary
checkingispassedtoAROM’skernel.
1250
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011Aknowledgewarehouseforintegratedsystems
3.2.2ProgrammingmethodstosolveproblemsEachproblemmotifidentification(MotifPredSubclass).Thedeepestclassesinthe(e.g.sequencealignment)maybeassociatedwithseveralalgorithmshierarchycorrespondtothebioinformaticsmethods.Anotherclass,(e.g.BlastP,Fasta,SmithWaterman,etc.):theneachalgorithmisCheckingOfPrediction,enclosestheresultofamethodappliedtodescribedinAROMTasksbyasolvingexecutablemethod.Thus,aidentifythefalsepositives.Thisknowledgeisobtainedbysolv-methoddescribesthewayaclassofproblemscanbesolved.Italsoingtheproblemofpartner’sidentificationmodeledintheMKB;associatesasetofinputstoasetofoutputs.However,unlikeaprob-itprovidesthenecessaryknowledgetoinstantiatethethirdpartlem,amethodcontainsanexecutablepartwhichspecifieshowtheoftheDKBthathandlesthemainconceptsandcoverstheknow-outputsarecomputedfromtheinputs.Theexecutablepartcontainsledgerepresentationoftheintegratedsystems.ThislastpartisinstructionswritteninaJavalanguageinterpreter(Bean-Shell)(seecenteredonthethreemajorbiologicalentitiescorrespondingtoFig.3foranexampleofmethod’sdefinition).Inordertoreusethethethreerootclasses,Protein,DomainandAssemblylinkedbymethodsthroughdifferentapplicationsandtoavoidanoverloadoftherootassociationHasDomain,asexplainedpreviously(Fig.2).theMKB,eachmethoddefinitionreliesonexternalprogramlibraryThefunctionalclassificationisdepictedbytheclassSubfamily,writteninJava.
whiletheassociationIsMemberOfconnectstheintegratedsystem3.2.3BuildinganddescribingastrategyFinally,astrategytoitscurrentsubfamily.Ifthetopclassescanmodelanysys-depictstheassociationofaproblemtoitssolvingmethod(s)(Fig.3).tems,theclassandassociationhierarchiesaresystem-dependent,Itisdescribedbytheproblemname,thesetofsolvingmethodsandainordertoexpressmorepreciselythespecificfeaturesrelativetosetofcriteriaguidingonechoiceamongthedifferentpossiblemeth-eachsystem.Asanexample,theclassAssemblyisspecializedintoods.AmethodcanbeassociatedwithaprobleminastrategyifaABC_AssemblyandTCS_Assembly,theformerbeingagainspe-strict(aonebyone)mappingcanbeestablishedbetweenthemethodcializedintoImportABCandExportABCaccordingtothetypeofinputsandoutputsandtheprobleminputsandoutputs.Atexecutiontransport.Someclassesarespecifictoonetypeofintegratedsys-time,therightmethodforsolvingaproblemisselectedaccordingtotem.Namely,theclassStimulusisonlyrelatedtoTCS_Assembly,thestateandclassmembershipoftheactualinputprovidedatthatwhileCompoundonlyconcernsABC_Assembly.
time.Theorderinwhichtheproblemshavetobesolveddependsontheinputsandoutputsofthecomponentproblemsandconstitutes4.2Schemaofthemethodologicalknowledgebase
thedataflow.TheISYMODbioinformaticsstrategymodeledwithThebioinformaticsapproachdescribedinSection2.2isdeclar-AROMTasksisfullyexplainedintheMKBparagraph(Section4.2).
ativelymodeledasaproblemlayoutusingAROMTasks.Thecompletedescriptionofthemodelingwouldbetootediousand4ISYMODSCHEME
onlythemostcriticalaspectsaredeveloped.Thewholebioinform-aticsstrategyisavailableinQuentinetal.(2002).Theidentific-IdeallytheDKBandMKBshouldbepresentedinparallelsincetheyationandreconstructionofintegratedsystemscanbeorganizedaredeeplyinterconnected.However,fortheclarity,wewillfirstandsolvedinfourmajorsteps:primarydataretrieval,part-presenttheDKBwhichmodelsthebiologicalknowledgeandnexttheneridentification,systemreconstructionandlearningupdate.ItMKBusedtomodelthemethodsemployedtoinferthisknowledgehasbeenmodeledthroughacompositeproblemAnalysisOfIn-fromtherawdata.WewillshowhowtheoutputsofthemethodstegratedSystemdecomposedintofourcomponentproblems:launchedintheMKBareusedtofeedtheclassesandassociationsDataRetrieval,PartnerIdentification,SystemReconstructionoftheDKBandhowthedataflowishandledintheMKB.
andLearning(Fig.5).
4.1Schemaofthedomainknowledgebase
•Primarydataretrieval.TheprimarysourcesofdataaretheCurrentlyISYMODmodelstwotypesofintegratedsystems—ABCEMBLorGenBankfilesprovidedbytheauthorsofthegenometransportersandTCSs—aswellastheresultsofthemethodsusedannotation.Whenanewsequencedgenomeisavailableinthefortheirprediction.Thedifferentconceptsidentifiedforthemod-databanks(EMBLorGenBank),itisautomaticallyretrievedelingofthisspecificbiologicaldomainandtherelationshipstheyandparsedtoextractandformattherequireddata.Therefore,entertainarerepresentedbyrootclassesandrootassociations.Prop-twocomponentproblemsmustbesolved:theFileRetrievalandertiesofconceptsandrelationshipsarerepresentedbyvariablesoftheFileFormatting.TheinputsoftheFileRetrievalproblemthecorrespondingclassandassociation,respectively.
aretheURLofthegenomerepositoryandthelistofalreadyThecompleteschemaoftheDKB(Fig.4)includes12rootprocessedgenomes,inordertoretrieveonlythenewgenomeclasses,someofwhicharespecializedintomorecharacterizedsub-files.Foreachnewgenome,thisfileistheinputoftheprob-classes.Therootclassesarelinkedthrough13rootassociations,lemFileFormatting,whichextracts:(1)theproteinsequenceswhicharespecializedinturnalongwiththeclasshierarchies.Theannotatedintheselectedgenomeand(2)theinformationcon-schemacanbesubdividedintothreedistinctparts,whicharecon-tainintheEMBL/GenBankfileconcerningtheorganism,strain,nectedthroughtherootclassProtein.Thefirstpartcontainsgeneralchromosome,etc.ItsoutputsareaproteinsequencefileinFastainformationrelativetotheprimarysourceofdatasuchasthechro-format,andinstancesoftheclassesOrganism,Strain,Chms,mosome,thestrain,theorganismandthereferences,whichareandReferencesaswellasinstancesoftheassociationslink-retrievedfromEMBL/GenBankfiles(seenextsection).Thesecondingtheseclasses.TheProteinclassisalsopartiallyinstantiated.partisdedicatedtothestorageofthepredictionresults,inorderIndeed,thisfirststepofthesolvingexecutionextractsallthetokeeptrackoftheanalysesperformedoneachprotein.Forthatproteinsencodedbyagenomewithoutknowingiftheybelongtopurpose,aclassPrediction,connectedtotheclassProtein,isanintegratedsystem.Therefore,foreachinstanceoftheclassspecializedaccordingtothetypeoftheapproachesused,thatareProtein,onlythename,descriptionandorientationvariablesbasedeitheronsimilaritysearches(SimilarityPredSubclass)or
arevalued.
1251
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011J.Chabalieretal.
Fig.4.UML-likegraphicalrepresentationoftheDKB.ThenotationsarethesameusedinFig.3.ThemodelingoftheDKBcanbesubdividedintothreepartsconnectedbytheProteinclass.Thefirstpart,relatedtotheprimarysourceofdata,isdepictedatthebottomleftoftheschema.Thesecondpart(bottomright)representsthepredictionresultsaccordingtothemethodused.ThecorepartoftheDKBisrepresentedonthetopoftheschema.Rootclassesandassociationsarerepresentedbyshadedboxes.
•Partneridentification.Theidentificationofallthesystem’sinputsoftheproblemPredictionaretheproteinsequencefilesandpartnersisthemajorproblemencountered.Itissolvedbythetheparametersrequiredforapplyingthecorrespondingbioinform-problemPartnerIdentificationwheretheinputistheproteinaticsprogram(e.g.motif,profiles,etc.).ItsoutputsareobjectsofsequencefilecreatedasoutputoftheDataRetrievalproblem,thePredictionclassandtuplesoftheIsPredictedByassociationandoutputsareinstancesofthemajorrootclassesandassoci-thatlinkedeachproteintoitspredictionresult.Astheidentifica-ationsoftheDKB(Protein,Domain,HasDomain),aswellastionoftheproteinpartnerofasystemcanbeachievedbydifferentinstancesofclassesandassociationsdedicatedtothepredictionbioinformaticsprograms,eitherbasedonsimilaritysearchesorstorage.motifsdetection,thePredictionproblemisspecializedintoahier-archyofsub-problemsaccordingtothepropertiesoftheassociatedThishighlevelcompositeproblemcanbesubdividedintotwomajoralgorithms.Theselectionoftheappropriatesub-probleminvolvesacomponentproblems:theidentificationoftheproteinsinvolvedprocessingthatisperformedonthedynamictypeoftheinputdata.inanintegratedsystem(ProteinIdentification)andtheidentific-Forexample,iftheinputfileoftheproblemPredictionactuallyationofthedomainsthemselves(DomainIdentification),includ-containsprofiles,thenthesub-problemPredByProfileisselected.ingboundprediction(DomainPrediction)andfeatureannotationProcessingdownthetreeofproblems,iftheprofilesarePsiBlastsuchasthenumberandlocationoftransmembranefragmentsprofiles,thenthesub-problemPredByPsiBlastisselected.This(FeatureAnnotation).
latterbeinganelementaryproblem,itissolvedbylaunchingtheTheproblemProteinIdentificationrequiresthesolvingofUtilPsiBlastmethod,whichperformedtheprogramPsiBlastPontwoothercomponentproblems:Prediction,whichidentifiesthesequencesfromthefilegivenasinput(seeFig.3forthedetails).proteinspotentiallyinvolvedintheintegratedsystems,andTheoutputsofeachsub-problemareinstancesofthecorrespondingPredictionChecking,whichconfirmsorrejectstheprediction.Thesub-classoftheDKBPredictionclassandtuplesoftheassociation
1252
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011Aknowledgewarehouseforintegratedsystems
Fig.5.GeneralmodelingoftheMKB.Thedifferentproblemstobesolvedareorganizedeitherthroughoutapart-ofrelationship(downwardarrows)oranis-arelationship(upwardarrows).Theproblemsrepresentedinlightgraycorrespondtotheelementaryproblems.Theyaresolvedbythelaunchingofanappropriateexecutablemethodthatcouldencapsulatebioinformaticsprograms.
IsPredictedBy.Then,allthepreviouslystoredproteinswhichareOncetheproblemPartnerIdentificationissolved,thesub-classesnotpredictedasinvolvedinanintegratedsystemaredeletedfromofPredictionareinstantiatedaswellastherootclassesProtein,theDKB.
DomainandCheckingOfPrediction.AstheintegratedsystemsThenextstepconsistsinthevalidationofthepredictionbysolv-havenotyetbeenreconstructed,thetuplesoftheHasDomainingtheproblemPredictionChecking.IttakesasinputtheidentifiedassociationlackthelinkwiththeAssemblyclass.Theywillproteins,whichareretrievedbyqueryingtheDKB,andprovides,becompletedduringthesolvingoftheSystemReconstructionasoutputs,instancesoftheCheckingofPredictionclass.Inaddi-problem.
tion,thevariablecomputedStatusoftheclassProteinreceivesthevalue‘confirmed’,ifthepredictionisvalidated,or‘rejected’iftheproteinappearstobeafalsepositive.Presently,onlyonemethodhas•Systemreconstruction.Theassemblyofthevalidatedpartnersinafunctionalbiologicalintegratedsystemisperformedbybeenimplemented,basedontheBackBlastprocedureexplainedintheresolutionoftheSystemReconstructionproblemwhichSection2.2.
isspecialized,inthispresentversion,intheuniqueSystem-InordertocompletethesolvingofthePartnerIdentificationprob-Assemblyingsub-problem.Thereconstructionreliesupontwolem,theannotationofproteindomainsshouldbeperformed.Therules:(1)thecloselocalizationonthechromosomeofthegenesDomainIdentificationproblemallowspredictionof:(1)thebound-encodingthedifferentpartnersand(2)thecompatibilityoftheariesofthedomainontheproteinsequence,(2)thefunctionaltypefamilypartnerdomain.Therefore,inputsofthisproblemaretheofthedomain(e.g.MSD,NBDorSBPinthecaseofABCtrans-valuesoftwovariablesoftheBelongToassociation(genom-porter,seeFig.1)and(3)itssub-familymembership.ThesedataicLocalization_beginandgenomicLocalization_end)andareobtainedbyanalyzingtheresultsofthePsiBlastprediction.thevalueoftheDomainFamilyvariableoftheHasDomainTherefore,theinputofthesub-problemDomainIdentificationisassociation.Solvingthisproblemleadstotheinstantiationofthesetofinstancesofthesub-classPsiBlastPred,thatarelinkedtheAssemblyandSubfamilyclasses,andtheIsMemberOftoproteininstanceswhenthevalueofthevariablecomputedStatusassociation.
is‘confirmed’.ThoseinstancesareobtainedbyqueryingtheDKB.SolvingtheSystemReconstructionproblemisthelastTheoutputsareinstancesoftheclassDomainandtheassociationinstantiationstepoftheDKB.However,inordertoupdate,HasDomain.
accordingtothenewincomingknowledge,theparameters
1253
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011J.Chabalieretal.
Fig.6.Exampleoftwoprocessingsequences.Thesesequencessharethesamegoal:toidentifyandtoassemblyintegratedsystemsinanewproteome.ThefirstonereliesonapredictionbytheBlastProgram.Thelastproblembelongstothelearningstepwhichiscoupledwiththeusedpredictionprograminthissequence.Thesecondsequence,morecomplex,involvestreebreakpointsrepresentedwithverticaldashedlines.TheprocessinginvolvesapredictionbyPsiBlastcompletedbyaMastprediction.
involvedinthebioinformaticsprograms,alearningstepisproteomethatinvolvesthepredictionwiththeBlastprogrambyrequired.ItwasmodeledintheMKBbytheLearningproblem.solvingthePredByBlastproblem.Thelinearexecutionisoften•Learning.Inordertobuildtheappropriateparameterfilesforfaster,butinordertoobtainthelargestrepertoryofintegratedsystemstheassociatedprograms,thisproblemfollowsthesamespe-inanewproteome,itisbettertochooseabranchedsequence,likethecializationhierarchyastheproblemPrediction.SolvingthissecondoneonFig.6,thatappliedtwokindsofpredictionmethodsproblemresultsintheupdateoftheparameterofthepredictionononeformattedproteome.
methodsusingthedataretrievedfromtheDKB.Fortheanalysisofnewgenome,theseupdatedfileswillcorrespondtotheinput5.2Consistencycontrol
oftheproblemPrediction.
OncetheAnalysisofIntegratedsystemproblemissolved,alltherootclassesandassociationsoftheDKBhavebeeninstantiated,5OPERATIONALISYMOD
excepttheclassesCompoundandStimuli,andthecorrespondingISYMODmodelsthedomainofsomeintegratedsystemsinsuchassociations.ThesetwoclasseshavebeenincludedintheDKBinawaythatmoreintegratedsystemscanbeadded.Theautomaticordertocompletethemodeling;howevertheirinstancesarestillprogramstoidentifyandreconstructsuchsystemsarealsomodeledreadfromatextfile,apartfromthegeneralbioinformaticsstrategy.anddescribed:addinganewproblemoranewmethodisaplug-and-ExcepttheclassPrediction,whosesub-classesaredirectlyinstan-playoperation.However,ISYMODisnotonlyaknowledgebase:tiatedaccordingtothesolvingmethod,wedecidedtoinstantiatethemodeledstrategiescanbeexecuted,evaluatedandtheirresultsonlytherootclasses,inordertokeeptheconsistencyofthewholestored,whileAROMcheckstheconsistencyoftheoverallbase.ThisDKBbytakingadvantageoftheclassificationmechanismincludedinisthereasonwhywerefertoISYMODasaknowledgewarehouse.
AROM.TheclassificationalgorithmislaunchedonallobjectsoftheSubfamilyclass,whichleadsittomovedowntheobjectsintothe5.1Processingastrategy
mostspecializedsub-classes(TCS_Subfamily,ABC_Subfamily,Attheexecutiontime,theexecutioncontrollerallowsadynamicImportABC_Subfamily,ExportABC_Subfamily)bycheckingthebuildingofprocessingsequencesaccordingtothedescriptionandattachmentconditionsbasedinthatcaseonthevaluestakenbythethemodelingofproblemsintheMKB.Basedonthematchingofthevariablesnameandtype.
inputsandoutputsofeachelementaryproblem,severalsequencesIncoordinationallobjectsandtuplesoftheDKBareclassi-canbeconstructed(Fig.6).Simplestsequencesarelinearsequencesfiedinspecializedsub-classesandsub-associationsbyapplyingbutmorecomplexsequences,namedbranchedsequences,canbethepropagationalgorithm(Chabalieretal.,2003).Thisalgorithmconstructedbyinsertionofoneormorebreakpoints.Abreakpointrecursivelypropagatestheresultofasingleobjectclassificationtoinvolvesaparallelsolvingoftwoormoreproblemsinthesameitslinkedobjects,usingAROM’sassociationpropertieswhichdrawexecutionsequence.Itconcernstheelementaryproblemsinvolvedcontrolledpathsamongobjects.
inasamelevelofspecializationintheMKBanditisdefinedthroughthedescriptionofsolvingstrategiesrelatedtohigherspecialization6DISCUSSION
levelproblems.Forexample,inordertosolvethePredByProfileISYMODisaknowledgewarehousesinceitintegratesinthesameproblem,thesolvingstrategyspecifiesthreeways:(1)solvingthelocalenvironment,dataandthemethodsusedtoproducethesePredByPsiblastproblem,(2)solvingthePredByHmmerproblemdata,insuchawaythatitcansafelynotonlyupdatestherawdataand(3)solvingthesetwoproblemsinparallel.Theusermustthenretrievedfrompublicdatabases,butalsoupdatesthemethod’spara-providetheinputswhichareappropriatetotheexpectedexecutionmeters,improvingtheanalyzingmethodsthemselves.ISYMODissequence.ThefirstexecutionsequenceonFig.6representsthelinearbuiltoverAROMwhichensurestheconsistencyofthedata.Theprocessingofacompleteanalysisofintegratedsystemsinanew
computationalstrategyimplementedinISYMODhasbeenused
1254
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011Aknowledgewarehouseforintegratedsystems
toestablishtherepertoryofABCtransportersinmorethan100offerstheopportunitytoreadilytestandevaluatedifferentroutesbacterialgenomes.Theproduceddatahavebeenmadepubliclyavailabletosolveaproblem.
availableviaanewreleaseofthespecializeddatabaseABCdbInthefuturewewouldliketoconvertISYMODintoanopen(http://www.lcb.cnrs-mrs.fr/∼quentin).
resourceontheWeb,sothatotheruserscouldparticipateinitsThetightconnectionbetweentheDKBandtheMKBisthekey-enrichmentbyintegratingtheirownmodulesatbothdomainandstoneofourapproach,whichpreventsISYMODfrombeingasinglemethodologicallevelsaccordingtotheirrequirements.ISYMODisknowledgebase.TheDKBisnotonlyawaytostorethedata,butbuiltoverAROMwhichfeaturestranslatorstoXMLandtosomeitalsostructuresitasapieceofknowledgewithinamodelusingdescriptionlogics.ThisshouldhelpustotranslateISYMODintoanappropriateformallanguage.TheconsistencyoftheexpressedOWL,astandardlanguagedevotedtothedevelopmentofwebbiologicalmodelisthusautomaticallyensuredbytheknowledgeontologies(http://www.w3.org/2001/sw/WebOnt).
representationsystem.Thekeypointisthatthesepiecesofmean-Amorefundamentalevolutionofourmodelingwilladdresstheingfuldata,whichwecallknowledge,areinputsandoutputsoftheintegrationofothertypesofrelationships,suchasfunctionalandproblemsoftheMKB.Therefore,anyoutputcomputedfromagivenphylogeneticlinks,aswellasadynamicviewthatshowshowtheinputwhensolvingaproblem,ismeaningfulwithregardtothebio-processesareorderedovertime.Modelingthesedifferenttypesoflogicalmodel.Consequently,theinterpretationofthewholeprocessrelationsrequiresanextensionofAROM’skernelbyaddingthepos-representedbythetaskslayoutisensuredstepbystepbythemodel,sibilityofdescribingalgebraicpropertiesoftheassociationsandtheasareboththefinalresultandtheconsistencyoftheoverallpredic-waytheycanbecomposed.Suchanenrichmentofassociationspe-tionstrategy.Wethinkthatsuchanapproach,whichmergesdomaincification,combinedwiththeimplementationofatimemanager,knowledgeandmethodologicalknowledgethankstoanappropriateshouldallowtheexploitationofdatathroughthelaunchingofsim-KRS,couldbeusedtobuildknowledgewarehousesforotherspe-ulationsovertherelationshipnetwork.Alltheseimprovementswillcificbiologicalfields,bringingtogethertheadvantagesoflocaldataleadtoamoreprecisemodelingofbiologicalprocesses.
warehousesandtheconsistencycheckingcapabilitiesofKRSs.Onlyafewapproachesincorporatingmethodsandtheirgenerateddatahavebeenpublished.AmongtheseRiboWeb(Altmanetal.,ACKNOWLEDGEMENTS
1999)sharesthesameunderlyingphilosophyasISYMOD,buthasWegratefullyacknowledgeDanielleZiébelinandFrançoisDenisforbeenappliedtodifferentspecializedbiologicalfields,andrunsoverhelpfuldiscussionsandAdamManvellforproofreadingofthemanu-adifferentrepresentationsystem.Itisaknowledgebasethathandlesscript.ThisworkwassupportedbygrantsfromtheCNRS(Centredataontheribosomalsubunitstructures,andmethodsforcomparingNationaldelaRechercheScientifique)andACI‘Informatique,Math-andanalyzingthesedatainordertoproducenewmodels.
ématiques,PhysiqueenBiologie’(grant035360).JCwassupportedMorecloselyrelatedtoISYMODintermsoftheunderlyingstruc-byanMRTfellowship.
tureisGenoStar(Durandetal.,2003),whichwasdevelopedatthesametimeunderthesameOBKRSAROMandtaskmanagerAROMTasks.However,bothworksshowquitedifferentaims.OnREFERENCES
theonehand,GenoStarisacommercial,friendlysoftwarethatAltman,R.,Bada,M.,Chai,X.,WhirlCarillo,M.,Chen,R.andAbernethy,N.F.(1999)providesageneral,fixedinstantiableontologyandaddresseswell-RiboWeb:anontology-basedsystemforcollaborativemolecularbiology.IEEEIntell.knowncomputationalproblemsconcerningmostlythefirstlevelSyst.,14,68–76.
Beier,D.andFrank,R.(2000)Molecularcharacterizationoftwo-componentsystemsofofgenomeannotation,i.e.geneidentificationandfunctionalpre-Helicobacterpylori.J.Bacteriol.,182,2068–2076.
dictions.Hence,GenoStariswell-adaptedtoboththesolvingofBraibant,M.,Gilot,P.andContent,J.(2000)TheATPbindingcassette(ABC)transportclassicalcomparisonproblems,andtheanalysisofpredictionresultssystemsofMycobacteriumtuberculosis.FEMSMicrobiol.Rev.,24,449–467.
thankstoagraphicalinterface.ButitisnotsuitedforprocessingBronner,G.,Spataro,B.,Page,M.,Gautier,C.andRechenmann,F.(2002)Modelingsomenewinsilicoexplorationsoverspecificbiologicalconcepts.comparativemappingusingobjectsandassociations.Comput.Chem.,26,413–420.Capponi,C.andGensel,J.(2000)Classificationsamongclassesandassociations:Ontheotherhand,ISYMODisaplatformwhichisintendedtotheAROM’sapproach.ECOOP2000.ProceedingsoftheWorkshopObjectsandsafelysetupandchecknewmethodsandstrategiesfortheidentifica-Classification:ANaturalConvergence,Cannes,France.
tionandthereconstructionoffunctionalsupra-molecularcomplexes,Capponi,C.,Chabalier,J.,Quentin,Y.andFichant,G.(2001)Aknowledgebaseforwhichisnotaclassicalproblem.ISYMODhasbeendesignedsointegratedbiologicalsystems.IEEEIntell.Syst.,16,52–60.
Chabalier,J.,Fichant,G.andCapponi,C.(2003)LaclassificationrécursivedansAROM.thatthedomainandmethodologicalschemascanevolveinordertoApplicationàl’identificationdesystèmesbiologiques.RevuesdesSciencesetprogressivelyincorporatehigherlevelsofcomplexityinthemodel-Technologiesdel’Information(RSTI)sériel’Objet9,167–181.
ingofthebiologicalfield,aswellasnewpredictionmethods.Indeed,Dassa,E.andBouiges,P.(2001)TheABCofABCS:aphylogeneticandfunc-extendingtheDKBschematoincludeothersystemsshouldbeeasytionalclassificationofABCsystemsinlivingorganisms.Res.Microbiol.,152,enoughthankstotheexplicitseparationofassociationsandclasses,211–229.
Durand,P.,Médigue,C.,Morgat,A.,Vandenbrouck,Y.,Viari,A.andRechenmann,F.thespecializationrelationandtherecursiveclassificationfacilitating(2003)Integrationofdataandmethodsforgenomeanalysis.Curr.Opin.DrugDiscov.objectrelocation.WecanrestructurepartsoftheschemawithoutDevel.,6,346–352.
disturbingthewhole.Asanexampleofsuchanextension,weareFabret,C.,Feher,V.A.andHoch,J.A.(1999)Two-componentsignaltransductioninB.currentlyworkingontheconservationofthegenomiccontextofthesubtilis:howoneorganismseesitsworld.J.Bacteriol.,181,1975–1983.
GeneOntologyConsortium(2004)TheGeneOntology(GO)databaseandinformaticsABCtransportersinordertodetectnewpartnersinthebiologicalresource.NucleicAcidsRes.,1,D258–D261.
processsuchasenzymesinvolvedinthemetabolismofthetranspor-Higgins,C.F.(2001)ABCtransporters:physiology,structureandmechanism—antedsubstrate.InthesamewaythestructureoftheMKBallowsustooverview.Res.Microbiol.,152,205–210.
addorremoveataskthatinourcaseencapsulatesabioinformaticsHolland,B.andBlight,M.A.(1999)ABC-ATPases,adaptableenergygeneratorsfuellingmethod,withoutdisturbingthestrategy.Thismodulararchitecture
transmembranemovementofavarietyofmoleculesinorganismsfrombacteriatohumans.J.Mol.Biol.,293,381–399.
1255
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011J.Chabalieretal.
Joseph,P.,Fichant,G.,Quentin,Y.andDenizot,F.(2002)RegulatoryrelationshipofPaulsen,I.T.,Sliwinski,M.K.andSaier,M.H.Jr(1998)Microbialgenomeanalyses:two-componentandABCtransportsystemsandclusteringoftheirgenesintheBacil-globalcomparisonsoftransportcapabilitiesbasedonphylogenies,bioenergeticslus/Clostridiumgroup,suggestafunctionallinkbetweenthem.J.Mol.Microbiol.andsubstratespecificities.J.Mol.Biol.,277,573–592.
Biotechnol.,4,503–513.
Perrière,G.,Duret,L.andGouy,M.(2000)HOBACGENE:databasesystemforcompar-Karp,P.D.,Arnaud,M.,Collado-Vides,J.,Ingraham,J.,Paulsen,I.andSaier,M.(2004)ativegenomicsinbacteria.GenomeRes.10,379–385.
TheE.coliEcoCycDatabase:nolongerjustametabolicpathwaydatabase.ASMQuentin,Y.,Fichant,G.andDenizot,F.(1999)Inventory,assemblyandanalysisofNews,70,25–30.
BacillussubtilisABCtransportsystems.J.Mol.Biol.,287,467–484.
Krieger,C.J.,Zhang,P.,Mueller,L.A.,Wang,A.,Paley,S.,Arnaud,M.,Pick,J.,RheeS.Y.Quentin,Y.andFichant,G.(2000)ABCdb:anABCtransporterdatabase.J.Mol.andKarp,P.D.(2004)MetaCyc:amultiorganismdatabaseofmetabolicpathwaysMicrobiol.Biotechnol.,2,501–504.
andenzymes.NucleicAcidsRes.,1,D438–D442.
Quentin,Y.,Chabalier,J.andFichant,G.(2002)Strategiesfortheidentification,theLinton,K.J.andHiggins,C.F.(1998)TheEscherichiacoliATP-bindingcassette(ABC)assemblyandtheclassificationofintegratedbiologicalsystemsincompletelyproteins.Mol.Microbiol.,28,5–13.
sequencedgenomes.Comput.Chem.,26,447–457.
Mizuno,T.(1997)Compilationofallgenesencodingtwo-componentphospho-Rodrigue,A.,Quentin,Y.,Lazdunski,A.,Mejean,V.andFoglino,M.(2000)Two-transfersignaltransducersinthegenomeofEscherichiacoli.DNARes.,28,componentsystemsinP.aeruginosa:whysomany?TrendsMicrobiol.,8,161–168.
498–504.
Nemati,H.R.,Steiger,D.M.,Iyer,L.S.andHerschel,R.T.(2002)Knowledgeware-Saurin,W.,Hofnung,M.andDassa,E.(1999)Gettinginorout:earlysegregationbetweenhouse:anarchitecturalintegrationofknowledgemanagement,decisionsup-importersandexportersintheevolutionofATP-bindingcassette(ABC)transporters.port,artificialintelligenceanddatawarehousing.DecisionSupportSystems,33,J.Mol.Evol.,48,22–41.
143–161.
Stevens,R.,Robinson,A.andGoble,C.A.(2003)myGrid:personalisedbioinformaticsPage,M.,Gensel,J.,Capponi,C.,Bruley,C.,Genoud,P.,Ziebelin,D.,Bardou,D.andontheinformationgrid,proceedingsofthe11thIBSM(Brisbane).Bioinformatics,Dupierris,V.(2001)Anewapproachtoobject-basedknowledgerepresentation:19,i302–i304.
theAROMSystem.Proceedingsofthe14thInternationalConferenceonIndus-Taglicht,D.andMichaelis,S.(1998)AcompletecatalogueofSaccharomycescerevisiaetrialandEngineeringApplicationsofArtificialIntelligenceandExpertSystemsABCproteinsandtheirrelevancetohumanhealthanddisease.MethodsEnzymol.,(IEAAI&ES2001),LectureNotesinArtificialIntelligence.
292,130–162.
Parkinson,J.andKofoid,E.(1992)CommunicationmodulesinbacterialsignalingThroup,J.P.,Koretke,K.K.,Bryant,A.P.,Ingraham,K.A.,Chalker,A.F.,Ge,Y.,Marra,A.,proteins.Annu.Rev.Genet.,26,71–112.
Wallis,N.G.,Brown,J.R.,Holmes,D.J.,Rosenberg,M.andBurnham,M.K.(2000)AParmentier,T.andZiébelin,D.(1999)Distributedproblemsolvingenvironmentded-genomicanalysisoftwo-componentsignaltransductioninStreptococcuspneumo-icatedtoDNAsequenceannotation.Proceedingsofthe11thEuropeanWorkshopniae.Mol.Microbiol.,35,566–576.
(EKAW’99).LectureNotesinComputerScience,Springer-Verlag,Heidelberg,Tomii,K.andKanehisa,M.(1998)AcomparativeanalysisofABCtransportersinVol.1621,p.243.completemicrobialgenomes.GenomeRes.,8,1048–1059.
1256
Downloaded from http://bioinformatics.oxfordjournals.org/ at Tianjin University Library on October 29, 2011
因篇幅问题不能全部显示,请点此查看更多更全内容