2008-12-15 dasari 1) Changes have been made to executeSearchThread function to handle extreme PTM searches without thrashing. 2) Extra variable has been added to SearchStats to keep track of number of peptides that have been skipped due to enoromous number of PTM variants. 3) An efficient MakePTMVariants iterator is coded to enumerate the exact number of ptm variants for a peptide than all permutations. This makes the PTM variants grow linearly with the number of DynamicMods rather than exponentially. 2008-12-04 chambers - moved Digestion config setup to InitWorkerGlobals to take advantage of precursor mass bounds - fixed crash removing precursor water loss twice as both single and double (at very high charge states) 2008-11-19 chambers Freicore: - changed CalculateSequenceIons to be able to flexibly calculate any of the peptide ion series - added tracking of dissociation method to BaseSpectrum; read in by PeakSpectrum - fixed MakePTMVariants to support multiple dynamic mods per residue (between permutations, not within a permutation) - expanded PTM unit tests - optimized proteinStore to use shared strings for supporting databases with extreme number of loci (peptide databases) - added root level Jamfile for building all projects at once MyriMatch: - added FragmentationAutoRule and FragmentationRule variables to control peptide fragment prediction 2008-11-17 dasari 1) Updated the SearchResult comparator to keep ambiguous modifications 2008-10-22 chambers - added static mods to MakePtmVariants - removed obsolete PTM manipulation functions - added auto-pruning of non-residue characters in proteinStore 2008-10-22 chambers - fixed BaseSearchResult to derive from pwiz::proteome::DigestedPeptide - updated pepXML reading and writing to use pwiz modification semantics - added non-intrusive serialization for Peptide, DigestedPeptide, and Modification 2008-10-21 chambers - refactored MyriMatch and Freicore to use double precision floats almost everywhere - added some rudimentary unit tests for checking that the new pwiz-proteome code behaves the same as the old freicore proteome code 2008-10-15 chambers Freicore: - updated Bumbershoot to use boost 1.36.0 * resolved some namespace conflicts (boost::tokenizer, std::exception) - fixed monoisotopic mass of K - fixed Jamfile bug calling for mpi compilation of pwiz::proteome - fixed bad default values for PrecursorMzTolerance and Dynamic/StaticMods - changed some proteomic types to use pwiz::proteome * Peptide replaces std::string w/ ResidueMap::GetMassOfResidues() * Digestion replaces DigestProteinSequence * DigestedPeptide replaces CandidateSequenceInfo * DynamicMods and MakePtmVariants uses ModificationMap * Fragmentation replaces manual calculation in CalculateSequenceIons - changed output of qonversion details to fixed width columns MyriMatch: - updated for new core internals - added pwiz versions to exe version output - added try/catch for thread functions - added mzFidelity score (again) 2008-10-03 dasari (1) Minor bug fixes in Myrimatch MPI (2) Major bug fixes in TagRecon (3) Code commenting in TagRecon and Associated Freicore 2008-08-31 chambers - fixed PeakSpectrum to always use NativeCentroider - fixed freicore Jamfile to only depend on NativeCentroider instead of all of spectrum_processing 2008-06-19 chambers - moved MPI feature to Jamroot; non-MPI applications will require it to be set to none - added ProteoWizard requirements for disabled warnings - MyriMatch * reverted pepXML spectrum attribute to DTA format for TPP compatibility * added non-standard attributes: spectrumID, spectrumNativeID, spectrumIndex, corresponding to the mzML identifiers * separated non-standard numSequenceComparisons into two separate attributes: numTargetComparisons and numDecoyComparisons * fixed crash bug reading empty spectra * added proper output for peptide_prev_aa and peptide_next_aa attributes; also outputting these attributes for alternative_protein elements * converted FASTA database command-line argument into configuration variable "ProteinDatabase" 2008-06-11 chambers - build system conversion to Boost.Build finalized (README files updated accordingly) - added ProteoWizard subtree to Bumbershoot repository - FreiCore * added preliminary support for mzML-style id attributes (stringID, nativeID, and index) * SearchSpectrum writes pepXML with that support * refactored scanInfo and BaseSpectrum::scanId to SpectrumId and BaseSpectrum::id * switch from LibMSR to ProteoWizard for MS data reading (and soon writing too!) * switched to use optional ostream pointer to provide qonversion details instead of global variable * switched to use optimized_lexical_cast in ProteoWizard - DirecTag * updated TagFile output to include Zeqiang's requested fields and use mzML-style attributes * updated usage line to reflect support for generic MS formats - MyriMatch * fixed to support 64-bit integers for search-time estimates and statistics 2008-04-19 chambers - FreiCore (long overdue commit with new Boost.Build build system) * fixed some capitalization consistency * fixed pepXML output for retention time, added "Config: " prefix for configuration parameters * qonversion details now configurable at runtime ("WriteQonversionDetails") * added segregation by terminal specificity as well as charge state * much more robust error checking and handling in pepXmlReader * added decoy tracking to proteinStore * more robust error checking in optimized lexical_cast specializations * cleanup of some old irrelevant files - MyriMatch * new MaxFragmentChargeState option * converted ProteinSampleSize to ProteinSamplingTime so that huge mod searches don't take forever to sample * also made estimation more accurate by reflecting changes in sequence candidate generation * new EstimateSearchTimeOnly option * search stats from child processes are now returned to the root process to generate overall stats * fixed issue of trying to preprocess spectra emptied out by precursor mass filter 2007-12-07 chambers - FreiCore * added real/decoy tracking to ProteinStore * fixed template bug in BaseSpectraList::random_shuffle() * fixed some calls to boost::filesystem to use native name checking * all builds, not just Windows, will use native name checking for boost::filesystem - DirecTag * updated to work with newer FreiCore code (added a SearchResult, adjusted to work with libmsr, etc.) * inline validation currently commented out - MyriMatch * fixed debug configuration to point to libmsr (need to fix MPI builds too) * fixed some calls to boost::filesystem to use native name checking * updated usage line to not specify mzData as spectra input (need to update docs) - SQTer: fixed bug in conversion of the OUT format's 12 hour time to 24 hour time 2007-11-28 chambers - MyriMatch * fixed memory leak caused by not deleting WorkerThreadInfo objects * fixed bug with using '/' as a path separator so it is now stripped properly when determining scan name * fixed MVH bug caused by recent optimization attempts * added massError score to SearchResult::serialization 2007-11-20 chambers - Freicore: major update * renamed "common" to "freicore" and updated all code accordingly; adding "freicore" directory and removing "common" directory * removed unused expat code in freicore directory * replaced mzDataReader with new-fangled libmsr: Bumbershoot apps now support reading from RAW, WIFF, mzData, and mzXML! * ScoreSequenceVsSpectrum now fills out an existing SearchResult object instead of creating a new one for each comparison (major performance boost) * pepXML output modified slightly * SimpleXMLWriter, SHA1, and GetHostEndianType functionality removed and now depends on libmsr for these * fixed MvhTable ConvertToPValues function * converted internal (Min)Confidence system to MaxFDR to match IDPicker and runtime parameters * major update to Histogram to support aggregating a Histogram over a long period of time without having to hold all samples in memory for all Histograms (still kind of buggy) * new functions to get vector of keys or values of any map (why didn't I think of that before?) - MyriMatch * added relative score calculation, score histograms, new scores mzSSE and massError * added summation of child process search stats to the root process to provide overall search stats * new CandidateSequenceInfo objects to carry around information about the candidate's termini specificity and missed cleavages * uses new ScoreSequenceVsSpectrum call syntax to get a performance boost * new MinResultScore variable to allow score histograms to exclude completely bogus results (like those with mvh score 0) * new NumSearchBestAdjustments variable to allow the top N best precursor mass adjustments to be searched instead of just the top adjustment (major benefits on Orbi) * to support the new multiple precursor mass feature, MyriMatch gives each spectrum a list of possible precursor masses 2007-10-26 chambers - FreiCore: * updated sequest enzymes to encode protein termini as valid cleavage sites * added filtering by charge state to BaseSpectraList * gave all SpectraList types a SpectraListType template parameter * moved expat includes and library pragma to new expat_xml.h * major updates to SearchSpectrum classes which are now more generic and supports named scores * updates to support num_tol_term and num_internal_cleavages attributes in pepXML * whether SearchSpectraList::calculateFDRs writes -qonversion.txt files is now set by defining DEBUG_QONVERSION * MakePtmVariants uses new CandidateSequenceInfo classes * CleavageRuleSet can now be initialized via constructor in addition to istream operator * new find_first_of functions for use on std::maps to find the first of a list of keys * templated operator<< for std::maps improved: only string keys/values will be quoted now (e.g. "string"->int) - MyriMatch: * candidate generation improved, candidates store their masses, terminal specificities, and missed cleavages * renamed WriteOutputToSQT to WriteOutputToFile (yay) * created SearchResult class to work with new generic SearchSpectrum classes * properly writes numTerminiCleavages and numMissedCleavages to pepXML * using conventional parameter name "SearchEngine: Name" and "SearchEngine: Version" as runtime variables to properly convey search engine name and version * using conventional parameter name "SearchTime: Started" and "SearchTime: Stopped" as runtime variables to properly convey search time statistics - SQTer: * created SearchResult class to work with new generic SearchSpectrum classes * properly writes numTerminiCleavages and numMissedCleavages to pepXML (uses flanking residues from OUT files to check cleavage based on the configured enzyme) * using conventional parameter name "SearchEngine: Name" and "SearchEngine: Version" as runtime variables to properly convey search engine name and version * start and end search times are determined by taking the minimum and maximum time from the entire set of OUT files * using conventional parameter name "SearchTime: Started" and "SearchTime: Stopped" as runtime variables to properly convey search time statistics * hollowed out ConvertXTandem function 2007-10-11 chambers - FreiCore * added file type detection support for FASTA and some fixes for XML type detection * renamed sqtFile to searchSpectrum * common makefile now links to Boost statically and to everything else dynamically * fix to SimpleXMLWriter so the string specialization of the attr() function will be called appropriately * converted most filesystem functions to use boost::filesystem (this is a new library dependency) * major update to IsotopeDistribution class to support future enhancements to deisotoping (but not likely on data with LTQ fragment accuracy) * added support for writeSvg to return the svg as an xml string instead of writing to file (also added a utility function to write the file directly) * fixed bug in writeSvg where peaks with the same intensity would get combined in the peaksByIntensity map (changed to multimap) * added support for score metadata in PepXML reader, which allows for varying score polarity and min/max values * added score polarity to searchResult and added it as a condition to the sort operator (this may have a performance impact!) * TagsSpectrum::readTags() will now use the source name when it reads in spectra - MyriMatch * added file type detection for input protein database (fasta) and input spectra files (mzdata) - TagValidate * now validates from PepXML files instead of SQT * added feature to convert scores to p values after reading them in (assumes that they are theoretical expectation values from DirecTag) * added feature to show tag scores at percentile intervals 2007-09-21 chambers - FreiCore: * added boost_filesystem * BaseRunTimeConfig now supports hiding variables if they are at their default value * BaseRunTimeConfig has a DEFINE_MEMBERS macro to make deriving configs easier and less verbose * peptide sequence functions now expect symbols to represent the termini of each peptide: ( and ) * DynamicMods now supports motif-style specificity, the new notation is backward compatible with the old one; peptide terminal modifications are supported * DynamicMods now use separate symbols internally than externally, so mods on different residues can use the samed mod symbol * ResidueMap further updated to reflect new termini symbols expected on peptide sequences; peptide symbols are given default masses (hydrogen and hydroxide); other various updates as well * moved scanId member in baseSpectrum to the top so it gets shown first in the MSVC debugger * added function for getting filename without extension (actually, filepath or filename) and a DateTime function that returns the date and time in the XML timestamp format * renamed sqtFile to searchSpectrum * expanded constants and turned them into C types instead of defines; now there are AVG and MONO variants for every element and molecule * updated mzDataReader to support a proprietary, optional, standalone index (nice for random access to spectra from GenerateSpectrumSVG) * significant updates to the SpectrumSVG code (fixed m/z axis glitch, expanded de novo mode to support multiple fragment charge states, added detection of isotope series, changed unzoom icon to Greg's version) * the proteinStore now supports lazy access: reading of the actual protein sequences can be deferred until they are accessed (validateSqtToXml uses this to minimize memory usage and load times) * fix to global profilers init macro * new shared function to strip peptide termini symbols from a sequence (GetRawSequence) * searchSpectrum class now reads and writes pepXML * separated searchResult type from searchSpectrum so pepXML parser could use it * added shared SequestRunTimeConfig class * added SimpleXMLWriter class, modified from TPP * added proteolytic cleavage site testing to CleavageRuleSet - SQTer: * now writes pepXML instead of SQT files (I suggested a name change to "Pepper" but it wasn't received well...apparently this name is already taken?) * now needs sequest.params to get a better standardized set of parameters that went into the search * X! Tandem conversion is probably broken, but since TPP has a Tandem2XML converter, this functionality may disappear * furthermore, SQTer may disappear entirely because TPP has an Out2XML converter - MyriMatch: * now writes pepXML instead of SQT * now keeps track of the peptide offsets from which each candidate originated * uses new cleavage rules testing in CleavageRuleSet, which apparently fixed a bug with protein-C-terminal peptides which weren't being generated properly * various ambiguous uses of the WATER constant are now appropriately specific to AVG or MONO masses 2007-08-17 chambers - MyriMatch: * added random shuffling of protein database * moved all MPI code to a separate CPP to clean up the main file, moved some definitions out of myrimatch.h (will do this for DirecTag as well) * renamed config to RunTimeConfig * added profiling convenience code to major functions to allow easy enabling and disabling of profiling output (BUMBERSHOOT_PROFILING) 2007-08-10 chambers - MyriMatch: updated inline validation to work with new filtering interface (and spectra list assignment) - SQTer: added a better header integrity check - Tests: updated to help profiling Windows vs. Linux and fixed old crappy Makefile - Website: updated versions and added anchors for each entry 2007-07-03 chambers - MyriMatch: updated build number, I forgot to rebuild MyriMatch when I made the cfg file fix on 6/22 2007-06-19 chambers - MyriMatch: split spectrum processing functions into a separate .cpp (finally) 2007-06-14 chambers - FreiCore: Moved the isotope distribution graph code to its own function (instead of being commented out) - FreiCore: Genericized the MvKey and MultivariableTable code to support any type, though only MvIntKeys and MvhTables are currently used - FreiCore/FreiTag: Updated TagMetaIndex code for new FreiTag composition score testing - Various projects: Updated to work with new common code - FreiTag: New scores tested. Complement, Composition, LongestPath, and NeutralLoss all use P values now (stored in the MVH tables) 2007-05-24 chambers - Wrapped zlib.lib functionality for MPI compression so that it isn't called in non-MPI configurations 2007-05-23 chambers - Fixed MPI bug; PeakSpectrum, SearchSpectrum, and TagsSpectrum now properly serialize themselves and do not serialize the BaseSpectrum; only the application Spectrum should serialize the BaseSpectrum - FreiTag and TagRecon will need to be updated with the above fix as well - Fixed Win32 compile broken by removing expat from the source tree; new requirement that Win32 compilers have "expat.h" in their include path (will need to update README.MSVC8) 2007-05-22 chambers - Removed expat.mk from Makefile, it will always be included in common.mk now and the include will be expected to be in the system includes - Fixed some compiling issues that snuck in last update 2007-05-17 chambers - Changed peak data to work with new application-specific structure - Added MPI compression to results transferrence 2007-04-13 chambers Updates to common code and makefile. Continued work on precursor adjustment with Freitag and Myrimatch. Deisotoping pretty much finalized (hopefully). idpickerWrapper has been stable for a few weeks. Added zlib (compression support in MPI programs) 2007-03-16 chambers [no log message] 2007-02-09 chambers Updates to align with common code. 2007-01-12 chambers Lots of common code changes and all applications except graphtag should be up to date with the common code. TagRecon is working again! 2006-12-29 chambers lots of refactoring in common code. spectra lists are much more flexible now, as is reading/writing SQTs and Tags files. Will probably expand that kind of flexibility to include the mzData reader (and hopefully a writer too). none of the applications are ready yet though, more work next week. 2006-12-21 chambers lots of crap. 2006-12-06 chambers [no log message] 2006-12-01 chambers -MyriMatch protein digestion updated. Currently not cleaving at end-of-protein properly. 2006-11-30 chambers - Changed Linux makefiles to keep separate object files and libraries for MPI compiles and non-MPI compiles. - Added feature to MyriMatch so it is able to utilize charge state information from MS, if it is present and valid. ("UseChargeStateFromMS") 2006-11-22 chambers Started updating FreiTag (and GraphTag) again. Changed GraphTag to support sequence highlighting (to make the correct/given path more obvious in the graph) Added a class to read and write the new "tags" format, tab delimited way of storing tags for each spectrum. XMLTags is no longer going to be used until XML is more widespread across the pipeline in general. Added Preprocess, Deisotope, and FilterByTIC to the BaseSpectrum, and ClassifyPeakIntensities to freitagSpectrum (will need to mirror these changes in MyriMatch soon). With this comes the ability to filter by TIC without classifying into peak classes. So I can filter, deisotope, then classify, or filter, classify, then deisotope, etc. 2006-11-14 chambers [no log message] 2006-11-09 chambers It all compiles! Yay! Now to move it over to the new externally accessibly repository. Oh boy. 2006-11-01 chambers MyriMatch 1.0.123: fixed MPI and multithreading bug where loci in search result set were not being added properly 2006-10-27 chambers myrimatch is pretty much final before it's productionized! sqter has been updated to support new DynamicMods and StaticMods notation (still won't work for OUT files) 2006-10-20 chambers [no log message] 2006-10-20 chambers [no log message] 2006-10-19 chambers [no log message] 2006-10-19 chambers [no log message] 2006-10-19 chambers [no log message] 2006-10-19 chambers Lots of work toward making MyriMatch ready for open-sourcing and a first public release. 2006-10-13 chambers tons of code cleanup on myrimatch, freitag, and common. created umbrella solution "everything.sln" in src directory 2006-10-06 chambers - myrimatch polished up - freitag broken (working on boost::serialization) - tagrecon broken - idpicker mostly working - common: added sqtFile, proteinStore - added project: procounter 2006-09-11 chambers Fixed most Myrimatch VC++ performance issues by making the result sets for spectrum-sequence comparisons local to each thread doing work. The result sets are combined into the global spectra list after all threads finish processing. There are still some issues with VC++ not using full power of the CPU when searching large amounts of spectra across small databases. 2006-09-08 chambers ugh 2006-09-01 chambers [no log message] 2006-08-25 chambers [no log message] 2006-08-22 chambers dbvalidate: divided average comparison up by charge state (also outputting precursor error files per charge state) myrimatch: vastly improved effectiveness of precursor adjustment with deisotoping - it's still broken on +3s though common: added a function to find a peak's complement into baseSpectrum 2006-08-18 chambers [no log message] 2006-08-11 chambers [no log message] 2006-08-04 chambers Final FreiQuest version is committed now. It has been renamed to Myrimatch. Thus, this is the initial import of Myrimatch, and it is the same code as the final version of FreiQuest.