rel: Added data release note v1_05.
@soumyadeep.ghosh.consulting I could use some help in putting the release numbers correctly together. Please have a look at this PR and the numbers I added. Please also find attached the output of the update scripts. I am highly confused about which numbers to take, because both update scripts output for instance a "Protein count" but with different numbers, e.g. 14 versus 21.
The way I calculated the number of updated proteins and condensates is simply by subtracting the protein and condensate counts of the 2 releases, v1.04 and v1.05. These I fetched by parsing the MongoDB dump files. FYI, the dump files for the new release are now uploaded on owncloud and can be found here
Update script RAW output:
--- Running sync for update_items as script --- Beginning sync for update_items at 2024-01-23 10:20:53.877643 End sync task for update_items at 2024-01-23 10:20:54.148213; Output stored at /local/ddcode-db-creator/data/output/update_items_2024-01-23_10:20:53_877643.txt 2024-01-23 10:20:54.148269 : Begin Post-processing . . .
-- Registering new condensate proteome members -- 2024-01-23 10:20:55.633991 : Created new protein: A9QM73 2024-01-23 10:20:55.809835 : Created new protein: Q9SUC3 2024-01-23 10:20:59.817318 : Created new protein: P41214 2024-01-23 10:20:59.996764 : Created new protein: P61221 2024-01-23 10:21:01.464273 : No. Condensates having proteome reduction: 0
-- Remove proteins details from UniProt -- 2024-01-23 10:21:01.481020 : #Proteins with no details: 0 2024-01-23 10:21:01.481602 : Deleted 0 proteins not found in UniProt: []
-- Remove proteins with no condensate membership -- 2024-01-23 10:21:02.237771 : Extra UniProt IDs found in proteins collection: 1; Proteins: H3BR35 2024-01-23 10:21:02.240022 : Delete operation response: 1
-- Update condensate_count in proteins -- 2024-01-23 10:21:02.330697 : Condensate Protein counts updated : 5
-- Update protein____counts in condensates -- 2024-01-23 10:21:02.367309 : Proteins counted for condensates : 9640
2024-01-23 10:21:03.230936 : Proteins updated : 12
-- Update confidence scores -- 2024-01-23 10:21:03.701306 : Condensate confidence scores updated : 645
-- Add species to condensates -- 2024-01-23 10:21:03.702094 : #Condensates with no species: 0 2024-01-23 10:21:03.702543 : Added species to condensates: 0
--- Running sync for add_condensate_items as script --- Beginning sync for add_condensate_items at 2024-01-23 10:47:20.935649 End sync task for add_condensate_items at 2024-01-23 10:47:21.029171; Output stored at /local/ddcode-db-creator/data/output/condensate_items_2024-01-23_10:47:20_935649.txt 2024-01-23 10:47:21.029565 : Begin Post-processing . . .
-- Registering new condensate proteome members -- 2024-01-23 10:47:26.561864 : No. Condensates having proteome reduction: 0
-- Remove proteins details from UniProt -- 2024-01-23 10:47:26.578402 : #Proteins with no details: 13 2024-01-23 10:47:27.160839 : Updated O95816 with data from UniProt 2024-01-23 10:47:27.627047 : Updated Q9VYI0 with data from UniProt 2024-01-23 10:47:28.122778 : Updated Q4Z8K6 with data from UniProt 2024-01-23 10:47:28.588932 : Updated Q8CCS6 with data from UniProt 2024-01-23 10:47:29.130307 : Updated P35568 with data from UniProt 2024-01-23 10:47:29.585883 : Updated Q6PBB2 with data from UniProt 2024-01-23 10:47:30.083060 : Updated P19419 with data from UniProt 2024-01-23 10:47:30.528730 : Updated Q02383 with data from UniProt 2024-01-23 10:47:31.047123 : Updated Q13049 with data from UniProt 2024-01-23 10:47:31.540543 : Updated Q969Q1 with data from UniProt 2024-01-23 10:47:32.068787 : Updated Q9C035 with data from UniProt Unable to fetch and parse Protein with UniProt ID Q0VBL3 ; Error: URL can't contain control characters. '/uniprot/Q0VBL3 .xml' (found at least ' ') 2024-01-23 10:47:32.074781 : Skipped fetching data for Q0VBL3 Unable to fetch and parse Protein with UniProt ID Q9Y4H4 ; Error: URL can't contain control characters. '/uniprot/Q9Y4H4 .xml' (found at least ' ') 2024-01-23 10:47:32.080569 : Skipped fetching data for Q9Y4H4 2024-01-23 10:47:32.082112 : Deleted 2 proteins not found in UniProt: ['Q0VBL3 ', 'Q9Y4H4 ']
-- Remove proteins with no condensate membership -- 2024-01-23 10:47:32.987258 : Extra UniProt IDs found in proteins collection: 0; Proteins: 2024-01-23 10:47:32.988562 : Delete operation response: 0
-- Update condensate_count in proteins -- 2024-01-23 10:47:33.056508 : Condensate Protein counts updated : 14
-- Update protein____counts in condensates -- 2024-01-23 10:47:33.126377 : Proteins counted for condensates : 9653
2024-01-23 10:47:33.895806 : Proteins updated : 21
-- Update confidence scores -- 2024-01-23 10:47:34.385265 : Condensate confidence scores updated : 659
-- Add species to condensates -- 2024-01-23 10:47:34.386355 : #Condensates with no species: 14 2024-01-23 10:47:34.405379 : Added species to condensates: 14