[galaxy-dev] Error importing BAM file into Library

Greg Von Kuster greg at bx.psu.edu
Tue Mar 15 09:51:01 EDT 2011


Hello Peter,

Breaking this issue into the following 2 parts, here is the status.

1. Don't alter the contents of files being uploaded to a data library if using the "upload_directory" or "upload_paths" options in conjunction with the "Link to files without copying into Galaxy" option.  This issue has been resolved in change set 5221:b5ecb8f4839d.

2. Determine if a BAM file is sorted before it is introduced into the Galaxy environment so that it will only be sorted if necessary.  We have a very simple test for this in the Bam class's _is_coordinate_sorted(0 method in ~/lib/galaxy/datatypes/binary.py, but this method obviously needs improvements.  The improved implementation is a bit non-trivial, but it is high priority, so should be completed soon.  In the meantime, Bam files cannot be uploaded to a data library using the combinations of options described in 1 above if they do not pass the current simple, rigid test in the Bam class's method.

Thanks for your message,

Greg Von Kuster


On Mar 10, 2011, at 1:18 PM, Peter Cock wrote:

> Hi all,
> 
> I think I have fallen over the same problem Glen Beane reported in Nov 2010,
> http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003943.html
> 
> I recall from reading the mailing list, that when you import a BAM file
> into Galaxy, it gets sorted and indexed. That makes sense since most
> tools need that to be done, and resorting an already sorted file should
> be quick.
> 
> However, I'm trying to import some presorted and indexed BAM files
> into a library in Galaxy via the Admin settings, linking to the file not
> copying it. I'm getting this:
> 
> <quote>
> 
> File size: 4.4 Gb
> Data type: auto
> Build: ?
> Miscellaneous information: uploaded bam fileTraceback (most recent
> call last): File "/opt/galaxy-dist/tools/data_source/upload.py", line
> 447, in __main__() File
> "/opt/galaxy-dist/tools/data_source/upload.py", line 439, in __main__
> add_file( dataset, registry, j
> Job Standard Output
> 
> [bam_sort_core] merging from 28 files...
> 
> Job Standard Error
> 
> Traceback (most recent call last):
>  File "/opt/galaxy-dist/tools/data_source/upload.py", line 447, in
>    __main__()
>  File "/opt/galaxy-dist/tools/data_source/upload.py", line 439, in __main__
>    add_file( dataset, registry, json_file, output_path )
>  File "/opt/galaxy-dist/tools/data_source/upload.py", line 381, in add_file
>    datatype.groom_dataset_content( output_path )
>  File "/opt/galaxy-dist/lib/galaxy/datatypes/binary.py", line 98, in
> groom_dataset_content
>    shutil.move( samtools_created_sorted_file_name, file_name )
>  File "/usr/local/lib/python2.6/shutil.py", line 260, in move
>    copy2(src, real_dst)
>  File "/usr/local/lib/python2.6/shutil.py", line 95, in copy2
>    copyfile(src, dst)
>  File "/usr/local/lib/python2.6/shutil.py", line 51, in copyfile
>    with open(dst, 'wb') as fdst:
> IOError: [Errno 13] Permission denied: '/data/XXX-bwa-out.sorted.bam'
> 
> error
> Database/Build: ?
> Number of data lines: None
> Disk file: /data/XXX-bwa-out.sorted.bam
> 
> </quote>
> 
> Clearly from the error message, Galaxy is trying to edit the source
> file (and the Unix account it is running in only has read permission
> for this file and its containing folder). From the stdout message
> "[bam_sort_core] merging from 28 files..." it looks like Galaxy is
> trying to (re)sort my file, and may well attempt to reindex it. Is that
> likely to be the case?
> 
> If the "copy" option had been used, then sorting and indexing
> should work - but I want Galaxy to link to the file as it it.
> 
> If however "copy" was not selected, then I don't want Galaxy
> trying to alter the file like this. Could the sort+index be disabled
> in this mode? I think it is reasonable to expect administrators
> trying to import BAM files from the local file system to take
> care of this.
> 
> Alternatively, you could actually check if the BAM file is pre-sorted
> or not.
> 
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/

Greg Von Kuster
Galaxy Development Team
greg at bx.psu.edu






More information about the galaxy-dev mailing list