[galaxy-dev] Error importing BAM file into Library
Greg Von Kuster
greg at bx.psu.edu
Tue Mar 15 09:51:01 EDT 2011
Hello Peter,
Breaking this issue into the following 2 parts, here is the status.
1. Don't alter the contents of files being uploaded to a data library if using the "upload_directory" or "upload_paths" options in conjunction with the "Link to files without copying into Galaxy" option. This issue has been resolved in change set 5221:b5ecb8f4839d.
2. Determine if a BAM file is sorted before it is introduced into the Galaxy environment so that it will only be sorted if necessary. We have a very simple test for this in the Bam class's _is_coordinate_sorted(0 method in ~/lib/galaxy/datatypes/binary.py, but this method obviously needs improvements. The improved implementation is a bit non-trivial, but it is high priority, so should be completed soon. In the meantime, Bam files cannot be uploaded to a data library using the combinations of options described in 1 above if they do not pass the current simple, rigid test in the Bam class's method.
Thanks for your message,
Greg Von Kuster
On Mar 10, 2011, at 1:18 PM, Peter Cock wrote:
> Hi all,
>
> I think I have fallen over the same problem Glen Beane reported in Nov 2010,
> http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-November/003943.html
>
> I recall from reading the mailing list, that when you import a BAM file
> into Galaxy, it gets sorted and indexed. That makes sense since most
> tools need that to be done, and resorting an already sorted file should
> be quick.
>
> However, I'm trying to import some presorted and indexed BAM files
> into a library in Galaxy via the Admin settings, linking to the file not
> copying it. I'm getting this:
>
> <quote>
>
> File size: 4.4 Gb
> Data type: auto
> Build: ?
> Miscellaneous information: uploaded bam fileTraceback (most recent
> call last): File "/opt/galaxy-dist/tools/data_source/upload.py", line
> 447, in __main__() File
> "/opt/galaxy-dist/tools/data_source/upload.py", line 439, in __main__
> add_file( dataset, registry, j
> Job Standard Output
>
> [bam_sort_core] merging from 28 files...
>
> Job Standard Error
>
> Traceback (most recent call last):
> File "/opt/galaxy-dist/tools/data_source/upload.py", line 447, in
> __main__()
> File "/opt/galaxy-dist/tools/data_source/upload.py", line 439, in __main__
> add_file( dataset, registry, json_file, output_path )
> File "/opt/galaxy-dist/tools/data_source/upload.py", line 381, in add_file
> datatype.groom_dataset_content( output_path )
> File "/opt/galaxy-dist/lib/galaxy/datatypes/binary.py", line 98, in
> groom_dataset_content
> shutil.move( samtools_created_sorted_file_name, file_name )
> File "/usr/local/lib/python2.6/shutil.py", line 260, in move
> copy2(src, real_dst)
> File "/usr/local/lib/python2.6/shutil.py", line 95, in copy2
> copyfile(src, dst)
> File "/usr/local/lib/python2.6/shutil.py", line 51, in copyfile
> with open(dst, 'wb') as fdst:
> IOError: [Errno 13] Permission denied: '/data/XXX-bwa-out.sorted.bam'
>
> error
> Database/Build: ?
> Number of data lines: None
> Disk file: /data/XXX-bwa-out.sorted.bam
>
> </quote>
>
> Clearly from the error message, Galaxy is trying to edit the source
> file (and the Unix account it is running in only has read permission
> for this file and its containing folder). From the stdout message
> "[bam_sort_core] merging from 28 files..." it looks like Galaxy is
> trying to (re)sort my file, and may well attempt to reindex it. Is that
> likely to be the case?
>
> If the "copy" option had been used, then sorting and indexing
> should work - but I want Galaxy to link to the file as it it.
>
> If however "copy" was not selected, then I don't want Galaxy
> trying to alter the file like this. Could the sort+index be disabled
> in this mode? I think it is reasonable to expect administrators
> trying to import BAM files from the local file system to take
> care of this.
>
> Alternatively, you could actually check if the BAM file is pre-sorted
> or not.
>
> Peter
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
> http://lists.bx.psu.edu/
Greg Von Kuster
Galaxy Development Team
greg at bx.psu.edu
More information about the galaxy-dev
mailing list