Bryce Boe

The Adventures of a UCSB Computer Science Ph.D. Student

Skip to: Content | Sidebar | Footer

Amazon S3: Convert Objects to Reduced Redundancy Storage

2 July, 2010 (13:37) | General | By: Bryce Boe

Edit 2010/08/27: I thought I should update this page stating that the AWS S3 Console was updated about a month ago with a feature to convert entire folders to/from Reduced Redundancy. Additionally the boto source has moved to GIT, thus a few changes are needed to run my script of the latest tree, however with the modification to the below checkout line it will still work from SVN.

If you are like me, you are pleased by the fact that amazon made it even cheaper to store information in Amazon S3 through their reduced redundancy storage model. Unfortunately until just recently there wasn’t a simple way to convert your old objects to use the reduced redundancy storage model. Using a new revision of boto, the python amazon aws package, I wrote a script (derived from one by the boto author) that will automatically convert all your old objects in a bucket to use the reduced redundancy storage model.

The script, shown at the bottom, currently requires a svn version of boto with revision of at least 1595. Assuming you have python and subversion installed, the following will get you up and running with the script, which can be downloaded here. Running the script concurrent times will take significantly much less time, therefore stopping the script midway through is of minor consequence.

svn checkout http://boto.googlecode.com/svn/trunk/@1595 boto-read-only
cd boto-read-only
(as root) python setup.py install
cd ..
rm -rf boto-read-only
./convert_to_rss.py your_bucket_name [aws_access_key_id aws_secret_access_key]

Note that you may alternatively put your aws_access_key_id and aws_secret_access_key into the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY respectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
#!/usr/bin/env python
import os, sys
 
try:
    import boto.s3
    boto.s3.key.Key.change_storage_class
except ImportError, e:
    sys.stderr.write('Package boto (svn rev. >= 1595) must be installed.\n')
    sys.exit(1)
except AttributeError, e:
    sys.stderr.write('Invalid version of boto. Required svn rev. >= 1595.\n')
    sys.exit(1)
 
def convert(bucket_name, aws_id, aws_key):
    s3 = boto.connect_s3(aws_id, aws_key)
    bucket = s3.lookup(bucket_name)
    if not bucket:
        sys.stderr.write('Invalid authentication, or bucketname. Try again.\n')
        sys.exit(1)
    print 'Found bucket: %s' % bucket_name
    sys.stdout.write('Converting: ')
    sys.stdout.flush()
    found = converted = 0
    try:
        for key in bucket.list():
            found += 1
            if key.storage_class != 'REDUCED_REDUNDANCY':
                key.change_storage_class('REDUCED_REDUNDANCY')
                converted += 1
            if found % 100 == 0:
                sys.stdout.write('.')
                sys.stdout.flush()
    except KeyboardInterrupt: pass
 
    print '\nConverted %d items out of %d to reduced redundancy storage.' % \
        (converted, found)
 
def main():
    def usage(msg=None):
        if msg:
            sys.stderr.write('<error>\n%s\n</error>\n' % msg)
        sys.stderr.write(''.join(['Usage: %s bucket [aws_access_key_id ',
                                  'aws_secret_access_key]\n']) % sys.argv[0])
        sys.exit(1)
 
    if len(sys.argv) == 2:
        bucket = sys.argv[1]
        msg = ''
        if 'AWS_ACCESS_KEY_ID' in os.environ:
            aws_id = os.environ['AWS_ACCESS_KEY_ID']
        else:
            msg += 'Environment does not contain AWS_ACCESS_KEY_ID.\n'
        if 'AWS_SECRET_ACCESS_KEY' in os.environ:
            aws_key = os.environ['AWS_SECRET_ACCESS_KEY']
        else:
            msg += 'Environment does not contain AWS_SECRET_ACCESS_KEY.\n'
        if msg:
            usage(msg + 'Please set values in environment or pass them in.')
    elif len(sys.argv) == 4:
        bucket = sys.argv[1]
        aws_id = sys.argv[2]
        aws_key = sys.argv[3]
    else:
        usage()
 
    convert(bucket, aws_id, aws_key)
 
if __name__ == '__main__':
    main()

Related Entries

Comments

Comment from KT
Time 2010/07/15 at 6:26 PM

I installed the boto.Version of 2.0a2…the lastest from the boto trunk as per the instructions and I get this error:

AttributeError: ‘Provider’ object has no attribute ‘storage_class’

Any work arounds?

Thanks,
KT

Comment from Bryce Boe
Time 2010/07/15 at 10:16 PM

@KT – Perhaps the version you checked out introduced a bug. I currently am unable to test the trunk revision, but you may try running an svn update to the latest revision or alternatively update (downgrade) to the later revision 1595 via `svn up -r 1595`.
Make sure to again do the python setup.py install to update the package.

Comment from KT
Time 2010/07/16 at 4:07 PM

Yes, revision 1595 is working. Thank you.

-KT

Comment from todd
Time 2010/11/16 at 2:03 PM

does this actually work? when i read back the keys they still have storage_class == ‘STANDARD’

Comment from todd
Time 2010/11/16 at 5:26 PM

Looks like it is working- for some reason key.storage_class never gets updated

Comment from Bryce Boe
Time 2010/11/29 at 10:49 PM

Glad it’s working todd. You can perform this action now from the AWS console, thus I haven’t sought to update my script with the latest release of boto which should correctly display the storage class.

Write a comment