How to retrieve large lists?

Nov 7, 2007 at 12:30 AM
Hi there,

Thanks for providing this library. I was evaluating it today and it wasn't clear to me how you could use it to retrieve large lists from S3 (for example, when you have a bucket that contains more than 1000 objects). Am I missing something or is this capability not in the library?

Thanks,

Jeffrey
Nov 8, 2007 at 11:38 PM
hello,
see the console sample that shows how to list the items in a bucket.

here is the extract:

private void ListBucket(String bucketName)
{
BucketListRequest request = null;
BucketListResponse response = null;
try
{
this.listBoxObjects.Items.Clear();

request = new BucketListRequest(bucketName);
response = service.BucketList(request);

XmlDocument bucketXml = response.StreamResponseToXmlDocument();

XmlNodeList objects = bucketXml.SelectNodes("//*local-name()='Key'");
foreach (XmlNode obj in objects)
{
this.listBoxObjects.Items.Add(obj.InnerXml);
}
}
catch (Exception ex)
{
if (response != null && response.DataStream != null)
response.DataStream.Close();

Tools.AppendTextFormat(mmoS3Log, ex.Message);
}
}
Nov 9, 2007 at 12:27 AM
OK...that example is a little more involved that what I found. I originally did this:

BucketListRequest bucketListRequest = new BucketListRequest("files.approver.com");
BucketListResponse bucketListResponse = service.BucketList(bucketListRequest);
Console.WriteLine(bucketListResponse.StreamResponseToString());

which works but only retrieves 1000 items. Next I tried wrapper.ListBucket which has the same shortcoming.

I am assuming that your code will retrieve the entire contents of the bucket, even if it's more than 1000 items?

Thanks,

Jeffrey
Nov 9, 2007 at 2:45 AM
Hello Jeffrey,
sorry I didn't understand that this was your problem...
By default a maximum of 1,000 objects are returned in a single Web Service call.

You could find some hints about how to solve the probleme here:
http://developer.amazonwebservices.com/connect/entry!default.jspa?categoryID=47&externalID=372&fromSearchPage=true

More specifics here:
http://developer.amazonwebservices.com/connect/thread.jspa?messageID=44915&#44915

It seems that we will have to add this ourselves.

I don't have the time for this right now. but if you do it I would be pleased if you would share it with me :)

Here
http://docs.amazonwebservices.com/AmazonS3/2006-03-01/RESTBucketGET.html
it says that we need to use a marker in the query to get the next result set. the marker to use is the last key of the current result.

anyway, that approach has it's limits. you would be better off storing your keys in a database and use them to make requests.

it the use is to delete the objects. delete the first 1000 then make a new query. But I guess this is not what you want to do...
Nov 10, 2007 at 7:07 PM
Edited Nov 10, 2007 at 7:34 PM
Hi Jeffrey,

In order to get ThreeSharp to return lists with more than 1000 keys, you have to check for the IsTruncated element to see if there are additional pages of keys to be returned. If IsTruncated = true, then add the "marker" query paramater and send additional requests until IsTruncated returns false. Marker is simply the name of the last key your received in the last response.

Heres a sample:

private List<string> GetKeys(string bucketName) 
{ 
    List<string> keys = new List<string>(); 
    bool isTruncated = true; 
    string marker = string.Empty; 
    while (isTruncated) 
    { 
        XmlDocument bucketXml = GetBucketListXml(bucketName, marker); 
        XmlNodeList keyNodes = bucketXml.SelectNodes("//*[local-name()='Key']"); 
        foreach (XmlNode key in keyNodes) 
        { 
            keys.Add(key.InnerXml); 
        } 
        marker = keys[keys.Count - 1];
        isTruncated = bool.Parse(bucketXml.SelectSingleNode("//*[local-name()='IsTruncated']").InnerXml); 
    } 
}
 
private XmlDocument GetBucketListXml(string bucketName, string marker) 
{ 
    BucketListRequest request = new BucketListRequest(bucketName); 
    if (!string.IsNullOrEmpty(marker)) 
    { 
        request.QueryList.Add("marker", marker); 
    } 
    BucketListResponse response = _service.BucketList(request); 
    XmlDocument bucketXml = response.StreamResponseToXmlDocument(); 
    return bucketXml; 
} 

Regards,
Chris Montgomery
Nov 14, 2007 at 8:03 PM
Hello Chris,
thank you for your sample. It's working great.

Just two very small changes...

I suggest using key.InnerText instead of key.InnerXml, that way it will handle filenames containing the '&' character to work.

Also we need to add a test to check if the bucket is empty with:
if (keys.Count > 0)
{
marker = keyskeys.Count - 1;
}

thanks again,
you saved me a lot of work.
John.
Nov 15, 2007 at 12:27 AM
Hi John,

Good call on using InnerText - I read another post on his forum and already made that change everywhere in my project. :) My application has key name validation further up-stream, but in your case you may also want to consider using HttpUtility.UrlEncode and .UrlDecode when creating and retreiving key names if you want to maximize compatibility between S3 and your file system's string requirements.

Regards,
Chris Montgomery
Coordinator
Nov 28, 2007 at 1:27 AM
Hi Everyone,
I've made a change for release 1.1. The form sample now implements code based on Chris's to get additional items if the list is truncated. Also, it's using InnerText instead of InnerXml. Thanks for the suggestions and ideas!

Thanks,
Joel Wetzel
Affirma Consulting
Nov 28, 2007 at 6:27 AM
Thanks for this, Joel.

When selecting any existing bucket using the 1.1 version of the Affirma.ThreeSharp.FormSample application, I get the error "The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel." I am pretty sure that this sample application worked for all my buckets in the 1.0 version.
Coordinator
Nov 29, 2007 at 6:00 PM


jeffreymcmanus wrote:
Thanks for this, Joel.

When selecting any existing bucket using the 1.1 version of the Affirma.ThreeSharp.FormSample application, I get the error "The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel." I am pretty sure that this sample application worked for all my buckets in the 1.0 version.


Interesting. The only thing that would have changed in how it accesses an existing bucket is - if you look in the constructor for the ThreeSharpFormSample form, there's a line that says config.Format = CallingFormat.SUBDOMAIN;. The default used to be CallingFormat.REGULAR, but had to be changed to support EU buckets. You might try changing it back to regular and see if that fixes it. If it does, my next question would be, what kind of firewall are you running?

Thanks,
Joel Wetzel
Affirma Consulting