Best Practice for Uploading Many Files

Mar 26, 2008 at 12:19 AM
This is my current code to send a bunch of files to S3:

ThreeSharpConfig config;
IThreeSharp service;
ObjectAddRequest request = null;
ObjectAddResponse response = null;
config = new ThreeSharpConfig();
config.AwsAccessKeyID = AppConstants.AwsAccessKeyID;
config.AwsSecretAccessKey = AppConstants.AwsSecretAccessKey;
config.IsSecure = false;

service = new ThreeSharpQuery(config);

for (int i = 0; i < filesToCopy.Count; i++)
{
filename = filesToCopyi.ToString();

try
{
request = new ObjectAddRequest(bucketName, filename);
request.Headers.Add("x-amz-acl", AppConstants.AclType);
request.LoadStreamWithFile(filename);
response = service.ObjectAdd(request);
}
catch (Exception exx)
{
NLogger.LogError(exx.Message);
}
finally
{
if (response != null && response.DataStream != null)
response.DataStream.Close();
request = null;
response = null;
}

}

As you can see I am reusing the service object for every file being sent.
It works just fine but was wondering if I should be creating a new service object for each file or if you see any other "Best Practices" that I should be following?

Thanks,
EE
Coordinator
Apr 11, 2008 at 6:33 PM
Re-using the service object is definitely best practice. Your code looks fine. One change coming up though: The next release will have the Transfer objects implementing IDisposable, so you won't need your "finally" clause anymore. You'll be able to write something like:

try
{
using (request = new ObjectAddRequest(bucketName, filename);
{
request.Headers.Add("x-amz-acl", AppConstants.AclType);
request.LoadStreamWithFile(filename);
using (response = service.ObjectAdd(request)) { }
}
}
catch (Exception exx)
{
NLogger.LogError(exx.Message);
}

This will now guarantee that unmanaged resources are released gracefully.

Thanks,
Joel Wetzel
Affirma Consulting
Aug 26, 2008 at 12:12 AM
Looking at version 1.4, I see how you implemented IDisposable on the Transfer class but I still think you may have a problem based on the prior recommendation.

The ThreeSharpStatistics class holds a HashTable of Transfer objects until the ThreeSharpQuery class is disposed.  This means if all the transfer objects will not be GC'd until you are done with the ThreeSharpQuery instance.

Why hold references to Transfer objects?  Why not just update the ThreeSharpStatistics with byte counts as you go?

Also, any progress or thoughts on throttling?

Thanks,

Mark Lindell
Coordinator
Sep 11, 2008 at 1:09 AM
Hi Mark,
I've looked over the code after reading your concern about the statistics class.  I don't believe this is a problem.  The using statement is for ensuring that unmanaged resources (the datastreams) get released.  They still do, even if the reference to the Transfer object continues to exist.

And there's really very little performance penalty for keeping the objects around.  They are kept around so a comprehensive dashboard can be drawn about the transfers that have been done.

The problem would come if you wanted to upload a million objects.  Then those objects would be problematic.  But the ThreeSharpStatistics class does have a RemoveTransfer(String id) method that removes the Tranfer objects from the HashTable, after which the GC would eventually dispose of them.

Thanks,
Joel Wetzel
Affirma Consulting
Apr 1, 2009 at 6:20 PM
The statistics member in the ThreeSharpQuery object is private, therefore the suggested work-around calling RemoveTransfer(String id) isn't available.  Shouldn't there be a method to clear the statistics or to not store statistics?  For my particular use case, I have a service that is continually reading objects from S3, its accumulating about 100mb per hour running.  I'd prefer just to drop the statistics, or else the recycling of the service object will need to be much more active.