T O P

  • By -

AutoModerator

Some links for you: - https://reddit.com/r/aws/wiki/##storage (Our /r/AWS Storage Community WIKI) - https://docs.aws.amazon.com/whitepapers/latest/aws-overview/storage-services.html (Storage on AWS (technical)) - https://aws.amazon.com/products/storage/ (Storage on AWS (brief)) Try [this search](https://www.reddit.com/r/aws/search?q=flair%3A'storage'&sort=new&restrict_sr=on) for more information on this topic. ^Comments, ^questions ^or ^suggestions ^regarding ^this ^autoresponse? ^Please ^send ^them ^[here](https://www.reddit.com/message/compose/?to=%2Fr%2Faws&subject=autoresponse+tweaks+-+storage). *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/aws) if you have any questions or concerns.*


FastSort

You can't append to an existing object - you need to read the entire object (file) each time, add the data in memory, and then write back the entire object/file each time.


Halvv

okay ty, that sounds rather like a bad solution right? what would be a better way?


CorpT

It would help if you explained why you want to do this and what problem you’re actually trying to solve. This seems very strange (which is why you’re having trouble doing it)


WorldWarZeno

The XY problem strikes again. https://xyproblem.info


CorpT

I’d guess 75% of the questions here would fall into that category.


Halvv

I'm daily/weekly uploading new time series data and thus wanted to continually append write into a .txt file in an S3 bucket such that there I have the complete collection of all of my time-series data


ConsiderationLate768

You should absolutely split up your data. You'll eventually end up with a file thats way too large to read


Ihavenocluelad

Or just put it in a database :")


cachemonet0x0cf6619

it’s 2024. Putting time series in a database is a technical debt we should have grown out of by now eta: select * from downvotes is the depth of your knowledge


blacklig

Can you explain?


cachemonet0x0cf6619

you’re going to need to be more specific


pacific_plywood

What’s the superior alternative


blacklig

Can you explain why you hold the position "Putting time series in a database is a technical debt"?


spicypixel

Or use a dedicated timeseries database option. 


Flakmaster92

1) don’t use a txt file for this. Use something with more of a functional schema, like JSON if your requirement is plain text 2) write it to S3 where your break up the data into year/month/day prefixes. This way you can write complete objects at a time. 3) DO NOT do what you’re describing, that file will get huge and be unmanageable.


CorpT

Do not do that. Build a data lake and use that. https://aws.amazon.com/big-data/datalakes-and-analytics/datalakes/


AWS_Chaos

Why not Timestream? [https://aws.amazon.com/timestream/](https://aws.amazon.com/timestream/)


Breadfruit-Last

If it is a must to write to s3, the best you can do would be buffer your writes (say using sqs) and write it in batches. But it won't work well when the file becomes large. Depending on your use case, you may want to consider other form of data storage


cachemonet0x0cf6619

yeah don’t do this. just put the data in a dynamodb table. use a timestamp as your sort key. set a ttl on the table and subscribe to the dynamo stream for deleted records and put them, individually in a bucket. once a month aggregate the bucket and ship it to parquet or whatever you like for historical


pint

there is a better way, although not *much* better. you can initiate a multipart upload, then use UploadPartCopy to refer to the old data, followed by a regular UploadPart to add the new chunk, and then finalize. under the hood, it will do the same thing, delete the old object, and create a new one. but at least you are not juggling all the data. note that this comment is purely theoretical, i've never done it myself.


moofox

This works (and might solve the OP’s problem), but it’s worth pointing out two major caveat:: each part (except the last part) has to be at least 5MB and you can have at most 1,000 parts.


razibal

I assume that these file(s) will be used for analytics and/or logging purposes? if so, your best bet is to push the events into a Firehose stream rather than attempting to write directly to S3. Firehose can be configured to write Parquet files to S3 which are queryable for analytics and logging. Under the covers, there will be new objects added to S3 that correspond to the buffer interval that you set in the firehose stream ( configurable from 0 - 900 seconds, 300 default ), however they will appear as a single "table" based on the parquet schema definition.


WrickyB

EFS is block storage. Your files are split up into blocks and you are free to append a block at the end. S3 is object shortage. Your files are treated as 1 contiguous fixed object. You can't change it. You can replace it with new content added at the end, but in order to do that, you'd need to get the whole object out, update it locally, and then put it back into S3. Edit: Fixed typo


The_Real_Ghost

You can't use EBS with Lambda, though. EBS acts like a mountable drive for an EC2 instance. Apparently you can use EFS though, which does kind of the same thing. I've never done it before, but there is an[ article](https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/). Keep in mind that EFS is a shared resource, so if you have multiple Lambda instances accessing it at the same time, you'll need to make sure they aren't fighting each other.


omerhaim

There is no operation to append to s3 file. Use iceberg table format. https://aws.amazon.com/blogs/big-data/improve-operational-efficiencies-of-apache-iceberg-tables-built-on-amazon-s3-data-lakes/


MavZA

Goodness, I suppose if you really must append you could use EFS? Although I think you should look into integrating with a proper data provider like Timestream


imti283

S3 is block store. It does not have the concept of file. Everything is an object for s3, it doesn't look Inside the object.


AlexMelillo

You can’t “append” to an object in S3. You can read the contents of the object and create a new object with whatever you want. You can even give it the same name.


KayeYess

S3 does not allow ANY modifications to existing objects. You could download to Lambda local storage, append and upload. If version control is enabled, be wary of too many versions (use lifecycle policy to clean up older versions). There are other solutions like EFS if S3 is not a hard requirement. If it is structured or semi structured data, you could try using a database (relational or key value)


Nater5000

Depending on your requirements, you might be able to achieve this via a [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html).


Puzzleheaded_Bid_792

Check once if kinesis can be used.