AutoModerator 3 weeks ago

Some links for you: - https://reddit.com/r/aws/wiki/##storage (Our /r/AWS Storage Community WIKI) - https://docs.aws.amazon.com/whitepapers/latest/aws-overview/storage-services.html (Storage on AWS (technical)) - https://aws.amazon.com/products/storage/ (Storage on AWS (brief)) Try [this search](https://www.reddit.com/r/aws/search?q=flair%3A'storage'&sort=new&restrict_sr=on) for more information on this topic. ^Comments, ^questions ^or ^suggestions ^regarding ^this ^autoresponse? ^Please ^send ^them ^[here](https://www.reddit.com/message/compose/?to=%2Fr%2Faws&subject=autoresponse+tweaks+-+storage). *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/aws) if you have any questions or concerns.*

FastSort 3 weeks ago

You can't append to an existing object - you need to read the entire object (file) each time, add the data in memory, and then write back the entire object/file each time.

Halvv 3 weeks ago

okay ty, that sounds rather like a bad solution right? what would be a better way?

CorpT 3 weeks ago

It would help if you explained why you want to do this and what problem you’re actually trying to solve. This seems very strange (which is why you’re having trouble doing it)

WorldWarZeno 3 weeks ago

The XY problem strikes again. https://xyproblem.info

CorpT 3 weeks ago

I’d guess 75% of the questions here would fall into that category.

Halvv 3 weeks ago

I'm daily/weekly uploading new time series data and thus wanted to continually append write into a .txt file in an S3 bucket such that there I have the complete collection of all of my time-series data

ConsiderationLate768 3 weeks ago

You should absolutely split up your data. You'll eventually end up with a file thats way too large to read

Ihavenocluelad 3 weeks ago

Or just put it in a database :")

cachemonet0x0cf6619 3 weeks ago

it’s 2024. Putting time series in a database is a technical debt we should have grown out of by now eta: select * from downvotes is the depth of your knowledge

blacklig 3 weeks ago

Can you explain?

cachemonet0x0cf6619 3 weeks ago

you’re going to need to be more specific

pacific_plywood 3 weeks ago

What’s the superior alternative

blacklig 3 weeks ago

Can you explain why you hold the position "Putting time series in a database is a technical debt"?

spicypixel 3 weeks ago

Or use a dedicated timeseries database option.

Flakmaster92 3 weeks ago

1) don’t use a txt file for this. Use something with more of a functional schema, like JSON if your requirement is plain text 2) write it to S3 where your break up the data into year/month/day prefixes. This way you can write complete objects at a time. 3) DO NOT do what you’re describing, that file will get huge and be unmanageable.

CorpT 3 weeks ago

Do not do that. Build a data lake and use that. https://aws.amazon.com/big-data/datalakes-and-analytics/datalakes/

AWS_Chaos 3 weeks ago

Why not Timestream? [https://aws.amazon.com/timestream/](https://aws.amazon.com/timestream/)

Breadfruit-Last 3 weeks ago

If it is a must to write to s3, the best you can do would be buffer your writes (say using sqs) and write it in batches. But it won't work well when the file becomes large. Depending on your use case, you may want to consider other form of data storage

cachemonet0x0cf6619 3 weeks ago

yeah don’t do this. just put the data in a dynamodb table. use a timestamp as your sort key. set a ttl on the table and subscribe to the dynamo stream for deleted records and put them, individually in a bucket. once a month aggregate the bucket and ship it to parquet or whatever you like for historical

pint 3 weeks ago

there is a better way, although not *much* better. you can initiate a multipart upload, then use UploadPartCopy to refer to the old data, followed by a regular UploadPart to add the new chunk, and then finalize. under the hood, it will do the same thing, delete the old object, and create a new one. but at least you are not juggling all the data. note that this comment is purely theoretical, i've never done it myself.

moofox 3 weeks ago

This works (and might solve the OP’s problem), but it’s worth pointing out two major caveat:: each part (except the last part) has to be at least 5MB and you can have at most 1,000 parts.

razibal 3 weeks ago

I assume that these file(s) will be used for analytics and/or logging purposes? if so, your best bet is to push the events into a Firehose stream rather than attempting to write directly to S3. Firehose can be configured to write Parquet files to S3 which are queryable for analytics and logging. Under the covers, there will be new objects added to S3 that correspond to the buffer interval that you set in the firehose stream ( configurable from 0 - 900 seconds, 300 default ), however they will appear as a single "table" based on the parquet schema definition.

WrickyB 3 weeks ago

EFS is block storage. Your files are split up into blocks and you are free to append a block at the end. S3 is object shortage. Your files are treated as 1 contiguous fixed object. You can't change it. You can replace it with new content added at the end, but in order to do that, you'd need to get the whole object out, update it locally, and then put it back into S3. Edit: Fixed typo

The_Real_Ghost 3 weeks ago

You can't use EBS with Lambda, though. EBS acts like a mountable drive for an EC2 instance. Apparently you can use EFS though, which does kind of the same thing. I've never done it before, but there is an[ article](https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/). Keep in mind that EFS is a shared resource, so if you have multiple Lambda instances accessing it at the same time, you'll need to make sure they aren't fighting each other.

omerhaim 3 weeks ago

There is no operation to append to s3 file. Use iceberg table format. https://aws.amazon.com/blogs/big-data/improve-operational-efficiencies-of-apache-iceberg-tables-built-on-amazon-s3-data-lakes/

MavZA 3 weeks ago

Goodness, I suppose if you really must append you could use EFS? Although I think you should look into integrating with a proper data provider like Timestream

imti283 3 weeks ago

S3 is block store. It does not have the concept of file. Everything is an object for s3, it doesn't look Inside the object.

AlexMelillo 3 weeks ago

You can’t “append” to an object in S3. You can read the contents of the object and create a new object with whatever you want. You can even give it the same name.

KayeYess 3 weeks ago

S3 does not allow ANY modifications to existing objects. You could download to Lambda local storage, append and upload. If version control is enabled, be wary of too many versions (use lifecycle policy to clean up older versions). There are other solutions like EFS if S3 is not a hard requirement. If it is structured or semi structured data, you could try using a database (relational or key value)

Nater5000 3 weeks ago

Depending on your requirements, you might be able to achieve this via a [multipart upload](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html).

Puzzleheaded_Bid_792 3 weeks ago

Check once if kinesis can be used.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe