Re: Archive workflow
Posted: Mon Sep 11, 2023 4:48 pm
hehe sounds like, challenge accepted
So before we go to implementation, some basics:
-) for validating if a copy worked, we basically have only one option: read the written file after it has been written.
-) in theory, we could make a checksum of the "source" file "while" it is being read BUT this would require a custom copy tool which we want to avoid we want the most stable copy tool available out there, preferred one that billions of users do use (e.g. windows copy or robocopy)
-) this is why we "need" to "read" the source file also twice, one time for copy and another time for checksum
-) all above means we generate potentially a huge amount of storage and network traffic (which is fine, just be aware about what you do)
Now some thoughts for implementation:
-) the "single file" branch is easy for you, you have source filename and target filename, it is easy to use $xxhash function in a populate processor and see if source and target xxhashes are the same.
-) the folder branch instead will require some logic and some smart resource control. We will need e.g. a find files processor and a sub workflow to get this done
Are you still on board or does it sound too crazy?
So before we go to implementation, some basics:
-) for validating if a copy worked, we basically have only one option: read the written file after it has been written.
-) in theory, we could make a checksum of the "source" file "while" it is being read BUT this would require a custom copy tool which we want to avoid we want the most stable copy tool available out there, preferred one that billions of users do use (e.g. windows copy or robocopy)
-) this is why we "need" to "read" the source file also twice, one time for copy and another time for checksum
-) all above means we generate potentially a huge amount of storage and network traffic (which is fine, just be aware about what you do)
Now some thoughts for implementation:
-) the "single file" branch is easy for you, you have source filename and target filename, it is easy to use $xxhash function in a populate processor and see if source and target xxhashes are the same.
-) the folder branch instead will require some logic and some smart resource control. We will need e.g. a find files processor and a sub workflow to get this done
Are you still on board or does it sound too crazy?