Page 3 of 3

Re: Archive workflow

Posted: Mon Sep 11, 2023 4:48 pm
by emcodem
hehe sounds like, challenge accepted :D
So before we go to implementation, some basics:
-) for validating if a copy worked, we basically have only one option: read the written file after it has been written.
-) in theory, we could make a checksum of the "source" file "while" it is being read BUT this would require a custom copy tool which we want to avoid we want the most stable copy tool available out there, preferred one that billions of users do use (e.g. windows copy or robocopy)
-) this is why we "need" to "read" the source file also twice, one time for copy and another time for checksum
-) all above means we generate potentially a huge amount of storage and network traffic (which is fine, just be aware about what you do)

Now some thoughts for implementation:
-) the "single file" branch is easy for you, you have source filename and target filename, it is easy to use $xxhash function in a populate processor and see if source and target xxhashes are the same.
-) the folder branch instead will require some logic and some smart resource control. We will need e.g. a find files processor and a sub workflow to get this done

Are you still on board or does it sound too crazy?

Re: Archive workflow

Posted: Tue Sep 12, 2023 10:24 am
by ring4life70
:-) Absolutely yes, I am here and I am so curious to figure out how to complete this workflow.

Thanks

Re: Archive workflow

Posted: Tue Sep 12, 2023 10:21 pm
by emcodem
Great :D
Before we go on i'd like to know how you want to approach this. It will depend a lot on how exactly things work for you, e.g.

how many files are being archived at once in folder mode, is it like above 100?
do we have to expect lots of stuff going on at once? e.g. multiple paralell archive processes are started on monday morning as weekly routine?
how big are the to be archived files?
how fast is the network (or local drive if you dont work on network)?
and do you delete it directly after archiving?
Do we have any time pressure or is everything totally relaxed?
Do we need to take care a lot about networking/IO resources or can we go as fast as we like?

Re: Archive workflow

Posted: Wed Sep 13, 2023 5:10 am
by ring4life70
thanks Emcodem, below are my answers
how many files are being archived at once in folder mode, is it like above 100?
no more than 100
do we have to expect lots of stuff going on at once? e.g. multiple paralell archive processes are started on monday morning as weekly routine?
I prefer to start the archiving workflow manually, and if the folders are not very large, also multiple folders in parallel
how big are the to be archived files?
about 1 tb, but we can ignore the control of materials already archived

how fast is the network (or local drive if you dont work on network)?
FFatrans workstation is directly connected to the archive NAS at 10GB

and do you delete it directly after archiving?

Yes after a manual spot check

Do we have any time pressure or is everything totally relaxed?

Absolutely no pressure we can work totally relaxed

Do we need to take care a lot about networking/IO resources or can we go as fast as we like?
The ffatrans workstation is directly connected to the archive NAS with a 10gb network, and to to the production network with the second 1 gb port, the production network is shared with the rest of the facility, I don't know what kind of impact the workflow might have on the production network in terms of network resources.

Re: Archive workflow

Posted: Thu Sep 14, 2023 2:40 pm
by emcodem
OK thanks for all the infos.
OKOK so i see it kind of overdose and pretty problematic to implement the checksum stuff by workflow only, it would be kind of preferrable if the copy tool itself does it. Before i said we don't want little used copy tools but i just saw that there is a new version of FastCopy.exe which i use at work since a long time actually.

The news is that there is an "old version 4.x" up to 2017 or such and a new version (5.x) that just arrived. The inventor now (from Version 5+) wants users that use it frequently in their working place to "buy a license" (50 USD). However, the tool itself runs without noticeable restrictions also without donation so you can use it for testing and setting everything up and once you are happy with it, you might be so fair and give the inventor the credits he deserves.

https://fastcopy.jp/

The tool can be used with GUI and it is portable, you can just download, extract and test so you see what i mean.
It has the ability to do the checksums exactly as i described but with the PRO that the checksum of the reading part is done actually while reading the file (i talked before about it). Overall the tool seems to be a perfect match for what you want to do.
Actually you might just use the tool as it is, without ffastrans workflow. Anyway, that is just one option. as the Tool has now (in new version) a nice cmd mode, we can pretty easy use it just as we use robocopy in the workflow.

Can you check it out and let me know?

The output of the commanline version of the tool looks pretty similar to robocopy, it should be trivial to migrate the current workflow from robocopy to fastcopy.

Code: Select all

C:\Users\emcod>C:\Users\emcod\Downloads\FastCopy5.3.1_x64\fcp.exe  C:\temp\4.4 /to="c:\temp\archive\"
=================================================
fcp(ver5.3.1) start at 2023/09/14 16:30:31
<Source>  C:\temp\4.4
<DestDir> c:\temp\archive\
<Command> Diff (Size/Date)
<FileLog> C:\Users\emcod\Downloads\FastCopy5.3.1_x64\Log\20230914-163031-0.log
-------------------------------------------------

TotalRead  = 7,451 MiB
TotalWrite = 7,451 MiB
TotalFiles = 26 (2)
TotalTime  = 9.1 sec
TransRate  = 856.2 MB/s
FileRate   = 2.85 files/s

Result : (ErrFiles : 0 / ErrDirs : 0) at 2023/09/14 16:30:41

Just a short personal note on fastcopy.exe: it is a tool from geeks for geeks but compared to all other "local copy tools" that i ever tested, this one gave me most control over the copy process. The guy really knows what he is doing, i was kind of impressed by it. At work i was able to copy from external ssd to internal ssd about 10% faster than windows explorer did (e.g. 9.9Gbit/s instead of 8.9 or similar) - this was pretty important because i got users waiting for the copy to finish...
Also, it is pretty funny that there is a new version of fastcopy.exe that allows smooth commandline operation now. Initially i thought about giving you my modified old version of fastcopy for your workflow but it was designed to copy single files only, so i would have needed to change the source code again for you which is kind of unpleasant - especially if there is already a newer version that does what we want :D

Re: Archive workflow

Posted: Mon Sep 18, 2023 8:36 am
by ring4life70
Thanks Emcodem,
you have convinced me :-) the new version of FastCopy would be a good alternative, let me do some test and eventually I will get back to you.
thanks again for your time

Re: Archive workflow

Posted: Tue Oct 24, 2023 3:29 pm
by ring4life70
Hi,
I am trying to include in the target path the archive creation date, but if i use the variable date:

cmd /C "C:\Windows\System32\Robocopy.exe "%s_source%" "\\192.168.100.1\RemoteArchive2\[MASTER ARCHIVE]\%s_original_folder%\%i_original_day%_%i_original_mon%_%i_original_year%\%s_original_name%" *.* /MOVE /E /NP & IF %ERRORLEVEL% LEQ 1 exit 0"

the process hangs and does not move forward

Re: Archive workflow

Posted: Wed Oct 25, 2023 10:26 am
by emcodem
@ring4life70 i don't see any issue with this command in first place, what do the logs say?
If it is reproduceable, you need to copy the calculated real command from the log, execute it on commandline manually and see what happens, maybe it prompts for some userinput or similar.

Re: Archive workflow

Posted: Thu Oct 26, 2023 10:03 am
by ring4life70
the problem is the destination, if i use a particular network drive as destination, the process hangs and does not move forward, but if i use a local drive or a different network drive, robocopy work fine without any issue :-(

Re: Archive workflow

Posted: Thu Oct 26, 2023 3:47 pm
by emcodem
can you reproduce the issue manually on commanline? Does ffastrans run as service and did you check the service credentials permissions to the share and such?