Yeah you've described distributed version control problem ;)<br><br>If your repositories are basically different versions of the same thing - i.e. you copy a part of the original tree out, and add/delete things but didn't try to rename files), then Unison might be able to help you.  It's designed to merge two sets of repository together based on file paths.  For files with the same name it will attempt to detect which one is later, and if it can't it will prompt you for reconciliation. 

<br><br><div style="margin-left: 40px;"><a href="http://www.cis.upenn.edu/~bcpierce/unison/">http://www.cis.upenn.edu/~bcpierce/unison/</a><br></div><br>If your problem is duplicate files with different names, then MD5/SHA1 will help you find dupes across file names, but it can't solve the file versioning problem.  

W.r.t the folder structure issue, you can actually preserve the folder structure if you convert the dupe files to symlinks (at least on non-windows platform). But if you have changing files with different names, then there probably will be a manual effort involved if you want to version them as the same file (either manually checkin to a source control, or ensure the files following a naming convention and have a script checkin for you).

<br><br><div><span class="gmail_quote">On 10/25/07, <b class="gmail_sendername">Joe Armstrong</b> <<a href="mailto:erlang@gmail.com">erlang@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I have an interesting? problem. Over the last ? years (> 10) I have been upgrading my home system this usually involved buying a bigger disk and copying most (or all) of the files from the old disk to the new disk or disks.

<br><br>I've also been backing up the family photos etc on USB disks.<br><br>Now I have > 1 Tera bytes of files spread over c. 10 computers and<br>3 pluggable USB disks. Having made a "backup" both the original and the

<br>copy live lives of their own.<br><br>Does anybody know of a good algorithm to consolidate/merge all this<br>data or do I have<br>to write my own? One immediate thought is to compute the MD5 sums of<br>all files on all

disk and thus find all duplicates - then create a master copy of all unique files but the file names will be wrong and this might result in a big mess. This cannot be an uncommon problem - any ideas how to solve it?

<br><br>/Joe<br>_______________________________________________<br>erlang-questions mailing list<br><a href="mailto:erlang-questions@erlang.org">erlang-questions@erlang.org</a><br><a href="http://www.erlang.org/mailman/listinfo/erlang-questions">

http://www.erlang.org/mailman/listinfo/erlang-questions</a><br></blockquote></div><br>