Use fdupes of FSlint.
fdupes -r -S /directory > dupes.txt
-r: search recursive
-S: show filesize of each duplicate
>dupes.txt: make a file with a register of every duplicate.
To delete the files you don’t want to conserve, you can do a list of them, and delete the files in the list:
1. Open the file dupes.txt in Writer.
2. Replace page break (\p) with tab (\t). I recommend to use the extension Advanced Search for OpenOffice. NOTE: Each group of duplicate files are delimited by two page breaks in dupes.txt, so one of them will remain, and will serve for delimite each row of the spreadsheet.
3. Save de file. Then change extension .txt to .csv.
4. Open dupes.csv in Calc, creating columns with the tabs.
5. Delete the cells with the archives you don’t need. Delete also the column with the extensions of the files.
6. Save as a file .csv. Mark the option Edit filters setup, and save without field delimiters or text delimiters
7. Change extension .csv to .txt (I’m not sure if it’s necesary).
8. Type in terminal:
while read file; do rm “$file”; done < dupes.txt
All the files in the list will be deleted. If you leave rows in blanck rm will send an error message, but it doesn’t mind.
FSlint is a gui to find duplicate files an other useful searches.