Wednesday, February 10, 2010

Find Duplicate Files Suite (Initial)

FindDup Suite

FindDup Suite contains 3 parts:
- finddup.pl
- dupselect.php
- unlink.pl

finddup.pl is the duplicate file finder which will search duplicate files in
current directory or specified directory. Progress will print to STDERR and
duplicate file list will print to STDOUT.

Usage:
$ perl finddup.pl [Path] > [duplicate-files.txt]


dupselect.php is a PHP web application for helping user to select duplicate
files. Copy/move [duplicate-files.txt] to same directory as dupselect.php in
the web server. For example, creating a web server with PHP in localhost, then
copy dupselect.php and duplicate-files.txt to /var/www/html, and then fire up
a web browser(such as Firefox) and go to http://localhost/dupselect.php and
you will have a screen like this:

duplicate-files.txt DupSelect CalcTotal

The DupSelect link will go to selecting duplicate files using
duplicate-files.txt duplicate file list.
The CalcTotal link show you a list and calculate Total size of all files in the
file.

In DupSelect mode, you will see a group seperated by horizontal line like this:

[generate]
_________________________________________________________________

[ ] ./20080428/kaberen22204.jpg (f66f689f)
[X] ./20080427/kaberen22204.jpg (f66f689f)
_________________________________________________________________

[ ] ./20080105/T2AW_WP_3/T2AW_10/T2AW_10z.psd (274a3199) (*)
[X] ./20080910/Al_4575/T2AW_10/T2AW_10z.psd (274a3199) (**)
[X] ./20071124/T2AW_WP_2/T2AW_10.psd (274a3199)
_________________________________________________________________

[ ] ./20080506/moeura26162.png (37645a01)
[X] ./20081228/moeura46760.png (37645a01)
_________________________________________________________________

The first file in each group is kept by default. dupselect.php will think
the file in deeper directory is more important.
There is two marks, (*) and (**). (*) repersents there is a file in deeper
directory in this group while (**) means there is more than one file in
deeper directory.
The selected files will generate a duplicate-files-delete.txt and
duplicate-files-delete.lst file in server after pressing [generate] button.
duplicate-files-delete.txt is for you to view a report using CalcTotal mode.
duplicate-files-delete.lst is for deleting using unlink.pl.

Inside dupselect.php:
There is some variables in dupselect.php for the sorting and selecting
strategy.

$excludes_order = array('detail');
$includes_order = array('waren','kaberen','moeren','kabeura','moeura');
$deselects = array('this-one-needs-duplicate');
$normal_depth = 2;

The $excludes_order variable controls which file should place in the back.
The $includes_order variable controls which file should place in the fronter,
but the file in deeper directory still have higher priority.
The $deselects variable controls which file should not be selected in
DupSelect mode.
The $normal_depth variable controls which file become more important.


unlink.pl is an utility for deleting files using a .lst list file.
Usage:
$ perl unlink.pl [duplicate-files-delete.lst]


Download:

vnc2flv-20100207 Win32

Compiled with Py2Exe/Python 2.6 with a modified setup.py.
Download: vnc2flv-20100207.7z

setup.py:
#!/usr/bin/env python
import psyco
psyco.full()

import os, sys
from distutils.command.build_ext import build_ext

try:
 # load py2exe distutils extension, if available
 import py2exe
except ImportError:
 pass

from distutils.core import setup, Extension
from vnc2flv import __version__

py2exe_options = dict(includes=['flvscreen'], # Include
                      excludes=['_ssl',  # Exclude _ssl
                                '_hashlib',"bz2", "_ctypes",
                                'doctest',"pywin", "pywin.debugger", "pywin.debugger.dbgcon",
                                "pywin.dialogs", "pywin.dialogs.list",
                                "Tkconstants","Tkinter","tcl",
                                "compiler","email","ctypes","logging","unicodedata",
                                "pydoc_topics","pydoc","httplib","cookielib","cookielib","urllib",
                                'pickle', 'calendar'],  # Exclude standard library
                      dll_excludes=['msvcr71.dll'],  # Exclude msvcr71
                      ascii=1,
                      compressed=1,
                      optimize=2,
                      bundle_files=2
                      )

setup(
  name='vnc2flv',
  version=__version__,
  description='Screen recording tool that captures a VNC session and saves as FLV',
  long_description='Vnc2flv is a screen recorder. It captures a VNC desktop session '
  'and saves it as a Flash Video (FLV) file.',
  license='MIT/X',
  author='Yusuke Shinyama',
  author_email='yusuke at cs dot nyu dot edu',
  url='http://www.unixuser.org/~euske/python/vnc2flv/index.html',
  packages=[
    'vnc2flv'
    ],
  scripts=[
    'tools/flvrec.py',
    'tools/flvcat.py',
    'tools/flvdump.py',
    'tools/flvaddmp3.py',
    'tools/flvsplit.py',
    'tools/recordwin.sh'
    ],
  keywords=['vnc', 'flv', 'video', 'screen recorder'],
  classifiers=[
    'Development Status :: 4 - Beta',
    'Environment :: Console',
    'Intended Audience :: Developers',
    'Intended Audience :: Science/Research',
    'License :: OSI Approved :: MIT License',
  ],
  ext_modules=[Extension('flvscreen',
                         ['flvscreen/flvscreen.c'],
                         #define_macros=[],
                         #include_dirs=[],
                         #library_dirs=[],
                         #libraries=[],
                         )],
  options={'py2exe': py2exe_options},
  console=['tools/flvrec.py', 'tools/flvcat.py', 'tools/flvdump.py', 'tools/flvaddmp3.py', 'tools/flvsplit.py'],
  zipfile = "vnc2flv.lib",
  )