file find
File::Find 5.8.8
File::Find is a Perl module to traverse a directory tree. more>>
SYNOPSIS
use File::Find;
find(&wanted, @directories_to_search);
sub wanted { ... }
use File::Find;
finddepth(&wanted, @directories_to_search);
sub wanted { ... }
use File::Find;
find({ wanted => &process, follow => 1 }, .);
These are functions for searching through directory trees doing work on each file found similar to the Unix find command. File::Find exports two functions, find and finddepth. They work similarly but have subtle differences.
find
find(&wanted, @directories);
find(%options, @directories);
find() does a depth-first search over the given @directories in the order they are given. For each file or directory found, it calls the &wanted subroutine. (See below for details on how to use the &wanted function). Additionally, for each directory found, it will chdir() into that directory and continue the search, invoking the &wanted function on each file or subdirectory in the directory.
finddepth
finddepth(&wanted, @directories);
finddepth(%options, @directories);
finddepth() works just like find() except that is invokes the &wanted function for a directory after invoking it for the directorys contents. It does a postorder traversal instead of a preorder traversal, working from the bottom of the directory tree up where find() works from the top of the tree down.
File::Find::Rule 0.30
File::Find::Rule is an alternative Perl interface to File::Find. more>>
SYNOPSIS
use File::Find::Rule;
# find all the subdirectories of a given directory
my @subdirs = File::Find::Rule->directory->in( $directory );
# find all the .pm files in @INC
my @files = File::Find::Rule->file()
->name( *.pm )
->in( @INC );
# as above, but without method chaining
my $rule = File::Find::Rule->new;
$rule->file;
$rule->name( *.pm );
my @files = $rule->in( @INC );
File 4.21
File attempts to classify files depending on their contents and prints a description if a match is found. more>>
The file command, if youre not familiar with it, is a command-line tool that tells you in words what kind of data a file contains. Unlike MS-Windows, UNIX and other systems dont rely on filename extentions to tell you the type of a file, but look at the files actual contents. This is, of course, more reliable, but requires a bit of I/O.
The original file command shipped with Bell Labs UNIX but was unavailable in source form to the masses before Ians reimplementation.
This file command (and magic file) was originally written by Ian Darwin (who still contributes occasionally) and is now maintained by a group of developers lead by Christos Zoulas.
Whos using it?
Every known BSD distribution (FreeBSD, NetBSD, OpenBSD, Darwin/Mac OS X, etc)
Every known Linux distribution
The Apache httpd server mod_mime_magic module uses the file commands innards to make file type guessing more reliable under Apache HTTPD.
File::Find::Similars 1.1
File::Find::Similars is a Similar files locator. more>>
SYNOPSIS
use File::Find::Similars;
File::Find::Similars->init(0, @ARGV);
similarity_check_name();
Similar-sized and similar-named files are picked as suspicious candidates of duplicated files.
What descirbes it better than a actual output. Sample suspicious duplicated files:
## =========
1574 PopupTest.java /home/tong/.../examples/chap10
1561 CardLayoutTest.java /home/tong/.../examples/chap1
1570 PopupButtonFrame.class /home/tong/.../examples/chap6
## =========
22984 BinderyHelloWorld.jpg /home/tong/...
17509 MacHelloWorld.gif /home/tong/...
The first column is the size of the file, 2nd the name, and 3rd the path. The motto for the listing is that, I would rather my program overkills (wrongly picking out suspicious ones) than neglects something that would cause me otherwise years to notice.
By default, File::Find::Similars(3) assumes that similar files within the same folder are OK. Hence you will not get duplicate warnings for generated files (like .o, .class or .aux, and .dvi files) or other file series.
Once you are sure that there are no duplications between folders and want File::Find::Similars(3) to scoop further, specify the first parameter as 1. This is very good to eliminate similar mp3 files within the same folder, or downloaded files from big sites where different packaging methods are used, e.g.:
## =========
66138 jdc-src.tar.gz .../ftp.ora.com/published/oreilly/java/javadc
147904 jdc-src.zip .../ftp.ora.com/published/oreilly/java/javadc
File::FindByRegex 1.2
File::FindByRegex is a Perl wrapper for File::Find. more>>
SYNOPSYS
use File::FindByRegex;
$find = File::FindByRegex->new( {
-srcdir => [C:tmpteradata-sql],
-tardir => C:tmpteradata-sqldoc,
-find => {no_chdir => 1},
-callbacks =>
{
qr/.p(l|m|od|t)$/oi, => &treat_pod,
qr/sql.+?.sql$/oi, => treat_pod,
qr/.html?$/oi, => &treat_html,
qr/.txt$/oi => &treat_txt,
qr/.(jpg|gif|png|bmp|tiff)$/ => sub { &treat_graphic(@_) }
},
-ignore =>
[
qr/eg.+.sql$/oi, # *.sql in directory eg
qr/java/oi, # All files in java directory.
],
-excepts =>
[
qr/java.*?.html?$/oi # dont ignore *.html in java/
]
});
sub File::FindByRegex::treat_pod
{
my $this = shift;
...
}
sub File::FindByRegex::treat_html
{
my $this = shift;
...
}
sub File::FindByRegex::treat_txt
{
my $this = shift;
...
}
sub File::FindByRegex::treat_graphic
{
my $this = shift;
...
}
$find->travel_tree;
Remote File Index 1.2
Remote File Index is an add-on for Plone which keeps track of a document only by its url. more>>
Did you ever find a huge pdf file that youd like to keep track of but wouldnt like to copy it entirely on your server ?
Now RemoteFileIndex indexes the content in the portal Catalog and only keeps the url of that document.
Works with:
- Plone 2.5.2
- Plone 2.5.1
- Plone 2.5
Enhancements:
- better integration with ATContentType
File::Find::Closures 1.06
File::Find::Closures is a Perl module with functions you can use with File::Find. more>>
SYNOPSIS
use File::Find;
use File::Find::Closures qw(:all);
my( $wanted, $list_reporter ) = find_by_name( qw(README) );
File::Find::find( $wanted, @directories );
File::Find::find( { wanted => $wanted, ... }, @directories );
my @readmes = $list_reporter->();
SOME PARTS ARE NOT IMPLEMENTED YET! THIS IS ALPHA ALPHA SOFTWARE: A MERE SHELL OF AN IDEA.
I wrote this module as an example of both using closures and using File::Find. Students are always asking me what closures are good for, and heres some examples. The functions mostly stand alone (i.e. they dont need the rest of the module), so rather than creating a dependency in your code, just lift the parts you want).
When I use File::Find, I have two headaches---coming up with the &wanted function to pass to find(), and acculumating the files.
This module provides the &wanted functions as a closures that I can pass directly to find(). Actually, for each pre-made closure, I provide a closure to access the list of files too, so I dont have to create a new array to hold the results.
The filenames are the full path to the file as reported by File::Find.
Unless otherwise noted, the reporter closure returns a list of the filenames in list context and an anonymous array that is a copy (not a reference) of the original list. The filenames have been normalized by File::Spec::canonfile unless otherwise noted. The list of files has been processed by File::Spec::no_upwards so that "." and ".." (or their equivalents) do not show up in the list.
File::Find::Parallel 0.0.4
File::Find::Parallel allows you to traverse a number of similar directories in parallel. more>>
SYNOPSIS
use File::Find::Parallel;
my $ffp = File::Find::Parallel->new( qw( /foo /bar ) );
print "Union:n";
my $union = $ffp->any_iterator
print " $_n" while $_ = $union->();
print "Intersection:n";
my $inter = $ffp->all_iterator
print " $_n" while $_ = $inter->();
File::Find is the ideal tool for quickly scanning a single directory. But sometimes its nice to be able to perform operations on multiple similar directories in parallel. Perhaps you need to compare the contents of two directories or convert files that are shared in more than one directory into hard links.
This module manufactures iterators that visit each file and directory in either the union or the intersection of a number of directories. Hmm. What does that mean?
Given two directory trees like this
foo
foo/a
foo/b/c
foo/d
bar
bar/a
bar/b
bar/e
you can choose to work with the intersection of the two directory structures:
.
./a
./b
That is the subdirectories and files that the foo and bar share.
Alternately you can work with the union of the two directory structures:
.
./a
./b
./b/c
./d
./e
Still not clear? Well, if you wanted to do a recursive diff on the two directories youd iterate their union so you could report files that were present in foo but missing from bar and vice-versa.
If, on the other hand you wanted to scan the directories and find all the files that are common to all of them youd iterate their intersection and receive only files and directories that were present in all the directories being scanned.
The any_iterator and all_iterator are built on a more general purpose method: want_iterator. If, for example, you want to make links between files that are found in more than one directory you might get your iterator like this:
my $iter = $ffp->want_iterator( 2 );
The apparently magic 2 reflects the fact that if youre going to be making links you need at least two files. No matter how many directories you are iterating over in parallel you will only see files and directories that appear in at least two of those directories.
File::Find::Parallel can scan any number of directories at the same time. Heres an example (on Unix systems) that returns the list of all files and directories that are contained in all home directories.
use File::Glob :glob;
use File::Find::Parallel;
my $find = File::Find::Parallel->new( bsd_glob( /home/* ) );
my @common = ( );
my $iter = $find->all_iterator;
while ( defined my $obj = $iter->() ) {
push @common, $obj;
}
print "The following files are common to ",
"all directories below /home :n";
print " $_n" for @common;
For a complete concrete example of its use see lncopies in the bin subdirectory of this distribution.
Iterators
The iterator returned by any_iterator, all_iterator or want_iterator is a code reference. Call it to get the next file or directory. When all files and directories have been returned the iterator will return undef.
Once created an iterator is independent of the File::Find::Parallel object that created it. If the object goes out of scope and is destroyed during the life of the iterator it will still function normally.
You may have many active iterators for a single File::Find::Parallel object at any time.
Test::File::Find::Rule 1.00
Test::File::Find::Rule is a Perl module to test files and directories with File::Find::Rule. more>>
SYNOPSIS
use Test::File::Find::Rule;
# Check that all files in $dir have sensible names
my $rule = File::Find::Rule
->file
->relative
->not_name(qr/^[w]{1,8}.[a-z]{3,4}$/);
match_rule_no_result($rule, $dir, File names ok);
# Check that all our perl scripts have use strict !
my $rule = File::Find::Rule
->file
->relative
->name(@perl_ext)
->not_grep(qr/^s*uses+strict;/m, sub { 1 });
match_rule_no_result($rule, $dir, use strict usage);
# With some help of File::Find::Rule::MMagic
# Check that there is less than 10 images in $dir
# with a size > 1Mo
my $rule = File::Find::Rule
->file
->relative
->magic(image/*)
->size(>1Mo);
match_rule_nb_result($rule, $dir, 100, A lot of big images);
# Check the exact result from a rule
my $dirs = [qw(web lib data tmp)];
my $rule = File::Find::Rule
->directory
->mindepth(1)
->maxdepth(1)
->relative;
match_rule_array($rule, $dir, $dirs, Directory structure ok));
File::Find::Rule::XPath 0.03
File::Find::Rule::XPath is a Perl module that contains rule to match on XPath expressions. more>>
SYNOPSIS
use File::Find::Rule::XPath;
my @files = File::Find::Rule->file
->name(*.dkb)
->xpath( //section/title[contains(., "Crustacean")] )
->in($root);
This module extends File::Find::Rule to provide the ability to locate XML files which match a given XPath expression.
METHODS
xpath( $xpath_expression )
Matches XML files which contain one or more nodes matching the given XPath expression. Files which are not well formed XML are silently skipped.
If no XPath expression is supplied, the value / is used. This will match all files which are well formed XML.
Email::Find 0.10
Email::Find allows you to find RFC 822 email addresses in plain text. more>>
Email::Find is a module for finding a subset of RFC 822 email addresses in arbitrary text (see "CAVEATS"). The addresses it finds are not guaranteed to exist or even actually be email addresses at all (see "CAVEATS"), but they will be valid RFC 822 syntax.
Email::Find will perform some heuristics to avoid some of the more obvious red herrings and false addresses, but theres only so much which can be done without a human.
Finds email addresses in the text and executes callback registered.
The callback is given two arguments. The first is a Mail::Address object representing the address found. The second is the actual original email as found in the text. Whatever the callback returns will replace the original text.
File Splitter 1.3
Split large text/html files into smaller files. I find it much faster and more accurate than cut and paste. You embed commands in the big file telling it which pieces of it are to go where, then let Splitter do the work. It is much faster and more accurate than trying to select huge blocks of text in an editor. You dont accidentally lose or duplicate text. Keeping files small makes the site more responsive. more>>
File Splitter - Split large text/html files into smaller files. I find it
much faster and more accurate than cut and paste.
You embed commands in the big file telling it which pieces
of it are to go where, then let Splitter do the work. It is
much faster and more accurate than trying to select huge
blocks of text in an editor. You dont accidentally lose or
duplicate text. Keeping files small makes the site more
responsive.
In the following pretend that [...] are actually lessthan...greater than.
You embed multiple [split tags in the file to be split of
the form:
[split charlie.html]
...
stuff that will end up in the charlie.html file.
...
[/split]
The text between the [split xxx] and [/split] tags is split
off into that named file and the text is removed from the
original file along with the tags.
1. Filenames may be absolute or relative, with no quotes or spaces.
2. Tags may be nested, but they must balance (equal number
of [split xxx] and [/split]).
3. Tags are case-insensitive, i.e. may be lower or upper case.
4. Multiple [split xxx] tags may be directed to the same
file, where they will be appended.
5. If the files mentioned in the split tags already exist,
they will be overwritten.
6. Anything not inside [split xxx].. [/split] is retained in
the original file. Everything else is removed.
The file being split must be small enough to fit in RAM.
Java array addressing limits the file to 2GB, though other
considerations mean in practice the largest file you can
handle will be smaller still.
To install, Extract the zip download with Winzip, available from
http://www.winzip.com (or similar unzip utility) into any
directory you please, often C:\ -- ticking off the (user
folder names) option. To run as an application, type:
java -jar C:\com\mindprod\splitter\splitter.jar x.html
adjusting as necessary to account for where the jar file is.
Enhancements:
Version 1.3
allow you to specify encoding
System Requirements:<<less
file*HANDLER 0.13
file*HANDLER is primarily a Perl script which coordinates some free media conversion packages with a PostgreSQL back end. more>>
Since the server caches media into the SQL database on demand, as the network grows, the network improves.
Its gridded directory sharing/browsing/searching with streaming audio/video as well as flat text/doc/pdf/image display for everyone. Its written with a few hooks for tags that would be included in your actual front page so that the UI is discard-able -- anyone can quickly rewrite a whole new [GT]UI without having to worry about the syntax of the newest version of dojo.licio.r or whatever.
If you wanted to ignore the JS/HTML/CSS hooks then you can easily use the system to make direct requests that just return lists formatted as HTML table-bodies. In other words, the markup IS the markup.
As such, Ive whipped up a Dojo 0.2 Widget that coordinates the serving backend with a UI so anyone can embed f*H functionality anywhere, or easily customize a provided default page.
A file*HANDLER server is really a few constituent parts Ive tied up for you (top down):
- A local web page providing the UI(served by an HTTP server of your choice) that is generated by a cgi script with embedded AJAX.
- A secondary portion of the same CGI script, acting as middle-ware, which communicates, via AJAX, with the local front page to reconcile asynchronous JavaScript requests with the file*HANDLER sub-network back-end.
- An always-on network server written in PERL that serves to the front and end communicates laterally with everyone elses file*HANDLER back-end PERL server additionally, it manages indexing of content directories you choose to serve.
- A PostgreSQL database that is accessed only via internal PERL routines called from your front page.
So for example, a remote user comes to your site. First, not only can they browse and search your files, but they can also browse and search the files of anyone else hosting a file*HANDLER server that your local server knows about. (file*HANDLER identifies other servers on the network automatically). The user can now read/view/listen/watch by stream any content they find from whomevers server. Theres no download, so theres so actual sharing, just direct streaming to the users browser.
Bowzilla for Linux
Bowzilla is a mini Game for 2 players. more>>
Particularly the realistic blood is to be considered with the lightning strike. You have to find a whole in the target range. However, you should not expect to much from it.
MUTE File Sharing 0.5.1
MUTE File Sharing is a peer-to-peer network that provides easy search-and-download functionality. more>>
It compiles as a fast, native application for many platforms (no Java, no Python, etc.).
MUTE protects your privacy by avoiding direct connections with your sharing partners in the network. Most other file sharing programs use direct connections to download or upload, making your identity available to spies from the RIAA and other unscrupulous organizations.
MUTE is based on research, and experiments show that it works quite well. MUTEs ant-inspired routing is light-weight, robust, and adaptive. Results from experiments in real MUTE networks show that the collective behavior of MUTE nodes quickly finds the shortest (or fastest) routing path between two nodes on the network.
Enhancements:
- This release fixes bugs in MUTEs initial connection to the network upon startup.
- MUTE has also been upgraded to Crypto++ 5.4, so it should now compile using GCC 4.1.