Main > Free Download Search >

Free avalon harvest software for linux

avalon harvest

Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 33
avalon-harvest 0.0.2

avalon-harvest 0.0.2


avalon-harvest provides an Avalon-based integration package. more>>
avalon-harvest provides an Avalon-based integration package.
Harvest is a very simple Java-based program to asynchronously copy data from one location to another.
The data can be anything that can be represented as a Java bean, for example, files or database records.
Data can be anything, including files, database records, data available from networks, JMS messages, or whatever you can imagine and represent as java bean.
Main features:
- Assynchronious system integration, where no other solution exists.
- Data replication between databases
- Fetching data from unusable datasources (html pages, documents, other files) into database (or any kind of device capable of storing data)
- Any other scenarios where you may need assynchronious copy of data.
<<less
Download (3.3MB)
Added: 2007-02-01 License: The Apache License Price:
996 downloads
harvest 1.9.15

harvest 1.9.15


Harvest is a system to collect information and make them searchable using a web interface. more>>
Harvest is a system to collect information and make them searchable using a web interface. Harvest can collect information on inter- and intranet using http, ftp, nntp as well as local files like data on harddisk, CDROM and file servers.
Current list of supported formats in addition to HTML include TeX, DVI, PS, full text, mail, man pages, news, troff, WordPerfect, RTF, Microsoft Word/Excel, SGML, C sources and many more. Stubs for PDF support is included in Harvest and will use Xpdf or Acroread to process PDF files. Adding support for new format is easy due to Harvests modular design.
Harvest is a modular, distributed search system framework with a working set components to make it a complete search system.
Main features:
- Harvest is designed to work as distributed system. It can distribute the load among different machines. It is possible to use a number of machines to gather data. The fulltext indexer doesnt have to run on the same machine as broker or web server.
- Harvest is designed to be modular. Every single step during collecting data, and answering search requests are implemented as single programs. This makes it easy to modify or replace parts of Harvest to customize its behaviour.
- Harvest allows complete control over the content of data in the search database. It is possible to customize the summarizer to create desired summaries which will be used for searching. The filtering mechanism of Harvest allows to make modifications to the summary created by summarizers. Manually created summaries can be inserted to the search database.
- The Search interface is written in Perl to make customization easy, if desired.
Enhancements:
- src/common/qdbm: updated to qdbm-1.8.20.
- components/broker/zebra/yaz: merged yaz-2.0.30.
<<less
Download (7.9MB)
Added: 2005-04-03 License: Free For Educational Use Price:
1673 downloads
TMHarvest 0.1 Pre1

TMHarvest 0.1 Pre1


TMHarvest provides a tool which harvests topicmaps from structured datasources. more>>
TMHarvest provides a tool which harvests topicmaps from structured datasources.
The TMHarvest library is based on TM4J and provides a convenient way to automatically generate topic maps from different data sources. A rules files with embedded templates (written in XML) defines from which data sources topic map constructs should be taken into account, as well how new or existent topics should be associated.
Main features:
- TMHarvest creates topic maps from existing data sources. Currently it supports the following source-types: SQL-Databases, CSV-Files, XPath-Queries. In addition it is possible to integrate custom model-providers, written in Java. See for details.
- The harvesting is driven by a model-file. The model may be processed as often as you like. Since the data sources are queried at processing time, it is possible to build topic paps periodically in order to reflect changing content.
- The model-file has a modular setup. The resulting topic map is populated step by step from distinct templateActions. The static parts of the topic map (the structural parts, for example the topics that define the ontology of the topic map) may be merged from static files and therefore seperated from the actual harvest.
Pros and Cons
The process which creates the topic map is serialized in the model-file. Once the process is defined, it runs as often as you like. This enables the creation of topic maps, that are refreshed periodically in an automated way.
Working with the templates leads to a certain interactive experience. Shaping the model, letting TMHarvest run, revisiting the results, reshape the model... let you converge the final form
The fact that the model-file must be written by hand, excludes many users, which lack the necessary technical affinity. The development of graphical interfaces or wizards would be really helpful.
<<less
Download (1.7MB)
Added: 2007-02-16 License: The Apache License Price:
980 downloads
Pax Logging 0.9.2

Pax Logging 0.9.2


Pax Logging is a consolidation effort that aims to make all existing logging APIs in the Java world available. more>>
Pax Logging is a consolidation effort that aims to make all existing logging APIs in the Java world available for OSGi developments, driven by a Log4J backend.
Each legacy API is loaded as its own bundle. The logging service can be reloaded at run-time.
Main features:
- Log4J driving the backend implementation.
- Log4J API supported.
- Jakarta Commons Logging API supported.
- Pax Logging Service implements the standard OSGi Log Service API.
- JDK Logging API support.
- Avalon Logger API support.
- SLF4J API support.
- Knopflerfish Log service support.
<<less
Download (MB)
Added: 2006-08-10 License: GPL (GNU General Public License) Price:
1172 downloads
cadaverserver 1.0.1

cadaverserver 1.0.1


cadaverserver project is a realtime artificial intelligence battle game server. more>>
cadaverserver project is a realtime artificial intelligence battle game server.

Cadaver is a simulated world of cyborgs and nature in realtime. The battlefield consists of forests, grain, water, grass, carcass (of course) and lots of other things.

The game server manages the game and the rules. You start a server and connect some clients. The clients communicate with the server using a very primitive protocol. They can order cyborgs to harvest grain, attack enemies or cut forest.

The game is not intended to be played by humans! Instead the idea is that you write artificial intelligence clients to beat the other artificial intelligences.

The server is a program that runs on the console.
It manages the rules of the game in realtime.

It listens on TCP port 8932.

You could connect to it by entering:

# telnet localhost 8932

You will get the Initialisation message.
You can now enter commands.

<<less
Download (0.052MB)
Added: 2007-01-02 License: GPL (GNU General Public License) Price:
1025 downloads
Virtual Data Center 1.04-11

Virtual Data Center 1.04-11


The Virtual Data Center (VDC) is a digital library system more>>
The Virtual Data Center (VDC) is a digital library system "in a box" for numeric data.

The VDC is a web application which provides everything necessary to maintain and disseminate collections of research studies: including facilities for the storage, archiving, cataloging, translation, and dissemination of each collection.

It includes on-line analysis, powered by the R Statistical environment. It also provides extensive support for distributed and federated collections including: location-independent naming of objects, distributed authentication and access control, federated metadata harvesting, remote repository caching, and distributed virtual
<<less
Download (14.5MB)
Added: 2006-04-18 License: GPL (GNU General Public License) Price:
1287 downloads
Document Library 1.2b2

Document Library 1.2b2


Document Library is a Web application for document management in larger organizations with a lot of documents. more>>
Document Library is a Web application for document management in larger organizations with a lot of documents.
Organizations deal with numerous documents, such as word processor documents and PDFs. These documents often reside on someones computer and are not network accessible. Versions of documents are hard to track - the same document may be passed around using email in multiple versions over time. In large organizations it therefore becomes important to structure the flow of documents and present them in a common format. This is typically done using a document management system. Document Library is one such document management system.
Information in the Document Library can be accessed using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), meaning that besides being open source, the Document Library is also a good example of an open data application. Because it is open data, the Document Library is easier to integrate with other applications, such as the Silva CMS or any other application capable of OAI-PMH harvesting.
Main features:
- Automatic conversion service: using OpenOffice, the Document Library can convert Word documents into PDFs and plain text, PDFs into plain text. The plain text version is important in that it allows for full-text indexing of document contents, and also makes documents more accessible to people with disabilities.
- Publication workflow: documents only become available for harvesting and download after a review process.
- Delegation of control: reviewers ("librarians") can be assigned to particular sections.
- Dynamic access: authors have automatic access to all the documents that list them as an author.
- Versions: multiple versions of the same document can coexist, one public and one under preparation.
- Email reminder functionality: users receive emails of the progress of the document through the workflow.
- OAI-PMH data provider: allows other systems to harvest document metadata using standard protocol.
- Integration with Silva CMS (using OAI-PMH).
- Fast upload and download integration with Apache using Tramline.
- Easy overview screens for librarians.
- Smart file upload user interface: files need to be uploaded only once even if rest of form needs to be amended.
- Document Library is built using the powerful Zope 3 application server platform.
Enhancements:
- Installation was made easier by using zc.buildout.
- Optional LDAP support was added.
- Filesystem storage was integrated with Tramline.
- The conversion provided by OooConv was improved.
- Table rendering was improved using zc.table.
- Zope was updated to 3.3.
<<less
Download (0.30MB)
Added: 2006-12-07 License: GPL (GNU General Public License) Price:
1055 downloads
GalaxyHack 1.74

GalaxyHack 1.74


GalaxyHack is a multi-player strategy game based on AI scripts. more>>
GalaxyHack project allows you to design a fleet of spaceships which can then be tested in AI script based battles against fleets designed by other players.

Though battles take place in real time, the strategy comes before hand, both in writing short AI scripts in a simple scripting language, and also in the set up and selection of your fleet.

You dont actually have any control over your units at all mid-battle, but rather use the time to see where the set up of your fleets is working, where your fleets weaknesses lie and changes are needed, and perhaps also to learn from the strategy of your opponent.

The game revolves around very large capital ships, from which smaller ships are launched, but which are not designed for attacking themselves, and which cannot be moved mid-battle. To win a battle you must destroy of all of your opponents capital ships before they destroy yours.

There can be hundreds of units in any one battle, but there is no harvesting, resource management or base building.

Fleet customisation:

There are three basic unit types: capital ships, frigates and small ships. All three are customisable with a small range of different weapons and technologies. Fleet selection and unit customisation are done using a windows-and-menus based editor. Each fleet has a points value, which is increased by adding or upgrading units.

You are able to save a fleet, along with a corresponding set of AI script selections.

AI scripting:

A "group" of units - consisting of up to three actual units - has an AI script attached to it. Scripts are written using a simple proprietary language. There are two generic types of weapons in the game: "small" and "big". "small" weapons are always fired automatically when there is an enemy within range, whilst "big" weapons are fired on the basis of AI script commands. The game translates the AI commands for a group into commands for each individual unit within that group.

AI scripts are written externally to the game using any program capable of editing text files.

The chief principle of GalaxyHack AI scripts is "If this then that else the other". For a more concrete example, here are two lines of code from an AI file:

if aenemy type == smallship health < 100
move nenemy type == smallship health < 100

Scripts also support:

* function calls
* a variety of different sorts of variables which can have their value changed by scripts.

Multiplayer:

The game is designed to allow different commanders to test their fleets in battle against one another. Up to four fleets can participate in any one battle. This said, the game is not properly "multiplayer" in that you never actually play in real time against other players. Instead, you can download fleets designed by other players from an online database. You are encouraged to submit your own fleets to this database for other players to download and battle against.

Graphics and sound:

The game engine is a 2D affair, with ships and what have you shuffling about. It is fairly easy to create your own custom art for your units, should you so wish.

There are various basic graphical effects for firing weapons of various sorts, units being destroyed, etc.

The game features music composed by The Embryo.

The game does not feature any sound effects.

<<less
Download (0.073MB)
Added: 2006-09-27 License: GPL (GNU General Public License) Price:
1127 downloads
James 2.3.1

James 2.3.1


The Java Apache Mail Enterprise Server (a.k.a. Apache James) is a 100% pure Java SMTP and POP3 Mail server and NNTP News server. more>>
The Java Apache Mail Enterprise Server (a.k.a. Apache James) is a 100% pure Java SMTP and POP3 Mail server and NNTP News server. It is designed James to be a complete and portable enterprise mail engine solution based on currently available open protocols.
James is also a mail application platform. We have developed a Java API to let you write Java code to process emails that we call the mailet API. A mailet can generate an automatic reply, update a database, prevent spam, build a message archive, or whatever you can imagine. A matcher determines whether your mailet should process an email in the server. The James project hosts the Mailet API, and James provides an implementation of this mail application platform API.
James is based on the Apache Avalon application framework, which was a product of the Apache Avalon project.
James requires Java 1.4 (For further information you may want to search the web, our dev and user mail archives or our wiki).
Main features:
- SMTP server
- Mailet Engine
- FileSystem mailboxes/spool
- RDBMS mailboxes/spool
- POP3 server
- RDBMS
- LDAP
- TLS
- Remote Manager
- TLS Support
- NNTP server
- FetchPOP
<<less
Download (4.9MB)
Added: 2007-05-01 License: Freeware Price:
947 downloads
BW whois 5.0

BW whois 5.0


BW whois is a modern whois client that works as a full-featured Web application or as a commandline tool. more>>
BW whois is a modern whois client that works as a full-featured Web application or as a commandline tool. BW whois is flexible and configurable with self-detecting CGI support, multiple security options in the CGI mode, a mature TLD table, database caching (using MySQL or PostgreSQL), and many more options and features
Main features:
- Self-detecting CGI support
- Simple command-line use
- Prevents data harvesting with multiple security features for web use
- Optional result caching with an SQL database
- Database features work with either MySQL or PostgreSQL
- Support for multiple outgoing IP addresses
- Support for available/not available results
- Fully customizable HTML output
- Support for Apache-style SSI (server-side includes)
- External TLD table for support of ALL top-level domains
- Fully configurable disclaimer stripping
- Automatic support for netblocks
- Unpacks packed (single-integer) IP addresses
Enhancements:
- Multiple outgoing IP addresses are supported to help with whois servers that block based on queries per IP address.
- Support was added for PostgreSQL in addition to MySQL for database features.
- Support for mod_perl was improved, and support for non-standard whois servers (like whois.denic.de) was improved.
<<less
Download (0.042MB)
Added: 2006-08-11 License: Artistic License Price:
1175 downloads
Excalibur 4.3

Excalibur 4.3


Excalibur is an open source software project of The Apache Software Foundation. more>>
Excalibur is an open source software project of The Apache Software Foundation. Our primary product is a lightweight, embeddable Inversion of Control container named Fortress that is written in java.

Inversion of control, also known as the hollywood principle ("dont call us, well call you"), is a simple but powerful concept. The idea is that we dont "wire up" all the pieces that make up an application (the "components") by writing lots of this-component-uses-that-one-like-so code, nor do we use some kind of lookup directory (like JNDI, for example) where each component decides what components to interact with itself. Instead, we instruct a smart piece of software, the container, to tell the components how to interact.

Fortress (and also its predecessor, "ECM") is such a container. It is lightweight, by which we mean that it doesnt need a lot of resources, take a lot of disk or memory, or impose all sorts of demands on its environment. Fortress is also embeddable, by which we mean that you can use fortress inside just about every java environment. More concretely, you can use it as the basis of a large standalone development platform (like the Keel project), at the core of a servlet-based web application (like Cocoon) or even as the basis of a GUI application (like GuiApp).

Fortress knows how to manage components that have been developed using a rigid lifecycle contract called Avalon-Framework. In the next upcoming release, fortress will also be able to manage ordinary javabeans, and support for other kinds of Inversion of Control are planned.

Besides providing fortress, excalibur also provides a small library of very useful components. We also distribute some of the libraries used to build fortress (and some other containers) seperately. This selection of libraries is called containerkit.

So why is excalibur an interesting project?

Heres a few partial answers.

good code. Code that comes from the mature avalon project (everything that used to be branded as "Avalon Excalibur" and "Avalon Fortress"). This includes two lightweight Inversion of Control containers. One of these powers, among other things, Cocoon. The other ("fortress") is the basis of projects like Keel. Excalibur also includes powerful and mature reusable "components" and "libraries", handling tasks ranging from thread management to component pooling to (URI-and-similar-) source resolving.

smart developers. Most of these people are or have been active in the avalon project in various roles. Several of them are apache members. Together they have loads and loads experience under their belts related to inversion of control development. In fact, Im confident to say this list includes some of the biggest experts on inversion-of-control-style container development.

exciting community. Several open source and commercial projects (both at apache and elsewhere) depend on and contribute to the excalibur project. A strong team of enthousiastic developers (with strong ties to several other projects in the same problem domain) has various big and small plans with excalibur. One of the leading open source organisations in the world, The Apache Software Foundation, is hosting the project, providing insight, advice, infrastructure, legal backing, a time- and battle-proven development process, and much more.
<<less
Download (MB)
Added: 2007-01-10 License: The Apache License 2.0 Price:
1223 downloads
NewsBro 2.4.2

NewsBro 2.4.2


NewsBro project is a web application providing usenet news service. more>>
NewsBro project is a web application providing usenet news service. It provides support for multiple users accessing multiple news groups via multiple news servers.
Main features:
- web-based Usenet access
- full-featured news reader
- multi-user access with individual profiles
- one click filtered access to all images within a newsgroup
- multi-server binary file support
- XFace support
- yEnc decoder
- stand-alone and Java servlet versions
- News RC File
- Header and article caching
- NZB support
- XPAT threading
- Binary file harvesting
- Watch List for monitoring followups
- RSS feed for each group
- Configurable threads list
<<less
Download (0.27MB)
Added: 2007-08-11 License: Free for non-commercial use Price:
808 downloads
Simple Python Distributed Indexing 0.9.17

Simple Python Distributed Indexing 0.9.17


SPyDI Is a powerful engine to create distributed full text indexing systems and distributed search engines. more>>
SPyDI Is a powerful engine to create distributed full text indexing systems and distributed search engines.
Simple Python Distributed Indexing library supports harvesting, crawling (pull mehtods), and push methods (via a Web interface or SPyRO Web services).
It supports boolean and vector Information retrieval models. It has few dependencies, and comes with its own HTTP server and HTML embedded pages language (called pyew and wey pages), and session manager.
It can use the SMTP of the Python library. It supports replacing the default modules with some better modules (Apache, exim, etc).
Enhancements:
- Monarca updates to support SPyROs new HTTP protocol management.
- Some bugfixes in pyew pages.
- General code cleanup.
<<less
Download (0.66MB)
Added: 2006-10-10 License: GPL (GNU General Public License) Price:
1109 downloads
HTML Parser 1.6-20060610

HTML Parser 1.6-20060610


HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. more>>
HTMLParser is a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.
The two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data). While prior versions concentrated on data extraction from web pages, Version 1.4 of the HTMLParser has substantial improvements in the area of transforming web pages, with simplified tag creation and editing, and verbatim toHtml() method output.
In order to use HTMLParser you will need to be able to write code in the Java programming language. Although some example programs are provided that may be useful as they stand, its more than likely you will need (or want) to create your own programs or modify the ones provided to match your intended application.
To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running. The htmllexer.jar provides low level access to generic string, remark and tag nodes on the page in a linear, flat, sequential manner. The htmlparser.jar, which includes the classes found in htmllexer.jar, provides access to a page as a sequence of nested differentiated tags containing string, remark and other tag nodes. So where the output from calls to the lexer nextNode() method might be:
< html>
< head>
< title>
"Welcome"
< /title>
< /head>
< body>
etc...
The output from the parser NodeIterator would nest the tags as children of the , and other nodes (here represented by indentation):
< html>
< head>
< title>
"Welcome"
< /title>
< /head>
< body>
etc...
The parser attempts to balance opening tags with ending tags to present the structure of the page, while the lexer simply spits out nodes. If your application requires only modest structural knowledge of the page, and is primarily concerned with individual, isolated nodes, you should consider using the lightweight lexer. But if your application requires knowledge of the nested structure of the page, for example processing tables, you will probably want to use the full parser.
Extraction
Extraction encompasses all the information retrieval programs that are not meant to preserve the source page. This covers uses like:
- text extraction, for use as input for text search engine databases for example
- link extraction, for crawling through web pages or harvesting email addresses
- screen scraping, for programmatic data input from web pages
- resource extraction, collecting images or sound
- a browser front end, the preliminary stage of page display
- link checking, ensuring links are valid
- site monitoring, checking for page differences beyond simplistic diffs
There are several facilities in the HTMLParser codebase to help with extraction, including filters, visitors and JavaBeans.
Transformation
Transformation includes all processing where the input and the output are HTML pages. Some examples are:
- URL rewriting, modifying some or all links on a page
- site capture, moving content from the web to local disk
- censorship, removing offending words and phrases from pages
- HTML cleanup, correcting erroneous pages
- ad removal, excising URLs referencing advertising
- conversion to XML, moving existing web pages to XML
During or after reading in a page, operations on the nodes can accomplish many transformation tasks "in place", which can then be output with the toHtml() method. Depending on the purpose of your application, you will probably want to look into node decorators, visitors, or custom tags in conjunction with the PrototypicalNodeFactory.
The HTML Parser is an open source library released under GNU Lesser General Public License, which basically says you are free to use the library "as is" in other (even proprietary) products, as long as due credit is given to the authors and the source code for the HTMLParser is included or available with the other product. For modified or embedded use, please consult the LGPL license.
<<less
Download (4.2MB)
Added: 2006-06-11 License: LGPL (GNU Lesser General Public License) Price:
1234 downloads
wpoison.php 1.0

wpoison.php 1.0


wpoison.php is a script that generates page after page of random fake email addresses. more>>
wpoison.php is a script that generates page after page of random fake email addresses, and is intended to be used for poisoning spambot email address databases.

wpoison.php is based on verion 1.8p of his script. It is a direct port and functionally equivalent. The reason for this port is that quite a few people do not have the ability to use CGI scripts on their sites. [cCOPYRIGHT=1]

All that is required to run wpoison.php is a web server configured for PHP and some form of a words file. You may download the words file that Ronald makes available on his site below along with the script.

Wpoison helps to combat the junk e-mail problem by effectively thwarting the efforts of junk e-mailers who regularly scan web pages, looking for target e-mail addresses to harvest. (The junk e-mailers subsequently send junk e-mail to all of the e-mail addresses that they harvest from various web sites.)

The idea behind Wpoison is really very simple. Junk e-mailers write programs to automatically scan thousands and thousands of web pages, looking for e-mail addresses which they then send unsolicited junk e-mail to, or which they sell to other spammers. By and large, these address harvesting web crawlers are about as intelligent as the spammers who use and/or develop them, which is to say not very. The majority of these programs can be easily fooled into accepting lots and lots of completely fake and useless e-mail addresses, so long as the bogus addresses in question appear to reside on ordinary nondescript web pages. That is where Wpoison comes in.

Wpoison is what is called a web CGI program. A CGI program is just like any other program, except that its purpose is to generate web pages on the fly and with dynamic content that can be different each time the program runs, which is to say each time the URL where the program is installed is referenced, either by someones web browser, or else by some web-scanning robotic program.

In the case of the Wpoison CGI program, the dynamic content that is generated each time the program is ``visited (by a web browser, or by a web-scanning robot program) is just a list of randomized bogus e-mail addresses, together with a list of randomized web hyper-links.

Now heres the catch... and this is the clever part. Each of the randomized web hyper-links that Wpoison generates looks exactly like an ordinary web hyper-link that leads off to someplace else, i.e. to some different web page having a different web URL. But in fact, that is just a matter of appearances, and the reality is that if you follow any one of these hyper-links, you will actually end up coming right back and executing the Wpoison CGI program again, at which point you will get yet another randomized dynamically generated web page, and that new page will contain its own totally new set of bogus E-mail addresses and also a fresh new set of randomized hyper-links. And of course, each of those new hyper-links will, if followed, lead right back to the Wpoison CGI program yet again, thus starting the whole cycle all over again.

It is important to note that when Wpoison is generating its randomized bogus e-mail addresses (and also its randomized pseudo-hyper-links) it uses an algorithm which makes the total number of different bogus e-mail addresses and pseudo-hyper-links essentially unlimited. In effect, Wpoison is capable of generating an infinite number of different bogus E-mail addresses!

So the basic idea behind Wpoison is to trap unwary and badly engineered address harvesting web crawlers, and to fool them into adding enormous quantities of completely bogus e-mail addresses to the E-mail address data bases of the spammers, thus polluting those data bases so badly that they become essentially useless, thereby putting the spammers who are using them out of business, or at least shutting them down for a time and causing them some major headaches while they try to clean up the messes in their now-heavily-polluted e-mail address data bases.
<<less
Download (0.003MB)
Added: 2006-06-23 License: GPL (GNU General Public License) Price:
1225 downloads
Secleted [ 0 ] software to compare
  • Page: 1 of 3
  • 1
  • 2
  • 3