Wayback Machine 0.8.0
Sponsored Links
Wayback Machine 0.8.0 Ranking & Summary
File size:
1.8 MB
Platform:
Any Platform
License:
LGPL (GNU Lesser General Public License)
Price:
Downloads:
1032
Date added:
2007-01-12
Publisher:
Brad Tofel
Wayback Machine 0.8.0 description
Wayback Machine is an open source java implementation of the The Internet Archive Wayback Machine.
The current production version of the Wayback Machine is implemented in perl, and lacks in maintainability and extensibility. Also, the code is not open source. Primary motivation for the new version is to address these three issues, enabling public distribution of the application, and easy experimentation with new features and access technologies.
The current Java version of the Wayback Machine supports two access, or replay modes of operation: "Archival Url" mode and "Proxy" mode.
Archival URL mode provides a user experience very close to the current production Wayback Machine. All query and replay access requests can be expressed as URLs.
In Archival Url replay mode, HTML documents are delivered with additional Javascript embedded in the page. This Javascript alters the document within the browser, attempting to make links and embedded content refer back to the Wayback Machine by rewriting them as Archival URLs.
Proxy URL mode allows replaying of archived documents within a client browser by configuring the browser to proxy all HTTP requests through the Wayback Machine. This has the strong advantage that no Javascript page markup is required to coerce the client browser to request additional URLs and embedded content from the Wayback Machine -- content just works as-is. One major disadvantage of this mode is that there is no way to forward temporal information with each replay request. Because of this limitation, only the most recently archived version of any resource is accessible thru the Wayback Machine in proxy Url mode.
Another limitation of the Proxy URL mode is that it requires special configuration of the client web browser to access the Wayback Service. This browser configuration is not complex, but it means that content cannot be accessed as a global URL.
See the User Manual to learn more about access modes.
The current Java version is intended to operate as a standalone webapp, maintaining an index on the machine hosting the webapp. This index contains records of the resources within a set of ARC files, which are also assumed to be stored on the same machine hosting the webapp.
This software includes the capability to scan for ARC files in a specified location, and to automatically index and serve content in newly discovered ARC files as they appear. Directing the Wayback Machine to look for ARC files in the directory where an instance of the Heritrix web crawler is writing ARC output should provide the capability to browse content archived by Heritrix as it is crawled.
Future versions of this software may integrate more tightly with the Heritrix web crawler application.
Enhancements:
- A sorted CDX flat file ResourceIndex implementation was added, allowing for much larger data sets.
- Support for ArchivalUrl Date-Range requests was added.
- Character set detection was improved so pages are not mangled when server side modification occurs.
- Several new command-line tools were added for generating and updating each ResourceIndex type.
- Indexing and merging processing were separated into different threads.
- Bugfixes were made to allow integration with NutchWax full-text searching.
The current production version of the Wayback Machine is implemented in perl, and lacks in maintainability and extensibility. Also, the code is not open source. Primary motivation for the new version is to address these three issues, enabling public distribution of the application, and easy experimentation with new features and access technologies.
The current Java version of the Wayback Machine supports two access, or replay modes of operation: "Archival Url" mode and "Proxy" mode.
Archival URL mode provides a user experience very close to the current production Wayback Machine. All query and replay access requests can be expressed as URLs.
In Archival Url replay mode, HTML documents are delivered with additional Javascript embedded in the page. This Javascript alters the document within the browser, attempting to make links and embedded content refer back to the Wayback Machine by rewriting them as Archival URLs.
Proxy URL mode allows replaying of archived documents within a client browser by configuring the browser to proxy all HTTP requests through the Wayback Machine. This has the strong advantage that no Javascript page markup is required to coerce the client browser to request additional URLs and embedded content from the Wayback Machine -- content just works as-is. One major disadvantage of this mode is that there is no way to forward temporal information with each replay request. Because of this limitation, only the most recently archived version of any resource is accessible thru the Wayback Machine in proxy Url mode.
Another limitation of the Proxy URL mode is that it requires special configuration of the client web browser to access the Wayback Service. This browser configuration is not complex, but it means that content cannot be accessed as a global URL.
See the User Manual to learn more about access modes.
The current Java version is intended to operate as a standalone webapp, maintaining an index on the machine hosting the webapp. This index contains records of the resources within a set of ARC files, which are also assumed to be stored on the same machine hosting the webapp.
This software includes the capability to scan for ARC files in a specified location, and to automatically index and serve content in newly discovered ARC files as they appear. Directing the Wayback Machine to look for ARC files in the directory where an instance of the Heritrix web crawler is writing ARC output should provide the capability to browse content archived by Heritrix as it is crawled.
Future versions of this software may integrate more tightly with the Heritrix web crawler application.
Enhancements:
- A sorted CDX flat file ResourceIndex implementation was added, allowing for much larger data sets.
- Support for ArchivalUrl Date-Range requests was added.
- Character set detection was improved so pages are not mangled when server side modification occurs.
- Several new command-line tools were added for generating and updating each ResourceIndex type.
- Indexing and merging processing were separated into different threads.
- Bugfixes were made to allow integration with NutchWax full-text searching.
Wayback Machine 0.8.0 Screenshot
Wayback Machine 0.8.0 Keywords
Wayback Machine
ARC
Wayback Machine 0.8.0
URL
Machine
Internet Archive Wayback Machine
open source java
internet archive wayback
open source
java implementation
Internet Archive
arc files
wayback
mode
content
browser
Bookmark Wayback Machine 0.8.0
Wayback Machine 0.8.0 Copyright
WareSeeker periodically updates pricing and software information of Wayback Machine 0.8.0 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of Wayback Machine 0.8.0 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
internet wayback machine
wayback machine wiki
the wayback machine
open source database
wayback machine google
media types wayback machine moving images
wayback machine down
web wayback machine
internet archive wayback machinemore hits from
wayback machine myspace
wayback machine archive
wayback machine search engine
wayback machine peabody
mr peabody wayback machine
wayback machine alexa
alexa wayback machine
open source software
wayback machine cartoon
Version History
Related Software
JGachine project is a Java game machine/engine/browser. Free Download
GnuAccounting is a open source java accounting software that creates and administrates e.g. invoices and credit notes. Free Download
jFin is a pure Java open source financial date arithmetic. Free Download
DocBook Doclet project creates DocBook XML from Java source documentation or HTML files. Free Download
LinCE is a programming environment that is simple, pluggable, multiplatform, and multilanguage. Free Download
Apache Struts is a free open-source framework for creating Java web applications. Free Download
Apache MyFaces is the first free open source Java Server Faces implementation. Free Download
Java Web Shell is a web based interactive shell-type environment written in Java. Free Download
Latest Software
Popular Software
Favourite Software