NAME

txt2graph - generates a dot-file from text-dependencies

DOWNLOAD

try http://andreas-romeyke.de/txt2graph/, the changelog is on http://andreas-romeyke.de/txt2graph/CHANGES,

latest tarball is on http://andreas-romeyke.de/txt2graph/txt2graph1.2.3.tar.gz

SYNOPSIS

txt2graph visualized the structure and dependencies of a text readed from stdin and generates a dot-File for graphviz to stdout.

DEPENDENCIES

You need already installed GraphViz tools, check http://www.graphviz.org or http://www.research.att.com/sw/tools/graphviz/
You need the perl-module GraphViz.pm by Leon Brocard (since version 1.2). You can get it from CPAN.org, simply by entering CPAN-shell via 'perl -MCPAN -e shell' and install it by typing 'install GraphViz', the GraphViz needs additional modules, too.

DESCRIPTION

What it does

txt2graph should visualisize the structure and dependencies of a text, because I needed it to analyze and understand the principles how my own text-compressor should work for good results. With txt2graph it is possible to find the weights in a document. This tool is also useful to compare documents in a similar way a ngram-tool will do. With usage of filterfile- and frequency-option you get little "topic"-maps of documents.

If you find another usage, please contact me.

How it works

txt2graph reads a text document from stdin, removes all non-alphas and generates an array (list) of words. Then it converts german-umlauts, because graphviz can only handle clean ASCII as node-description and output a dot-file for a directed or an undirected graph.

Warning

The generated dot-file will be typically 4-5fold of original document. This means also that dot or neato consumes very much time.

AUTHOR and COPYRIGHT

LICENSE

txt2graph is distributed under the terms of the GNU General Public License, you find a copy in this distribution in file COPYING or at http://www.gnu.org.

Please check twice if you want a special license for commercial use and contact me with signed email (signed with PGP or GnuPG!) and the subject "txt2graph license".

THANKS

A special greeting goes to my friend-girl Maren for her patience, to Matthias Richter for his ideas, Leon Brocard for his GraphViz-Module and to Derek Jones for his bugreports. Last but not least a greeting goes to the team around the GraphViz-tools.

HOW TO USE IT

for help simple type:
perl txt2graph.pl --help
for manual simple type:
perldoc txt2graph.pl
to create a weighted undirected graph:
cat mytext.txt|perl txt2graph.pl --undirect > mytext.dot

neato -Tps mytext.dot > mytext.ps

Or quite simpler (since version 1.2):

cat mytext.txt| perl txt2graph.pl --undirect --format=ps > mytest.ps
to create an undirected graph:
cat mytext.txt|perl txt2graph.pl --undirect > mytext.dot

dot -Tps mytext.dot > mytext.ps

or quite simpler (since version 1.2):

cat mytext.txt| perl txt2graph.pl --undirect --format=png > mytext.png
to create a directed graph:
cat mytext.txt|perl txt2graph.pl --direct > mytext.dot

dot -Tps mytext.dot > mytext.ps
create a directed graph with twopi-layout as SVG:
cat mytext.txt| perl txt2graph.pl --direct --format=svg --layout=twopi > mytext.svg

(since version 1.2)
you can avoid words by using stoplist, minfrequency, maxfrequency, mincount and maxcount, more information via
perl --help | more
a more complex example is below:
txt2graph.pl --infile=TODO --concentrate --layout=dot --format=ps --zeroword --records --casesensitive --minwordlen=3 --remove-multiples --outfile=test.ps

It reads the file 'TODO', builds an PostScript-file 'test.ps'. The nodes were clustered, that means strong related words were nearby. The graph is undirected, multiple links were supressed and filtered words were substituted by a special node. The words must be greater than 3 characters an will be used in casesensitive way

BUGS

There were following known bugs:

older version of GraphViz coredump if nodescount is high. This is not a failure in this script, but has an effect on it. Use '--format=canon' to recheck if this script fails too.

Of course, there are some additional bugs, but not known yet. If you detect one, please do not hesitate to contact me at andreas.romeyke@web.de

TODO

check extra file http://andreas-romeyke.de/txt2graph/TODO. If you have suggestions, or patches (prefered), contact me, too.

API Description

... coming soon ...