Warum Octave langsamer als Matlab ist

Zusammenfassung meiner Erkenntnisse bei der Analyse eines Prototyps der NMF in Octave.

Ich hatte folgenden Prototypen der NMF (non negative matrix factorization) nach Lee & Seung in Matlab/Octave geschrieben:

%%%
%%% NMF nach Lee & Seung
%%%
%%% copyright 2008
%%% Andreas Romeyke (art1@andreas-romeyke.de)
%%%
%%% computes the NMF decomposition of given matrix X with
%%% compression-parameter and returns 2 matrices
%%% U and V with X_approx=U*V ~ X
%%%
function [U,V] = nmf3(X, compression)
maxit=10000;
epsilon=0.00000000001;
lasterr=1000000000000000000000000000;
[rows,columns]=size(X);
if compression > rows || compression > columns
    display ('compression should not be larger than rows or columns in
given matrix'); abort;
end
U=(rand(rows,compression));
V=(rand(compression,columns));
X=X / max(max(X));
for n = 1:maxit
    a=transpose(U) * X;
    b=transpose(U) * U * V;
    V=V .* a ./ b;
    a=X * transpose(V);
    b=U * V * transpose(V);
    U=U .* a ./ b;
    % calc error/residue
    Xapprox=U*V;
    err=(sum (sum( abs(X-Xapprox) )));
    if (lasterr - err < epsilon)
	break;
    end
    if (mod(n,200) == 0)
	printf('n:%i err:%f\n', n, err);
    end
    lasterr=err;
end

Der Code war unter Octave 3.1 vier mal langsamer als unter Matlab 7.6. Mich hat das gewundert, da der Code vollvektorisiert ist und Octave auf die schnellere Atlas-Bibliothek setzt.

Auf der Octave-Mailingliste fragte ich nach der Ursache. Zum einen wurde mir vorgeschlagen, den transpose-Befehl durch den transpose-Operator zu ersetzen:

%%% NMF nach Lee & Seung
%%%
%%% copyright 2008
%%% Andreas Romeyke (art1@andreas-romeyke.de)
%%%
%%% computes the NMF decomposition of given matrix X with
%%% compression-parameter and returns 2 matrices
%%% U and V with X_approx=U*V ~ X
%%%
function [U,V] = nmf3(X, compression)
maxit=10000;
epsilon=0.00000000001;
lasterr=1000000000000000000000000000;
[rows,columns]=size(X);
if compression > rows || compression > columns
    display ('compression should not be larger than rows or columns in
given matrix'); abort;
end
U=(rand(rows,compression));
V=(rand(compression,columns));
X=X / max(max(X));
for n = 1:maxit
    a=U.' * X;
    b=U.' * U * V;
    V=V .* a ./ b;
    a=X * V.';
    b=U * V * V.';
    U=U .* a ./ b;
    % calc error/residue
    Xapprox=U*V;
    err=(sum (sum( abs(X-Xapprox) )));
    if (lasterr - err < epsilon)
	break;
    end
    if (mod(n,200) == 0)
	printf('n:%i err:%f\n', n, err);
    end
    lasterr=err;
end

Unter Octave 3.1 spürte ich noch keine Verbesserung. laut David Bateman, einer der Hauptenwickler von Octave werden Operatoren ab Version 3.2 besonders optimiert, John W. Eaton ergänzte:

On 27-Aug-2008, Andreas Romeyke wrote:

| Thanks for your explanation. But I did not see the reason, why the
| transpose-operator and the transpose function will be evaluated
| differently? Should it not be the same in internal representation of
| Octaves's interpreter? I thought Octave uses an AST to walk on it?

Yes, but for historical reasons, operators are handled internally by
the interpreter and function calls are handled differently.  The
interpreter does not currently translate .' to a call to the transpose
function.  However, I think this may need to change to properly handle
operator overloading for classes.  For example, so you can define an
@double/transpose.m function to overload the behavior of the transpose
operator with your own function (of course, doing that will be slow,
so I don't guess you speed demons care about that too much, right?).

jwe

Daraufhin compilierte ich Octave aus dem Entwicklerzweig. Jetzt betrug die Laufzeit zu Matlab statt dem Vierfachen nur noch 7/2-fache. Auf den Unterschied angesprochen erklärten die Octave-Entwickler nach einigen Tests, daß Matlab eine andere Variante der Matrixmultiplikation verwendet, die zwar schneller aber auch ungenauer ist. Ich wurde auf den Thread Behavior-of-mldivide-in-Octave-relative-to-Matlab verwiesen:

It seems
that Matlab is not using the LAPACK xGEDLD function like Octave does,
but rather a formulation based on the QR factorization. The formulation
can be written as

m=10; n=20; A = randn(m,n); b=rand(m,1); [Q,R,E]=qr(A);
[A\b E(:,1:m)*(R(:,1:m)\(Q'*b))]

Result returned by this is not the minimum 2-norm solution, like in
Octave but is equally valid. It also appears that the matlab method is
faster and uses less memory. Marco points out that values like m=50 and
n=20000 in the above will result in an exhaustion of the memory of the
machine, whereas Matlab is still capable of returning a result.

Another issue is what is the result that is returned by Matlab and
Octave for singular matrices. Consider a case like

A = [eps, 0, -eps; 0, eps, -eps; -eps, 0, 10]; x = A \ ones(3,1)

Octave falls back to using xGELSD to solve the matrix in this case
whereas as Matlab returns the result based on LU factorization with a
warning of its inaccuracy.

Als nächstes fiel mir auf, daß meine Atlas-Bibliothek nicht direkt für mein System optimiert wurde. Vielleicht könnte das neucompilieren der Bibliothek Abhilfe schaffen? Via Debians Build-Umgebung dauerte es knapp 8h bis die Bibliothek optimiert und installiert war.

Auf alle Fälle ist Octave eine Versuchung wert. Es entfallen die Lizenzkosten, man kann in die Quellen schauen und Octave auf seine Rechnerarchitektur hin optimieren. Die Unterstützung der Octave-Community ist hervorragend. Zur Zeit fehlen Octave noch folgende Funktionen, die das Arbeiten erleichtern würden:

Profiler
Just-in-time Compiler/Interpreter
Octave2XXX Compiler

Ferner ist die Nutzbarkeit von GUIs noch mangelhaft. Aber wer braucht die schon? ;-)