псевдо инверсия разреженной матрицы в python

Question

псевдо инверсия разреженной матрицы в python

Я работаю с данными нейровизуализации, и из-за большого количества данных я хотел бы использовать разреженные матрицы для своего кода (scipy.sparse. I have found the method sparse.lsqr, but it is not very efficient. Is there a method to compute the pseudo-inverse of Moore-Penrose (correspondent to pinv for normal matrices).

The size of my matrix A is about 600'000x2000 and in every row of the matrix I'll have from 0 up to 4 non zero values. The matrix A size is given by voxel x fiber bundle (white matter fiber tracts) and we are expecting maximum 4 tracts to cross in a voxel. In most of the white matter voxels we expect to have at least 1 tract, but I will say that around 20% of the lines could be zeros.

The vector b should not be sparse, actually b contains the measure for each voxel, which is in general not zero.

I would need to minimize the error, but there are also some conditions on the vector x. As I tried the model on smaller matrices, I never needed to constrain the system in order to satisfy these conditions (in general 0

Is that of any help? Is there a way to avoid taking the pseudo-inverse of A?

Thanks

Update 1st June: Спасибо еще раз за помощь. Я не могу ничего показать вам о своих данных, потому что код на Python дает мне некоторые проблемы. Однако, чтобы понять, как я могу выбрать хороший k, я попытался создать функцию тестирования в Matlab.

Код следующий:

F=zeros(100000,1000);

for k=1:150000
    p=rand(1);
    a=0;
    b=0;
    while a<=0 || b<=0
    a=random('Binomial',100000,p);
    b=random('Binomial',1000,p);
    end
    F(a,b)=rand(1);
end

solution=repmat([0.5,0.5,0.8,0.7,0.9,0.4,0.7,0.7,0.9,0.6],1,100);
size(solution)
solution=solution';
measure=F*solution;
%check=pinvF*measure;
k=250;
F=sparse(F);
[U,S,V]=svds(F,k);
s=svds(F,k);
plot(s)
max(max(U*S*V'-F))
for s=1:k
    if S(s,s)~=0
        S(s,s)=1/S(s,s);
    end
end

inv=V*S'*U';
inv*measure
max(inv*measure-solution)

Вы хоть представляете, что должно быть k по сравнению с размер F? Я взял 250 (больше 1000), и результаты неудовлетворительны (время ожидания приемлемое, но не короткое). Также теперь я могу сравнить результаты с известным решением, но как вообще выбрать k? Я также приложил график 250 отдельных значений, которые я получил, и их квадраты, нормализованные. Я точно не знаю, как лучше сделать screeplot в Matlab. Теперь я использую большее k, чтобы увидеть, не станет ли значение намного меньше.

Еще раз спасибо, Jennifer

The image shows the 250 computed. I don't know exactly how to create a scree plot in Matlab. squared normalized single values