In fact, For LS problems, we usually use Givens rotations or Householder reflection rather than Gram-Schmidt method for VLSI implementation in QRD. As far as I know, the flop counts of those are almost the same. And the rotations and reflection in the usual method needs more large hardware as CORDIC rather than the simple multiplications and accumulation in Gram-Schmidt method. Of course, the normalization in Gram-Schmidt method has problems. But this can be solved with workarounds such as Look-up-tables. And mathematical stability of these has no problem, using modified version of this or reorthogonalization technique. In spite of these, every hardware guys ever likes Systolic arrays based on rotations and reflections. Why???????Why???????Why???????