UserReference:P300ClassifierMethods: Difference between revisions

Latest revision as of 00:08, 14 September 2009

Load BCI2000 Data Files

Once the BCI2000 data files are check for compatibility, signals are extracted either from each training or testing data file and are arranged in a matrix form as:

$𝐗_{𝐢}^{𝐤} (𝐧) = [\begin{matrix} x_{1}^{1} (0) & x_{1}^{1} (1) & \dots & x_{1}^{1} (p - 1) & \dots & x_{1}^{l} (0) & x_{1}^{l} (1) & \dots & x_{1}^{l} (p - 1) \\ x_{2}^{1} (0) & x_{2}^{1} (1) & \dots & x_{2}^{1} (p - 1) & \dots & x_{2}^{l} (0) & x_{2}^{l} (1) & \dots & x_{2}^{l} (p - 1) \\ ⋮ & ⋮ & \dots & ⋮ & \dots & ⋮ & ⋮ & \dots & ⋮ \\ x_{m}^{1} (0) & x_{m}^{1} (1) & \dots & x_{m}^{1} (p - 1) & \dots & x_{m}^{l} (0) & x_{m}^{l} (1) & \dots & x_{m}^{l} (p - 1) \end{matrix}]$

for

$i = 1, 2, \dots, m;$

$k = 1, 2, \dots, l;$

$n = 0, 1, \dots, p - 1;$

being $m$ the total number of observations (stimuli), $l$ the total number of channels, and $p = t * F s$ the total number of samples recorded for each channel. $t$ is the recording stimulus time, and $F s$ is the sampling frequency.

Consider the following example to help you understand the previous mathematical notation. A BCI data set is recorded during a P300 Speller task using a 6 by 6 matrix of characters. Each row and column of the matrix is randomly intensified resulting in 12 different stimuli. The sets of 12 intensification are repeated 15 times for each intended character to spell. For this example, the subject pretend to spell the word "SEND", a total of 4 characters.

A 6 by 6 speller matrix. Here, the user's task is to spell the word "SEND" (one character at a time). For each character, all rows and columns in the matrix are intensified 15 times (here, the third row is shown intensified).

Assume that the data set is recorded from 8 channels at 256 Hz, the elapsed time from the start to the end of each intensification is 800 ms, and the Decimation Frequency provided by the investigator is 20 Hz. For this example, $m = 12 x 15 x 4 = 720$ , $l = 8$ , and $p = r o u n d (256 x 0.800) = 205$ . The total number of columns (variables) of the above matrix is $8 x 205 = 1640$ .

Get P300 Responses

In this step, it is computed the corresponding time samples of a time Response Window $[t_{1}, t_{2}]$ in (ms). Following the above example, assume that the time Response Window specified by the investigator is [0 800] ms.

$n_{1} = r o u n d (\frac{t_{1} F s}{1000}) = r o u n d (\frac{0 * 256}{1000}) = 0$

$n_{2} = r o u n d (\frac{t_{2} F s}{1000}) = r o u n d (\frac{800 * 256}{1000}) = 205$

Signals of interest are extracted from $𝐗_{𝐢}^{𝐤} (𝐧)$ and are defined only for $n_{1} \leq n < n_{2}$ .

The coefficients $b_{i}$ of the Moving Average (MA) filter are found as

$b_{i} = \frac{1}{N + 1}$

for $i = 0, 1, 2, \dots, N$ where $N$ is the filter order. The filter order can be computed from the sampling frequency $F s$ and the provided decimation frequency $D f$ as

$N = r o u n d (\frac{F s}{D f}) = r o u n d (\frac{256}{20}) = 13$

Thus, the impulse response $h (n)$ can be computed as

$h (n) = \frac{1}{N + 1} \sum_{i = 0}^{N - 1} δ (n - i) .$

To filter the selected signals, each channel $k$ and each observation (stimulus) $i$ of the matrix $𝐗_{𝐢}^{𝐤} (𝐧)$ is convolved $(⋆)$ with the impulse response $h (n)$ . The next equation shows how to filter a signal for channel $k = 1$ and observation $i = 1$

$y_{1}^{1} (n) = x_{1}^{1} (n) ⋆ h (n) .$

The output $y_{i}^{k} (n)$ , result of the convolution between all the extracted signals —for channels $k = 1, 2, \dots, l$ and observations $i = 1, 2, \dots, m$ — and the impulse response, is downsampled by a factor $N$ .

Generate Feature Weights for a Linear Model using Stepwise Linear Discriminant Analysis (SWLDA)

Consider a data vector $𝐝$ of $m$ observations, a vector $𝐰$ of $n$ model parameters (weights) to estimate, and a matrix $𝐆$ representing the final linear model. This inverse problem can be written as

$𝐆 𝐰 = 𝐝$

An approximate solution to this problem can be found by minimizing the difference (residuals) between the actual data $𝐝$ and $𝐆 𝐰$ .

$𝐫 = 𝐝 - 𝐆 𝐰$

The least squares or 2-norm solution has been adopted to minimize these residuals.

$𝐰 = (𝐆^{𝐓} 𝐆)^{- 1} 𝐆^{𝐓} 𝐝 .$

The symbol $T$ represents the transpose of the matrix $𝐆$ . Note that least squares solution is only valid for overdetermined systems ( $m \leq n$ ); there must be in the model more observations than variables. If the residuals have a normal distribution, the least squares corresponds to the maximum likelihood criterion.

@@ Line 1: / Line 1: @@
-Signals are obtained either for each training or testing data file and are arranged in a matrix form as:
+== Load BCI2000 Data Files ==
+Once the BCI2000 data files are check for compatibility, signals are extracted either from each training or testing data file and are arranged in a matrix form as:
 <math>
 {\mathbf{X_{i}^{k}(n)}} =
 \begin{bmatrix}
-x_{1}^{1}(1) & x_{1}^{1}(2) & \cdots & x_{1}^{1}(p) & \cdots & x_{1}^{l}(1) & x_{1}^{l}(2) & \cdots & x_{1}^{l}(p) \\
+x_{1}^{1}(0) & x_{1}^{1}(1) & \cdots & x_{1}^{1}(p-1) & \cdots & x_{1}^{l}(0) & x_{1}^{l}(1) & \cdots & x_{1}^{l}(p-1) \\
-x_{2}^{1}(1) & x_{2}^{1}(2) & \cdots & x_{2}^{1}(p) & \cdots &  x_{2}^{l}(1) & x_{2}^{l}(2) & \cdots & x_{2}^{l}(p) \\
+x_{2}^{1}(0) & x_{2}^{1}(1) & \cdots & x_{2}^{1}(p-1) & \cdots &  x_{2}^{l}(0) & x_{2}^{l}(1) & \cdots & x_{2}^{l}(p-1) \\
 \vdots & \vdots & \cdots &\vdots & \cdots & \vdots & \vdots & \cdots & \vdots \\
-x_{m}^{1}(1) & x_{m}^{1}(2) & \cdots & x_{m}^{1}(p) & \cdots &  x_{m}^{l}(1) & x_{m}^{l}(2) & \cdots & x_{m}^{l}(p) \\
+x_{m}^{1}(0) & x_{m}^{1}(1) & \cdots & x_{m}^{1}(p-1) & \cdots &  x_{m}^{l}(0) & x_{m}^{l}(1) & \cdots & x_{m}^{l}(p-1) \\
 \end{bmatrix}
 </math>
@@ Line 17: / Line 19: @@
 <math> k = 1, 2, \ldots, l; </math>
-<math> n = 1, 2, \ldots, p; </math>
+<math> n = 0, 1, \ldots, p-1; </math>
 being <math> m </math> the total number of observations (stimuli), <math> l </math> the total number of channels, and <math> p = t*Fs </math> the total number of samples recorded for each channel. <math> t </math> is the recording stimulus time, and <math> Fs </math> is the sampling frequency.
-Let us consider the following example to help you understand the previous mathematical notation. A BCI data set is recorded during a P300 Speller task using a 6 by 6 matrix of characters. Each row and column of the matrix is randomly intensified resulting in 12 different stimuli. The sets of 12 intensification are repeated 15 times for each intended character to spell. For this example, the subject pretend to spell only two characters. The data set is recorded from 8 channels at 256 Hz, and the elapsed time from the start to the end of each intensification is 800 ms. For this example, <math> m = 12x15x2 = 360 </math>, <math> l = 8 </math>, and <math> p = 256 x 0.800 = 205 </math>. The total number of columns of the above matrix is <math> 8 x 256 = 2048 </math>.
+Consider the following example to help you understand the previous mathematical notation. A BCI data set is recorded during a P300 Speller task using a 6 by 6 matrix of characters. Each row and column of the matrix is randomly intensified resulting in 12 different stimuli. The sets of 12 intensification are repeated 15 times for each intended character to spell. For this example, the subject pretend to spell the word "SEND", a total of 4 characters.
+[[Image:P3SpellerScreen.PNG|center|frame|A 6 by 6 speller matrix. Here, the user's task is to spell the word "SEND" (one character at a time).  For each character, all rows and columns in the matrix are intensified 15 times (here, the third row is shown intensified).]]
+Assume that the data set is recorded from 8 channels at 256 Hz, the elapsed time from the start to the end of each intensification is 800 ms, and the [[User_Reference:P300_classifier#Parameters Pane|'''Decimation Frequency''']] provided by the investigator is 20 Hz.  For this example, <math> m = 12x15x4 = 720 </math>, <math> l = 8 </math>, and <math> p = round(256 x 0.800) = 205 </math>. The total number of columns (variables) of the above matrix is <math> 8 x 205 = 1640 </math>.
+== Get P300 Responses ==
+In this step, it is computed the corresponding time samples of a time [[User_Reference:P300_classifier#Parameters Pane|'''Response Window''']] <math> [t_{1}, t_{2}] </math> in (ms). Following the above example, assume that the time Response Window specified by the investigator is [0 800] ms.
+<math>
+n_{1} = round\left(\frac{t_{1}Fs}{1000}\right) = round\left(\frac{0*256}{1000}\right) = 0
+</math>
+<math>
+n_{2} = round\left(\frac{t_{2}Fs}{1000}\right) = round\left(\frac{800*256}{1000}\right) = 205
+</math>
+Signals of interest are extracted from <math> \mathbf{X_{i}^{k}(n)} </math> and are defined
+only for <math> n_{1}\le n < n_{2} </math>.
+The coefficients <math> b_{i} </math> of the [http://en.wikipedia.org/wiki/Moving_average Moving Average (MA)] filter are found as
+<math>
+b_{i} = \frac{1}{N+1}
+</math>
+for <math> i = 0, 1, 2, \ldots, N </math> where <math> N </math> is the filter order. The filter order can be computed from the sampling frequency <math> Fs </math>  and the provided decimation frequency <math> Df </math> as
+<math>
+N = round \left(\frac{Fs}{Df}\right) = round \left(\frac{256}{20}\right) = 13
+</math>
+Thus, the impulse response <math> h(n) </math> can be computed as
+<math>
+h(n) = \frac{1}{N+1} \sum_{i=0}^{N-1} \delta(n-i).
+</math>
+To filter the selected signals, each channel <math> k </math> and each observation (stimulus) <math> i </math> of the matrix <math> \mathbf{X_{i}^{k}(n)} </math> is convolved <math> (\star) </math> with the impulse response <math> h(n) </math>. The next equation shows how to filter a signal for channel <math> k = 1 </math> and observation <math> i = 1 </math>
+<math>
+y_{1}^{1}(n) = x_{1}^{1}(n) \star h(n).
+</math>
+The output <math> y_{i}^{k}(n)</math>, result of the convolution between all the extracted signals —for channels <math> k = 1, 2, \ldots, l </math> and observations <math> i = 1, 2, \ldots, m </math>— and the impulse response, is [http://en.wikipedia.org/wiki/Downsampling downsampled] by a factor <math> N </math>.
+== Generate Feature Weights for a Linear Model using Stepwise Linear Discriminant Analysis (SWLDA) ==
+Consider a data vector <math> \mathbf {d} </math> of <math> m </math> observations, a vector <math> \mathbf{w} </math> of <math> n </math> model parameters (weights) to estimate, and a matrix <math> \mathbf{G} </math> representing the final linear model. This inverse problem can be written as
+<math>
+\mathbf{G}\mathbf{w} = \mathbf{d}
+</math>
+An approximate solution to this problem can be found by minimizing the difference (residuals) between the actual data <math> \mathbf{d} </math> and <math> \mathbf{G}\mathbf{w} </math>.
+<math>
+\mathbf{r} = \mathbf{d} - \mathbf{G}\mathbf{w}
+</math>
+The least squares or 2-norm solution has been adopted to minimize these residuals.
+<math>
+\mathbf{w} = (\mathbf{G^{T}}\mathbf{G})^{-1}\mathbf{G^{T}}\mathbf{d}.
+</math>
+The symbol <math> T </math> represents the transpose of the matrix <math> \mathbf{G} </math>. Note that least squares solution is only valid for overdetermined systems (<math> m \le n </math>); there must be in the model more observations than variables. If the residuals have a normal distribution, the least squares corresponds to the maximum likelihood criterion.