基于FPGA的CNN卷积神经网络加速器

目录

1、本文背景

2、高级设计

2.1数学概述:

3、硬件设计

3.1 输入图像

3.2 VGA/摄像头

3.3卷积第一层

3.4 池化层

3.4 卷积第二层

3.5部分和

3.6第一个全连接层

3.7第二个全连接层

4、软件设计

5、系统设计

6、测试

7、硬件错误和问题

8、结果

10、可用性

11、结论

12、知识产权注意事项

13、改进和未来工作

14、Verilog代码和C代码


1、本文背景

神经网络是一种基于大脑神经网络的机器学习模型。一系列节点排列在中,通过操作和权重相互连接。该模型已证明在图像分类任务中取得了成功,这些任务如今具有许多应用,从自动驾驶汽车到面部识别。标准 CNN 可以具有浮点权重和特征图——这些需要大量的内存资源和计算能力来实现必要的乘法器等。

二元神经网络利用二值化特征图和权重,这大大减少了所需的存储和计算资源量,并使在资源受限系统(如 FPGA)上的硬件中合成它们成为可能。我们实现的网络基于使用 Tensorflow 机器学习库在 Python 中实现的软件模型。Python 代码由康奈尔大学博士生 Ritchie Zhao 提供。Verilog 代码在硬件中实现了用于构建软件模型的各个层和功能。该系统旨在对数字进行分类,并使用 MNIST 数据集的一个子集来训练模型,并产生了大约 40% 的测试准确率。这可以通过使用非二值化特征图和实现附加功能(例如批量归一化)来改进。

Verilog 模型用于执行推理任务,但不训练用于计算的权重。相反,使用的权重由 Python 实现生成,并在 Verilog 模型中硬编码。当神经网络用于分类时,训练权重很耗时并且不是实时完成的。因此,我们选择将模型重点放在分类任务上,并使用预训练的权重进行计算。我们最初计划使用 HPS 传递 FPGA 使用的权重;然而,这导致使用了过多的逻辑单元并且设计不适合设备。

2、高级设计

2.1数学概述:

计算不同输出特征图所涉及的数学主要限于乘法和加法运算。由于我们设计中的权重是二进制值,乘法运算可以替换为三元运算符,这些运算符决定一个值在 1 -1 后是否必须加上或减去(0 的权重被视为 -1这大大减少了实现设计所需的 DSP 模块数量。卷积操作是通过在输入特征图上滑动过滤器来执行的。重叠索引彼此相乘并相加以形成相应输出索引处的值。二值化是通过确定被二值化的值的符号并相应地将输出值分配为 -1 1 来实现的。虽然真正的二值化涉及将输出转换为 1 0 而不是 1 -1,但此网络所需的计算使其更有效地转换为 1 -1。对于本报告的其余部分,对二值化的引用是指将数字转换为 1 -1,而不是 1 0。池化操作涉及检查给定值集中的最大值并将输出分配给该最大值。下面的图片描述了所有这些过程。

1:卷积示例

  2:池化示例

  3:二值化示例

2.2总体概述

二元神经网络由两个卷积层、两个池化层和两个全连接层组成。输入图像是一个 7 x 7 的两位黑白图像。图像在底部和右侧填充 -1s 以创建一个 8 x 8 的图像,该图像被输入网络。第一个卷积层将输入图像与 16 3 x 3 滤波器进行卷积,以产生 16 8 x 8 输出映射,这些映射被二值化为仅包含 1 -1。然后将这 16 个映射合并以形成 16 4 x 4 的输出映射,然后将其馈入第二个卷积层。第二个卷积层包含 512 3 x 3 滤波器。每张图像都与 32 个独特的过滤器进行卷积,以产生 32 4 x 4 的输出特征图。然后将这些二值化和池化,将它们变成 2 x 2 的输出映射,然后传递到全连接层。第一个全连接层将传入的 32 2 x 2 特征映射展平为一个 128 个条目的数组。然后将该数组与一个 128 x 32 的滤波器数组进行矩阵相乘,以生成大小为 32 的输出数组。然后将该输出数组二值化并乘以最终全连接层中的 32 x 10 滤波器矩阵,以生成一个 10 项数组。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第 0 个条目表示输入图像为 0 的可能性。如果数组中的第 0 个条目具有数组中的最大值,则 BNN 将推断输入为数字 0。然后将该数组与一个 128 x 32 的滤波器数组进行矩阵相乘,以生成大小为 32 的输出数组。然后将该输出数组二值化并乘以最终全连接层中的 32 x 10 滤波器矩阵,以生成一个 10 项数组。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第 0 个条目表示输入图像为 0 的可能性。如果数组中的第 0 个条目具有数组中的最大值,则 BNN 将推断输入为数字 0。然后将该数组与一个 128 x 32 的滤波器数组进行矩阵相乘,以生成大小为 32 的输出数组。然后将该输出数组二值化并乘以最终全连接层中的 32 x 10 滤波器矩阵,以生成一个 10 项数组。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第 0 个条目表示输入图像为 0 的可能性。如果数组中的第 0 个条目具有数组中的最大值,则 BNN 将推断输入为数字 0。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第 0 个条目表示输入图像为 0 的可能性。如果数组中的第 0 个条目具有数组中的最大值,则 BNN 将推断输入为数字 0。此数组中的每个条目对应于输入图像是与该数组索引对应的数字的图像的概率。例如,数组中的第 0 个条目表示输入图像为 0 的可能性。如果数组中的第 0 个条目具有数组中的最大值,则 BNN 将推断输入为数字 0

所有的特征图和权重数组都存储在寄存器中,卷积和矩阵乘法是使用三元运算符实现的。使用 DSP 模块会导致设计所需的乘法器短缺。特征映射的两位大小和 1 位权重数组导致最小的存储要求,消除了对 M10K 块等内存单元的需要。每个层的所有权重都在 Verilog 中硬编码。我们最初计划使用 PIO 端口将 HPS 馈入重量;然而,这导致使用了更多 FPGA 中可用的 ALM

3、硬件设计

3.1 输入图像

来自 MNIST 测试集的十个输入图像对应于十个数字中的每一个,在 FPGA 上以 Verilog 进行硬编码。FPGA 接收来自 HPS 的输入选择信号,该信号用于在各种图像中挑选作为输入并馈入二值化卷积网络以生成数字预测输出。来自 MNIST 测试集的输入图像平均池化为 7 x 7 大小的 1 位灰度矩阵。我们为每个条目使用 2 位,因为输入被二值化为 1 -12'b01 代表黑色像素,2'b11 代表白色像素。然后,在将图像输入到第一个卷积层之前,我们用 -1s 填充底行和右列以形成 8 x 8 矩阵。这使得矩阵大小均匀,更容易在更多层中使用。

3.2 VGA/摄像头

我们最初的计划是使用 NTSC 摄像机捕捉实时图像或手写数字作为输入,并实时执行数字分类。我们从 Avalon Bus Master to HPS 页面上的 Bruce 视频代码开始,它通过 Qsys 中的 Video_In_Subsystem 模块将视频输入存储到片上 SRAM,并且有一个总线主控将像素从 SRAM 复制到双端口 SDRAM,其中然后,VGA 控制器模块将 SDRAM 数据显示在 VGA 屏幕上。我们使用了代码和 Qsys 视频子系统模块。我们能够将 8 RGB 颜色转换为 2 位灰度,如下图所示,使用 Video_In_Clipper Video_In_Scaler Qsys 模块将输入大小从 320x240 修剪为 224x224,然后使用池化在 HPS 上创建 7x7 图像后来发现这个方案不可行,当我们在 FPGA 上运行 ALM 时,我们最常使用它来构建实际的二值化神经网络。因此,我们选择在 FPGA 上对来自 MNIST 数据集的一些现有输入图像进行硬编码,并发送一个选择信号以从中选择各种图像。

 图 4:224x224 2 位灰度到 7x7 1 位灰度

3.3卷积第一层

第一个卷积层使用 16 3 x 3 的滤波器,每个条目的大小为 1 位。输入图像是一个 8 x 8 矩阵,条目大小为 1 位,并与每个过滤器卷积以生成 16 个大小为 8 x 8 的输出特征图。输入图像的边为零,使其成为 10 x 10 矩阵。当与 3 x 3 矩阵进行卷积时,会产生一个 8 x 8 矩阵。

卷积是通过使用三元运算符来实现的,以确定过滤器中的位是 1 还是 0,从而将输入 fmap 中的值与临时和相加或相减。为了节省空间,我们使用 1 位权重(1 0)和三元运算符而不是两个位权重来表示 1 -1。临时总和存储在临时特征输出中。这对输出特征图中的每个条目重复进行,并为 16 3 x 3 过滤器中的每一个并行发生。一旦计算出所有临时和值,这些值的符号位用于将 +1 -1 分配给输出特征图中的相应条目。基本上,如果临时和为正且大于 0,我们将其分配给 +1。否则,我们将其分配给 -1。请注意,我们使用此实现将 -1 分配给临时总和 0。这一层使用两个组合always块实现,一个实现填充,一个计算卷积。每个块都包含嵌套的 for 循环,允许并行计算所有临时总和。在代码的主体中,一个生成循环用于实现 16 个这样的卷积单元,以允许并行计算 16 个输出特征图中的每一个。

3.4 池化层

网络中有两个最大池化层,每个卷积层后面有一个。池化层将输出特征图缩小了两倍。第一个池化层将 8 x 8 的特征映射转换为 4 x 4 的映射,而第二个池化层将 4 x 4 的特征映射转换为 2 x 2 的映射。这是通过在四个值的平方中取最大值并将该值指定为一个条目来代替输出特征图中的所有四个值来完成的,从而减小尺寸。两层都使用 for 循环来生成硬件以同时处理输入特征图中的所有元素。

3.4 卷积第二层

第二个卷积层的实现方式与第一个大致相同。两个组合always块用于填充图像并计算卷积的临时总和,然后将其存储在输出特征图中。与第一个卷积块不同,这里的输出不会立即二值化,因为必须首先计算部分和。16 个特征映射中的每一个与 32 个独特的过滤器的卷积为每个输入特征映射创建 32 个输出特征映射。然后将这 32 个输出相加并进行二值化以创建 32 个最终输出映射。在主要代码体中,生成块中嵌套的 for 循环用于并行实现所有卷积。

3.5部分和

部分求和层接收由第二个卷积层计算的 16*32 4 x 4 个特征映射,并将与输入的 16 个特征映射中的每一个对应的 32 个映射相加到该层。部分和是使用 32 4 x 4 累积临时和数组计算的。状态机用于首先在第一个状态下将数组中的所有值初始化为 0,并在下一个状态中迭代传递到层的 16 x 32 x 4 x 4 数组中的 16 行。嵌套 for 循环用于并行计算 32 x 4 x 4 部分和 - 在此状态下 16 个时钟周期后,部分和已计算完毕,状态机移至下一个状态。在这里,部分和被二值化并分配给 32 x 4 x 4 输出特征图,该特征图被传递到第二个池化层。

  图 5:部分和


 

3.6第一个全连接层

全连接层接收第二个池化层输出的 32 x 2 x 2 矩阵,并将其展平以形成一维 128 长度的数组。这乘以一个 128 x 32 的矩阵以形成一个长度为 32 的数组。这一层也是使用状态机和一个长度为 32 的临时和数组来实现的。在第一个状态中,临时和值都被初始化为 0。在下一个状态中,三元运算符用于确定权重矩阵中的值是否为a 1 0 和存储在扁平特征图的相应索引中的值分别从临时和中添加或减去。重复 128 次迭代 - 二维权重数组中的行数。一个 for 循环用于并行执行 32 个这样的操作。在这之后的状态中,

 图 6:第一个全连接层

3.7第二个全连接层

第二个全连接层的结构与第一个相同。它从前一层获取长度为 32 的数组,并使用与前面描述的相同的状态机结构将其与大小为 32 的权重矩阵乘以 10。输出矩阵是一个大小为 10 的数组,具有 8 位条目 - 值未二值化以提供有关数字分类的更多信息。

 图 7:第二个全连接层

 图 8:模型摘要

4、软件设计

二值化神经网络的最终输出是一个长度为 10 的数组。此最终输出数组的给定索引处的值对应于处理的图像是该索引号的图像的可能性。例如,如果索引 0 处的值是数组中的最小值,则表明处理后的图像为 0 的可能性最低。同样,如果索引 5 处的值是数组中的最高值,这意味着 BNN 推断图像最有可能是数字 5。我们通过了这 10 个最终输出使用 8 位宽 PIO 端口将值从 FPGA 传输到 HPS。然后 HPS 处理 10 个最终输出并将数字转换为概率尺度,以确定图像的前三个最可能的分类。串行控制台上 HPS 的输出如上图所示。为了计算概率,我们首先将所有正的最终输出值相加以获得正推理指数的总和。然后可以通过将索引 n 处的最终输出值除以正推理索引的总和来计算数字 n 的概率。

 图 9:HPS 串行控制台输出

5、系统设计

下图显示了我们设计的 Qsys 实现。PIO 端口从 HPS 连接到轻量级 axi 主总线,并导出到不同存储器地址的 FPGA 架构。Pio_switch 是我们用来选择在 hps 上硬编码的各种输入图像作为 BNN 的新输入的输出信号。一旦 pio_swich 被选择并输出到 FPGAHPS pio_start 从低电平切换到高电平以重新启动 BNN 数字识别计算。在 BNN 重启时,Pio_end 被设置为低,只有在 BNN 完成最终输出数组的计算时,FPGA 才会设置为高。通过记录复位时的时间和 pio_end 变高的时间,我们可以通过开始和结束时间的时间差来计算我们的 BNN 计算时间,我们发现大约为 4-5us

FPGA完成计算后,三个PIO端口(时钟信号pio_hps_image_clk,数据信号pio_out_data和片选信号pio_out_cs)依次接收FPGAHPS10个最终输出。片选线通常保持低电平以重置索引。当片选为高电平时,最终输出阵列的相应索引将在时钟信号的每个上升沿加载到数据信号中。此后,索引递增。为了开始接收最终输出,HPS 将片选拉高,翻转时钟信号,然后在数据端口读取并存储该值,从而将最终输出数组的值存储在索引 0 处。然后重复此过程 9次接收所有最终输出数组数据值。

 图 10:Qsys PIO 端口


 

6、测试

我们在 Modelsim 上测试了我们设计的初始迭代,并采用了单元测试来确保我们的每个模块都按预期工作。我们实现了每个模块并传入已知的输入值和模拟结果以验证输出是否符合预期。一旦我们为所有涉及的层完成了这些,我们就开始实例化所有层并将它们相互连接起来。然后我们将所有权重值和输入图像设置为已知值,并监控整个网络的流量。

 图 11:Modelsim 输出

一旦我们的设计仿真正确,我们将其移到 FPGA 上,并使用 LED 和 PIO 端口查看每一层的输出,以确保设计在硬件中执行与在仿真中一样。由于 Modelsim 仅模拟并行执行,因此我们必须对 FPGA 上的设计重复所有测试,以实际验证我们的层是否按预期工作。我们发现的一些错误是顺序操作的并行实现,例如累积和导致 FPGA 上的计算不准确。在 Modelsim 中,这些模拟正确,因为软件中的执行实际上是顺序的,但在生成实际电路时情况并非如此。

在 FPGA 上调试时,通过将输出映射到 LED 或通过 PIO 端口将其发送到 HPS 后将其打印在串行控制台上来测试每一层的实现。将硬件计算值与软件实现的 Python 模型进行比较,以验证每一层是否按预期运行。虽然调试模型的最有效方法是通过 PIO 端口传递输出值并在串行控制台上打印出来,但我们最终在 FPGA 上运行了算术逻辑模块 (ALM)。此时,我们必须切换到将输出映射到板上 LED 以验证计算出的值是否准确。

7、硬件错误和问题

虽然我们最初希望完全并行实施设计,但系统的某些元素使这不可行。网络的某些组件,例如部分求和模块,需要多个周期才能正确运行。对于此模块,必须依次执行 16 次加法运算才能计算出累加和。这 16 个操作不能并行执行,因此需要几个时钟周期才能执行。我们遇到的其他问题是在连接 PIO 端口以在 FPGA 和 HPS 之间传递数据以及将 FPGA 输出映射到板上的 LED 时,板上的 ALM 反复耗尽。添加端口或 LED 映射有时会导致实现设计所需的 ALM 资源大幅增加,从而导致设计不适合电路板。我们通过找到使用较少 ALM 的变通方法来解决这些问题 - 例如,我们没有从 HPS 传递权重,而是在 Verilog 文件中对其进行了硬编码。由于权重在分类过程中的任何时候都不会改变,因此这对功能没有任何影响。

8、结果

下图显示了我们最终演示的 LCD 显示屏。将显示前三个计算出的概率,以及传递到网络的 8 x 8 输入图像。完整的二值化神经网络能够准确地执行对图像进行分类所需的计算。将每一层的输出与软件中相应实现的输出进行比较,以验证是否正在执行预期的计算。软件准确度的预期准确度为 33% - 由于硬件模型模仿软件模型的计算,因此硬件分类器的预期准确度也可以假设为 33%。

 图 12:显示输出:数字 1

 图 13:显示输出:数字 5

软件模型的计算速度是通过将表示计算已经完成的完成信号传回 HPS 并测量从 HPS 发送到 FPGA 的开始信号到从 FPGA 发送完成信号之间的时间来衡量的回到 HPS。发现该 FPGA BNN 计算时间约为 0.004 ms 4us。另一方面,在 PC 上运行的相同 BNN Python 实现大约需要 44us。这个时间测量是通过在 y_conv 上运行 Tensorflow Eval 函数所需的持续时间来计算的:y_conv.eval(feed_dict=test_dict),其中 y_conv BNN 的最后一个张量层。在 1 个批次大小中,我们测量了处理 1 个输入所需的时间,大约为 64.4 毫秒,我们还测量了处理 180 个输入所需的时间,大约为 72.4 毫秒。因为 CNN 的处理时间是加载权重和计算的总时间,为了粗略估计计算权重的时间,我们使用时间差和 (72.4ms-64.4ms)/180 数据 = 44us/数据。请注意,我们在四核 PC 上运行 Python 代码。PC下测量时差存在不稳定性,各种因素会导致时间测量发生变化
9、资源使用

下表总结了我们设计的最终实现所使用的一些不同资源。可以看出,BNN 仅使用 FPGA 上可用总内存的一小部分,并且三元运算符的使用最大限度地减少了对乘法器/DSP 模块的需求。最常用的资源是 ALM,但当不包括用于将输出数据传输到 HPS、在设计中传达开始和结束信号等的 PIO 端口时,其中一半以上的资源仍然可以在板上使用这些结果证实了 BNN 的低资源需求。

 图 14:资源使用情况摘要


10、可用性

当前的设计不是非常灵活,因为输入图像必须硬编码到 Verilog 代码中才能进行处理。由于权重也是硬编码的,因此对这些权重的任何更改也需要修改和重新编译代码。通过使用 PIO 端口或 SRAM 存储器将权重从 HPS 传输到 FPGA,可以使设计更具可配置性;然而,在我们当前的实现中,引入这些元素中的任何一个都会导致设计不适合 FPGA。虽然数字分类本质上并不是一项非常广泛适用的任务,但图像分类今天有很多用途。硬件分类器的加速使其更适合时间是主要约束条件的实时分类任务。

11、结论

在大多数情况下,我们的实施符合我们的期望。我们最初希望获得更高的准确度;直到开发过程的后期,我们才注意到 Python 实现中的错误。纠正这个错误对于使 Python 设计真正二进制至关重要,但也导致准确度下降了大约 0.4(从大约 0.8 到 0.4)。对网络硬件的更改可以适应提高准确性所需的更改,但实施这些更改需要时间超过我们的截止日期。因此,我们选择继续实施较低精度的模型。

我们希望在我们的模型中包含的一个功能是一个摄像头接口,它允许实时捕获、合并图像并馈送到 BNN。虽然我们拥有实现此类系统所需的 Verilog 和 HPS 代码,但将此功能纳入设计会导致所需的 ALM 总数超过板上可用的数量 - 在添加这些更改之前,我们的设计使用了大约 28,000 个 ALM,添加它们后,计数跃升至 38,000 左右。

12、知识产权注意事项

实施的网络基于博士生 Ritchie Zhao 实施的框架。提供的代码也部分基于康奈尔大学高级课程的课堂作业。虽然没有专利或商标问题,但也没有专利机会,因为我们的硬件所基于的软件设计不是我们自己的设计。我们的 FPGA 代码是使用 ECE 5760 课程网页上提供的一些资源构建的。例如,我们用来与 VGA 显示器接口的代码来自类网站上的示例程序。除了参考相关语法和操作的在线资源之外,我们没有使用来自公共领域的任何其他代码。我们知道,我们的设计没有引起任何法律考虑。

13、改进和未来工作

如果我们重新完成这个项目,我们将改变的事情可能包括修改网络的设计,以支持来自每一层的二进制权重和非二进制输出特征图,因为这可以提高准确性。然而,虽然我们当前的实现使用很少的寄存器,但使用了很大比例的可用 ALM,因此这种实现可能不可行。另一个潜在的变化可能是改变网络的大小。目前,第一个卷积层有 16 个输出特征图,第二个卷积层和第一个全连接层有 32 个输出特征图。这些数字可以分别减少到 8、16 和 16。虽然这可能会导致精度下降,但较小的尺寸可以使设计适合电路板,而不会占用大量可用资源,

该模型的进一步改进可能包括扩展分类以处理来自不同数据集的图像,例如 CIFAR10,而不仅仅是数字。用于处理此类图像的神经网络比我们实现的神经网络更复杂,通常需要更多的内存和计算资源。由于我们已经在用这个网络推动 FPGA 计算资源的极限,我们可能需要使用更大的板来实现任何更复杂的东西。

14、Verilog代码和C代码


//verilog
// synthesis VERILOG_INPUT_VERSION SYSTEMVERILOG_2005
module DE1_SoC_Computer (
	
	// FPGA Pins
	

	// Clock pins
	CLOCK_50,
	CLOCK2_50,
	CLOCK3_50,
	CLOCK4_50,

	// ADC
	ADC_CS_N,
	ADC_DIN,
	ADC_DOUT,
	ADC_SCLK,

	// Audio
	AUD_ADCDAT,
	AUD_ADCLRCK,
	AUD_BCLK,
	AUD_DACDAT,
	AUD_DACLRCK,
	AUD_XCK,

	// SDRAM
	DRAM_ADDR,
	DRAM_BA,
	DRAM_CAS_N,
	DRAM_CKE,
	DRAM_CLK,
	DRAM_CS_N,
	DRAM_DQ,
	DRAM_LDQM,
	DRAM_RAS_N,
	DRAM_UDQM,
	DRAM_WE_N,

	// I2C Bus for Configuration of the Audio and Video-In Chips
	FPGA_I2C_SCLK,
	FPGA_I2C_SDAT,

	// 40-Pin Headers
	GPIO_0,
	GPIO_1,
	
	// Seven Segment Displays
	HEX0,
	HEX1,
	HEX2,
	HEX3,
	HEX4,
	HEX5,

	// IR
	IRDA_RXD,
	IRDA_TXD,

	// Pushbuttons
	KEY,

	// LEDs
	LEDR,

	// PS2 Ports
	PS2_CLK,
	PS2_DAT,
	
	PS2_CLK2,
	PS2_DAT2,

	// Slider Switches
	SW,

	// Video-In
	TD_CLK27,
	TD_DATA,
	TD_HS,
	TD_RESET_N,
	TD_VS,

	// VGA
	VGA_B,
	VGA_BLANK_N,
	VGA_CLK,
	VGA_G,
	VGA_HS,
	VGA_R,
	VGA_SYNC_N,
	VGA_VS,

	
	// HPS Pins
	
	
	// DDR3 SDRAM
	HPS_DDR3_ADDR,
	HPS_DDR3_BA,
	HPS_DDR3_CAS_N,
	HPS_DDR3_CKE,
	HPS_DDR3_CK_N,
	HPS_DDR3_CK_P,
	HPS_DDR3_CS_N,
	HPS_DDR3_DM,
	HPS_DDR3_DQ,
	HPS_DDR3_DQS_N,
	HPS_DDR3_DQS_P,
	HPS_DDR3_ODT,
	HPS_DDR3_RAS_N,
	HPS_DDR3_RESET_N,
	HPS_DDR3_RZQ,
	HPS_DDR3_WE_N,

	// Ethernet
	HPS_ENET_GTX_CLK,
	HPS_ENET_INT_N,
	HPS_ENET_MDC,
	HPS_ENET_MDIO,
	HPS_ENET_RX_CLK,
	HPS_ENET_RX_DATA,
	HPS_ENET_RX_DV,
	HPS_ENET_TX_DATA,
	HPS_ENET_TX_EN,

	// Flash
	HPS_FLASH_DATA,
	HPS_FLASH_DCLK,
	HPS_FLASH_NCSO,

	// Accelerometer
	HPS_GSENSOR_INT,
		
	// General Purpose I/O
	HPS_GPIO,
		
	// I2C
	HPS_I2C_CONTROL,
	HPS_I2C1_SCLK,
	HPS_I2C1_SDAT,
	HPS_I2C2_SCLK,
	HPS_I2C2_SDAT,

	// Pushbutton
	HPS_KEY,

	// LED
	HPS_LED,
		
	// SD Card
	HPS_SD_CLK,
	HPS_SD_CMD,
	HPS_SD_DATA,

	// SPI
	HPS_SPIM_CLK,
	HPS_SPIM_MISO,
	HPS_SPIM_MOSI,
	HPS_SPIM_SS,

	// UART
	HPS_UART_RX,
	HPS_UART_TX,

	// USB
	HPS_CONV_USB_N,
	HPS_USB_CLKOUT,
	HPS_USB_DATA,
	HPS_USB_DIR,
	HPS_USB_NXT,
	HPS_USB_STP
);

//=======================================================
//  PARAMETER declarations
//=======================================================


//=======================================================
//  PORT declarations
//=======================================================


// FPGA Pins


// Clock pins
input						CLOCK_50;
input						CLOCK2_50;
input						CLOCK3_50;
input						CLOCK4_50;

// ADC
inout						ADC_CS_N;
output					ADC_DIN;
input						ADC_DOUT;
output					ADC_SCLK;

// Audio
input						AUD_ADCDAT;
inout						AUD_ADCLRCK;
inout						AUD_BCLK;
output					AUD_DACDAT;
inout						AUD_DACLRCK;
output					AUD_XCK;

// SDRAM
output 		[12: 0]	DRAM_ADDR;
output		[ 1: 0]	DRAM_BA;
output					DRAM_CAS_N;
output					DRAM_CKE;
output					DRAM_CLK;
output					DRAM_CS_N;
inout			[15: 0]	DRAM_DQ;
output					DRAM_LDQM;
output					DRAM_RAS_N;
output					DRAM_UDQM;
output					DRAM_WE_N;

// I2C Bus for Configuration of the Audio and Video-In Chips
output					FPGA_I2C_SCLK;
inout						FPGA_I2C_SDAT;

// 40-pin headers
inout			[35: 0]	GPIO_0;
inout			[35: 0]	GPIO_1;

// Seven Segment Displays
output		[ 6: 0]	HEX0;
output		[ 6: 0]	HEX1;
output		[ 6: 0]	HEX2;
output		[ 6: 0]	HEX3;
output		[ 6: 0]	HEX4;
output		[ 6: 0]	HEX5;

// IR
input						IRDA_RXD;
output					IRDA_TXD;

// Pushbuttons
input			[ 3: 0]	KEY;

// LEDs
output		[ 9: 0]	LEDR;

// PS2 Ports
inout						PS2_CLK;
inout						PS2_DAT;

inout						PS2_CLK2;
inout						PS2_DAT2;

// Slider Switches
input			[ 9: 0]	SW;

// Video-In
input						TD_CLK27;
input			[ 7: 0]	TD_DATA;
input						TD_HS;
output					TD_RESET_N;
input						TD_VS;

// VGA
output		[ 7: 0]	VGA_B;
output					VGA_BLANK_N;
output					VGA_CLK;
output		[ 7: 0]	VGA_G;
output					VGA_HS;
output		[ 7: 0]	VGA_R;
output					VGA_SYNC_N;
output					VGA_VS;




// HPS Pins

	
// DDR3 SDRAM
output		[14: 0]	HPS_DDR3_ADDR;
output		[ 2: 0]  HPS_DDR3_BA;
output					HPS_DDR3_CAS_N;
output					HPS_DDR3_CKE;
output					HPS_DDR3_CK_N;
output					HPS_DDR3_CK_P;
output					HPS_DDR3_CS_N;
output		[ 3: 0]	HPS_DDR3_DM;
inout			[31: 0]	HPS_DDR3_DQ;
inout			[ 3: 0]	HPS_DDR3_DQS_N;
inout			[ 3: 0]	HPS_DDR3_DQS_P;
output					HPS_DDR3_ODT;
output					HPS_DDR3_RAS_N;
output					HPS_DDR3_RESET_N;
input						HPS_DDR3_RZQ;
output					HPS_DDR3_WE_N;

// Ethernet
output					HPS_ENET_GTX_CLK;
inout						HPS_ENET_INT_N;
output					HPS_ENET_MDC;
inout						HPS_ENET_MDIO;
input						HPS_ENET_RX_CLK;
input			[ 3: 0]	HPS_ENET_RX_DATA;
input						HPS_ENET_RX_DV;
output		[ 3: 0]	HPS_ENET_TX_DATA;
output					HPS_ENET_TX_EN;

// Flash
inout			[ 3: 0]	HPS_FLASH_DATA;
output					HPS_FLASH_DCLK;
output					HPS_FLASH_NCSO;

// Accelerometer
inout						HPS_GSENSOR_INT;

// General Purpose I/O
inout			[ 1: 0]	HPS_GPIO;

// I2C
inout						HPS_I2C_CONTROL;
inout						HPS_I2C1_SCLK;
inout						HPS_I2C1_SDAT;
inout						HPS_I2C2_SCLK;
inout						HPS_I2C2_SDAT;

// Pushbutton
inout						HPS_KEY;

// LED
inout						HPS_LED;

// SD Card
output					HPS_SD_CLK;
inout						HPS_SD_CMD;
inout			[ 3: 0]	HPS_SD_DATA;

// SPI
output					HPS_SPIM_CLK;
input						HPS_SPIM_MISO;
output					HPS_SPIM_MOSI;
inout						HPS_SPIM_SS;

// UART
input						HPS_UART_RX;
output					HPS_UART_TX;

// USB
inout						HPS_CONV_USB_N;
input						HPS_USB_CLKOUT;
inout			[ 7: 0]	HPS_USB_DATA;
input						HPS_USB_DIR;
input						HPS_USB_NXT;
output					HPS_USB_STP;

//=======================================================
//  REG/WIRE declarations
//=======================================================

//wire			[15: 0]	hex3_hex0;
//wire			[15: 0]	hex5_hex4;

//assign HEX0 = ~hex3_hex0[ 6: 0]; // hex3_hex0[ 6: 0]; 
//assign HEX1 = ~hex3_hex0[14: 8];
//assign HEX2 = ~hex3_hex0[22:16];
//assign HEX3 = ~hex3_hex0[30:24];
//assign HEX4 = 7'b1111111;
//assign HEX5 = 7'b1111111;
//assign HEX0 = test[6:0]; // hex3_hex0[ 6: 0]; 


//HexDigit Digit0(HEX0, final_out[1][7:4]);//hex3_hex0[3:0]);
//HexDigit Digit1(HEX1, final_out[1][3:0]);
//HexDigit Digit2(HEX2, hex3_hex0[11:8]);
//HexDigit Digit3(HEX3, hex3_hex0[15:12]);

// MAY need to cycle this switch on power-up to get video
assign TD_RESET_N = SW[1];

// get some signals exposed
// connect bus master signals to i/o for probes
//assign GPIO_0[0] = TD_HS ;
//assign GPIO_0[1] = TD_VS ;
//assign GPIO_0[2] = TD_DATA[6] ;
//assign GPIO_0[3] = TD_CLK27 ;
//assign GPIO_0[4] = TD_RESET_N ;


//=======================================================
// Bus controller for AVALON bus-master
//=======================================================
wire [31:0] vga_bus_addr, video_in_bus_addr ; // Avalon addresses
reg  [31:0] bus_addr ;
wire [31:0] vga_out_base_address = 32'h0000_0000 ;  // Avalon address
wire [31:0] video_in_base_address = 32'h0800_0000 ;  // Avalon address
reg [3:0] bus_byte_enable ; // four bit byte read/write mask
reg bus_read  ;       // high when requesting data
reg bus_write ;      //  high when writing data
reg [31:0] bus_write_data ; //  data to send to Avalog bus
wire bus_ack  ;       //  Avalon bus raises this when done
wire [31:0] bus_read_data ; // data from Avalon bus
reg [31:0] timer ;
reg [3:0] state ;
reg last_vs, wait_one;
reg [19:0] vs_count ;
reg last_hs, wait_one_hs ;
reg [19:0] hs_count ;

// pixel address is
logic [9:0] vga_x_cood, vga_y_cood, video_in_x_cood, video_in_y_cood ;
reg [7:0] current_pixel_color1, current_pixel_color2 ;
// compute address
// 640 x 480, ceil(log2 640) = 10
assign vga_bus_addr = vga_out_base_address + {22'b0,video_in_x_cood + vga_x_cood} +
 ({22'b0,video_in_y_cood + vga_y_cood}<<10) ;

 //video in: 320 by 240, x:0-319, y:0-239
 // 320 x 240, ceil(log2 320) = 9
  //video in: 224 by 224, x:0-223, y:0-223
 // 320 x 240, ceil(log2 224) = 7.8 = 8
assign video_in_bus_addr = video_in_base_address + {22'b0,video_in_x_cood} + 
({22'b0,video_in_y_cood}<<8) ;	 

logic [7:0] greyscale8;
logic [1:0] greyscale;
//765 432 10
assign greyscale = (bus_read_data[6:5]>>1) + (bus_read_data[3:2]>>1);
assign greyscale8 = {{2{1'b0, greyscale}}, greyscale};

logic [9:0] vga_x_cood_2, vga_y_cood_2;
logic [31:0] vga_bus_addr_2;
assign vga_bus_addr_2 = vga_out_base_address + {22'b0,video_in_x_cood + vga_x_cood_2} +
 ({22'b0,video_in_y_cood + vga_y_cood_2}<<10) ;
 
//logic [1:0] image_array [320][240];
always @(posedge CLOCK2_50) begin //CLOCK_50

	// reset state machine and read/write controls
	if (~KEY[0]) begin
		state <= 0 ;
		bus_read <= 0 ; // set to one if a read opeation from bus
		bus_write <= 0 ; // set to on if a write operation to bus
		// base address of upper-left corner of the screen
		vga_x_cood <= 10'd0 ;
		vga_y_cood <= 10'd0 ;
		
		vga_x_cood_2 <= 10'd230;
		vga_y_cood_2 <= 0 ;
		
		video_in_x_cood <= 0 ;
		video_in_y_cood <= 0 ;
		
	   bus_byte_enable <= 4'b0001;

		timer <= 0;
	end
	else begin
		timer <= timer + 1;
	end
	
	// write to the bus-master
	// and put in a small delay to aviod bus hogging
	// timer delay can be set to 2**n-1, so 3, 7, 15, 31
	// bigger numbers mean slower frame update to VGA
	if (state==0 && SW[0] && (timer & 3)==0 ) begin //
		state <= 1;	
		
		// read all the pixels in the video input
		video_in_x_cood <= video_in_x_cood + 10'd1 ;
		if (video_in_x_cood > 10'd223) begin
			video_in_x_cood <= 0 ;
			video_in_y_cood <= video_in_y_cood + 10'd1 ;
			if (video_in_y_cood > 10'd223) begin
				video_in_y_cood <= 10'd0 ;
			end
		end
		// one byte data
		bus_byte_enable <= 4'b0001;
		// read first pixel
		bus_addr <= video_in_bus_addr ;
		// signal the bus that a read is requested
		bus_read <= 1'b1 ;	
	end
	
	// finish the  read
	// You MUST do this check
	if (state==1 && bus_ack==1) begin
		state <= 2 ; //state <= 2 ;
		bus_read <= 1'b0;
		if (!SW[2]) begin
			current_pixel_color1 <= bus_read_data ;
		end
		else begin
			current_pixel_color1 <= 2;
		end
	end
	
	// write a pixel to VGA memory
	if (state==2) begin
		state <= 9 ;
		bus_write <= 1'b1;
		bus_addr <= vga_bus_addr ;
		bus_write_data <= current_pixel_color1  ;
		//image_array[video_in_x_cood][video_in_y_cood] <= greyscale; 
		bus_byte_enable <= 4'b0001;
	end
	
	// and finish write
	if (state==9 && bus_ack==1) begin
		state <= 0 ;
		bus_write <= 1'b0;
	end
	
end

//===============================

logic pio_start;
logic pio_end;
logic [2:0] pio_switch;
 //different input images corresponding to different numbers 
always @ (*) begin 
		if (pio_switch==3'd1) begin
			//LEDR[7:0] = final_out[3];
				//1. idx 171 is 0
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1, 1, 1,-1,-1},
				'{-1,-1, 1, 1, 1, 1,-1,-1},
				'{-1, 1, 1,-1,-1, 1,-1,-1},
				'{-1, 1, 1, 1, 1,-1,-1,-1},
				'{-1,-1, 1, 1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
				
		end 
		else if (pio_switch==3'd2) begin 
			//LEDR[7:0] = final_out[3];
				//2. idx 1 is a 1
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (pio_switch==3'd3) begin 
			//LEDR[7:0] = final_out[3];
				//3. idx 39 is a 2
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1, 1, 1,-1,-1,-1},
				'{-1, 1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end
		else if (pio_switch==3'd4) begin 
			//LEDR[7:0] = final_out[3];
				//4. idx 85 is 4
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1,-1,-1,-1,-1,-1},
				'{-1, 1, 1,-1, 1, 1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1, 1, 1,-1,-1,-1},
				'{-1,-1,-1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (pio_switch==3'd5) begin
			//LEDR[7:0] = final_out[3];
			//5. idx 119 is 5
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1,-1,-1,-1,-1},
				'{-1,-1, 1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1, 1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (pio_switch==3'd6) begin
			//LEDR[7:0] = final_out[3];
			//6. idx 137 is 7
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
			
		end 
		else begin
			//LEDR[7:0] = final_out[3];
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,1,-1},
				'{-1,-1,-1,-1,1,1,-1,-1},
				'{-1,-1,-1,-1,1,-1,-1,-1},
				'{-1,-1,-1,1,-1,-1,-1,-1},
				'{-1,-1,1,1,-1,-1,-1,-1},
				'{-1,-1,1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
/*
		if (SW[9]) begin
			LEDR[7:0] = final_out[3];
				//1. idx 171 is 0
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1, 1, 1,-1,-1},
				'{-1,-1, 1, 1, 1, 1,-1,-1},
				'{-1, 1, 1,-1,-1, 1,-1,-1},
				'{-1, 1, 1, 1, 1,-1,-1,-1},
				'{-1,-1, 1, 1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
				
		end 
		else if (SW[8]) begin 
			LEDR[7:0] = final_out[3];
				//2. idx 1 is a 1
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1, 1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (SW[7]) begin 
			LEDR[7:0] = final_out[3];
				//3. idx 39 is a 2
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1, 1, 1,-1,-1,-1},
				'{-1, 1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end
		else if (SW[6]) begin 
			LEDR[7:0] = final_out[3];
				//4. idx 85 is 4
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1,-1,-1,-1,-1,-1},
				'{-1, 1, 1,-1, 1, 1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1, 1, 1,-1,-1,-1},
				'{-1,-1,-1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (SW[5]) begin
			LEDR[7:0] = final_out[3];
			//5. idx 119 is 5
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1,-1,-1,-1,-1},
				'{-1,-1, 1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1, 1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (SW[4]) begin
			LEDR[7:0] = final_out[3];
			//6. idx 137 is 7
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1, 1, 1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1, 1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
			
		end 
		else if (SW[3]) begin
			LEDR[7:0] = final_out[3];
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,1,-1},
				'{-1,-1,-1,-1,1,1,-1,-1},
				'{-1,-1,-1,-1,1,-1,-1,-1},
				'{-1,-1,-1,1,-1,-1,-1,-1},
				'{-1,-1,1,1,-1,-1,-1,-1},
				'{-1,-1,1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (SW[2]) begin
			LEDR[7:0] = final_out[2];
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,1},
				'{-1,-1,-1,-1,1,1,-1,-1},
				'{-1,-1,-1,-1,1,-1,-1,-1},
				'{-1,-1,-1,1,-1,-1,-1,-1},
				'{-1,-1,1,1,-1,-1,-1,-1},
				'{-1,-1,1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else if (SW[1]) begin
			LEDR[7:0] = final_out[1];
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,1,1,-1,1},
				'{-1,-1,-1,-1,1,-1,-1,-1},
				'{-1,-1,-1,1,-1,-1,-1,-1},
				'{-1,-1,1,1,-1,-1,-1,-1},
				'{-1,-1,1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end 
		else begin
			LEDR[7:0] = 8'b0;	//final_out[0];
				input_image= 
				'{
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,1,1,-1,-1},
				'{-1,-1,-1,-1,1,-1,-1,-1},
				'{-1,-1,-1,1,-1,-1,-1,-1},
				'{-1,-1,1,1,-1,-1,-1,-1},
				'{-1,-1,1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1},
				'{-1,-1,-1,-1,-1,-1,-1,-1}
				};
		end
	*/
end
         

//===========================================================================================================
//					read outputs to HPS
//===========================================================================================================
logic signed [7:0] pio_out_data;
logic pio_out_cs;
integer out_count;

always @ (posedge pio_hps_image_clk) begin 
	if (pio_out_cs) begin
		if (out_count<10) begin 
			pio_out_data <= final_out[out_count];
			out_count <= out_count + 1; 
		end 
	end 
	else out_count <= 0; 
end

//===========================================================================================================
//					weight initialization
//===========================================================================================================

//first conv filters - 16 3x3s : logic filter [16][3][3];
always @ (*) begin 


filter = 
'{  '{ '{1,1,0},'{1,0,0},'{0,1,0} },
    '{ '{1,0,1},'{0,0,0},'{1,0,0} },
    '{ '{0,1,1},'{1,1,1},'{1,0,1} },
    '{ '{1,0,1},'{1,1,1},'{1,1,0} },
    '{ '{0,0,0},'{1,0,1},'{1,1,1} },
    '{ '{1,0,0},'{1,1,1},'{1,0,1} },
    '{ '{0,1,1},'{1,1,0},'{0,1,1} },
    '{ '{0,0,1},'{1,0,1},'{1,1,0} },
    '{ '{0,1,1},'{1,1,1},'{0,0,0} },
    '{ '{0,1,0},'{0,1,1},'{0,0,0} },
    '{ '{1,1,1},'{0,1,1},'{1,1,1} },
    '{ '{1,1,0},'{0,1,1},'{1,0,1} },
    '{ '{0,0,1},'{0,1,0},'{0,1,0} },
    '{ '{0,1,1},'{1,0,0},'{0,0,0} },
    '{ '{0,0,0},'{1,0,0},'{0,0,0} },
    '{ '{1,1,1},'{1,1,1},'{1,1,1} }  };

end 



//second conv filters - 16*32 3x3s in 16 row 32 column format: logic filters_conv2 [16][32][3][3];
always @ (*) begin 

filters_conv2 = 

'{
'{  '{ '{1,0,0},'{0,1,1},'{0,0,1} },
    '{ '{1,0,1},'{1,1,1},'{1,1,1} },
    '{ '{1,1,1},'{1,1,0},'{0,0,0} },
    '{ '{1,1,0},'{1,0,1},'{1,0,0} },
    '{ '{0,0,1},'{1,1,0},'{1,1,0} },
    '{ '{1,1,0},'{0,0,1},'{0,0,1} },
    '{ '{1,1,0},'{1,1,1},'{1,1,1} },
    '{ '{0,1,1},'{1,0,1},'{0,0,0} },
    '{ '{0,0,1},'{1,0,1},'{1,0,0} },
    '{ '{0,0,1},'{1,0,1},'{1,1,0} },
    '{ '{1,0,1},'{1,1,1},'{1,1,1} },
    '{ '{0,0,1},'{1,0,1},'{0,1,0} },
    '{ '{0,0,1},'{0,0,0},'{0,0,0} },
    '{ '{1,0,0},'{1,0,1},'{1,0,0} },
    '{ '{1,1,0},'{0,1,0},'{1,1,0} },
    '{ '{1,1,0},'{0,1,0},'{0,1,0} },
    '{ '{0,1,1},'{0,0,0},'{1,0,0} },
    '{ '{1,1,0},'{1,1,1},'{0,1,0} },
    '{ '{0,0,1},'{1,0,0},'{1,1,1} },
    '{ '{1,1,0},'{0,1,0},'{0,1,0} },
    '{ '{1,1,0},'{0,1,0},'{1,1,1} },
    '{ '{0,1,0},'{1,1,1},'{0,0,0} },
    '{ '{1,0,1},'{0,1,0},'{1,0,1} },
    '{ '{1,0,1},'{0,1,1},'{0,0,1} },
    '{ '{0,0,1},'{1,0,0},'{1,0,0} },
    '{ '{0,1,0},'{0,1,1},'{0,0,1} },
    '{ '{0,1,1},'{0,1,1},'{0,1,1} },
    '{ '{0,0,1},'{1,0,1},'{1,1,1} },
    '{ '{0,0,0},'{1,1,0},'{1,0,0} },
    '{ '{0,0,0},'{0,0,1},'{0,1,1} },
    '{ '{0,1,1},'{0,0,0},'{1,1,1} },
    '{ '{1,0,1},'{1,1,0},'{1,0,1} }  },
'{  '{ '{0,0,0},'{0,1,1},'{0,0,0} },
    '{ '{1,0,1},'{1,0,1},'{0,1,1} },
    '{ '{1,0,1},'{0,1,1},'{0,1,1} },
    '{ '{0,1,1},'{1,0,1},'{1,0,1} },
    '{ '{0,1,1},'{1,1,0},'{0,0,0} },
    '{ '{1,1,0},'{1,0,1},'{1,1,1} },
    '{ '{0,0,1},'{1,0,1},'{0,1,1} },
    '{ '{1,0,0},'{1,1,0},'{1,1,0} },
    '{ '{1,0,0},'{0,0,1},'{1,1,1} },
    '{ '{0,0,0},'{1,1,0},'{0,0,1} },
    '{ '{1,0,1},'{1,0,0},'{1,1,1} },
    '{ '{0,1,1},'{0,0,1},'{1,1,0} },
    '{ '{1,1,0},'{1,1,1},'{1,0,0} },
    '{ '{1,0,1},'{0,0,0},'{0,0,1} },
    '{ '{1,0,1},'{1,1,1},'{1,1,1} },
    '{ '{1,1,0},'{0,1,1},'{1,1,1} },
    '{ '{1,0,0},'{0,0,1},'{1,0,0} },
    '{ '{1,0,1},'{0,1,1},'{0,1,0} },
    '{ '{1,1,0},'{1,1,0},'{1,0,1} },
    '{ '{1,1,1},'{1,0,1},'{1,1,0} },
    '{ '{0,0,0},'{0,1,1},'{1,1,1} },
    '{ '{0,0,1},'{1,1,0},'{1,1,0} },
    '{ '{1,0,1},'{1,0,1},'{1,0,1} },
    '{ '{1,0,1},'{1,1,0},'{1,1,0} },
    '{ '{1,0,1},'{1,1,1},'{0,1,1} },
    '{ '{1,1,0},'{0,1,1},'{1,1,1} },
    '{ '{1,1,1},'{1,1,0},'{0,1,0} },
    '{ '{1,1,0},'{1,1,1},'{1,1,0} },
    '{ '{0,1,0},'{1,0,1},'{1,1,1} },
    '{ '{1,1,1},'{0,1,0},'{0,0,1} },
    '{ '{0,1,0},'{1,1,1},'{1,1,1} },
    '{ '{1,1,0},'{0,1,1},'{0,1,0} }  },
'{  '{ '{0,1,1},'{0,0,0},'{1,1,0} },
    '{ '{1,1,0},'{1,0,0},'{1,0,0} },
    '{ '{0,0,0},'{1,1,1},'{1,0,0} },
    '{ '{0,1,1},'{1,0,1},'{0,0,1} },
    '{ '{0,0,1},'{1,0,0},'{0,0,0} },
    '{ '{1,0,1},'{1,0,1},'{1,1,0} },
    '{ '{0,0,1},'{1,0,1},'{0,1,0} },
    '{ '{1,1,0},'{0,0,1},'{0,1,1} },
    '{ '{1,1,1},'{1,1,0},'{1,0,0} },
    '{ '{1,1,1},'{0,1,0},'{1,1,1} },
    '{ '{0,0,1},'{1,1,1},'{0,1,1} },
    '{ '{1,0,1},'{0,0,0},'{0,0,1} },
    '{ '{1,0,0},'{1,1,0},'{1,1,1} },
    '{ '{0,0,1},'{1,1,1},'{1,0,0} },
    '{ '{0,0,0},'{1,1,1},'{0,1,0} },
    '{ '{1,1,1},'{0,0,0},'{1,0,1} },
    '{ '{0,1,1},'{1,0,1},'{0,0,0} },
    '{ '{0,1,1},'{0,0,1},'{0,0,0} },
    '{ '{0,0,0},'{1,1,1},'{0,1,1} },
    '{ '{0,1,0},'{1,0,1},'{1,1,1} },
    '{ '{1,1,1},'{1,0,0},'{0,0,1} },
    '{ '{0,0,0},'{0,0,1},'{0,1,1} },
    '{ '{0,1,0},'{0,0,0},'{1,1,1} },
    '{ '{0,0,1},'{0,0,0},'{1,1,1} },
    '{ '{0,1,0},'{1,0,1},'{1,0,0} },
    '{ '{1,1,1},'{0,1,0},'{1,1,1} },
    '{ '{0,0,1},'{0,1,0},'{1,0,1} },
    '{ '{0,1,1},'{1,1,0},'{1,0,1} },
    '{ '{1,0,1},'{0,1,0},'{1,0,1} },
    '{ '{1,0,1},'{0,0,0},'{1,1,0} },
    '{ '{1,1,0},'{1,0,1},'{0,0,0} },
    '{ '{0,0,0},'{0,1,0},'{0,1,0} }  },
'{  '{ '{0,1,0},'{0,0,0},'{1,0,1} },
    '{ '{0,1,1},'{0,0,0},'{0,1,1} },
    '{ '{0,1,1},'{0,0,1},'{1,0,1} },
    '{ '{0,0,1},'{1,1,1},'{1,0,0} },
    '{ '{0,1,1},'{0,1,1},'{0,0,1} },
    '{ '{1,0,0},'{1,0,1},'{1,1,1} },
    '{ '{1,0,1},'{1,1,1},'{0,1,0} },
    '{ '{1,1,1},'{0,0,0},'{0,0,0} },
    '{ '{0,1,0},'{1,1,1},'{1,0,1} },
    '{ '{1,1,1},'{0,0,0},'{0,0,0} },
    '{ '{0,1,1},'{0,0,0},'{1,0,1} },
    '{ '{1,0,1},'{1,0,0},'{0,0,0} },
    '{ '{0,1,1},'{1,1,1},'{1,1,0} },
    '{ '{1,1,1},'{1,0,1},'{0,1,1} },
    '{ '{1,1,0},'{0,0,1},'{1,1,0} },
    '{ '{1,0,0},'{0,1,1},'{1,1,0} },
    '{ '{0,1,0},'{1,0,1},'{0,1,1} },
    '{ '{0,1,1},'{1,1,1},'{1,1,0} },
    '{ '{1,1,1},'{1,0,1},'{1,1,1} },
    '{ '{1,1,0},'{1,1,1},'{0,1,0} },
    '{ '{0,0,0},'{1,1,0},'{1,1,0} },
    '{ '{1,1,1},'{0,0,1},'{1,0,0} },
    '{ '{1,1,0},'{1,0,0},'{1,0,0} },
    '{ '{1,1,1},'{1,0,1},'{0,0,0} },
    '{ '{1,1,0},'{1,1,1},'{1,1,1} },
    '{ '{0,0,0},'{1,1,1},'{1,0,1} },
    '{ '{1,0,1},'{1,1,0},'{1,0,1} },
    '{ '{1,1,1},'{0,1,1},'{1,1,1} },
    '{ '{0,1,1},'{1,0,1},'{1,1,1} },
    '{ '{0,0,0},'{0,0,1},'{1,0,0} },
    '{ '{1,0,1},'{0,1,1},'{0,0,0} },
    '{ '{1,1,1},'{0,1,1},'{0,0,1} }  },
'{  '{ '{0,1,0},'{1,1,0},'{1,1,1} },
    '{ '{1,0,0},'{0,1,1},'{0,0,0} },
    '{ '{0,1,0},'{0,0,0},'{1,0,0} },
    '{ '{0,0,1},'{1,1,1},'{0,1,1} },
    '{ '{0,0,1},'{1,1,0},'{1,1,1} },
    '{ '{0,1,0},'{1,1,0},'{0,1,0} },
    '{ '{0,1,1},'{1,1,0},'{0,0,1} },
    '{ '{0,1,1},'{0,0,0},'{1,0,0} },
    '{ '{1,1,0},'{1,1,1},'{0,1,0} },
    '{ '{1,1,1},'{1,0,1},'{1,1,1} },
    '{ '{0,1,1},'{0,1,1},'{1,0,1} },
    '{ '{1,0,0},'{0,1,0},'{1,1,1} },
    '{ '{1,0,0},'{0,0,1},'{1,0,0} },
    '{ '{0,1,1},'{1,0,0},'{1,1,0} },
    '{ '{1,1,1},'{0,1,1},'{0,0,0} },
    '{ '{1,0,0},'{1,0,1},'{1,1,0} },
    '{ '{1,1,1},'{0,0,1},'{0,0,1} },
    '{ '{0,0,0},'{1,0,0},'{0,1,0} },
    '{ '{0,0,1},'{1,0,1},'{1,0,0} },
    '{ '{0,1,0},'{1,0,1},'{0,1,1} },
    '{ '{1,0,1},'{0,1,1},'{1,1,1} },
    '{ '{0,1,1},'{1,1,1},'{0,1,0} },
    '{ '{0,1,0},'{1,1,1},'{0,1,1} },
    '{ '{0,1,1},'{0,0,1},'{1,1,1} },
    '{ '{0,1,1},'{0,1,0},'{0,1,0} },
    '{ '{1,0,0},'{1,0,0},'{0,1,0} },
    '{ '{0,1,1},'{0,1,1},'{1,0,1} },
    '{ '{1,1,1},'{1,0,0},'{1,1,0} },
    '{ '{0,1,0},'{1,0,1},'{0,1,0} },
    '{ '{1,1,0},'{1,1,1},'{1,1,1} },
    '{ '{0,0,0},'{0,0,0},'{0,1,1} },
    '{ '{0,0,0},'{0,0,1},'{0,1,0} }  },
'{  '{ '{1,0,0},'{0,0,0},'{0,1,1} },
    '{ '{0,1,1},'{1,0,0},'{1,1,0} },
    '{ '{0,1,0},'{0,1,0},'{1,1,1} },
    '{ '{1,1,1},'{1,0,1},'{1,1,1} },
    '{ '{1,1,1},'{0,1,0},'{0,1,0} },
    '{ '{0,1,0},'{1,0,1},'{0,0,0} },
    '{ '{0,0,0},'{0,1,1},'{0,0,0} },
    '{ '{0,1,1},'{1,1,1},'{0,1,0} },
    '{ '{1,0,0},'{1,0,1},'{0,1,1} },
    '{ '{1,1,1},'{1,0,0},'{0,0,1} },
    '{ '{0,0,0},'{1,0,0},'{1,0,1} },
    '{ '{1,0,1},'{1,1,0},'{0,0,1} },
    '{ '{1,1,0},'{1,0,0},'{1,0,0} },
    '{ '{0,0,0},'{0,0,0},'{0,1,1} },
    '{ '{0,0,1},'{1,1,0},'{0,1,1} },
    '{ '{0,1,1},'{0,1,0},'{0,1,0} },
    '{ '{0,1,0},'{0,0,1},'{0,1,0} },
    '{ '{1,1,0},'{0,0,0},'{1,0,1} },
    '{ '{1,1,1},'{1,1,1},'{1,1,0} },
    '{ '{0,1,1},'{1,0,1},'{1,0,0} },
    '{ '{0,1,1},'{0,1,1},'{1,1,1} },
    '{ '{1,0,0},'{0,0,0},'{1,0,0} },
    '{ '{1,1,1},'{0,0,1},'{1,0,0} },
    '{ '{1,1,1},'{0,0,1},'{1,1,0} },
    '{ '{1,0,0},'{1,1,1},'{0,0,0} },
    '{ '{1,0,0},'{0,0,1},'{0,1,0} },
    '{ '{1,1,1},'{0,0,1},'{0,0,1} },
    '{ '{1,1,1},'{0,1,1},'{1,0,0} },
    '{ '{0,0,1},'{1,0,1},'{0,0,1} },
    '{ '{0,0,1},'{0,0,0},'{1,0,0} },
    '{ '{1,0,1},'{0,1,0},'{0,0,0} },
    '{ '{1,0,1},'{1,1,1},'{1,0,1} }  },
'{  '{ '{0,0,1},'{1,1,0},'{1,1,1} },
    '{ '{1,0,1},'{0,0,1},'{1,1,1} },
    '{ '{0,1,1},'{1,1,1},'{0,0,0} },
    '{ '{0,0,1},'{0,0,0},'{0,1,0} },
    '{ '{0,1,0},'{1,0,0},'{0,1,0} },
    '{ '{0,1,1},'{0,0,1},'{0,0,0} },
    '{ '{0,1,1},'{0,0,0},'{0,0,1} },
    '{ '{0,1,1},'{1,1,1},'{0,1,1} },
    '{ '{1,0,0},'{1,1,1},'{0,0,1} },
    '{ '{0,1,0},'{1,0,1},'{0,1,0} },
    '{ '{1,0,1},'{0,1,0},'{0,1,0} },
    '{ '{0,0,1},'{1,1,1},'{0,0,1} },
    '{ '{1,0,0},'{0,0,1},'{1,1,0} },
    '{ '{0,1,0},'{1,1,1},'{1,0,1} },
    '{ '{0,1,0},'{1,0,1},'{1,0,0} },
    '{ '{0,0,0},'{0,1,0},'{1,0,1} },
    '{ '{1,1,1},'{1,0,1},'{1,0,0} },
    '{ '{1,1,1},'{1,0,1},'{0,0,1} },
    '{ '{0,1,0},'{1,1,1},'{1,1,0} },
    '{ '{0,1,1},'{0,0,1},'{1,1,0} },
    '{ '{0,1,0},'{1,1,1},'{1,1,1} },
    '{ '{0,0,0},'{1,1,0},'{1,0,0} },
    '{ '{1,0,1},'{1,0,1},'{1,0,0} },
    '{ '{1,1,0},'{1,0,0},'{1,0,0} },
    '{ '{1,1,1},'{0,1,0},'{0,1,1} },
    '{ '{1,1,1},'{0,0,1},'{0,0,0} },
    '{ '{1,0,1},'{1,0,1},'{0,0,0} },
    '{ '{0,1,0},'{0,1,1},'{1,1,1} },
    '{ '{0,1,0},'{0,0,0},'{0,1,0} },
    '{ '{0,0,1},'{0,0,0},'{0,0,1} },
    '{ '{1,0,1},'{1,0,0},'{1,0,1} },
    '{ '{1,0,1},'{0,0,1},'{1,0,1} }  },
'{  '{ '{0,0,1},'{0,0,0},'{1,1,0} },
    '{ '{0,1,0},'{0,1,1},'{0,1,0} },
    '{ '{1,1,0},'{1,1,0},'{0,0,0} },
    '{ '{1,1,1},'{1,1,0},'{0,0,1} },
    '{ '{0,0,1},'{0,0,0},'{1,1,1} },
    '{ '{0,1,0},'{1,1,0},'{1,0,0} },
    '{ '{0,1,0},'{1,0,0},'{0,0,0} },
    '{ '{0,1,0},'{1,0,0},'{1,0,0} },
    '{ '{0,1,1},'{1,0,0},'{1,1,1} },
    '{ '{1,1,0},'{1,1,1},'{1,0,1} },
    '{ '{0,1,0},'{0,1,0},'{1,0,1} },
    '{ '{1,1,1},'{1,0,0},'{1,0,0} },
    '{ '{1,1,0},'{0,1,1},'{1,0,1} },
    '{ '{0,0,1},'{1,1,1},'{0,1,1} },
    '{ '{0,0,1},'{0,1,0},'{1,1,0} },
    '{ '{0,0,0},'{0,0,1},'{0,1,1} },
    '{ '{0,1,1},'{1,0,1},'{1,0,0} },
    '{ '{1,0,1},'{0,1,0},'{1,1,0} },
    '{ '{1,0,1},'{0,1,0},'{1,0,0} },
    '{ '{1,1,0},'{0,1,1},'{0,0,0} },
    '{ '{1,1,0},'{1,0,1},'{1,1,1} },
    '{ '{0,1,0},'{1,0,0},'{1,1,0} },
    '{ '{1,0,0},'{1,1,0},'{1,1,1} },
    '{ '{0,0,0},'{0,1,1},'{1,0,0} },
    '{ '{0,1,1},'{0,1,1},'{0,1,1} },
    '{ '{1,0,0},'{1,1,1},'{1,0,1} },
    '{ '{0,0,0},'{0,0,0},'{0,0,1} },
    '{ '{1,1,1},'{1,0,1},'{0,0,1} },
    '{ '{0,1,1},'{0,1,0},'{1,1,0} },
    '{ '{1,1,1},'{1,1,1},'{1,1,0} },
    '{ '{1,1,0},'{1,0,1},'{0,0,1} },
    '{ '{1,1,1},'{1,1,1},'{0,0,1} }  },
'{  '{ '{0,1,0},'{0,0,0},'{0,0,0} },
    '{ '{1,0,1},'{1,0,1},'{1,1,0} },
    '{ '{1,1,0},'{1,1,1},'{0,0,0} },
    '{ '{1,0,1},'{0,1,0},'{1,0,0} },
    '{ '{1,0,1},'{1,1,0},'{0,0,0} },
    '{ '{1,0,0},'{0,0,1},'{0,0,1} },
    '{ '{0,1,1},'{0,0,1},'{0,1,1} },
    '{ '{0,1,1},'{1,0,0},'{0,1,1} },
    '{ '{1,0,1},'{0,0,0},'{0,0,0} },
    '{ '{0,0,1},'{1,1,1},'{0,0,0} },
    '{ '{0,1,0},'{0,0,0},'{0,1,0} },
    '{ '{0,0,1},'{1,1,0},'{1,0,0} },
    '{ '{1,1,0},'{0,0,1},'{0,0,1} },
    '{ '{0,1,0},'{0,0,1},'{1,0,1} },
    '{ '{1,0,0},'{1,0,0},'{0,1,1} },
    '{ '{0,0,1},'{0,1,1},'{0,1,1} },
    '{ '{1,1,1},'{0,0,0},'{1,1,0} },
    '{ '{1,1,1},'{1,1,1},'{1,1,0} },
    '{ '{1,0,0},'{1,1,1},'{0,0,0} },
    '{ '{1,1,1},'{1,0,0},'{0,0,1} },
    '{ '{0,0,1},'{1,0,0},'{1,1,1} },
    '{ '{1,0,0},'{1,0,0},'{0,0,1} },
    '{ '{0,0,1},'{1,0,0},'{0,1,0} },
    '{ '{1,1,0},'{0,0,0},'{1,1,0} },
    '{ '{0,0,0},'{1,1,1},'{0,1,1} },
    '{ '{0,0,1},'{1,0,1},'{1,0,0} },
    '{ '{0,0,1},'{1,1,0},'{1,1,0} },
    '{ '{0,1,1},'{0,0,0},'{0,1,1} },
    '{ '{1,0,0},'{0,0,1},'{0,1,1} },
    '{ '{0,0,1},'{1,0,1},'{1,1,1} },
    '{ '{0,1,0},'{0,0,0},'{0,0,1} },
    '{ '{0,1,0},'{0,0,0},'{1,0,1} }  },
'{  '{ '{1,0,0},'{1,0,0},'{0,0,1} },
    '{ '{0,0,1},'{1,0,0},'{0,1,0} },
    '{ '{1,1,1},'{0,1,1},'{0,0,1} },
    '{ '{0,1,0},'{0,1,1},'{1,1,1} },
    '{ '{1,0,1},'{1,1,1},'{0,1,1} },
    '{ '{0,0,0},'{0,1,1},'{1,0,1} },
    '{ '{1,1,0},'{1,1,1},'{0,0,0} },
    '{ '{1,1,0},'{1,1,1},'{0,1,0} },
    '{ '{1,0,0},'{1,0,0},'{0,0,0} },
    '{ '{1,0,1},'{0,0,0},'{0,1,0} },
    '{ '{0,0,1},'{1,1,0},'{1,1,1} },
    '{ '{0,0,1},'{1,0,0},'{0,0,1} },
    '{ '{1,0,1},'{1,1,1},'{1,0,0} },
    '{ '{0,1,1},'{0,1,0},'{1,1,0} },
    '{ '{0,1,1},'{0,0,1},'{1,0,0} },
    '{ '{1,0,0},'{0,0,1},'{1,1,1} },
    '{ '{0,1,1},'{1,1,1},'{1,1,1} },
    '{ '{0,0,0},'{1,0,1},'{0,0,1} },
    '{ '{1,0,0},'{1,1,1},'{0,0,0} },
    '{ '{1,0,0},'{1,1,0},'{0,0,1} },
    '{ '{1,0,1},'{1,0,0},'{0,0,0} },
    '{ '{0,1,1},'{0,1,0},'{0,0,1} },
    '{ '{1,0,1},'{1,1,1},'{1,0,1} },
    '{ '{0,1,1},'{1,0,1},'{1,0,1} },
    '{ '{1,1,0},'{1,0,1},'{0,1,1} },
    '{ '{0,1,0},'{0,0,1},'{1,1,0} },
    '{ '{0,0,1},'{1,1,1},'{0,0,1} },
    '{ '{1,0,0},'{1,1,0},'{0,0,0} },
    '{ '{0,0,1},'{0,1,1},'{0,1,1} },
    '{ '{1,1,0},'{1,1,0},'{0,1,1} },
    '{ '{0,1,1},'{1,0,1},'{0,0,1} },
    '{ '{1,0,1},'{0,1,1},'{1,1,0} }  },
'{  '{ '{1,0,1},'{1,1,1},'{1,1,1} },
    '{ '{1,1,1},'{0,0,1},'{0,0,1} },
    '{ '{0,0,0},'{1,1,1},'{0,0,1} },
    '{ '{1,1,0},'{1,0,1},'{1,1,1} },
    '{ '{1,0,1},'{0,0,1},'{0,1,1} },
    '{ '{0,1,0},'{1,1,0},'{1,0,0} },
    '{ '{0,1,1},'{0,1,0},'{1,1,1} },
    '{ '{1,0,1},'{1,0,0},'{1,0,0} },
    '{ '{1,1,1},'{1,0,0},'{1,1,0} },
    '{ '{0,0,1},'{1,0,1},'{0,0,1} },
    '{ '{1,0,1},'{0,1,1},'{1,1,0} },
    '{ '{1,1,1},'{1,0,1},'{1,1,1} },
    '{ '{1,0,1},'{1,0,1},'{1,1,0} },
    '{ '{1,1,1},'{1,0,0},'{1,0,0} },
    '{ '{0,1,0},'{0,0,1},'{0,0,0} },
    '{ '{0,1,0},'{0,0,0},'{0,1,1} },
    '{ '{1,0,0},'{1,1,1},'{0,0,1} },
    '{ '{1,1,0},'{0,1,1},'{0,0,0} },
    '{ '{0,0,0},'{1,1,0},'{1,0,1} },
    '{ '{0,0,0},'{0,0,1},'{0,1,0} },
    '{ '{1,1,0},'{0,0,0},'{0,0,0} },
    '{ '{1,0,0},'{1,0,1},'{0,0,0} },
    '{ '{0,1,0},'{1,1,1},'{0,0,1} },
    '{ '{0,0,0},'{1,1,1},'{1,0,1} },
    '{ '{1,0,0},'{0,0,1},'{0,0,1} },
    '{ '{0,0,1},'{1,1,0},'{0,1,0} },
    '{ '{0,0,0},'{1,0,1},'{1,0,1} },
    '{ '{1,1,0},'{0,0,0},'{1,0,1} },
    '{ '{1,1,1},'{1,1,1},'{1,1,1} },
    '{ '{1,1,0},'{0,0,1},'{0,0,0} },
    '{ '{0,1,0},'{0,0,1},'{0,1,1} },
    '{ '{1,1,1},'{1,1,0},'{1,0,0} }  },
'{  '{ '{1,1,0},'{1,1,0},'{1,1,0} },
    '{ '{0,1,0},'{1,0,1},'{1,0,1} },
    '{ '{0,1,0},'{1,1,0},'{0,0,0} },
    '{ '{0,1,0},'{1,1,0},'{0,1,0} },
    '{ '{0,1,1},'{0,0,1},'{1,0,0} },
    '{ '{1,1,1},'{0,0,0},'{0,1,0} },
    '{ '{0,0,1},'{0,0,1},'{1,0,0} },
    '{ '{1,1,1},'{1,0,1},'{1,1,1} },
    '{ '{1,1,0},'{1,0,1},'{0,0,0} },
    '{ '{1,0,1},'{0,1,0},'{0,0,1} },
    '{ '{1,0,1},'{0,1,0},'{0,1,1} },
    '{ '{1,0,1},'{0,0,0},'{1,0,0} },
    '{ '{0,0,1},'{0,0,1},'{1,0,0} },
    '{ '{1,1,1},'{1,1,1},'{0,1,0} },
    '{ '{0,1,1},'{1,0,0},'{0,0,1} },
    '{ '{0,1,1},'{0,1,1},'{0,1,0} },
    '{ '{1,1,1},'{0,0,0},'{1,0,1} },
    '{ '{0,1,0},'{0,0,1},'{0,0,1} },
    '{ '{1,1,1},'{0,0,1},'{1,1,0} },
    '{ '{0,1,0},'{0,0,0},'{1,1,0} },
    '{ '{1,0,0},'{1,0,0},'{1,0,1} },
    '{ '{0,0,0},'{0,1,0},'{0,1,0} },
    '{ '{0,1,0},'{0,1,1},'{0,1,0} },
    '{ '{1,1,0},'{0,1,0},'{0,0,0} },
    '{ '{1,0,0},'{1,1,0},'{0,0,1} },
    '{ '{1,1,0},'{0,0,1},'{1,1,1} },
    '{ '{1,0,0},'{1,1,1},'{1,0,1} },
    '{ '{1,0,1},'{0,0,1},'{1,0,1} },
    '{ '{0,0,1},'{1,0,1},'{1,0,0} },
    '{ '{1,0,0},'{0,0,0},'{1,1,1} },
    '{ '{0,0,0},'{0,0,1},'{1,1,0} },
    '{ '{1,1,1},'{1,1,1},'{0,0,0} }  },
'{  '{ '{1,0,0},'{0,0,1},'{0,0,0} },
    '{ '{1,1,0},'{0,1,0},'{1,0,0} },
    '{ '{1,0,0},'{1,0,0},'{1,1,0} },
    '{ '{0,1,1},'{1,1,1},'{0,1,0} },
    '{ '{1,0,0},'{1,0,0},'{0,0,0} },
    '{ '{0,1,1},'{0,1,1},'{1,1,1} },
    '{ '{1,1,1},'{0,1,0},'{1,1,1} },
    '{ '{1,0,1},'{0,1,1},'{0,1,1} },
    '{ '{1,1,0},'{0,1,1},'{1,1,0} },
    '{ '{1,0,1},'{0,1,0},'{0,1,0} },
    '{ '{0,1,1},'{0,1,0},'{1,1,1} },
    '{ '{1,0,0},'{1,1,1},'{1,0,0} },
    '{ '{0,1,0},'{0,1,0},'{0,1,1} },
    '{ '{1,0,0},'{1,0,1},'{1,1,1} },
    '{ '{0,1,1},'{0,1,0},'{1,1,0} },
    '{ '{1,1,1},'{1,1,1},'{1,1,1} },
    '{ '{0,0,0},'{1,0,1},'{0,0,1} },
    '{ '{1,1,1},'{1,0,1},'{0,1,0} },
    '{ '{1,0,1},'{1,1,1},'{1,1,0} },
    '{ '{1,0,0},'{0,0,0},'{1,1,1} },
    '{ '{1,1,0},'{0,0,0},'{1,0,1} },
    '{ '{0,1,1},'{1,1,1},'{1,0,1} },
    '{ '{1,1,0},'{1,0,0},'{0,0,1} },
    '{ '{0,0,1},'{0,0,1},'{1,1,1} },
    '{ '{0,1,0},'{0,1,1},'{1,0,0} },
    '{ '{1,1,1},'{1,1,0},'{1,1,0} },
    '{ '{0,1,1},'{0,0,0},'{0,1,1} },
    '{ '{0,0,0},'{1,0,0},'{0,1,1} },
    '{ '{0,1,1},'{1,0,1},'{0,1,1} },
    '{ '{0,0,1},'{1,0,0},'{1,0,0} },
    '{ '{1,0,1},'{0,0,1},'{0,1,1} },
    '{ '{0,0,1},'{1,1,0},'{1,0,0} }  },
'{  '{ '{0,0,1},'{0,1,0},'{0,0,1} },
    '{ '{0,1,0},'{0,0,0},'{1,0,1} },
    '{ '{0,1,1},'{0,0,1},'{1,0,0} },
    '{ '{0,0,0},'{1,1,0},'{0,0,0} },
    '{ '{0,1,0},'{1,1,1},'{1,0,0} },
    '{ '{0,1,0},'{0,1,0},'{0,0,1} },
    '{ '{0,0,1},'{0,0,0},'{0,1,0} },
    '{ '{0,0,0},'{1,1,1},'{1,1,1} },
    '{ '{1,0,0},'{0,1,1},'{0,1,0} },
    '{ '{1,1,0},'{0,1,0},'{0,1,0} },
    '{ '{0,0,1},'{1,0,1},'{1,1,0} },
    '{ '{1,0,1},'{1,1,0},'{0,1,1} },
    '{ '{1,0,0},'{1,0,1},'{1,1,1} },
    '{ '{0,1,1},'{1,1,1},'{1,0,0} },
    '{ '{0,1,1},'{1,1,0},'{0,1,0} },
    '{ '{0,1,1},'{0,0,0},'{0,0,1} },
    '{ '{0,1,0},'{1,1,0},'{0,1,1} },
    '{ '{1,0,1},'{0,1,1},'{1,0,0} },
    '{ '{0,0,1},'{1,1,1},'{1,1,1} },
    '{ '{0,1,0},'{1,1,0},'{0,0,0} },
    '{ '{1,1,1},'{1,1,1},'{1,0,0} },
    '{ '{0,0,1},'{1,1,0},'{0,1,0} },
    '{ '{1,0,1},'{1,0,0},'{0,1,1} },
    '{ '{1,1,0},'{1,0,1},'{0,1,0} },
    '{ '{0,1,0},'{0,0,1},'{1,0,1} },
    '{ '{1,1,1},'{1,1,1},'{0,0,0} },
    '{ '{1,1,1},'{0,0,0},'{1,1,1} },
    '{ '{1,0,0},'{1,1,1},'{1,1,0} },
    '{ '{1,0,1},'{1,0,1},'{0,0,0} },
    '{ '{1,0,0},'{0,0,0},'{1,1,0} },
    '{ '{0,1,1},'{1,1,0},'{1,1,0} },
    '{ '{1,0,0},'{1,0,1},'{1,1,1} }  },
'{  '{ '{1,0,0},'{1,1,1},'{1,1,1} },
    '{ '{0,1,0},'{1,0,0},'{0,0,1} },
    '{ '{0,1,1},'{0,0,0},'{1,1,1} },
    '{ '{1,1,1},'{0,0,0},'{0,1,1} },
    '{ '{0,0,0},'{0,1,1},'{0,0,1} },
    '{ '{1,1,0},'{1,1,1},'{1,1,1} },
    '{ '{0,0,0},'{1,1,0},'{1,1,1} },
    '{ '{1,1,1},'{0,0,1},'{1,0,0} },
    '{ '{1,0,0},'{1,1,1},'{1,0,1} },
    '{ '{0,1,0},'{1,0,0},'{1,1,0} },
    '{ '{1,1,0},'{1,1,0},'{1,0,1} },
    '{ '{0,1,0},'{0,0,1},'{0,1,0} },
    '{ '{1,1,1},'{1,1,1},'{0,1,1} },
    '{ '{0,0,1},'{1,1,0},'{0,1,1} },
    '{ '{0,0,0},'{0,0,0},'{1,1,1} },
    '{ '{1,1,1},'{1,1,1},'{0,0,0} },
    '{ '{1,1,0},'{1,0,1},'{1,0,0} },
    '{ '{0,0,1},'{1,1,0},'{0,1,1} },
    '{ '{1,1,0},'{1,0,0},'{1,0,0} },
    '{ '{1,0,1},'{1,0,1},'{1,1,0} },
    '{ '{0,1,1},'{1,1,0},'{0,0,1} },
    '{ '{0,1,1},'{1,0,0},'{0,1,0} },
    '{ '{0,1,0},'{1,0,0},'{1,0,0} },
    '{ '{0,1,0},'{0,0,1},'{0,0,1} },
    '{ '{0,0,0},'{0,0,1},'{1,1,0} },
    '{ '{1,1,0},'{1,0,1},'{0,0,0} },
    '{ '{1,0,1},'{1,1,0},'{1,0,1} },
    '{ '{1,0,0},'{1,0,0},'{0,1,0} },
    '{ '{0,1,0},'{1,0,0},'{0,1,1} },
    '{ '{1,0,1},'{0,0,1},'{1,1,1} },
    '{ '{0,0,0},'{1,1,1},'{0,0,1} },
    '{ '{0,1,1},'{1,1,0},'{1,0,0} }  },
'{  '{ '{1,0,0},'{0,1,1},'{0,1,1} },
    '{ '{0,0,0},'{1,0,0},'{1,0,1} },
    '{ '{1,0,1},'{0,1,0},'{0,1,1} },
    '{ '{0,1,1},'{0,1,0},'{0,0,1} },
    '{ '{0,1,0},'{0,0,1},'{1,1,1} },
    '{ '{1,0,1},'{0,1,0},'{0,0,0} },
    '{ '{0,1,0},'{1,0,1},'{1,1,1} },
    '{ '{0,1,0},'{1,1,0},'{0,0,0} },
    '{ '{1,1,1},'{1,1,0},'{0,1,1} },
    '{ '{0,1,1},'{1,1,0},'{0,1,1} },
    '{ '{1,1,0},'{0,1,0},'{1,1,1} },
    '{ '{1,1,1},'{0,1,1},'{0,0,1} },
    '{ '{0,1,1},'{0,0,1},'{0,0,1} },
    '{ '{1,0,0},'{0,0,0},'{1,1,1} },
    '{ '{1,0,1},'{0,0,1},'{1,1,1} },
    '{ '{0,0,1},'{0,1,1},'{0,0,1} },
    '{ '{1,0,0},'{0,1,0},'{1,0,1} },
    '{ '{0,1,1},'{0,1,0},'{0,0,0} },
    '{ '{0,1,1},'{1,0,0},'{1,0,1} },
    '{ '{0,0,0},'{1,0,0},'{0,0,1} },
    '{ '{1,0,1},'{1,0,0},'{0,1,0} },
    '{ '{1,1,1},'{1,1,0},'{1,1,1} },
    '{ '{1,1,0},'{0,1,1},'{0,0,1} },
    '{ '{0,1,1},'{1,1,1},'{1,0,1} },
    '{ '{0,1,1},'{1,0,0},'{0,1,1} },
    '{ '{0,1,0},'{1,0,1},'{1,0,0} },
    '{ '{1,1,1},'{1,0,0},'{0,1,1} },
    '{ '{1,1,1},'{1,0,0},'{1,0,1} },
    '{ '{0,1,1},'{1,1,1},'{1,1,0} },
    '{ '{0,0,1},'{0,0,1},'{0,0,1} },
    '{ '{0,0,1},'{1,0,1},'{1,1,0} },
    '{ '{0,1,0},'{1,1,1},'{0,1,0} }  }
};

end



//weights for fc layer - 128 row 32 columns:  logic wa [128][32];

integer h, q;
always @(*) begin
		wa = 
'{
'{0,1,0,0,0,1,1,1,1,1,1,0,0,0,1,0,1,1,0,1,0,1,1,0,1,0,0,1,1,0,1,1},
'{0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,1,0,1,1},
'{0,0,1,1,1,0,0,1,0,1,1,0,0,1,0,1,0,1,0,1,1,0,1,1,0,1,0,1,1,0,1,1},
'{1,0,1,1,0,1,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0,1,0,1,1,1,0,1,1,0,0,0},
'{0,0,0,1,0,1,1,1,0,0,0,1,0,0,0,0,1,1,0,0,1,0,1,1,1,0,1,0,1,0,0,1},
'{0,1,0,1,0,0,1,1,1,0,1,1,0,0,1,1,0,1,0,1,0,1,1,0,0,1,1,1,0,1,1,1},
'{1,1,1,0,1,0,0,1,1,0,1,1,1,1,1,1,1,0,1,0,1,0,1,0,1,0,0,0,0,0,0,0},
'{1,1,0,0,0,0,1,0,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,1,0,1,1,1,0,1,1},
'{0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,1,0,1,0,0,1,0},
'{1,0,0,1,1,1,0,0,0,0,1,1,0,1,1,0,0,1,0,1,0,1,0,1,0,1,0,1,1,1,1,1},
'{0,1,1,0,0,1,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,1,1,0,0,1,1,1,0,1,0,0},
'{0,0,0,1,1,1,0,1,0,0,1,0,0,1,1,0,1,0,1,0,0,1,1,1,1,0,0,0,1,0,0,0},
'{0,1,0,0,0,0,0,1,1,1,1,0,1,1,0,1,0,0,0,0,1,0,1,0,0,0,1,1,0,0,0,0},
'{1,0,1,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,0,1,1,0,1,0,0,1,1,1,0},
'{1,0,1,0,1,1,1,0,0,0,1,0,1,1,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1,1,0,1},
'{1,0,1,1,1,1,0,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,1,1,0,1,0,1,1,1,0,0},
'{0,1,0,0,0,1,0,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,0,0,0},
'{0,0,0,0,1,1,0,0,1,0,1,1,1,0,1,0,1,0,0,1,1,1,1,0,1,1,1,0,1,1,1,1},
'{0,1,0,0,0,1,1,0,1,0,1,1,1,0,0,1,0,1,1,1,1,1,0,0,0,1,1,1,0,1,1,0},
'{0,0,0,1,1,1,1,1,0,1,1,1,1,1,0,1,0,1,0,1,1,1,1,1,1,1,0,0,1,1,0,0},
'{0,1,1,1,1,1,0,0,0,0,0,1,0,0,1,0,1,0,1,1,1,1,0,0,0,0,0,0,1,1,1,1},
'{0,1,0,0,0,0,1,1,0,1,0,1,0,1,0,0,1,0,0,1,1,0,0,1,1,0,0,0,0,1,0,1},
'{1,0,1,0,0,1,0,1,0,0,0,1,1,0,1,0,1,1,1,1,0,0,0,0,1,0,1,1,0,0,1,1},
'{1,1,0,0,1,1,1,0,1,0,0,1,0,1,1,0,1,1,0,0,0,1,1,0,1,1,0,0,1,1,1,1},
'{1,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,1,0,0,1,1,1,0,0,1,1,1,1,0,0,1,1},
'{0,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,1},
'{1,1,0,1,0,0,1,0,1,0,1,0,1,1,1,0,1,1,0,1,1,1,0,1,1,1,0,0,1,0,1,1},
'{1,1,1,0,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,0,0,1,0,1,0,1,0,0,1,1,1,1},
'{1,1,1,0,1,1,0,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,1,0,1,0,0},
'{1,0,0,1,0,0,0,1,0,0,0,1,0,1,1,0,1,1,1,1,0,0,1,0,0,1,1,1,1,0,0,1},
'{0,1,1,1,1,0,0,0,1,0,0,1,1,1,0,0,1,0,0,1,0,0,0,1,0,1,0,0,1,1,0,1},
'{0,0,1,0,0,1,1,0,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,1,0,1,1,1,0,1},
'{1,0,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,1,1,1,0,1,1,0,1,0,1,1,1,1,0,1},
'{0,0,0,1,0,0,1,0,1,0,0,0,0,1,0,1,0,0,1,1,1,0,1,1,0,0,1,0,0,0,1,1},
'{1,0,1,1,0,0,1,0,0,1,1,1,1,0,0,1,1,1,0,1,1,1,1,0,1,1,1,0,1,0,0,1},
'{1,0,0,0,0,1,0,0,0,1,0,1,1,1,1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0,0,1},
'{0,0,0,1,1,0,1,0,1,1,0,0,0,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0},
'{0,1,1,0,1,0,0,0,0,1,1,0,1,0,0,1,1,1,1,0,0,1,1,1,0,0,1,0,0,1,0,1},
'{1,0,0,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,1,1,1,1,0,1,0,1,1,1,1},
'{0,0,0,1,1,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,1,0,0,1,1,0,1,1,0,1,1,0},
'{1,0,1,1,0,1,0,0,0,0,0,1,0,1,1,1,0,0,1,0,1,1,1,1,1,1,1,0,1,1,0,1},
'{1,0,0,1,0,1,0,0,0,1,1,1,0,1,1,1,1,0,0,0,0,1,1,0,1,1,1,0,1,1,0,1},
'{1,1,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,1,0,1,1,1,0,0,1,1,1,0,0,0,1,1},
'{0,0,0,0,0,0,0,1,0,1,0,1,1,0,0,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0},
'{0,0,0,1,0,0,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,1,0,1,0,1,1,0,1,1,0,0},
'{0,1,1,0,0,0,1,1,0,1,1,0,1,1,0,1,0,1,0,0,1,1,1,0,1,1,0,1,1,0,1,1},
'{1,0,0,1,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,1,1,0,1,1,0},
'{0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,0,1,1,0,1,1,1,0,1,1,1,1,0,1,0,0,1},
'{0,0,0,0,0,0,0,1,0,0,1,1,1,1,0,1,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,0},
'{0,1,1,0,0,1,1,1,0,1,0,0,0,1,0,1,1,0,1,0,1,1,1,1,1,1,1,0,1,0,1,0},
'{0,1,1,0,1,1,1,0,1,1,0,0,1,0,0,0,1,0,0,0,1,0,1,1,1,1,1,0,1,0,1,1},
'{1,1,1,1,0,1,1,0,0,1,0,0,1,1,0,0,0,0,1,1,1,1,1,0,1,0,1,0,1,0,0,0},
'{0,0,0,1,1,1,1,1,1,0,0,1,0,1,0,1,0,0,1,0,0,0,1,0,0,1,1,1,1,1,1,0},
'{1,1,0,1,0,1,1,1,0,0,0,0,0,1,0,1,1,0,0,1,0,1,0,1,1,0,1,1,0,0,1,1},
'{0,0,0,1,0,1,1,1,1,0,1,1,0,1,1,0,1,1,1,1,0,1,1,0,1,0,1,0,0,0,1,1},
'{0,1,0,0,1,0,0,1,1,0,1,0,1,1,1,0,1,1,0,1,1,0,0,1,1,0,0,1,0,0,1,1},
'{1,0,1,0,1,0,1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,0,0,0,1,1,1,1,1},
'{1,1,0,0,0,0,0,0,1,1,0,1,1,1,0,0,0,1,1,1,1,0,1,0,1,0,1,1,0,0,0,0},
'{1,0,1,0,1,0,1,1,1,0,1,1,1,0,0,0,1,0,0,1,1,0,0,0,0,0,1,0,1,1,0,1},
'{0,1,1,1,0,0,1,0,1,1,0,0,0,1,1,1,1,1,0,0,1,0,1,1,0,1,0,1,1,1,1,0},
'{1,1,1,1,1,1,1,1,0,1,1,0,1,0,0,1,0,1,1,1,1,1,0,0,1,0,1,0,0,1,1,0},
'{1,1,1,0,1,1,1,0,1,1,1,1,1,0,0,0,1,0,1,0,1,1,1,1,0,0,0,1,0,0,1,1},
'{0,1,1,1,1,0,1,0,1,1,1,0,1,0,1,1,1,1,0,0,1,1,1,0,0,0,1,0,1,0,1,1},
'{0,0,0,1,1,1,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1},
'{0,1,0,0,0,0,1,1,0,1,0,0,1,1,0,1,0,0,1,1,0,1,0,1,1,1,1,0,1,1,1,1},
'{0,1,1,1,0,0,0,1,1,0,1,0,1,0,1,1,1,1,1,1,0,0,1,0,0,0,1,1,0,0,1,0},
'{1,1,1,0,0,0,1,0,1,0,1,0,1,1,1,0,0,1,1,1,0,1,1,1,1,1,1,1,1,0,1,1},
'{1,0,0,1,1,1,0,1,0,0,0,1,1,1,1,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0},
'{1,0,1,0,0,1,1,0,0,1,0,1,0,1,0,0,1,1,0,1,0,0,0,1,0,1,0,1,1,0,0,1},
'{0,1,0,1,0,1,1,0,1,0,0,1,1,0,0,0,1,0,0,1,1,0,1,1,0,1,0,0,1,1,1,0},
'{0,1,1,1,0,1,1,0,0,1,0,1,0,1,1,0,0,0,0,1,0,1,1,1,1,0,0,1,1,0,1,1},
'{1,1,1,1,1,1,0,1,0,1,1,0,0,1,0,1,1,0,1,0,0,0,1,0,0,0,0,1,0,1,1,0},
'{0,0,1,1,0,0,0,0,1,0,1,0,1,0,1,1,0,1,1,0,1,0,0,1,0,1,0,1,0,1,0,1},
'{1,0,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,1,1,1,1,0,1,0,1,0,0,1,0,1,1,0},
'{1,0,1,0,1,0,0,1,0,0,0,1,1,1,1,1,0,0,1,1,1,1,0,1,0,1,0,1,1,0,1,1},
'{0,0,1,0,1,0,0,1,1,0,0,1,0,0,0,1,0,1,0,0,1,1,1,0,0,1,1,1,1,0,0,0},
'{1,0,1,1,1,0,0,1,0,1,1,1,1,0,1,0,0,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1},
'{0,0,1,0,0,1,1,0,0,0,0,1,0,1,1,1,0,1,0,0,1,0,1,1,1,1,1,0,1,0,0,1},
'{0,1,1,0,0,1,0,0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1,1,0,0,0,0,1,1,1,0},
'{0,0,0,0,1,1,1,0,1,1,0,1,1,1,1,1,1,1,1,0,1,0,1,1,0,1,1,1,0,1,1,0},
'{1,1,0,1,1,0,1,1,0,1,1,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,1,0,1,1},
'{1,1,0,0,0,1,1,0,0,1,1,0,1,0,0,0,0,0,0,1,1,1,0,0,0,1,0,1,0,1,0,0},
'{1,1,0,0,1,1,0,0,0,1,1,0,0,1,0,0,1,1,0,0,1,0,1,1,1,0,1,1,0,1,1,0},
'{0,0,1,1,0,0,0,1,1,0,0,1,0,1,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,0,1},
'{0,1,1,1,1,1,0,1,1,1,0,0,0,0,0,1,1,1,0,0,0,1,0,1,1,0,1,0,0,0,1,0},
'{0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0},
'{1,1,1,0,1,0,0,1,0,1,1,0,1,1,1,0,0,0,0,1,0,0,1,0,0,1,1,1,1,1,1,1},
'{1,1,1,1,1,1,0,0,1,0,0,1,1,0,0,0,1,0,1,1,0,0,0,1,1,1,0,1,1,0,0,1},
'{1,0,0,1,1,0,0,0,0,1,1,0,0,0,1,1,0,0,0,0,1,0,1,1,1,0,1,1,0,0,0,1},
'{0,1,0,1,0,1,0,0,1,0,0,1,0,1,1,1,1,1,1,1,1,1,1,0,0,1,0,0,1,0,0,1},
'{0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,0,1,1,1,0,0,1,0},
'{0,0,1,0,1,0,1,0,0,0,0,1,1,0,0,1,1,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1},
'{1,1,1,0,1,1,1,1,0,1,1,1,1,1,1,0,1,1,0,0,1,0,1,0,1,0,0,1,0,0,1,0},
'{0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,1,1,1,1,0,1,0,1,0,0,0,1},
'{1,1,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0,0,1,1,1,0,0,1,1,0,1,0},
'{1,1,1,1,0,0,1,1,0,1,1,0,0,1,0,1,0,1,0,1,0,1,1,0,1,1,1,1,0,0,0,1},
'{1,1,1,1,0,1,0,0,1,1,0,1,1,1,1,1,0,0,1,1,0,0,0,1,0,1,1,1,0,1,1,1},
'{0,1,0,1,1,1,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,1,1,1,0,0,1,0,0,0,0},
'{1,1,0,0,1,1,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,0,1,1,1,0,1,1,1,0,0,0},
'{1,0,0,0,1,1,1,0,1,0,0,1,1,1,1,1,1,0,1,1,0,1,1,1,0,1,0,0,0,0,1,1},
'{1,1,1,1,0,1,0,0,1,1,0,1,0,1,0,1,1,0,0,1,1,0,1,1,1,0,1,1,0,0,0,1},
'{0,0,1,1,1,1,0,0,1,0,1,1,0,1,1,1,0,1,1,1,1,1,0,0,1,1,1,0,1,0,1,0},
'{0,0,0,0,1,1,1,0,1,0,1,0,0,0,1,0,1,1,0,1,1,0,0,1,1,0,1,0,1,0,0,0},
'{0,0,1,1,1,0,1,1,1,1,0,1,1,0,1,1,0,1,0,0,1,1,1,1,1,0,0,1,1,0,0,0},
'{1,0,1,1,0,1,1,1,1,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,1,1,1,1,0,1,0},
'{0,0,1,0,1,1,0,1,1,1,1,1,1,0,0,1,0,0,0,1,0,1,0,1,1,1,1,1,0,0,1,1},
'{0,1,0,1,1,1,1,1,1,1,1,0,1,0,0,0,0,0,0,1,0,1,1,1,1,0,1,1,1,0,1,1},
'{1,1,0,1,0,0,0,1,0,1,0,1,0,0,1,1,1,1,0,1,0,1,0,0,0,0,0,1,0,1,0,1},
'{0,1,0,0,1,0,1,1,0,1,0,1,1,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1,1,1,0},
'{1,0,1,1,0,0,1,1,0,1,1,1,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0},
'{0,0,0,0,0,1,1,1,1,0,1,0,0,0,1,1,1,0,1,0,1,1,1,0,1,1,1,0,0,0,0,0},
'{0,1,1,1,1,1,1,1,0,0,1,1,1,0,0,1,0,1,0,1,1,0,0,0,1,1,0,0,1,1,1,1},
'{1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,0,0,1,1,0,1,0,0,0,0,1,0,1},
'{0,0,1,0,0,1,1,1,1,1,0,1,1,0,0,1,1,0,0,1,0,0,1,1,0,1,1,0,1,0,0,1},
'{1,1,1,1,1,0,1,0,1,0,1,0,1,1,1,1,0,1,1,1,1,1,1,0,0,1,1,1,0,1,1,0},
'{1,1,1,0,0,1,0,1,0,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,1,1,1,0,1,0,0,0},
'{0,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,0,1,1,0,0,0,0,0,1,1},
'{1,1,0,1,1,0,1,1,0,1,0,1,1,0,0,1,0,0,1,1,1,0,0,1,0,1,0,1,0,1,0,1},
'{1,1,0,0,1,1,0,1,1,0,1,1,0,0,1,0,0,0,1,0,0,1,1,0,1,1,1,0,1,0,1,1},
'{0,0,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,0,0,0,1},
'{1,0,0,1,1,0,0,0,1,1,1,1,0,0,1,1,0,1,1,1,0,1,0,1,0,0,1,1,0,0,1,1},
'{1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,0,1,0,1,0,1,0,1,0,1,0,0,0,1,1,1},
'{1,0,1,0,0,1,1,0,0,0,1,0,1,1,0,1,0,1,0,1,1,0,0,0,1,1,1,1,0,1,0,1},
'{1,0,0,1,1,0,0,0,1,0,1,0,1,1,1,0,1,1,0,1,1,0,0,0,0,1,0,0,1,1,0,0},
'{0,0,1,0,0,1,1,1,0,1,1,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,0,0,1,1},
'{1,0,0,0,1,1,0,0,1,0,0,1,1,1,1,1,1,1,0,0,1,0,1,1,1,1,0,0,1,0,0,1},
'{1,1,1,0,1,1,1,1,1,0,1,1,0,0,1,0,1,1,0,0,1,0,1,1,1,1,0,0,0,1,1,0},
'{1,0,1,1,1,0,1,0,1,1,1,0,0,1,1,1,0,1,1,1,1,1,1,1,0,1,0,1,0,1,1,1}
};
end



//weights for final mapping - 32 rows 10 columns: logic wa1 [32][10];
 
integer r, k;
always @(*) begin
wa1 = 
'{
'{1,0,1,1,0,1,1,0,1,1},
'{1,0,0,0,1,1,1,0,0,0},
'{1,0,1,1,1,1,0,0,1,0},
'{0,0,0,1,1,0,1,0,0,1},
'{1,0,0,0,0,1,1,1,0,1},
'{0,1,0,1,0,1,0,1,1,0},
'{1,1,0,0,1,0,1,1,0,1},
'{0,1,0,0,1,0,0,1,1,1},
'{1,1,1,0,1,0,1,0,1,1},
'{0,0,0,0,1,0,1,0,1,1},
'{1,0,1,1,0,1,1,1,1,0},
'{0,1,0,0,1,1,1,1,1,1},
'{1,1,1,1,0,0,0,0,1,0},
'{0,1,0,1,0,0,1,1,1,0},
'{1,0,0,1,0,1,0,0,1,1},
'{0,0,1,1,1,1,0,1,0,1},
'{0,1,0,0,0,1,0,1,1,1},
'{1,0,1,1,1,0,1,0,1,1},
'{1,0,0,0,0,1,1,1,0,0},
'{0,1,1,0,1,1,0,1,1,1},
'{1,1,0,1,0,0,0,0,0,1},
'{0,1,0,1,0,0,1,1,1,1},
'{1,1,0,0,0,1,1,0,1,1},
'{0,1,1,1,0,0,1,0,0,0},
'{0,1,0,0,1,0,1,1,0,1},
'{1,1,0,1,1,1,0,1,1,0},
'{0,1,1,0,1,0,1,1,0,0},
'{1,0,1,0,1,1,1,0,1,1},
'{0,1,1,1,0,1,0,1,0,1},
'{1,0,0,1,1,0,0,0,0,0},
'{1,0,1,0,0,1,0,1,1,1},
'{1,0,1,0,1,1,0,0,1,0}
};

end


//===========================================================================================================
//					1. first convolutional layer 
//===========================================================================================================
logic signed [1:0] input_image [8][8]; //Input image
logic filter [16][3][3]; //Input filter
logic signed [1:0] out_map [16][8][8]; //Output image

genvar m; 
generate 
	for (m=0; m<16; m++ ) begin: conv1
		conv_1 one (.fmap(input_image),  .filter(filter[m]), .partial_sums(out_map[m]), .clk_50(CLOCK_50)); //, .start(start_conv1), .finish(finish_conv1[m])
	end 
endgenerate

//pooling to produce 4x4 from 8x8 
logic signed [1:0] pool_conv1 [16][4][4];
pool1 first (.pool_conv1(pool_conv1), .out_map(out_map), .clk_50(CLOCK_50)); //, .start(finish_conv1[1]), .finish(finish_pool1)

//===========================================================================================================
//				 2. second convolutional layer 
//===========================================================================================================

//4D array containing all the 16*32 3x3 filters used in the convolution
logic filters_conv2 [16][32][3][3]; //Input filter
//4D array which contains the 16*32 4x4 partial sums generated by convolving 16 input fmaps with 32 filters each 
logic signed [4:0] partials_conv2 [16][32][4][4]; //Output partial from conv2

logic finish_ps;
logic finish_pool2;

genvar n, o;
generate 
	//convole each of the 16 input fmaps with a unique set of 32 filters to generate 32 sets of 16 partial sums 
	for (n=0; n<16; n++) begin: conv2
		for (o=0; o<32; o++) begin: conv2_inner
			conv_2 second (.fmap(pool_conv1[n]), .filter(filters_conv2[n][o]), .partial_sums(partials_conv2[n][o]), .clk_50(CLOCK_50)
						   );
		end
	end 
endgenerate

//calculate partial sums 
partial_sums conv_layer2 (.outmap_conv2d(outmap_conv2d), .partials_conv2(partials_conv2), .clk_50 (CLOCK_50), .start(pio_start), .finish(finish_ps)); //KEY[3]

//3D array containing the 32 4x4 output fmaps generated at this layer 
logic signed [1:0] outmap_conv2d [32][4][4];

//3D array containing the 32 2x2 output fmaps generated at this layer 
logic signed [1:0] pool_conv2 [32][2][2];

//pooling to produce 2x2 from 4x4 
pool2 second (.pool_conv2(pool_conv2), .outmap_conv2d(outmap_conv2d), .clk_50(CLOCK_50), .start(finish_ps), .finish(finish_pool2)); //finish_ps

//===========================================================================================================
//					fully connected layer 
//===========================================================================================================
logic finish_fc1;
logic start_fc2;

//128x32 array of binary weights 
logic wa [128][32];

//1x32 array output from this layer 
logic signed [1:0] fc_out [32];

logic signed [8:0] temp [32];

//feed in output from last pooling layer 
fc1 full_1 (.fmap(pool_conv2), .wa(wa), .clk_50(CLOCK_50), .start(finish_pool2), .finish(finish_fc1), .fc_out(fc_out));


//===========================================================================================================
//					 final map layer 
//===========================================================================================================

//32x10 binary weight array 
logic wa1 [32][10];

logic finish_fc2; 

//1x10 8 bit output from layer 
logic signed [7:0] final_out [10];


//feed in output from fully connected layer 
ten_map last (.fmap(fc_out), .wa1(wa1), .final_out(final_out), .clk_50(CLOCK_50), .start(finish_fc1), .finish(finish_fc2));

assign pio_end = finish_fc2;
//=======================================================
//  Structural coding
//=======================================================

Computer_System The_System (
	
	// FPGA Side
	
	// PIO ports 
	//.pio_fpga_data_external_connection_export (pio_fpga_data),
	.pio_hps_image_data_external_connection_export (pio_hps_image_data),
	.pio_hps_image_clk_external_connection_export (pio_hps_image_clk),
	.pio_hps_image_cs_external_connection_export (pio_hps_image_cs),
	.pio_out_data_external_connection_export (pio_out_data),
	.pio_out_cs_external_connection_export (pio_out_cs),
	
	.pio_start_external_connection_export (pio_start),
	.pio_end_external_connection_export (pio_end),
	.pio_switch_external_connection_export (pio_switch),
	//.pio_x_external_connection_export(pio_x),
	//.pio_y_external_connection_export(pio_y),
	// Global signals
	.system_pll_ref_clk_clk					(CLOCK_50),
	.system_pll_ref_reset_reset			(1'b0),
	
	
	
		// SRAM shared block with HPS
	.onchip_sram_0_s1_address               (sram_address),               
	.onchip_sram_0_s1_clken                 (sram_clken),                 
	.onchip_sram_0_s1_chipselect            (sram_chipselect),            
	.onchip_sram_0_s1_write                 (sram_write),                 
	.onchip_sram_0_s1_readdata              (sram_readdata),              
	.onchip_sram_0_s1_writedata             (sram_writedata),             
	  
	// AV Config
	.av_config_SCLK							(FPGA_I2C_SCLK),
	.av_config_SDAT							(FPGA_I2C_SDAT),

	// Audio Subsystem
//	.audio_pll_ref_clk_clk					(CLOCK3_50),
//	.audio_pll_ref_reset_reset				(1'b0),
//	.audio_clk_clk								(AUD_XCK),
//	.audio_ADCDAT								(AUD_ADCDAT),
//	.audio_ADCLRCK								(AUD_ADCLRCK),
//	.audio_BCLK									(AUD_BCLK),
//	.audio_DACDAT								(AUD_DACDAT),
//	.audio_DACLRCK								(AUD_DACLRCK),

	// Slider Switches
	//.slider_switches_export					(SW),

	// Pushbuttons (~KEY[3:0]),
	//.pushbuttons_export						(~KEY[3:0]),

	// Expansion JP1
	//.expansion_jp1_export					({GPIO_0[35:19], GPIO_0[17], GPIO_0[15:3], GPIO_0[1]}),

	// Expansion JP2
	//.expansion_jp2_export					({GPIO_1[35:19], GPIO_1[17], GPIO_1[15:3], GPIO_1[1]}),

	// LEDs
	//.leds_export								(LEDR),
	
	// Seven Segs
	//.hex3_hex0_export							(hex3_hex0),
	//.hex5_hex4_export							(hex5_hex4),
	
	// PS2 Ports
	//.ps2_port_CLK								(PS2_CLK),
	//.ps2_port_DAT								(PS2_DAT),
	//.ps2_port_dual_CLK						(PS2_CLK2),
	//.ps2_port_dual_DAT						(PS2_DAT2),

	// IrDA
	//.irda_RXD									(IRDA_RXD),
	//.irda_TXD									(IRDA_TXD),

	// VGA Subsystem
	.vga_pll_ref_clk_clk 					(CLOCK2_50),
	.vga_pll_ref_reset_reset				(1'b0),
	.vga_CLK										(VGA_CLK),
	.vga_BLANK									(VGA_BLANK_N),
	.vga_SYNC									(VGA_SYNC_N),
	.vga_HS										(VGA_HS),
	.vga_VS										(VGA_VS),
	.vga_R										(VGA_R),
	.vga_G										(VGA_G),
	.vga_B										(VGA_B),
	
	// Video In Subsystem
	.video_in_TD_CLK27 						(TD_CLK27),
	.video_in_TD_DATA							(TD_DATA),
	.video_in_TD_HS							(TD_HS),
	.video_in_TD_VS							(TD_VS),
	.video_in_clk27_reset					(),
	.video_in_TD_RESET						(),
	.video_in_overflow_flag					(),
	
	.ebab_video_in_external_interface_address     (bus_addr),     // 
	.ebab_video_in_external_interface_byte_enable (bus_byte_enable), //  .byte_enable
	.ebab_video_in_external_interface_read        (bus_read),        //  .read
	.ebab_video_in_external_interface_write       (bus_write),       //  .write
	.ebab_video_in_external_interface_write_data  (bus_write_data),  //.write_data
	.ebab_video_in_external_interface_acknowledge (bus_ack), //  .acknowledge
	.ebab_video_in_external_interface_read_data   (bus_read_data),   
	// clock bridge for EBAb_video_in_external_interface_acknowledge
	.clock_bridge_0_in_clk_clk                    (CLOCK_50),
		
	// SDRAM
	.sdram_clk_clk								(DRAM_CLK),
   .sdram_addr									(DRAM_ADDR),
	.sdram_ba									(DRAM_BA),
	.sdram_cas_n								(DRAM_CAS_N),
	.sdram_cke									(DRAM_CKE),
	.sdram_cs_n									(DRAM_CS_N),
	.sdram_dq									(DRAM_DQ),
	.sdram_dqm									({DRAM_UDQM,DRAM_LDQM}),
	.sdram_ras_n								(DRAM_RAS_N),
	.sdram_we_n									(DRAM_WE_N),
	
	
	// HPS Side
	
	// DDR3 SDRAM
	.memory_mem_a			(HPS_DDR3_ADDR),
	.memory_mem_ba			(HPS_DDR3_BA),
	.memory_mem_ck			(HPS_DDR3_CK_P),
	.memory_mem_ck_n		(HPS_DDR3_CK_N),
	.memory_mem_cke		(HPS_DDR3_CKE),
	.memory_mem_cs_n		(HPS_DDR3_CS_N),
	.memory_mem_ras_n		(HPS_DDR3_RAS_N),
	.memory_mem_cas_n		(HPS_DDR3_CAS_N),
	.memory_mem_we_n		(HPS_DDR3_WE_N),
	.memory_mem_reset_n	(HPS_DDR3_RESET_N),
	.memory_mem_dq			(HPS_DDR3_DQ),
	.memory_mem_dqs		(HPS_DDR3_DQS_P),
	.memory_mem_dqs_n		(HPS_DDR3_DQS_N),
	.memory_mem_odt		(HPS_DDR3_ODT),
	.memory_mem_dm			(HPS_DDR3_DM),
	.memory_oct_rzqin		(HPS_DDR3_RZQ),
		  
	// Ethernet
	.hps_io_hps_io_gpio_inst_GPIO35	(HPS_ENET_INT_N),
	.hps_io_hps_io_emac1_inst_TX_CLK	(HPS_ENET_GTX_CLK),
	.hps_io_hps_io_emac1_inst_TXD0	(HPS_ENET_TX_DATA[0]),
	.hps_io_hps_io_emac1_inst_TXD1	(HPS_ENET_TX_DATA[1]),
	.hps_io_hps_io_emac1_inst_TXD2	(HPS_ENET_TX_DATA[2]),
	.hps_io_hps_io_emac1_inst_TXD3	(HPS_ENET_TX_DATA[3]),
	.hps_io_hps_io_emac1_inst_RXD0	(HPS_ENET_RX_DATA[0]),
	.hps_io_hps_io_emac1_inst_MDIO	(HPS_ENET_MDIO),
	.hps_io_hps_io_emac1_inst_MDC		(HPS_ENET_MDC),
	.hps_io_hps_io_emac1_inst_RX_CTL	(HPS_ENET_RX_DV),
	.hps_io_hps_io_emac1_inst_TX_CTL	(HPS_ENET_TX_EN),
	.hps_io_hps_io_emac1_inst_RX_CLK	(HPS_ENET_RX_CLK),
	.hps_io_hps_io_emac1_inst_RXD1	(HPS_ENET_RX_DATA[1]),
	.hps_io_hps_io_emac1_inst_RXD2	(HPS_ENET_RX_DATA[2]),
	.hps_io_hps_io_emac1_inst_RXD3	(HPS_ENET_RX_DATA[3]),

	// Flash
	.hps_io_hps_io_qspi_inst_IO0	(HPS_FLASH_DATA[0]),
	.hps_io_hps_io_qspi_inst_IO1	(HPS_FLASH_DATA[1]),
	.hps_io_hps_io_qspi_inst_IO2	(HPS_FLASH_DATA[2]),
	.hps_io_hps_io_qspi_inst_IO3	(HPS_FLASH_DATA[3]),
	.hps_io_hps_io_qspi_inst_SS0	(HPS_FLASH_NCSO),
	.hps_io_hps_io_qspi_inst_CLK	(HPS_FLASH_DCLK),

	// Accelerometer
	.hps_io_hps_io_gpio_inst_GPIO61	(HPS_GSENSOR_INT),

	//.adc_sclk                        (ADC_SCLK),
	//.adc_cs_n                        (ADC_CS_N),
	//.adc_dout                        (ADC_DOUT),
	//.adc_din                         (ADC_DIN),

	// General Purpose I/O
	.hps_io_hps_io_gpio_inst_GPIO40	(HPS_GPIO[0]),
	.hps_io_hps_io_gpio_inst_GPIO41	(HPS_GPIO[1]),

	// I2C
	.hps_io_hps_io_gpio_inst_GPIO48	(HPS_I2C_CONTROL),
	.hps_io_hps_io_i2c0_inst_SDA		(HPS_I2C1_SDAT),
	.hps_io_hps_io_i2c0_inst_SCL		(HPS_I2C1_SCLK),
	.hps_io_hps_io_i2c1_inst_SDA		(HPS_I2C2_SDAT),
	.hps_io_hps_io_i2c1_inst_SCL		(HPS_I2C2_SCLK),

	// Pushbutton
	.hps_io_hps_io_gpio_inst_GPIO54	(HPS_KEY),

	// LED
	.hps_io_hps_io_gpio_inst_GPIO53	(HPS_LED),

	// SD Card
	.hps_io_hps_io_sdio_inst_CMD	(HPS_SD_CMD),
	.hps_io_hps_io_sdio_inst_D0	(HPS_SD_DATA[0]),
	.hps_io_hps_io_sdio_inst_D1	(HPS_SD_DATA[1]),
	.hps_io_hps_io_sdio_inst_CLK	(HPS_SD_CLK),
	.hps_io_hps_io_sdio_inst_D2	(HPS_SD_DATA[2]),
	.hps_io_hps_io_sdio_inst_D3	(HPS_SD_DATA[3]),

	// SPI
	.hps_io_hps_io_spim1_inst_CLK		(HPS_SPIM_CLK),
	.hps_io_hps_io_spim1_inst_MOSI	(HPS_SPIM_MOSI),
	.hps_io_hps_io_spim1_inst_MISO	(HPS_SPIM_MISO),
	.hps_io_hps_io_spim1_inst_SS0		(HPS_SPIM_SS),

	// UART
	.hps_io_hps_io_uart0_inst_RX	(HPS_UART_RX),
	.hps_io_hps_io_uart0_inst_TX	(HPS_UART_TX),

	// USB
	.hps_io_hps_io_gpio_inst_GPIO09	(HPS_CONV_USB_N),
	.hps_io_hps_io_usb1_inst_D0		(HPS_USB_DATA[0]),
	.hps_io_hps_io_usb1_inst_D1		(HPS_USB_DATA[1]),
	.hps_io_hps_io_usb1_inst_D2		(HPS_USB_DATA[2]),
	.hps_io_hps_io_usb1_inst_D3		(HPS_USB_DATA[3]),
	.hps_io_hps_io_usb1_inst_D4		(HPS_USB_DATA[4]),
	.hps_io_hps_io_usb1_inst_D5		(HPS_USB_DATA[5]),
	.hps_io_hps_io_usb1_inst_D6		(HPS_USB_DATA[6]),
	.hps_io_hps_io_usb1_inst_D7		(HPS_USB_DATA[7]),
	.hps_io_hps_io_usb1_inst_CLK		(HPS_USB_CLKOUT),
	.hps_io_hps_io_usb1_inst_STP		(HPS_USB_STP),
	.hps_io_hps_io_usb1_inst_DIR		(HPS_USB_DIR),
	.hps_io_hps_io_usb1_inst_NXT		(HPS_USB_NXT)
);


endmodule

//===========================================================================================================
//					helper modules 
//===========================================================================================================

//
module conv_1 (fmap, filter, partial_sums, clk_50); //, start, finish
	input clk_50;
	input signed [1:0] fmap[8][8]; //Input Image - 2 bit 8x8
	input filter [3][3]; //Input Filter - 1 bit 3x3 
	logic signed [1:0] fmap_padded[10][10]; // Padded input image to be 10x10
	logic signed [4:0] temp_sum[8][8];
	output logic signed [1:0] partial_sums[8][8]; 

	//pad to 10 by 10 to maintain size after convolving 
	always @(*) begin
		for (int i = 0; i<10; i++) begin //row
			for (int j =0; j<10; j++) begin //column	
				if ((i==0) || (i==9) || (j==0) || (j==9)) fmap_padded[i][j] <= 2'b00; //pad with -1
				else fmap_padded[i][j]<=fmap[i-1][j-1]; 
			end
		end
	end
	
	always @(*) begin
		for (int k = 1; k<9; k++) begin //row 
			for (int l=1; l<9; l++) begin //column
			//get 3 by 3 surrounded by pixel at fmap_padded[k][l], multiply by filter and add 
				temp_sum[k-1][l-1] = 
				 ((filter[0][0] ? fmap_padded[k-1][l-1] : -fmap_padded[k-1][l-1]) //top-left
				 + (filter[0][1] ? fmap_padded[k-1][l] : -fmap_padded[k-1][l])) //top-middle
				 
				 + ((filter[0][2] ? fmap_padded[k-1][l+1] : -fmap_padded[k-1][l+1]) //top-right
				 + (filter[1][0] ? fmap_padded[k][l-1] : -fmap_padded[k][l-1])) //middle-left
				 
				 + ((filter[1][1] ? fmap_padded[k][l] : -fmap_padded[k][l]) //middle-middle
				 + (filter[1][2] ? fmap_padded[k][l+1] : -fmap_padded[k][l+1])) //middle-right
				 
				 + ((filter[2][0] ? fmap_padded[k+1][l-1] : -fmap_padded[k+1][l-1]) //bottom-left
				 + (filter[2][1] ? fmap_padded[k+1][l] : -fmap_padded[k+1][l])) //bottom-middle
				 
				 + (filter[2][2] ? fmap_padded[k+1][l+1] : -fmap_padded[k+1][l+1]); //bottom-right
				//store temp sum in partial sum matrix 
				if (temp_sum[k-1][l-1] == 0) partial_sums[k-1][l-1] = 2'b11;
				else partial_sums[k-1][l-1] = (temp_sum[k-1][l-1]>>>4) ?  2'b11 : 2'b01; //load in 1 or -1
			end
		end
	end
endmodule 

module pool1 (pool_conv1, out_map, clk_50); //, start, finish
	input clk_50;
	input signed [1:0] out_map [16][8][8];
	output logic signed [1:0] pool_conv1 [16][4][4];
	integer h;
	
	//max pooling - check if any ones in 4x4 square-- if yes, max = 1, if no max = -1 since outmap_conv2 binarized to 1/-1
	always @(*) begin //posedge clk_50
		//if (start) begin
			for (h=0; h<16; h++) begin 
				pool_conv1 [h][0][0] <= ((out_map[h][0][0]&out_map[h][0][1]&out_map[h][1][0]&out_map[h][1][1])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][0][1] <= ((out_map[h][0][2]&out_map[h][0][3]&out_map[h][1][2]&out_map[h][1][3])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][1][0] <= ((out_map[h][2][0]&out_map[h][2][1]&out_map[h][3][0]&out_map[h][3][1])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][1][1] <= ((out_map[h][2][2]&out_map[h][2][3]&out_map[h][3][2]&out_map[h][3][3])==2'b01) ? 2'b01 : 2'b11;

				pool_conv1 [h][0][2] <= ((out_map[h][0][4]&out_map[h][0][5]&out_map[h][1][4]&out_map[h][1][5])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][0][3] <= ((out_map[h][0][6]&out_map[h][0][7]&out_map[h][1][6]&out_map[h][1][7])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][1][2] <= ((out_map[h][2][4]&out_map[h][2][5]&out_map[h][3][4]&out_map[h][3][5])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][1][3] <= ((out_map[h][2][6]&out_map[h][2][7]&out_map[h][3][6]&out_map[h][3][7])==2'b01) ? 2'b01 : 2'b11;

				pool_conv1 [h][2][0] <= ((out_map[h][4][0]&out_map[h][4][1]&out_map[h][5][0]&out_map[h][5][1])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][2][1] <= ((out_map[h][4][2]&out_map[h][4][3]&out_map[h][5][2]&out_map[h][5][3])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][3][0] <= ((out_map[h][6][0]&out_map[h][6][1]&out_map[h][7][0]&out_map[h][7][1])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][3][1] <= ((out_map[h][6][2]&out_map[h][6][3]&out_map[h][7][2]&out_map[h][7][3])==2'b01) ? 2'b01 : 2'b11;

				pool_conv1 [h][2][2] <= ((out_map[h][4][4]&out_map[h][4][5]&out_map[h][5][4]&out_map[h][5][5])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][2][3] <= ((out_map[h][4][6]&out_map[h][4][7]&out_map[h][5][6]&out_map[h][5][7])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][3][2] <= ((out_map[h][6][4]&out_map[h][6][5]&out_map[h][7][4]&out_map[h][7][5])==2'b01) ? 2'b01 : 2'b11;
				pool_conv1 [h][3][3] <= ((out_map[h][6][6]&out_map[h][6][7]&out_map[h][7][6]&out_map[h][7][7])==2'b01) ? 2'b01 : 2'b11;
			end	
			//finish <= 1; 
		//end 
	end
endmodule 

module conv_2 (fmap, filter, partial_sums, clk_50); //, start, finish
	input clk_50;
	input signed [1:0] fmap [4][4]; //Input Image - 2 bit 4x4
	input filter [3][3]; //Input Filter - 1 bit 3x3 
	logic signed [1:0] fmap_padded [6][6]; //Convert input filter to 6x6
	//logic signed [4:0] temp_sum [4][4]; 
	
	output logic signed [4:0] partial_sums [4][4];
	
	//pad to size to 5 by 5 to maintain size after convolving 
	always @(*) begin
		for (int i=0; i<6; i++) begin //row
			for (int j=0; j<6; j++) begin //column	
				if ((i==0) || (i==5) || (j==0) || (j==5)) fmap_padded[i][j] <= 2'd0;
				else fmap_padded[i][j]<=fmap[i-1][j-1]; 
			end
		end
	end
	always @(*) begin
		for (int k = 1; k<5; k++) begin //row 
			for (int l=1; l<5; l++) begin //column
				partial_sums[k-1][l-1] = 
				 ((filter[0][0] ? fmap_padded[k-1][l-1] : -fmap_padded[k-1][l-1]) //top-left
				 + (filter[0][1] ? fmap_padded[k-1][l] : -fmap_padded[k-1][l])) //top-middle
				 
				 + ((filter[0][2] ? fmap_padded[k-1][l+1] : -fmap_padded[k-1][l+1]) //top-right
				 + (filter[1][0] ? fmap_padded[k][l-1] : -fmap_padded[k][l-1])) //middle-left
				 
				 + ((filter[1][1] ? fmap_padded[k][l] : -fmap_padded[k][l]) //middle-middle
				 + (filter[1][2] ? fmap_padded[k][l+1] : -fmap_padded[k][l+1])) //middle-right
				 
				 + ((filter[2][0] ? fmap_padded[k+1][l-1] : -fmap_padded[k+1][l-1]) //bottom-left
				 + (filter[2][1] ? fmap_padded[k+1][l] : -fmap_padded[k+1][l])) //bottom-middle
				 
				 + (filter[2][2] ? fmap_padded[k+1][l+1] : -fmap_padded[k+1][l+1]); //bottom-right
			end
		end	
	end
endmodule 

module partial_sums (outmap_conv2d, partials_conv2, clk_50, start, finish); 
	input clk_50; 

	//sum up sets up 16 partial sums and binarize the sum to generate the final 32 output fmaps 
	input logic signed [4:0] partials_conv2 [16][32][4][4]; //Input Range from -9 to 9, so 5 bit
	output logic signed [1:0] outmap_conv2d [32][4][4];
	logic signed [9:0] temp_sum[32][4][4]; //
	integer a, b, c, d, e, f, g, h, i; 
	 
	input logic start; 
	output logic finish; 
	logic [2:0] state; 
	initial begin 
		state = 3'b0;
		finish = 0; 
		b = 0;
		i = 0;
	end
	
	always @ (posedge clk_50) begin
		if (start) begin
			if (state == 3'b0) begin
				for (a=0; a<32; a++) begin //columns
					//reset temp_sum to 0 
					for (g=0; g<4; g++) begin //iterate through all 4x4s and sum partial sums 
						for (h=0; h<4; h++) begin
							temp_sum[a][g][h] <= 10'd0;
						end 
					end
				end 
				state <= 3'd1;
			end 
			
			if (state == 3'd1) begin 
					for (a=0; a<32; a++) begin //rows
						for (c=0; c<4; c++) begin //iterate through all 4x4s and sum partial sums 
							for (d=0; d<4; d++) begin
								temp_sum[a][c][d] <= temp_sum[a][c][d] + partials_conv2[b][a][c][d];
							end 
						end
					end
				b<= b+1;  //iterate 16 times 
				if (b==15) state <= 3'd2;
			end 

			if (state == 3'd2) begin
				for (a=0; a<32; a++) begin //rows
					for (e=0; e<4; e++) begin //transfer sign bit from temporary sums to output fmaps 
						for (f=0; f<4; f++) begin
							if (temp_sum[a][e][f] == 0) outmap_conv2d[a][e][f] <= 2'b11;
							else outmap_conv2d[a][e][f] <= ((temp_sum[a][e][f])>>>8) ?  2'b11 : 2'b01; //store 1 or -1 based on sign bit 
						end 
					end
				end
				state <= 3'd3; 
			end 
			if (state == 3'd3) begin
				state <= 3'd3;
				finish <= 1;
			end 
		end
		else begin 
			state <= 3'd0;
			finish <= 0;
			b<=0;
			i<=0;
		end
	end
endmodule 

module pool2 (pool_conv2, outmap_conv2d, clk_50, start, finish); 
	input clk_50;	
	input logic signed [1:0] outmap_conv2d [32][4][4]; 
	output logic signed [1:0] pool_conv2 [32][2][2];

	integer g;
	input logic start; 
	output logic finish;
	
	initial begin
		finish <= 0;
	end 
	
	//max pooling - check if any ones in 2x2 square-- if yes, max = 1, if no max = -1 since outmap_conv2 binarized to 1/-1
	always @(posedge clk_50) begin 
		if (start) begin
			if (g<32) begin
				pool_conv2 [g][0][0] <= ((outmap_conv2d[g][0][0]&outmap_conv2d[g][0][1]&outmap_conv2d[g][1][0]&outmap_conv2d[g][1][1])==2'b01) ? 2'b01 : 2'b11;
				pool_conv2 [g][0][1] <= ((outmap_conv2d[g][0][2]&outmap_conv2d[g][0][3]&outmap_conv2d[g][1][2]&outmap_conv2d[g][1][3])==2'b01) ? 2'b01 : 2'b11;
				pool_conv2 [g][1][0] <= ((outmap_conv2d[g][2][0]&outmap_conv2d[g][2][1]&outmap_conv2d[g][3][0]&outmap_conv2d[g][3][1])==2'b01) ? 2'b01 : 2'b11;
				pool_conv2 [g][1][1] <= ((outmap_conv2d[g][2][2]&outmap_conv2d[g][2][3]&outmap_conv2d[g][3][2]&outmap_conv2d[g][3][3])==2'b01) ? 2'b01 : 2'b11;
				g <= g+ 1; 
			end
			else begin 
				finish <= 1; 
			end 
		end
		else begin 
			finish <= 0; 
			g<=0; 
		end 
	end
endmodule 



module fc1 (fmap, wa, clk_50,start, finish, fc_out);
	input logic signed [1:0] fmap [32][2][2]; //output from last pooling layer 
	input logic wa[128][32]; //input weights array
	output logic signed [1:0] fc_out [32]; 
	input logic clk_50;
	integer i, j, k, l; 
	logic signed [1:0] fmap_flat [128];
	logic [7:0] count; 
	logic [2:0] state;
	input logic start; 
	output logic finish; 
	initial begin 
		state <= 3'b0;
		finish <= 0; 
	end
	logic signed [8:0] temp [32];
	
	//flatten 2D array 
	always @(*)begin
		for (i=0; i<32; i++) begin
			fmap_flat[i]     = fmap[i][0][0];
			fmap_flat[i+32]  = fmap[i][0][1];
			fmap_flat[i+64]  = fmap[i][1][0];
			fmap_flat[i+96]  = fmap[i][1][1];

		end
	end
	
	always @ (posedge clk_50) begin 
		if (start) begin
			if (state == 3'b0) begin
				for (i=0; i<32; i++) begin
					temp[i] <= 0;
					j <= 0;
				end
				state <= 3'd1;
			end
			
			//calculate cumulative sum 
			if (state == 3'd1) begin
					for (k=0;k<32; k++) begin 
						temp[k] <= temp[k] + (wa[j][k] ? fmap_flat[j] : -fmap_flat[j] );
					end
				
				if (j==127) state <= 3'd2; //iterate 128 times 
				else j <= j+1;				
			end
			if (state == 3'd2) begin
				for (l=0; l<32; l++) begin
					if ( temp[l] == 0) fc_out[l] <= 2'b11;
					else fc_out[l] <= (temp[l]>>>8) ? 2'b11 : 2'b01; //binarize
				end
				state <= 3'd3;
			end 
			if (state == 3'd3) begin
				state <= 3'd3;
				finish <= 1; 
			end
		end
		else begin 
			state <= 3'b0;
			finish <= 0;
			j <= 0; 
			i<= 0; 
		end 
	end 
endmodule

module ten_map (fmap, wa1, final_out, clk_50, start, finish);
	input logic clk_50;
	input logic signed [1:0] fmap [32];
	input logic wa1 [32][10];
	output logic signed [7:0] final_out [10];
	input logic start; 
	output logic finish; 
	//multiply matrices 1x128 x 128x32 = 1x32
	integer j, k; 
	logic [2:0] state; 
	initial begin 
		finish <=0; 
		state <= 3'b0;
	end 
	logic signed [7:0] temp[10];

	always @ (posedge clk_50) begin 
		if (start)  begin
			if (state == 3'b0) begin 
				for (k=0;k<10; k++) begin 
					temp[k] <=0;
					j <= 0;
				end 
				if (k==10) state <= 3'd1;
			end 
			//calculate cumulative sum 
			if (state == 3'd1) begin 
				for (k=0;k<10; k++) begin 
					temp[k] <= temp[k] + (wa1[j][k] ? fmap[j] : - fmap[j]);
				end	
				if (j==32) begin
					state <= 3'd2;
					for (k=0;k<10; k++) begin 
						final_out[k] <= temp[k];
					end
				end
				j <= j+1;	
			end
			if (state == 3'd2) begin 
				finish <= 1; 
			end
		end 
		else begin 
			finish <=0; 
			state <= 3'b0;
			j<=0; 
		end 
	end
endmodule



{"mode":"full","isActive":false}
//HPS
///
/// 640x480 version!
/// test VGA with hardware video input copy to VGA
///

//gcc v1.c -o v1

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/ipc.h> 
#include <sys/shm.h> 
#include <sys/mman.h>
#include <sys/time.h> 
#include "address_map_arm_brl4.h"
#include <math.h>
#include <pthread.h>

/* function prototypes */
void VGA_text (int, int, char *);
void VGA_text_clear();
void VGA_box (int, int, int, int, short);
void VGA_line(int, int, int, int, short) ;
void VGA_disc (int, int, int, short);
int  VGA_read_pixel(int, int) ;
int  video_in_read_pixel(int, int);
void draw_delay(void) ;


// the light weight buss base
void *h2p_lw_virtual_base;
volatile unsigned int *h2p_lw_video_in_control_addr=NULL;
volatile unsigned int *h2p_lw_video_in_resolution_addr=NULL;
//volatile unsigned int *h2p_lw_video_in_control_addr=NULL;
//volatile unsigned int *h2p_lw_video_in_control_addr=NULL;

volatile unsigned int *h2p_lw_video_edge_control_addr=NULL;

// pixel buffer
volatile unsigned int * vga_pixel_ptr = NULL ;
void *vga_pixel_virtual_base;

// video input buffer
volatile unsigned int * video_in_ptr = NULL ;
void *video_in_virtual_base;

// character buffer
volatile unsigned int * vga_char_ptr = NULL ;
void *vga_char_virtual_base;

// /dev/mem file id
int fd;

// shared memory 
key_t mem_key=0xf0;
int shared_mem_id; 
int *shared_ptr;
int shared_time;
int shared_note;
char shared_str[64];

// pixel macro
#define VGA_PIXEL(x,y,color) do{\
	char  *pixel_ptr ;\
	pixel_ptr = (char *)vga_pixel_ptr + ((y)<<10) + (x) ;\
	*(char *)pixel_ptr = (color);\
} while(0)
	
#define VIDEO_IN_PIXEL(x,y,color) do{\
	char  *pixel_ptr ;\
	pixel_ptr = (char *)video_in_ptr + ((y)<<9) + (x) ;\
	*(char *)pixel_ptr = (color);\
} while(0)
	

// measure time
struct timeval t1, t2;
double elapsedTime;
struct timespec delay_time ;

#define HPS_IMAGE_DATA_BASE  0x00000070
#define HPS_IMAGE_CLK_BASE   0x00000090
#define HPS_IMAGE_CS_BASE    0x00000080

#define OUT_DATA_BASE    0x00000120
#define OUT_CS_BASE      0x00000130

#define PIO_START_BASE    0x00000140
#define PIO_END_BASE      0x00000150
#define PIO_SWITCH_BASE   0x00000160
//1. Function to Read Input Image from File-------------------------------
//   and Send it to FPGA from HPS
volatile signed int * hps_image_data = NULL ;
volatile unsigned int * hps_image_clk = NULL ;
volatile unsigned int * hps_image_cs = NULL ;

volatile unsigned int * pio_start = NULL ;
volatile unsigned int * pio_end = NULL ;
volatile unsigned int * pio_switch = NULL ;

int image_matrix[8*8];
void toggle_image_clk (void){
  *hps_image_clk = 0;
  *hps_image_clk = 1;
  //sleep(1);
}
void load_input(void){
	//Initialize things in FPGA
	*hps_image_cs = 0;
	toggle_image_clk();
  *hps_image_cs = 1; //Set CS high
  //Open input_data.txt file to read output
	FILE *myFile;
	char c;
  int counter = 0;
	int nFile;
	myFile = fopen("input_data.txt", "r");
  //myFile = fopen("input_test.txt", "r");
	if (myFile == NULL) {
		printf("Fail to open Input Image File \n");
		exit(1);
	}
	printf("Input Image File open successfully \n");
	while ((c = getc(myFile)) != 255){
		if ((c == '1')){
      *hps_image_data = 1;
      toggle_image_clk();
      image_matrix[counter] = 1;
      counter++;
			//printf("number#%d: %d \n", counter, 1);
      //if (counter%8 == 0) printf("\n");
		}
    else if ((c == '0')){
      *hps_image_data = -1;
      toggle_image_clk();
      image_matrix[counter] = 0;
      counter++;
      //printf("number#%d: %d \n", counter, -1);
      //if (counter%8 == 0) printf("\n");
    }
	}
  *hps_image_cs = 0;
  // Print output
  printf("%d input image is successfully loaded \n", counter);
  int i, offset;
  for (i = 0; i < 8; i++){
    offset = i * 8;
    printf("%d %d %d %d %d %d %d %d \n", image_matrix[offset]
    , image_matrix[offset+1], image_matrix[offset+2], image_matrix[offset+3]
    , image_matrix[offset+4], image_matrix[offset+5], image_matrix[offset+6]
    , image_matrix[offset+7]);
  }
	fclose(myFile);
}

volatile signed int * out_data = NULL ;
volatile unsigned int * out_cs = NULL ;

signed int final_out [10];

void read_output(void){
  //Initialize things in FPGA
 	*out_cs = 0;
	toggle_image_clk();

  //Set CS high
  *out_cs = 1;
  int i;

  for (i = 0; i < 10; i++){
        toggle_image_clk();
        final_out[i] = (signed int) (*out_data);
  }
  *out_cs = 0; 
}
int global_maxIdx, global_maxIdx2, global_maxIdx3;
float global_probability, global_probability2, global_probability3;
//Print Output -------------------------------
void print_output(void){
      printf("Negative Output \n");
      printf("%d %d %d %d %d %d %d %d %d %d\n", final_out[0], final_out[1], 
      final_out[2], final_out[3], final_out[4], final_out[5],
      final_out[6], final_out[7], final_out[8], final_out[9]);
      printf("Convert to Positive and Probablity Computation \n");
      int i, sum_magnitude = 0, maxValue = -9999, maxIdx = 0;
      for (i = 0; i < 10; i++){
        if (final_out[i] > 127) final_out[i] = final_out[i] - 256;
        if (final_out[i] > 0) sum_magnitude += final_out[i];
        //else sum_magnitude -= final_out[i];
        // Extract max values in the list
        if (final_out[i] > maxValue) {
          maxValue = final_out[i];
          maxIdx = i;
        }
      }
      printf("%d %d %d %d %d %d %d %d %d %d\n", final_out[0], final_out[1], 
      final_out[2], final_out[3], final_out[4], final_out[5],
      final_out[6], final_out[7], final_out[8], final_out[9]);
      float probability, probability2, probability3;
      probability = (float) maxValue/(float)sum_magnitude;
      printf("Probability that it is #%d is %.3f\n", maxIdx, probability);
      int maxValue2 = -9999, maxIdx2 = 0;
      for (i = 0; i < 10; i++){
        // Extract second max values in the list
        if ((final_out[i] > maxValue2) && (i != maxIdx)) {
          maxValue2 = final_out[i];
          maxIdx2 = i;
        }
      }
      int maxValue3 = -9999, maxIdx3 = 0;
      for (i = 0; i < 10; i++){
        // Extract third max values in the list
        if ((final_out[i] > maxValue3) && (i != maxIdx) && (i != maxIdx2)) {
          maxValue3 = final_out[i];
          maxIdx3 = i;
        }
      }
      probability2 = (float) maxValue2/(float)sum_magnitude;
      probability3 = (float) maxValue3/(float)sum_magnitude;
      printf("Probability that it is #%d is %.3f\n", maxIdx2, probability2);
      printf("Probability that it is #%d is %.3f\n", maxIdx3, probability3);
      global_maxIdx = maxIdx;
      global_maxIdx2 = maxIdx2;
      global_maxIdx3 = maxIdx3;
      global_probability = probability;
      global_probability2 = probability2;
      global_probability3 = probability3;
}

int control = 1;
double elapsedTime;
void *scan_thread(void * t){
		int input;
		float input_value;
		int input_dt;
    struct timeval t1, t2;
	while(1){
		printf("Note: \n");
		printf("Enter 0 to read and display output \n"); 
		printf("Enter 1 to 6 output prebuilt module \n");
		//printf("Enter 3 to restart drum on HPS \n");
		printf(">");
		scanf("%d", &input);
		while (input < 0 || input > 6){
			printf("Enter a number from 1 to 3 \n");
			printf(">");
			scanf("%d", &input);
		}
 
    if (input == 0){	//Enter 0 to read and display output
			read_output();
      print_output();
		}
		else if ((input >= 1)&&(input <= 6)){	//Enter 1-6 to read and display output
      *pio_switch = input; 
			control = input;
      *pio_start = 0;
      *pio_start = 1; //Start compute
      gettimeofday(&t1, NULL);
      while (*pio_end != 1); //Wait until compute finish
      gettimeofday(&t2, NULL);
      elapsedTime = (t2.tv_sec - t1.tv_sec) * 1000.0;      // sec to ms
      elapsedTime += (t2.tv_usec - t1.tv_usec) / 1000.0;   // us to ms
      
      printf ("Compute time is %.3f ms\n", elapsedTime);
      read_output();
      print_output(); 
      VGA_text_clear();
      
		}
		else if (input == 7){	
      
		}
		printf("--------Done---------\n\n");
   
	}
}

int main(void)
{
  printf("Hello \n");
	delay_time.tv_nsec = 10 ;
	delay_time.tv_sec = 0 ;

	// Declare volatile pointers to I/O registers (volatile 	// means that IO load and store instructions will be used 	// to access these pointer locations, 
	// instead of regular memory loads and stores) 
  	
	// === need to mmap: =======================
	// FPGA_CHAR_BASE
	// FPGA_ONCHIP_BASE      
	// HW_REGS_BASE        
  
	// === get FPGA addresses ==================
    // Open /dev/mem
	if( ( fd = open( "/dev/mem", ( O_RDWR | O_SYNC ) ) ) == -1 ) 	{
		printf( "ERROR: could not open \"/dev/mem\"...\n" );
		return( 1 );
	}
    
    // get virtual addr that maps to physical
	h2p_lw_virtual_base = mmap( NULL, HW_REGS_SPAN, ( PROT_READ | PROT_WRITE ), MAP_SHARED, fd, HW_REGS_BASE );	
	if( h2p_lw_virtual_base == MAP_FAILED ) {
		printf( "ERROR: mmap1() failed...\n" );
		close( fd );
		return(1);
	}/*
    h2p_lw_video_in_control_addr=(volatile unsigned int *)(h2p_lw_virtual_base+VIDEO_IN_BASE+0x0c);
	h2p_lw_video_in_resolution_addr=(volatile unsigned int *)(h2p_lw_virtual_base+VIDEO_IN_BASE+0x08);
	*(h2p_lw_video_in_control_addr) = 0x04 ; // turn on video capture
	*(h2p_lw_video_in_resolution_addr) = 0x00f00140 ;  // high 240 low 320
	h2p_lw_video_edge_control_addr=(volatile unsigned int *)(h2p_lw_virtual_base+VIDEO_IN_BASE+0x10);
	*h2p_lw_video_edge_control_addr = 0x01 ; // 1 means edges
	*h2p_lw_video_edge_control_addr = 0x00 ; // 1 means edges
	*/
 
	//New PIO
 
  hps_image_data = (signed int*)(h2p_lw_virtual_base + HPS_IMAGE_DATA_BASE);
  hps_image_clk = (unsigned int*)(h2p_lw_virtual_base + HPS_IMAGE_CLK_BASE);
  hps_image_cs = (unsigned int*)(h2p_lw_virtual_base + HPS_IMAGE_CS_BASE);
  
  out_data = (signed int*)(h2p_lw_virtual_base + OUT_DATA_BASE);
  out_cs = (unsigned int*)(h2p_lw_virtual_base + OUT_CS_BASE);
  
  pio_start = (unsigned int*)(h2p_lw_virtual_base + PIO_START_BASE);
  pio_end = (unsigned int*)(h2p_lw_virtual_base + PIO_END_BASE);
  pio_switch = (unsigned int*)(h2p_lw_virtual_base + PIO_SWITCH_BASE);

	// === get VGA char addr =====================
	// get virtual addr that maps to physical
	vga_char_virtual_base = mmap( NULL, FPGA_CHAR_SPAN, ( PROT_READ | PROT_WRITE ), MAP_SHARED, fd, FPGA_CHAR_BASE );	
	if( vga_char_virtual_base == MAP_FAILED ) {
		printf( "ERROR: mmap2() failed...\n" );
		close( fd );
		return(1);
	}
    
    // Get the address that maps to the character 
	vga_char_ptr =(unsigned int *)(vga_char_virtual_base);

	// === get VGA pixel addr ====================
	// get virtual addr that maps to physical
	// SDRAM
	vga_pixel_virtual_base = mmap( NULL, FPGA_ONCHIP_SPAN, ( PROT_READ | PROT_WRITE ), MAP_SHARED, fd, SDRAM_BASE); //SDRAM_BASE	
	
	if( vga_pixel_virtual_base == MAP_FAILED ) {
		printf( "ERROR: mmap3() failed...\n" );
		close( fd );
		return(1);
	}
    // Get the address that maps to the FPGA pixel buffer
	vga_pixel_ptr =(unsigned int *)(vga_pixel_virtual_base);
	
	
	// === get video input =======================
	// on-chip RAM
	video_in_virtual_base = mmap( NULL, FPGA_ONCHIP_SPAN, ( PROT_READ | PROT_WRITE ), MAP_SHARED, fd, FPGA_ONCHIP_BASE); 
	if( video_in_virtual_base == MAP_FAILED ) {
		printf( "ERROR: mmap3() failed...\n" );
		close( fd );
		return(1);
	}
	// format the pointer
	video_in_ptr =(unsigned int *)(video_in_virtual_base);
	
	// ===========================================

	/* create a message to be displayed on the VGA 
          and LCD displays */
	char text_top_row[40] = "DE1-SoC ARM/FPGA\0";
  char text_top_row_2[40] = "BNN Inference on FPGA";
  char text_top_row_1[40] = "By: Vidya and Xitang";
  char text_bottom_row[40] = "Cornell ece5760 - Bruce Land :D\0";
	char num_string[20], time_string[50] ;
	
	// a pixel from the video
	int pixel_color;
	// video input index
	int i,j;
	
	// clear the screen
	VGA_box (0, 0, 639, 479, 0x03);
	// clear the text
	VGA_text_clear();
	VGA_text (1, 56, text_top_row);
	VGA_text (1, 57, text_bottom_row);
	
	// start timer
    //gettimeofday(&t1, NULL);
	
  // Load input
  load_input();
  read_output();
  print_output(); 
  
  pthread_t threads;

  // thread attribute used here to allow JOIN
 	pthread_attr_t attr;

  /* Initialize mutex and condition variable objects */
	pthread_mutex_t run_mutex = PTHREAD_MUTEX_INITIALIZER;
	pthread_mutex_init(&run_mutex, NULL);
		
 	/* For portability, explicitly create threads in a joinable state */
 	pthread_attr_init(&attr);
 	pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
   	
	pthread_create(&threads, &attr, scan_thread, NULL);
  
  int pixel_read;
  struct timeval t3, t4;
  gettimeofday(&t3, NULL);
  int pixel_color_matrix[224][224];
  int sum_color_matrix[8][8];
  
  // 0 is black, 255 is white
  for (i=0; i<224; i++) {
    for (j=0; j<224; j++) {
       pixel_color_matrix[i][j] = 0;
	  }
  }
  //rand()%2 to get 0 or 1
  int temp = 0;
  for (i=0; i<8; i++) {
    for (j=0; j<8; j++) {
       sum_color_matrix[i][j] = image_matrix[temp++];//j; //Initialize as white 3, Greyscale 0-3, 0 is black
	  }
  }
  int x_offset, y_offset, grey_color, x_offset_end, y_offset_end;
  int start_idx_i, end_idx_i, start_idx_j, end_idx_j;

  //images to display on monitor 
  int matrix1[8][8] = {
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1, 1, 1,-1,-1},
				{-1,-1, 1, 1, 1, 1,-1,-1},
				{-1, 1, 1,-1,-1, 1,-1,-1},
				{-1, 1, 1, 1, 1,-1,-1,-1},
				{-1,-1, 1, 1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
  int matrix2[8][8] = {
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1, 1,-1,-1,-1,-1},
				{-1,-1,-1, 1,-1,-1,-1,-1},
				{-1,-1,-1, 1,-1,-1,-1,-1},
				{-1,-1,-1, 1,-1,-1,-1,-1},
				{-1,-1,-1, 1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
  int matrix3[8][8] = {
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1, 1, 1, 1,-1,-1,-1},
				{-1,-1,-1,-1, 1,-1,-1,-1},
				{-1,-1,-1,-1, 1,-1,-1,-1},
				{-1,-1,-1, 1, 1,-1,-1,-1},
				{-1, 1, 1, 1, 1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
  int matrix4[8][8] = {
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1, 1,-1,-1,-1,-1,-1},
				{-1, 1, 1,-1, 1, 1,-1,-1},
				{-1,-1, 1, 1, 1,-1,-1,-1},
				{-1,-1,-1, 1, 1,-1,-1,-1},
				{-1,-1,-1, 1, 1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
  int matrix5[8][8] = {
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1, 1, 1,-1,-1,-1,-1},
				{-1,-1, 1,-1,-1,-1,-1,-1},
				{-1,-1, 1, 1, 1,-1,-1,-1},
				{-1,-1,-1,-1,-1, 1,-1,-1},
				{-1,-1, 1, 1, 1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
  int matrix6[8][8] = {
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1, 1, 1, 1,-1,-1,-1},
				{-1,-1,-1,-1, 1,-1,-1,-1},
				{-1,-1,-1,-1, 1,-1,-1,-1},
				{-1,-1,-1,-1, 1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
  int matrix7[8][8] = {
				{-1,-1,-1,-1,-1,-1,1,-1},
				{-1,-1,-1,-1,1,1,-1,-1},
				{-1,-1,-1,-1,1,-1,-1,-1},
				{-1,-1,-1,1,-1,-1,-1,-1},
				{-1,-1,1,1,-1,-1,-1,-1},
				{-1,-1,1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1},
				{-1,-1,-1,-1,-1,-1,-1,-1}
				};
	while(1) 
	{
		gettimeofday(&t1, NULL);
		 
		// note that this version of VGA_disk
		// has THROTTLED pixel write
		
		// software copy test.
		// in production, hardware does the copy
		// put a few  pixel in input buffer
		 //VIDEO_IN_PIXEL(160,120,0xff);
		 //VIDEO_IN_PIXEL(0,0,0xff);
		 //VIDEO_IN_PIXEL(319,239,0xff);
		 //VIDEO_IN_PIXEL(300,200,0xff);
		
    // Video input is 224 by 224 
		// read/write video input -- copy to VGA display
    // Copy over every 2s
     gettimeofday(&t4, NULL);
     if ((t4.tv_sec - t3.tv_sec) > 2){
       //VGA_disc((rand()&0x3ff), (rand()&0x1ff), rand()&0x3f, rand()

  • 28
    点赞
  • 338
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
FPGA是具有可编程逻辑门阵列的集成电路,可以用于实现各种不同的功能。CNN卷积神经网络)是一种常用的深度学习算法,用于图像识别、语音识别等各种人工智能任务。 FPGA CNN代码指的是使用FPGA来实现卷积神经网络的代码。在FPGA上实现CNN可以大大加快神经网络的计算速度和功耗效率。 为了描述300字中的FPGA CNN代码,我们首先需要了解CNN的基本原理。CNN主要由卷积层、池化层和全连接层组成。在卷积层中,使用卷积核对输入图像进行卷积操作以提取特征。在池化层中,通过采样操作减少特征图的尺寸。在全连接层中,使用神经元对特征进行分类。 在FPGA上实现CNN代码的关键是将CNN的计算任务分配到FPGA的逻辑门阵列中并行计算。首先,我们需要定义CNN的结构,包括卷积层的卷积核大小、池化层的采样大小、全连接层的神经元数量等。然后,我们需要将这些结构信息翻译成FPGA可以理解的指令,包括在逻辑门阵列中分配计算资源和设置运算规则。 在FPGA的代码中,我们需要设计合适的模块来表示CNN的各个层,包括卷积层模块、池化层模块和全连接层模块。每个模块都需要包含输入和输出端口,用于数据的输入和输出。同时,需要设计适当的寄存器和缓存用于存储中间计算结果,以便于后续层次间的传递。 在FPGA代码中,我们还需要设计适当的控制模块用于对各个层次的计算进行控制和调度。控制模块需要根据CNN的结构以及模块之间的数据依赖关系,进行合理的任务调度和计算资源分配。 总之,FPGA CNN代码是用于在FPGA上实现卷积神经网络的代码。通过合理的设计和编程,可以利用FPGA的并行计算能力来加速CNN的计算速度,并提高功耗效率。
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值