I’ve built Assimp 4.1.0 Windows x64 release artifacts (lib/dlls), using VS 2017 (should work for VS 2015 but untested) and put up the libraries.
It is available at – https://github.com/tharindu-mathew/assimp/releases/tag/v4.1.0-Win-x64
I’ve built Assimp 4.1.0 Windows x64 release artifacts (lib/dlls), using VS 2017 (should work for VS 2015 but untested) and put up the libraries.
It is available at – https://github.com/tharindu-mathew/assimp/releases/tag/v4.1.0-Win-x64
Building Caffe on Windows 10 has been a journey (to put it lightly). Since I have a new gen gfx card (new for 2018), it does not support CUDA 8.0. So, all posts saying you need CUDA 8.0 are outdated (at least to me). With CUDA 9.0, Windows caffe does not compile.
The good news after a couple of days of trying I’ve figured out a workaround. The only boost version that supports CUDA 9.0 as of now is boost 1.65.1 and above. But, interestingly cmake breaks with boost 1.66.0. I know, welcome to the real-world versioning hell when it comes to actually building stuff.
So, if you got your windows source from: https://github.com/BVLC/caffe/tree/windows
You need to do a couple of stuff. First download and install boost 1.65.1 in some path. Let’s call this root directory my_boost_1_65_1 (typically C:\local\boost_1_65_1), and the library directory (which changes based upon which VS version you downloaded, typically C:\local\boost_1_65_1\lib64-msvc-14.0 for VS 2015). Yes, it sucks that MSVC version is 14.0 for VS 2015, but such is the life living in a Microsoft world.
I assume you have checked out the 1.0 version of windows caffe. Now, open the build_windows.cmd in the scripts directory, and modify the cmake command as follows (note the 3 lines referring to boost and boost paths):
cmake -G"!CMAKE_GENERATOR!" ^
-DBLAS=Open ^
-DCMAKE_BUILD_TYPE:STRING=%CMAKE_CONFIG% ^
-DBUILD_SHARED_LIBS:BOOL=%CMAKE_BUILD_SHARED_LIBS% ^
-DBUILD_python:BOOL=%BUILD_PYTHON% ^
-DBUILD_python_layer:BOOL=%BUILD_PYTHON_LAYER% ^
-DBUILD_matlab:BOOL=%BUILD_MATLAB% ^
-DCPU_ONLY:BOOL=%CPU_ONLY% ^
-DCOPY_PREREQUISITES:BOOL=1 ^
-DINSTALL_PREREQUISITES:BOOL=1 ^
-DUSE_NCCL:BOOL=!USE_NCCL! ^
-DCUDA_ARCH_NAME:STRING=%CUDA_ARCH_NAME% ^
-DBOOST_NO_BOOST_CMAKE=TRUE ^
-DBOOST_ROOT=C:/boost_1_65_1 ^
-DBOOST_LIBRARYDIR=C:/boost_1_65_1/lib64-msvc-14.0 ^
"%~dp0\.."
I do the following edit as well, so I know that boost version is correct or it will fail. So, edit the boost version in Dependencies.cmake (located in the cmake directory):
# ---[ Boost
find_package(Boost 1.65 REQUIRED COMPONENTS system thread filesystem)
list(APPEND Caffe_INCLUDE_DIRS PUBLIC ${Boost_INCLUDE_DIRS})
list(APPEND Caffe_DEFINITIONS PUBLIC -DBOOST_ALL_NO_LIB)
list(APPEND Caffe_LINKER_LIBS PUBLIC ${Boost_LIBRARIES})
And, by changing these two files, everything should be good. Now, execute scripts\build-windows.cmd and watch your build succeed.
l_prerequisites.vcxproj]
CUSTOMBUILD : warning : cannot resolve item 'api-ms-win-crt-heap-l1-1-0.dll' [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_prerequisites.vcxproj]
CUSTOMBUILD : -- warning : gp_resolved_file_type non-absolute file 'api-ms-win-crt-heap-l1-1-0.dll' returning type 'system' -- possibly incorrect [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_
prerequisites.vcxproj]
CUSTOMBUILD : warning : cannot resolve item 'api-ms-win-crt-stdio-l1-1-0.dll' [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_prerequisites.vcxproj]
CUSTOMBUILD : -- warning : gp_resolved_file_type non-absolute file 'api-ms-win-crt-stdio-l1-1-0.dll' returning type 'system' -- possibly incorrect [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install
_prerequisites.vcxproj]
CUSTOMBUILD : warning : cannot resolve item 'api-ms-win-crt-convert-l1-1-0.dll' [D:\tools\caffe\build\tools\upgrade_solver_proto_text_install_prerequisites.vcxproj]
CUSTOMBUILD : -- warning : gp_resolved_file_type non-absolute file 'api-ms-win-crt-convert-l1-1-0.dll' returning type 'system' -- possibly incorrect [D:\tools\caffe\build\tools\upgrade_solver_proto_text_insta
ll_prerequisites.vcxproj]
6897 Warning(s)
0 Error(s)
Time Elapsed 00:14:44.49
Let me know in the comments if you run into more issues.
CUDA is fast but painful to debug. It’s similar to working with openGL, which gives brilliant results when it works, but you have no idea what’s going on when it doesn’t. I’m listing down a number of ways that you can use to track down issues in your CUDA algorithms. Hopefully, it will ease the pain that I had to go through.
This step seems rather obvious, and Nsight gets installed when you install CUDA. But, surprisingly its not obvious to a beginner how to use and why you should use it. If you are using Visual Studio, and are having problems with your CUDA algorithm, follow these steps to start debugging. Make sure the project is built in “Debug” mode. After building it (don’t run it), open the Nsight menu and click CUDA Debugging. And now, you should be able to conveniently place breakpoints within your CUDA kernels, that get hit. Also, look at the Nsight output in your output information, and watch out for error codes.
Always, make sure for memory access violations. Click on the Nsight menu and make sure “Enable CUDA Memory checker” is checked and follow the steps under point 1 to debug your application. If there are memory access violations stop right there! This is the first thing you should correct. Even if your algorithm runs and you are getting some results, there can be plenty of subtle bugs lying around when memory access violations happen. A common error that happens is because some threads access your arrays outside their index. So you need to block proceeding if a thread index is outside by including a return statement after an index range check like below:
int x_index = blockDim.x * blockIdx.x + threadIdx.x; int y_index = blockDim.y * blockIdx.y + threadIdx.y; if ((x_index >= cols) || (y_index >= rows)) { return; }
Make yourself familiar with the CUDA runtime error codes. Nsight will sometimes give output with an error such as “Program hit error 9 on execution”. Now, what you have to do is look up this error code with the documentation that you are using. Let’s look it up here – http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038. Aha! now we know what error 9 means. It says “This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations.” We probably asked the kernel to use 100000 threads per block or something to that effect, which is out of the limit of threads that the device can use per block. Now, we know we need to check what the values we are passing and adjust that.
This is something that I found extremely helpful. Here’s a simple C++ snippet I use:
Clock::time_point t0 = Clock::now(); CUDA_segment(pre_segmentation_img, post_segmentation_img, vis_img); Clock::time_point t1 = Clock::now(); milliseconds ms = std::chrono::duration_cast<milliseconds>(t1 - t0); std::cout << "Time taken for segmentation: " << ms.count() << "ms\n";
In addition to telling your execution time, which probably matters to you since you are trying to use CUDA, it also tells you if your CUDA execution failed. If you are getting a run time like 1ms for something that would usually take about 500ms, you need to hold your enthusiasm. Your algorithm didn’t suddenly become super fast. Your CUDA code probably ran into an error, and exited.
If there is a problem with your algorithm and you need to understand why it’s failing, try simplifying your kernel execution to a single thread. This allow you to forget the complexity of parallel execution and debug it like a single threaded application. Just use block size = 1, and threads per block = 1. Also, do any additional modifications to your kernel code so that it goes on the same path every time you debug, i.e. if your processing an image, make sure it operates on the same sequences of pixels, by hard coding the x and y indices (x_index = 200, y_index = 200).
convert_2_closest_color <<<1, 1>>> (cuda_img, valid_colors_);
After following step 3, I prefer to use a lot of printfs for debugging. This allows me to execute the code in “Release” mode, and see what exactly is going wrong at a fast execution speed.
NOTE: Make sure you disable all printfs through a macro when you want to use this code in production
Even with debugging, the data structures you use are hard to check because of the massive parallelism that’s inherent with CUDA. Try to write out the effects of the intermediate steps of your algorithm by doing a cudaMemCpy from device to host. I usually write out the data into CSV files or image files and check the output for any issues that I can see. If you can visualize the data, you will notice a lot of issues that can result due to errors in your code.
I hope this helped to ease some of the pain that you are suffering due to programming CUDA. Don’t get me wrong I love CUDA, and I truly love the end execution times it gives for my algorithms. But debugging is quite a process and needs to get used to 🙂
I’ve been dabbling at a bit of C for some extremely optimized code, and it has served me well. I’ve also learnt some lessons. Here is one lesson I learnt regarding memset.
I had a large array of floats.
int great_array_size = 100000; float * the_great_array = (float*) malloc (sizeof(float) * great_array_size));
Now, I wanted a fast way of initializing the array to a specific value. So, without thinking too much I used memset. I did not have any knowledge when I was using this that it was mainly used to initialize strings.
float great_initial_value = 0.005f; memset(the_great_array, great_initial_value, great_array_size);
Instead of fast initialization what I got was a world of hurt. Memset will convert the passed value to a char value, and in this case use the least significant byte value of the float and use this value to initialize your array.
The correct way to initialize a float array is the obvious way.
for (size_t i = 0; i < great_array_size;++i) { the_great_array[i] = great_initial_value; }
Sigh. It seems easy now that I know what happens. Oh, well.
The difference between malloc and new is subtle, but important if you are mixing C and C++. malloc will allocate the memory needed for your object. new will allocate your memory and call your constructor as well, executing any code in it.
The same difference applies to free and delete.
Here’s a code example.
#include <cstdlib> #include <iostream> struct MyClass { int property = 0; MyClass() { property = 10; } ~MyClass() { std::cout << "Object destructor called" << std::endl; } }; int main(int argc, char** argv) { MyClass *my_class_malloc = (MyClass*) malloc(sizeof(MyClass)); // just allocated memory std::cout << "Property : " << my_class_malloc->property << std::endl; MyClass *my_class_new = new MyClass(); // calls constructor and sets to 10 std::cout << "Property : " << my_class_new->property << std::endl; std::cout << "Calling free..." << std::endl; free(my_class_malloc); std::cout << "Calling delete..." << std::endl; delete(my_class_new); }
On this post I maintain a list of plugins I use with Visual Studio.
If you are used to control+click through Eclipse or Idea usage. You will love this set of tools. It has tons of useful tweaks that makes Visual Studio much easier to use.
Available at: https://visualstudiogallery.msdn.microsoft.com/dbcb8670-889e-4a54-a226-a48a15e4cace
I’m a big vim fan for ease of navigation, fast editing and movement. If you are this will be a life saver for you, as you will never have to use arrow keys or the mouse for navigation.
Available at: https://visualstudiogallery.msdn.microsoft.com/59ca71b3-a4a3-46ca-8fe1-0e90e3f79329
This feature is almost expected of IDEs and is primarily one of the reasons why we use them for complex projects. Unfortunately, this plugin only has renaming as the supported refactoring option. I really miss the extract method that was available in other IDEs after I switched to Visual C++ for my work. Anyway, it’s better than nothing.
Available at: https://visualstudiogallery.msdn.microsoft.com/164904b2-3b47-417f-9b6b-fdd35757d194
4. Image Watch (as mentioned by Chris May)
This is a really cool plugin if you are working with openCV. It helps you get rid of the std::cout statements you need to include to see the contents of matrices, and makes working with images a pleasure rather than a pain.
Available at: https://visualstudiogallery.msdn.microsoft.com/e682d542-7ef3-402c-b857-bbfba714f78d
Let me know if there are any other plugins that you find useful for coding C++ in Visual Studio.
Disclaimer: This solution worked for Visual Studio 2013 with Qt 5.3
One of the reasons if your Qt program runs fine but the Visual Studio Intellisense does not detect Qt specific variables can be due to the necessary header files not being included. You will notice if you go to Project Properties->C/C++->Additional Include Directories, that there are several include entries defined with using QTDIR as an environment variable.
I’m sure you have figured out the solution now. You have to define this system variable to make Intellisense work. Add a new system variable called QTDIR, and point it to the QT installation with the version you are working with, ex: C:\Qt\5.3\msvc2012_opengl. This should solve your problem.
Note: There seems to be other problems that can cause Intellisense to not work with Qt, and they are pointed out in: http://qt-project.org/faq/answer/intellisense_does_not_work_for_my_qt_application._whats_wrong
I was following the excellent tutorial from Alessio on cross compiling to raspberry pi from Windows 7. Then, I hit the the dreaded ‘Error 127’ from make. Now, after hours of searching I couldn’t find how to solve this. Then, Bryan has mentioned that you need to install the 32 bit cygwin version and that would work, and works it does.
If you already installed cygwin 64 bit version like me and wondering how to install it,here are some steps:
Another error I ran across is this:
Could not determine GDB version after sending: C:\cygwin32\opt\cross\x-tools\arm-unknown-linux-gnueabi\bin\arm-unknown-linux-gnueabi-gdb.exe –version
If you ran across this you installed python 2.7 instead of 2.6. Re-run the cygwin setup and change the python version to 2.6.x.x and install this python version. Re-run your program and it should be working.
I wrote this so it helps someone else to save a ton of time. HTH.
This post discusses best practices when programming with the WSO2 Carbon platform, which is the base for all WSO2 products.
Here are the main points discussed in this post:
These points are discussed in detail below giving reasons and a HOWTO for each point. Hope you find the details useful as this is a long (and probably boring) read.
I personally prefer unit tests. But we have an integration test framework and as well as a system test framework (Clarity). Make sure you have tests to address to cover most functionality, if not all functionality. Features should not be considered complete, without test coverage.
If you find improvements on the points spoken, please do leave a comment and I will incorporate it into the post.
This post should help a lot of people starting in jQuery and javascript. For ease of reference, I have borrowed code present in the ajax jquery docs.
The standard way in jQuery to make an ajax call is by using,
$.ajax({ type: 'POST', url: url, data: data, success: success, dataType: dataType });
This gives you an infinite number of tweaks that’s available (all documented here). But, the truth is 90% of the time you do not need these extra options. So, here are a few shortcut functions that allows us to write even less with jQuery.
Let’s take an example use of $.post:
$.post('bam_data_processor.jsp', 'processAll=true', function(data) { $('#hadoop_result').html(data); });
This is shorthand for,
$.ajax({ type: 'POST', url: 'bam_data_processor.jsp', data: 'processAll=true', success: function(data) { $('#hadoop_result').html(data); }), });
Now, let’s take an example use of $.get:
$.get('bam_stored_configs.jsp', function(data) { $('#hadoop_result').html(data); }, "json");
$.get is shorthand for
$.ajax({ url: 'bam_data_processor.jsp', success: function(data) { $('#hadoop_result').html(data); }), dataType : "json" });
An even shorter method for dealing with JSON is.
$.getJSON('bam_stored_configs.jsp', function(data) { $('#hadoop_result').html(data); });
which is shorthand for:
$.get('bam_stored_configs.jsp', function(data) { $('#hadoop_result').html(data); }, "json");
It’s really great that the jQuery devs introduce conveniences like these to write even less code. HTH.