A Language Independent Technique for Tracking Source Code Lines
Motivating Example
Our paper on LHDiff has been accepted as a full technical paper in ICSM 2013, where we not only explain the technique in detail but also compare the technique with other state-of-the-art line tracking techniques using different evaluation methods. The data and code used in our experiment are available to download. In case you want to use those to replicate the study or in a different study please feel free to contcat us.
No More Wait: LHDiff tool is now available to download
A command line version of the tool is now available to download. Please download the jar file and run this from command line using following instruction:
java -jar lhdiff.jar
The tool requires Java runtime environment which can be downloaded from here.
The following command line options are available to work with LHdiff:
Option | Description | Default Setting |
-i | Ignore case differences | disabled |
-k | The size of mapping candidate set | 15 |
-p | Context weight (0<CXW<1) and the threshold value (0<TH<1) for combine similarity score used in Step 4. Content weight will be automatically set to 1-CXW | 0.4 and 0.45 |
-cnm | Line content similarity metric | Levenshtein |
-cxm | Line content similarity metric | Cosine |
-cxs | Context size | 4 |
-ls | Detect line split | disabled |
-ob | Display both line number and content | display only line number |
Usage Example
usage: java -jar lhdiff.jar [-i] [-k candidateSetSize] [-p contextWeight Threshold] [-cnm contentMetric][-cxm contextMetric] [-cxs contextSize] [-ls lineSplit] [-ob outputBoth] oldfile newfile
You can also type help in the command line to learn details about different options availavle withing LHDiff.
usage: java -jar lhdiff.jar help
I have been using LDiff in my research, which is relatively slow in real practice. It seems like that LHDiff could be a great replacement for me.
The problem is that my code in c and because I have to call LDiff/LHDiff many times (like millions of times), I cannot afford to create a process run LDiff/LHDiff each time. So, I basically re-implemented LDiff in c in my tool. Now I would like to replace LDiff with LHDiff. Do you have any suggestion to reduce my work?
Thanks!
Hi Meng,
I am happy that you find the tool interesting. I have the LHDiff code implemented in Java. If you want to use LHDiff in your work, I can share that with you. Running the tool from command line is very slow compared to its performance while running the java code directly. You can possibly easily change the code into c.
But I dont have any implementation in c currently. Thanks.