Hello, welcome back again. In this blog post, I would like to share my progress on the Google Summer of Code with OpenSUSE for the RPMLint project for week 8.
As mentioned in the previous blog post, I started to work on DuplicatesCheck.py
. Here is a detailed overview of my progress.
Progress [########……..]
For this week, I decided to focus on DuplicatesCheck.py
. However, I soon realized that the tests required some additional capabilities of the FakePkg
class.
Here is the current interface of FakePkg
we use to create a mockPkg
:
|
|
Using the header
argument alone doesn’t provide all the required information for some tests. Certain tests require much more information, which cannot be directly passed through the header parameter.
For the current discussion with DuplicatesCheck
, the test function I am considering is test_unexpanded_macros
in the file test_files.py
. For this test, we need what’s called md5
hash values. These are the hashes of files that are in a binary RPM file.
To learn more about MD5 follow this wiki link
For example, consider the test function:
|
|
Here, the binary file we are checking is test/binary/duplicates-0-0.x86_64.rpm
, and the hashes of all the files are generated by md5 to find duplicate files with the same hashes. See the list of files in the below output sample of an RPM command to list all the files in a binary RPM file:
|
|
After hashing these files using md5 within pytest runtime, I obtained the hash values of these files. Because there are duplicate files, the same hash values are generated and stored with a key-value data structure (Dictionary). See the output sample below:
|
|
As shown, there are 9 files that are hashed, and some files have the same hash values. Additionally, in the rpm
query, there are a total of 11 files. This differnece is because there are very small files, rpmlint ignores them. The file size limit id defined in configuration file, which are less than the minimum file size limit, all files will be ignored.
And yes, I found out these hash values using the Python Debugger (Pdb). These values are stored in a variable md5s
during runtime in the DuplicatesCheck.py
file. Here
I believe this would work, provided that we implement the md5
variable within the Pkgfile
class and pass header information while creating a mock package using FakePkg
. I am not sure whether I should hard-code these hash values into the header object for passing to the test function. I even tried creating real files using the real_files=True
argument, but it didn’t work.
Misc.
In addition to working on DuplicatesCheck.py
, I have also made some progress with FilesCheck.py
. This also requires some more capabilities of FakePkg
. However, I haven’t explored all the possible ways to create files and test them yet. I plan to do that in the coming week.
As mentioned in my last post, I will be visiting the SUSE office in Bangalore. I will share the visit date. I am planning to visit around the 3rd week of August. I will also share the details on my LinkedIn page. Do follow me on LinkedIn.
Links: