I assume you already got familiar with the first part of the article. If not, I recommend reading đź”—Â Cleaning up Laravel app database duplicates - part 1 first. In the second part we're going to extend what we did last time.
đź’ˇ Remember to do a database and file storage backup first
How the Theme looks like
We use the same relation set as in the previous article, but let's also have an image in our Theme. Then this is how the grouping is going to look like:
The extra step
We need to do an extra step and generate hash strings for our images which could possibly be duplicates of each other. Comparing images will be the heaviest part of our script, so to avoid hell-long execution time, let's compare the images only if other fields are equal.
$toCompareByText->groupBy('textHash')
->each(function ($sameTextGroup) use () {
if ($sameTextGroup->count() < 2) {
return;
}
$toCompareByImage = collect();
foreach ($sameTextGroup as $theme) {
if (!$theme->image) {
$theme->imageHash = 'no-image';
$toCompareByImage->push($theme);
continue;
}
$path = storage_path('app/public/' . $theme->image);
if (!file_exists($path)) continue;
$image = file_get_contents($path);
if (!$image) continue;
$theme->imageHash = md5($image);
$toCompareByImage->push($theme);
}
}
đź’ˇ The simplest image hashing approach is based on the assumption that we want to know if images are strictly the same. For comparing similar images search for image similarity checking techniques - there are also readymade libraries which could be utilized here
Clean it up
At this step we should have got groups of duplicate entries marked with imageHash field. Let's go through each group and leave only one entry (survivor) per group.
$toCompareByImage->groupBy('imageHash')
->each(function ($sameImageGroup) use ($fs) {
$survivor = $sameImageGroup->shift();
$sameImageGroup->each(function ($theme) use ($fs, $survivor) {
DB::transaction(function () use ($theme, $fs, $survivor) {
Configuration::where('theme_id', $theme->id)->update(['theme_id', $survivor->id]);
if ($theme->image) {
$fs->delete($theme->image);
}
$theme->delete();
});
});
});
đź’ˇ Always take care of data consistency and use transactions for actions which can't live without each other
See also đź”—Â Cleaning up Laravel app database duplicates - part 1