In this article I'm going to handle the case when a database has overgrown because of duplicates and dump entries. In this article I'm going to create a handy migration script to find unwanted rows, remove them and replace foreign keys to point to correct places.
đź’ˇ Remember to do a database backup first
The case
Let's imagine that in our application users can create configurations. Unfortunately, it turned out that they keep repeating the job over and over… whoops, our bad!
Search and hash
Let's start with creating proper query in a migration script. The relations are Users → Configurations → Themes and Users → Themes, so users can create themes and assign them to active configurations.
Now we will search for duplicate theme values and organize them in easy-to-manage collections.
- Fetch and iterate users, because we want to compare Themes in scope of given User.
return new class extends Migration
{
public function up(): void
{
User::with('themes')->chunk(100, $this->processUsers($users));
}
}
- Generate hash strings for theme fields to easily compare them. By using Laravel model's only method, we want to have only specified fields, because there may be fields like images which cannot be treated as a text value and we don't want to process them. We'll learn about images in next article - đź”—Â Cleaning up Laravel application database - part 2 - images.
private function processUsers(Collection $users): void
{
foreach ($users as $user) {
$toCompareByText = collect();
foreach ($user->themes as $theme) {
$theme->textHash = md5(serialize(
$theme->only(['heading', 'paragraph', 'primary_color', 'secondary_color', 'font_family', 'base_font_size'])
));
$toCompareByText->push($theme);
}
}
}
Grouping is the key
Let's put themes into groups - collections of themes having all values in common. One check if a group contains duplicates and we can do further processing in cleanUpCollection
function which will do the changes in the database.
$toCompareByText->groupBy('textHash')
->each(function ($sameTextGroup) {
if ($sameTextGroup->count() < 2) {
return; // No duplicates
}
$this->cleanUpCollection($sameTextGroup);
});
Clean up the mess
Finally we will reassign themes which are going to stay in the database and remove unwanted rows. The shift
method called on the collection will simply extract a survivor from the set. Survivor is going to stay, others are free to get deleted, because they're duplicates of survivor.
private function cleanUpCollection(Collection $sameTextGroup): void
{
$survivor = $sameTextGroup->shift();
$sameTextGroup->each(function ($theme) use ($survivor) {
DB::transaction(function () use ($theme, $survivor) {
Configuration::where('theme_id', $theme->id)->update(['theme_id', $survivor->id]);
$theme->delete();
});
});
}
đź’ˇ Always take care of data consistency and use transactions for actions which can't live without each other
Prevent future littering
I hope this handy script did the job for you, but if there were duplicates in your application's database, it's good to know the reason of this state. It's time to rethink the application's logic, structure and user interface to make it more intuitive and promote clean patterns.
See also đź”—Â Cleaning up Laravel application database - part 2 - images