Seed Manager¶
This part of the documentation is aimed to describe the store and load procedure of the seeds. It also describes the design part of it and how is it implemeted. Seed Manager can be extendted easily for a new model with felxible extention tecnique.
Store Procedure¶
Store procedure is a Service provided by the application which is used to call to store data for a particular Model (e,g. Project, Grammar..) with all the dependencies.
It follows a simple pattern which evolves from Seed::Base
.
Seed::Base
is the parent class where all the necessary methods are declared.
Naming Convention and necessary configuration¶
Seed Service follows a naming convention for the sake of the design which is part of Seed Module.
<model_name>_seed.rb
E,g. Seed class project_seed.rb
, project_uses_block_language_seed.rb
Seed class takes three params seed_id
which is mandatory a param must be passed as argument and dependencis
is optional param only passed as argument if the Seed Model has dependencies and defer_referential_checks
has default valu false
unless any seed class privides other values based on the seed model’s foreign_key_constraints.
seed_id
could be uid
of the Model or Model object.
The instance varible seed_id
is the id of the Model that will be stored or processed.
- other configuration parameters
SEED_IDENTIFIER = Project
is the name of the modelSEED_DIRECTORY = "projects"
is the seed directory to store the seed
- optional dependencies
def initialize(seed_id)
super(seed_id, dependencies = {
ProjectUsesBlockLanguageSeed => "project_uses_block_languages",
CodeResourceSeed => "code_resources",
ProjectSourceSeed => "project_sources",
ProjectDatabaseSeed => "project_databases",
ProjectDatabaseSeed => "default_database",
}, defer_referential_checks = true)
end
Seed are stored in a yaml file with a prefix of seed_id
in corresponding directory
all the dependencies will be stored in its own SEED_DIRECTORY
and it will create a dependency manifest seed_id-deps.yaml in the parent directory
which contains a set of three idential value, seed_path
, seed_id
and seed_name
. seed name is the seed model name.
Images and sqlite databases are stored in respective SEED_DIRECTORY
with the corresponding seed_id
Call to store a seed¶
After seed class is defined according the above configuration and naming comvention (encouraged to follow), one can start stoing the data.
e.g: Seed::ProjectSeed.new(Project.first/Project.first.id).start_store
Seed class can handle both Object or Object id
start_store calls store
method which takes a Set object as argument. Which has been used for storing dependencies.
def store_dependencies(processed)
dependencies.each do |dependent_seed_name, seed_model_attribute|
data = seed.send(seed_model_attribute)
to_serialize = (data || [])
if not to_serialize.respond_to?(:each)
to_serialize = [to_serialize]
end
to_serialize.each do |dep_seed|
dependent_seed_name.new(dep_seed)
.store(processed)
end
end
end
seed
is the Model object we are storing either provided as constructor arguments or it calls a find
on Model if provided seed_id is a id.
dependencies
hash contains {key => value}
where key is dependent seed and value is the model attribute to call the on the parent model to get all relative records.
if the return data is not an array incase it has only one record its need to be serialized. And then each record has passed to store with corresponding seed model.
processed
is a set param with three values as the store
method is designed to break the circular dependencies
after_store_seed
hook is called after store_seed
to enable seed classed to override this method if something like image or database needs to be stored after seed is stored.
def store(processed)
if processed.include? [seed_directory, seed.id, self.class]
else
store_seed
after_store_seed
processed << [seed_directory, seed.id, self.class]
store_dependencies(processed)
end
if dependencies.present?
File.open(project_dependent_file(processed.first[0], processed.first[1]), "w") do |file|
YAML::dump(processed, file)
end
end
end
Method itself describes the steps processed.first
contains the parent class information
If the Seed does not have any dependencies, no problem as the default value of the dependencies
is an empty array.
Store all seed of a seed class¶
To store all data, the example call will look like:
Seed::ProjectSeed.store_all
or Seed::GrammarSeed.store_all
Its a class method which calls store_all
method on Seed class, defined as:
def self.store_all
self::SEED_IDENTIFIER.all.each { |s| new(s.id).start_store }
end
Load procedure¶
Load procedure of the seed also declared in Seed::Base
class
It follows very simple pattern. It takes seed_load_id
aka seed_id
if seed_id is not a object itself.
and retuns files base name if any yaml file is provided to load
defined as:
def load_seed_id
return File.basename(seed_id, ".*") if File.extname(seed_id.to_s).present? && File.extname(seed_id.to_s) == ".yaml"
return seed_id unless seed_id.is_a?(seed_name)
end
load_id
is generated based on the type of load_seed_id
, but always retruns an id
regardelss of load_seed_id type
def load_id
if load_seed_id
if string_is_uuid? load_seed_id.to_s
load_seed_id.to_s
else
find_load_seed_id(load_seed_id.to_s)
end
end
end
As described in the Store Procedure, Seed class is configured with SEED_DIRECTORY
and SEED_IDENTIFIER
.
So When we start loading a particular seed we already know the seed directory
Upsert seed data¶
Upsert is meant to Insert or Update. As seed data is stored in a yaml file, we create a seed instance by loading the yaml file.
def seed_instance
raise "Could not find project with slug or ID \"#{load_id}\"" unless File.exist? seed_file_path
YAML.load_file(seed_file_path)
end
Now upserting data from seed file path and after upserting it calls after_load_seed
to load seed specific data
def upsert_seed_data
raise RuntimeError.new "Mismatched types, instance: #{seed_instance.class.name}, instance_type: #{seed_name.name}" if seed_instance.class != seed_name
Rails.logger.info " Upserting data for #{seed_name}"
db_instance = seed_name.find_or_initialize_by(id: load_seed_id)
db_instance.assign_attributes(seed_instance.attributes)
db_instance.save! if db_instance.changed?
db_instance
Rails.logger.info "Done with #{seed_name}"
after_load_seed
end
seed_name
is the defined SEED_IDENTIFIER
in the seed class
Code explains the steps of of intializng attributes for the model
It also handles dependencies by reading the the dependency manifest writtend during store procedure.
def load_dependencies
deps = File.join seed_directory, "#{load_seed_id}-deps.yaml"
deps = YAML.load_file(deps)
deps.each do |_, seed_id, seed|
seed.new(seed_id).upsert_seed_data
end
end
Loads the ...-deps.yaml
file and takes each set data, where we need to take care of only last params one is seed_id and anoher is seed class.
Then it follwos the usual way to call upsert_seed_data
method on seed instance.
Based on defer_referential_checks
value it calls `` ActiveRecord::Base.connection.disable_referential_integrity`` which takes the transaction block to enable deffered constraints.
Otherwise just runs the upsert and other methods. As a final step it moves the data from intermediate tmp storage to origianl storage defined in Project seed
To load a particular seed, the example call would look like:
Seed::ProjectSeed.new(seed_id).start_load
start_load
is defined as follows
def start_load
run_within_correct_transaction do
upsert_seed_data
dep = File.join seed_directory, "#{load_id}-deps.yaml"
load_dependencies if File.exist? dep
end
if @defer_referential_checks
db_instance = seed_name.find_or_initialize_by(id: load_id)
db_instance.touch
db_instance.save!
end
move_data_from_tmp_to_data_directory
end
It calls dependencies if only deps file are present in the seed directory
Load all seed data of a seed class¶
It’s also a class method which calls load_all
on seed class to be loaded, examle call will look like:
Seed::ProjectSeed.load_all
or Seed::GrammarSeed.load_all
and defined as:
def self.load_all
Dir.glob(File.join load_directory, "*.yaml").each do |f|
next if f =~ /deps/
new(File.basename(f)).start_load
end
end
Which excludes dependecy files because deps are extendted name of the the processed seed_id
which is constructed based on availabilty of dependencies and load_dependencies
method takes care of those files.