"refine") Phase(
<Phase.REFINE: 'refine'>
We use from __future__ import annotations
to support forward references in type hints. To be precise in the @classmethod
we create to keep track of all instances of the class.
We want to use Pydantic
Dataclasses to enable typechecking and validation. We also want to use the Dataclasses with the MiniDataAPI
to create the tables in the SQLite
database. But SQLite
only has datatypes: NULL
, INTEGER
, REAL
, TEXT
, and BLOB
. So no list
or any of the Dataclass(Enum) types we use.
To be able to use both Pydanctic
and the MiniDataAPI
we will do two things:
@field_serializer
methods to the Pydantic Dataclass that convert the fields to JSON strings when we use the method .model_dump()
on the instance of the Pydantic Dataclass. These serialised JSON strings can then be added to the SQLite database.@field_validator
decorator to convert the JSON strings back to the correct datatypes when we load the data from the SQLite database back into the Pydantic Dataclass.This way we can:
MiniDataAPI
and SQLite
friendly datatypes using .model_dump()
on the instance, that we can then add to the database.The exact implementation can be found below where the Classes are defined.
We also want to keep track of the instances available for each class. Therefore we need some higher order magic.
We can’t just add a _instances = []
statement to the Class, because Pydantic will then assume it is a model field (private attribute). We need to tell Pydantic to ignore the _instances class variable as a model field and treat is as a class variable. Therefore we need to import ClassVar
from typing
and use it to type the _instances variable.
First we define the possible values of the different variables that are available in the classes. We use the module enum
to define Enumerations. We use this to bind the possible values to a variable name, making the code more readable and maintainable.
OrganizationSystem (value, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
How tools organize and structure information.
PhaseQuality (value, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
Quality rating for how well a tool performs in each phase.
Phase (value, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
The five phases of the PKM workflow.
Method (value, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
How actions are performed - manually or automatically.
InformationType (value, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
Information content types that flow through the PKM workflow.
Next we create a dataclass for each item we need to be present in the PKM workflow.
Pydantic Dataclasses
Used for typechecking.
When creating a new instance for an InformationItem
the toolflow
must be given as a list of Tool
objects. The typechecking makes sure that any Tool
object mentioned in the toolfow
list, does exist as an actual Tool
instance. So make sure to first create all the Tool
instances that are needed for an InformationItem
, before creating the InformationItem
instance.
I had some serious trouble getting the Pydantic dataclass validations to work. One of the issues is described above and is about SQLite not supporting all datatypes. A second major issue is that the Pydantic Dataclasses reference each other. The InformationItem
references the Tool
in the toolflow
field. I would also be convenient to store all the InformationItem
s that can be used with a certain Tool
, but in that case we would create a circular reference between InformationItem
and Tool
.
We decided to remove the information_items
list from Tool
. When we need to get all the InformationItem
s that are supported by a Tool
we can write a Python function or do a SQL-query on the SQLite database.
But then we are left with the fact that we want a list of Tool
s that exist. These are the options considered:
toolflow: list[Tool]
toolflow: list[Tool.name]
toolflow: list[str]
The last option is used in combination with validation to ensure each string is a valid Tool.name.
Here’s why this is the best approach:
The same goes for the Improvement
class and the field tool
.
Tool (name:str, organization_system:list[__main__.OrganizationSystem], phase_quality:__main__.PhaseQualityData, collect:str|None=None, retrieve:str|None=None, consume:str|None=None, extract:str|None=None, refine:str|None=None)
*!!! abstract “Usage Documentation” Models
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
PhaseQualityData (collect:__main__.PhaseQuality, retrieve:__main__.PhaseQuality, consume:__main__.PhaseQuality, extract:__main__.PhaseQuality, refine:__main__.PhaseQuality)
*!!! abstract “Usage Documentation” Models
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
PhaseToolflowData (collect:Union[str,list[str],tuple[str],NoneType], retrieve:Union[str,list[str],tuple[str],NoneType], consume:Union[str,list[str],tuple[str],NoneType], extract:Union[str,list[str],tuple[str],NoneType], refine:Union[str,list[str],tuple[str],NoneType])
*!!! abstract “Usage Documentation” Models
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
PhaseMethodData (collect:__main__.Method|None, retrieve:__main__.Method|None, consume:__main__.Method|None, extract:__main__.Method|None, refine:__main__.Method|None)
*!!! abstract “Usage Documentation” Models
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
InformationItem (name:str, info_type:__main__.InformationType, method:__main__.PhaseMethodData, toolflow:__main__.PhaseToolflowData)
*!!! abstract “Usage Documentation” Models
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
Improvement (title:str, what:str, why:str, prio:int, tool:str, phase:__main__.Phase)
*!!! abstract “Usage Documentation” Models
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
Test creating instances
def test_phase_quality_data():
pqd = PhaseQualityData(collect=PhaseQuality.GREAT, retrieve=PhaseQuality.BAD, consume=PhaseQuality.OK, extract=PhaseQuality.NA, refine=PhaseQuality.GREAT)
test_eq(pqd.collect, PhaseQuality.GREAT)
test_eq(pqd.retrieve, PhaseQuality.BAD)
def test_tool_creation():
tool = Tool(name="TestTool", organization_system=[OrganizationSystem.TAGS], phase_quality=PhaseQualityData(collect=PhaseQuality.GREAT, retrieve=PhaseQuality.BAD, consume=PhaseQuality.OK, extract=PhaseQuality.NA, refine=PhaseQuality.GREAT))
test_eq(tool.name, "TestTool")
test_eq(tool.slug, "testtool")
test_eq(tool.phase_quality.collect, PhaseQuality.GREAT)
def test_tool_flatten():
tool = Tool(name="TestTool", organization_system=[OrganizationSystem.TAGS], phase_quality=PhaseQualityData(collect=PhaseQuality.GREAT, retrieve=PhaseQuality.BAD, consume=PhaseQuality.OK, extract=PhaseQuality.NA, refine=PhaseQuality.GREAT))
flat = tool.flatten_for_db()
test_eq(flat['collect_quality'], 'great')
test_eq(flat['retrieve_quality'], 'bad')
test_eq(flat['name'], 'TestTool')
def test_information_item():
methods = PhaseMethodData(collect=Method.MANUAL, retrieve=None, consume=None, extract=None, refine=None)
tools = PhaseToolflowData(collect="Reader", retrieve="Recall", consume=None, extract=None, refine=None)
item = InformationItem(name="Test Article", info_type=InformationType.WEB_ARTICLE, method=methods, toolflow=tools)
test_eq(item.method.collect, Method.MANUAL)
test_eq(item.toolflow.collect, "Reader")
def test_information_item_flatten():
methods = PhaseMethodData(collect=Method.MANUAL, retrieve=None, consume=None, extract=None, refine=None)
tools = PhaseToolflowData(collect=["Reader", "Recall"], retrieve="Recall", consume=None, extract=None, refine=None)
item = InformationItem(name="Test Article", info_type=InformationType.WEB_ARTICLE, method=methods, toolflow=tools)
flat = item.flatten_for_db()
test_eq(flat['collect_method'], 'manual')
test_eq(flat['retrieve_method'], None)
test_eq(flat['collect_toolflow'], '["Reader", "Recall"]')
test_eq(flat['retrieve_toolflow'], 'Recall')
def test_improvement():
imp = Improvement(title="Fix Search", what="Better search in Reader", why="Current search is bad", prio=1, tool="testtool", phase=Phase.RETRIEVE)
test_eq(imp.title, "Fix Search")
test_eq(imp.phase, Phase.RETRIEVE)
test_eq(imp.flatten_for_db()['phase'], 'retrieve')
{'testtool': Tool(name='TestTool', organization_system=[<OrganizationSystem.TAGS: 'tags'>], phase_quality=PhaseQualityData(collect=<PhaseQuality.GREAT: 'great'>, retrieve=<PhaseQuality.BAD: 'bad'>, consume=<PhaseQuality.OK: 'ok'>, extract=<PhaseQuality.NA: 'na'>, refine=<PhaseQuality.GREAT: 'great'>), collect=None, retrieve=None, consume=None, extract=None, refine=None, slug='testtool')}
{'test_article': InformationItem(name='Test Article', info_type=<InformationType.WEB_ARTICLE: 'web_article'>, method=PhaseMethodData(collect=<Method.MANUAL: 'manual'>, retrieve=None, consume=None, extract=None, refine=None), toolflow=PhaseToolflowData(collect=['Reader', 'Recall'], retrieve='Recall', consume=None, extract=None, refine=None), slug='test_article')}
Connect to the database in the main.py
. We should also enable foreign key constraints. These are disabled by default in Sqlite.
For testing purposes in this module we will use db = database(":memory:")
to create an in-memory database.
create_db (loc='static/infoflow.db')
But for now we won’t use foreign key constraints.
Tests and usage examples
[Column(cid=0, name='id', type='INTEGER', notnull=0, default_value=None, is_pk=1),
Column(cid=1, name='slug', type='TEXT', notnull=0, default_value=None, is_pk=0),
Column(cid=2, name='name', type='TEXT', notnull=0, default_value=None, is_pk=0),
Column(cid=3, name='info_type', type='TEXT', notnull=0, default_value=None, is_pk=0),
Column(cid=4, name='method', type='TEXT', notnull=0, default_value=None, is_pk=0),
Column(cid=5, name='toolflow', type='TEXT', notnull=0, default_value=None, is_pk=0)]
Add the previously created instances to the SQLite tables
{'title': 'improvement_a',
'what': 'gras',
'why': 'dus',
'prio': 0,
'tool': 'reader',
'phase': 'collect',
'slug': 'improvement_a'}
Add a single instance to the SQLite table
InformationItemDB(id=1, slug='infoitem_a', name='infoitem_a', info_type='book', method='["manual"]', toolflow='[["reader", "obsidian"], "obsidian", "reader", "obsidian", "reader"]')
Add multiple instances to the SQLite table
<Table tool_db (id, slug, name, organization_system, phase_quality, collect, retrieve, consume, extract, refine)>
[{'id': 1,
'slug': 'reader',
'name': 'reader',
'organization_system': '["tags"]',
'phase_quality': '["great", "ok", "ok", "ok", "ok"]',
'collect': None,
'retrieve': None,
'consume': None,
'extract': None,
'refine': None},
{'id': 2,
'slug': 'obsidian',
'name': 'obsidian',
'organization_system': '["tags"]',
'phase_quality': '["great", "bad", "bad", "bad", "bad"]',
'collect': None,
'retrieve': None,
'consume': None,
'extract': None,
'refine': None}]
Now retrieve the info from the database as intances from the Pydantic Dataclass
Method 1:
[{'id': 1,
'slug': 'reader',
'name': 'reader',
'organization_system': '["tags"]',
'phase_quality': '["great", "ok", "ok", "ok", "ok"]',
'collect': None,
'retrieve': None,
'consume': None,
'extract': None,
'refine': None},
{'id': 2,
'slug': 'obsidian',
'name': 'obsidian',
'organization_system': '["tags"]',
'phase_quality': '["great", "bad", "bad", "bad", "bad"]',
'collect': None,
'retrieve': None,
'consume': None,
'extract': None,
'refine': None}]
Tool(name='reader', organization_system=[<OrganizationSystem.TAGS: 'tags'>], phase_quality=[<PhaseQuality.GREAT: 'great'>, <PhaseQuality.OK: 'ok'>, <PhaseQuality.OK: 'ok'>, <PhaseQuality.OK: 'ok'>, <PhaseQuality.OK: 'ok'>], collect=None, retrieve=None, consume=None, extract=None, refine=None, slug='reader')
Method 2:
[ToolDB(id=1, slug='reader', name='reader', organization_system='["tags"]', phase_quality='["great", "ok", "ok", "ok", "ok"]', collect=None, retrieve=None, consume=None, extract=None, refine=None),
ToolDB(id=2, slug='obsidian', name='obsidian', organization_system='["tags"]', phase_quality='["great", "bad", "bad", "bad", "bad"]', collect=None, retrieve=None, consume=None, extract=None, refine=None)]
ToolDB(id=1, slug='reader', name='reader', organization_system='["tags"]', phase_quality='["great", "ok", "ok", "ok", "ok"]', collect=None, retrieve=None, consume=None, extract=None, refine=None)
Tool(name='reader', organization_system=[<OrganizationSystem.TAGS: 'tags'>], phase_quality=[<PhaseQuality.GREAT: 'great'>, <PhaseQuality.OK: 'ok'>, <PhaseQuality.OK: 'ok'>, <PhaseQuality.OK: 'ok'>, <PhaseQuality.OK: 'ok'>], collect=None, retrieve=None, consume=None, extract=None, refine=None, slug='reader')
instns_to_db (db_tbl, cls_instns)
Add all instances from a given Class to the given database table