WIP: image processing library (or libraries?) #12

Draft
schrottkatze wants to merge 15 commits from schrottkatze/iowo:proc-libs into main
Owner

This PR is supposed to add the image processing library that is going to primarily be used in iOwO until there's time for more complex evaluation concepts.

The library is supposed to be able to be used outside of iOwO as well, roughly in that form (the compile-time checked form):

fn some_pl() {
    let mut pipeline_builder = Image::read("foo.png")
        .blend(Image::read("bar.jpg"), Mode::Normal, 0.5)
        .scale_resize(0.5)
        .write("out.jpg");

    let pipeline = pipeline_builder.build();

    pipeline.run().unwrap();
}

There's going to be 3 ways of using it, this is only some pseudocode I typed down.

1.: The "normal" way for external users:
- with typed high level pipelines
- pipelines can be nested and reused and passed to eachother?
2.: The low level way: This is also going to be internal to the normal way, basically manually inserting and calling check functions. Not statically typed, but useful for generating pipelines in runtime
3.: The oversimplified way: Not able to actually use pipelines, but for simple/single operations like inverting etc., without complexity for building and calling pipelines.

The way of using it above is just my first draft and quite flawed (and lacking versatility), the "actual" implementation is not going to be merged until I'm happy with it.

This PR is supposed to add the image processing library that is going to primarily be used in iOwO until there's time for more complex evaluation concepts. The library is supposed to be able to be used outside of iOwO as well, roughly in that form (the compile-time checked form): ```rs fn some_pl() { let mut pipeline_builder = Image::read("foo.png") .blend(Image::read("bar.jpg"), Mode::Normal, 0.5) .scale_resize(0.5) .write("out.jpg"); let pipeline = pipeline_builder.build(); pipeline.run().unwrap(); } ``` There's going to be 3 ways of using it, this is only some pseudocode I typed down. 1.: The "normal" way for external users: - with typed high level pipelines - pipelines can be nested and reused and passed to eachother? 2.: The low level way: This is also going to be internal to the normal way, basically manually inserting and calling check functions. Not statically typed, but useful for generating pipelines in runtime 3.: The oversimplified way: Not able to actually use pipelines, but for simple/single operations like inverting etc., without complexity for building and calling pipelines. The way of using it above is just my first draft and quite flawed (and lacking versatility), the "actual" implementation is not going to be merged until I'm happy with it.
schrottkatze added 1 commit 2024-02-05 13:31:25 +00:00
schrottkatze added 1 commit 2024-02-19 11:56:53 +00:00
schrottkatze added 1 commit 2024-02-19 16:57:48 +00:00
schrottkatze added 1 commit 2024-02-19 17:07:14 +00:00
schrottkatze added 2 commits 2024-02-19 19:54:30 +00:00
schrottkatze requested review from multisamplednight 2024-02-19 19:57:30 +00:00
schrottkatze added 1 commit 2024-02-21 12:11:43 +00:00
Author
Owner

At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is.

I'll try to make the data types trait based as well, so it's technically not a pure image processing library.

Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM very fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result.

Maybe a hybrid architecture? but then how do we determine when to do what?

I'll try explicit instructions to handle this (which may also be inserted during build):

  • memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably stored in-memory and if we run out of memory, we can maybe still switch to disk caching
  • disk buffer: forcably caches the inputs to disk

Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache.

Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent runs of the program?

At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is. I'll try to make the data types trait based as well, so it's technically not a pure image processing library. Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM *very* fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result. Maybe a hybrid architecture? but then how do we determine when to do what? I'll try explicit instructions to handle this (which may also be inserted during build): - memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably stored in-memory and if we run out of memory, we can maybe still switch to disk caching - disk buffer: forcably caches the inputs to disk Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache. Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent runs of the program?
schrottkatze added 1 commit 2024-02-21 13:25:02 +00:00
multisamplednight requested changes 2024-02-22 17:53:22 +00:00
Cargo.toml Outdated
@ -3,3 +3,3 @@
"crates/app",
"crates/eval",
"crates/ir",
"crates/ir", "crates/prowocessing",
Collaborator

Nit: Regarding formatting, you may want to split the newly added entry to its own line.

Nit: Regarding formatting, you may want to split the newly added entry to its own line.
Author
Owner

ah cargo keeps doing that and i keep forgetting when adding new crates lol

ah cargo keeps doing that and i keep forgetting when adding new crates lol
schrottkatze marked this conversation as resolved
@ -25,2 +16,2 @@
let config = if let Some(config) = args.config_path {
Ok(config)
pub fn read(args: &CliConfigs) -> Self {
let config = if let Some(config) = &args.config_path {
Collaborator

Could use Option::ok_or with Result::or_else (untested):

let config = args.config_path.as_ref().ok_or(()).or_else(|_| find_config_file());
Could use `Option::ok_or` with `Result::or_else` (untested): ```rust let config = args.config_path.as_ref().ok_or(()).or_else(|_| find_config_file()); ```
Author
Owner

Doesn't work since find_config_file returns an owned value and the other value is unowned, if i add .as_ref() to the find_config_file call, it doesn't work either since then it's a temporary value.

Doesn't work since `find_config_file` returns an owned value and the other value is unowned, if i add `.as_ref()` to the `find_config_file` call, it doesn't work either since then it's a temporary value.
Collaborator

Then how about cloning it instead?

let config = args.config_path.clone().ok_or(()).or_else(|_| find_config_file());
Then how about cloning it instead? ```rust let config = args.config_path.clone().ok_or(()).or_else(|_| find_config_file()); ```
Author
Owner

Works.

Works.
schrottkatze marked this conversation as resolved
@ -12,0 +27,4 @@
},
Dev {
#[command(subcommand)]
dev_command: DevCommands,
Collaborator

Nit: Is there clarity gained in prefixing the variable name with dev_ given it's always taken out of the Dev variant?

Nit: Is there clarity gained in prefixing the variable name with `dev_` given it's always taken out of the `Dev` variant?
Author
Owner

nope, changing that

nope, changing that
schrottkatze marked this conversation as resolved
@ -27,0 +85,4 @@
let pipe = PipelineBuilder::new().add(1).stringify().build();
println!(
"{:?}",
pipe.run(vec![num0.into(), num1.into()].into()).into_inner()[0]
Collaborator

All of these dev commands seem like they'd belong into tests using #[test] instead.

All of these dev commands seem like they'd belong into tests using `#[test]` instead.
Author
Owner

that's what i already mentioned, that i prefer playing around like that

sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively.

planning to do that, but the experimental things are all termporary anyway.

that's what i already mentioned, that i prefer playing around like that sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively. planning to do that, but the `experimental` things are all termporary anyway.
Collaborator

sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively.

They're still just as interactive if one were to use cargo test --no-capture <test_name>.

planning to do that, but the experimental things are all termporary [sic] anyway.

They're what this PR introduces and what you requested my review on.

> sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively. They're still just as interactive if one were to use `cargo test --no-capture <test_name>`. > planning to do that, but the `experimental` things are all termporary [sic] anyway. They're what this PR introduces and what you requested my review on.
Author
Owner

They are indeed not what this PR introduces, those will not be in the PR once it's complete. this is basically just a working copy/scratchpad state for now, and i'll be purging it once the actual library is done which is then what will be introduced by this PR.

They are indeed not what this PR introduces, those will not be in the PR once it's complete. this is basically just a working copy/scratchpad state for now, and i'll be purging it once the actual library is done which is then what will be introduced by this PR.
Collaborator

Fair enough.

Fair enough.
multisamplednight marked this conversation as resolved
@ -0,0 +1,7 @@
//! Definitions of the data transfer and storage types.
/// Types for element and pipeline IO
Collaborator

Given that you use a module comment at the very start of this file, it seems like these doc-comments would also belong into their respective modules.
This gives the reader added context when opening the respective module e.g. after following a type definition.

Given that you use a module comment at the very start of this file, it seems like these doc-comments would also belong into their respective modules. This gives the reader added context when opening the respective module e.g. after following a type definition.
schrottkatze marked this conversation as resolved
@ -0,0 +3,4 @@
/// Types for element and pipeline IO
pub mod io;
/// Raw data types contained in `io`
Collaborator

Could use an intralink.

-/// Raw data types contained in `io`
+/// Raw data types contained in [`io`]
Could use an intralink. ```diff -/// Raw data types contained in `io` +/// Raw data types contained in [`io`] ```
schrottkatze marked this conversation as resolved
@ -0,0 +4,4 @@
pub struct Inputs<'a>(Vec<Data<'a>>);
impl<'a> Inputs<'a> {
/// get inner value(s)
Collaborator

Quite ambiguous doc-comment also regarding the rather lengthy doc-comment on the type itself.
How about removing this method altogether and making the content of Inputs directly public,
given that one's free to convert from/to it already?

Quite ambiguous doc-comment also regarding the rather lengthy doc-comment on the type itself. How about removing this method altogether and making the content of `Inputs` directly public, given that one's free to convert from/to it already?
schrottkatze marked this conversation as resolved
@ -0,0 +24,4 @@
impl<'a> From<&'a Outputs> for Inputs<'a> {
fn from(value: &'a Outputs) -> Self {
Self(value.0.iter().map(std::convert::Into::into).collect())
Collaborator

Unnecessary full method path, consider just using From::from or Into::into instead.

Unnecessary full method path, consider just using `From::from` or `Into::into` instead.
Author
Owner

ah yes, rust-analyzer loves completing full paths lol

ah yes, rust-analyzer loves completing full paths lol
schrottkatze marked this conversation as resolved
@ -0,0 +28,4 @@
}
}
/// Newtype struct around `OwnedData` for pipeline/element outputs
Collaborator

I already see that this is a newtype around OwnedData, just re-stating it does not add semantical value. Rather, what would this struct be used for?

I already see that this is a newtype around `OwnedData`, just re-stating it does not add semantical value. Rather, what would this struct be used for?
schrottkatze marked this conversation as resolved
@ -0,0 +33,4 @@
impl Outputs {
/// consume self and return inner value(s)
pub fn into_inner(self) -> Vec<OwnedData> {
Collaborator

Wait, why is Outputs allowed to be consumed for its inner content consumed while Inputs doesn't?

Wait, why is `Outputs` allowed to be consumed for its inner content consumed while `Inputs` doesn't?
Author
Owner

Inputs only contains a Vec of Data which either contains a string slice or an integer, which are really cheap to clone. OwnedData would be much heavier to clone in this case.

(I'm currently not happy how the IO of instructions works anyway, planning on reworking that to be more sensible, clear and flexible))

`Inputs` only contains a `Vec` of `Data` which either contains a string slice or an integer, which are really cheap to clone. `OwnedData` would be much heavier to clone in this case. (I'm currently not happy how the IO of instructions works anyway, planning on reworking that to be more sensible, clear and flexible))
schrottkatze marked this conversation as resolved
@ -0,0 +2,4 @@
/// Owned data type, for use mostly in outputs and storage
#[derive(Clone, Debug)]
pub enum OwnedData {
Collaborator

Given the common pattern of AsRef in the stdlib and InstructionRef already being used in this project, how about renaming OwnedData → Data and Data below → DataRef?

Given the common pattern of `AsRef` in the stdlib and `InstructionRef` already being used in this project, how about renaming `OwnedData` → `Data` and `Data` below → `DataRef`?
Author
Owner

did the renaming, gonna take care of the AsRef impl later (I remember trying that earlier and something causing problems, i forgot what that was though.)

did the renaming, gonna take care of the `AsRef` impl later (I remember trying that earlier and something causing problems, i forgot what that was though.)
schrottkatze marked this conversation as resolved
@ -0,0 +20,4 @@
}
/// Unowned data type, for inputs into runner functions
#[derive(Clone, Copy)]
Collaborator

Why does OwnedData implement Debug while Data doesn't?

Why does `OwnedData` implement `Debug` while `Data` doesn't?
schrottkatze marked this conversation as resolved
@ -0,0 +28,4 @@
impl Data<'_> {
/// converts itself to `OwnedData`
pub fn to_owned_data(&self) -> OwnedData {
Collaborator

Consider implementing ToOwned and AsRef appropiately instead.

OT: Wondering that clippy didn't scream about this. Interesting, since it has a lint for this, but apparently it didn't trigger since the name was (imo unnecessarily) suffixed with _data.

Consider implementing [`ToOwned`] and [`AsRef`] appropiately instead. OT: Wondering that clippy didn't scream about this. Interesting, since [it has a lint for this](https://rust-lang.github.io/rust-clippy/master/index.html#/should_implement_trait), but apparently it didn't trigger since the name was (imo unnecessarily) suffixed with `_data`. [`ToOwned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html [`AsRef`]: https://doc.rust-lang.org/std/convert/trait.AsRef.html
Author
Owner

I did that deliberately to not trigger the lint, but I'll fix that and properly do it after fixing your simpler reviews

I did that deliberately to not trigger the lint, but I'll fix that and properly do it after fixing your simpler reviews
Author
Owner

As I've just realized because I was trying to implement ToOwned/AsRef, it isn't even necessary to have a seperate DataRef, so i removed that entirely and... who would've thought, it's much more readable and concise overall now

As I've just realized because I was trying to implement `ToOwned`/`AsRef`, it isn't even necessary to have a seperate `DataRef`, so i removed that entirely and... who would've thought, it's much more readable and concise overall now
schrottkatze marked this conversation as resolved
@ -0,0 +5,4 @@
use super::data::io::Outputs;
pub(crate) trait PipelineElement {
/// return a static runner function pointer to avoid dynamic dispatch during pipeline execution - Types MUST match the signature
Collaborator

The condition might be more visible if it's put in an explicit # Indirect panics section or the like.

The condition might be more visible if it's put in an explicit `# Indirect panics` section or the like.
Author
Owner

probably gonna refactor this to not panic in the future and add proper handling (Results lol)

probably gonna refactor this to not panic in the future and add proper handling (`Result`s lol)
@ -0,0 +10,4 @@
};
/// Addition
pub struct Add(pub i32);
Collaborator

Why do Add, Subtract, Concatenate hold a pub i32 if it's never used?

Why do `Add`, `Subtract`, `Concatenate` hold a `pub i32` if it's never used?
Author
Owner

because i wanted to do second params like that originally, then realizing it doesn't work that "easily". so basically, still WIP

because i wanted to do second params like that originally, then realizing it doesn't work that "easily". so basically, still WIP
@ -0,0 +68,4 @@
outputs: vec![DataType::String],
}
}
}
Collaborator

Both ops::num and ops::str could very much use the guard condition pattern with let-else instead, so that the actual intended logic of the runner function isn't nested one more.

Both `ops::num` and `ops::str` could very much use the guard condition pattern with [let-else](https://doc.rust-lang.org/stable/rust-by-example/flow_control/let_else.html) instead, so that the actual intended logic of the runner function isn't nested one more.
schrottkatze marked this conversation as resolved
@ -0,0 +7,4 @@
/// TODO:
/// - Bind additional inputs if instruction has more then one and is passd without any additional
/// - allow binding to pointers to other pipelines?
/// - allow referencing earlier data
Collaborator

Those TODO:s seem like they should belong in an issue, so one can

  • discuss on them
  • selectively mark them as done/undone
  • edit them easily
  • link them from a respective PR
Those `TODO:`s seem like they should belong in an issue, so one can - discuss on them - selectively mark them as done/undone - edit them easily - link them from a respective PR
Author
Owner

this is heavily WIP, remember, these are experiments. There will not be a seperate PR until this is not only out of experimental state, but a functioning image processing library that has all that fixed.

this is heavily WIP, remember, these are experiments. There will not be a seperate PR until this is not only out of experimental state, but a functioning image processing library that has all that fixed.
Collaborator

So you'd rather do all the work of finding all TODO:s and converting them to issues (and then creating a PR removing them) after "finishing" prowocessing?

So you'd rather do all the work of finding all `TODO:`s and converting them to issues (and then creating a PR removing them) after "finishing" `prowocessing`?
Collaborator

Also, that just addresses one point, while the other 3 are still standing.

Also, that just addresses one point, while the other 3 are still standing.
Author
Owner

The intention is to perfect the api beforehand, and tbh, you'll be the only other person reviewing stuff anyway. And wdym "editing them easily", tbh i find editing text in my editor much, much easier then working with a forgejo web ui...

and if they're done, i delete them.

The intention is to perfect the api beforehand, and tbh, you'll be the only other person reviewing stuff anyway. And wdym "editing them easily", tbh i find editing text in my editor much, *much* easier then working with a forgejo web ui... and if they're done, i delete them.
Collaborator

Sounds fine to me.

Sounds fine to me.
multisamplednight marked this conversation as resolved
@ -0,0 +90,4 @@
/// Runnable pipeline - at the core of this library
pub struct Pipeline {
runners: Vec<fn(&Inputs) -> Outputs>,
Collaborator

Also regarding the enum-based arch: Why the indirection of fn(&Inputs) -> Outputs? Why does Pipeline not hold Box<dyn Element> as well?

Also regarding the enum-based arch: Why the indirection of `fn(&Inputs) -> Outputs`? Why does `Pipeline` not hold `Box<dyn Element>` as well?
Author
Owner

Unless I misunderstood rusts dyn, this avoids the overhead of dynamic dispatch by saving static function pointers, which can just be called on the fly

Unless I misunderstood rusts `dyn`, this avoids the overhead of dynamic dispatch by saving static function pointers, which can just be called on the fly
Collaborator

They do save 1 indirection, but that is 1 indirection which hasn't even been benchmarked yet against

  1. worse debuggability
    • when Debugging Pipeline if it were to implement Debug, all one sees at the moment is a bunch of addresses of a function in hexadecimal, e.g. 0x000056380fa85320.
  2. when writing fn runner() as opposed to a direct fn eval(&Inputs) -> Outputs or the like
    1. an extra indent for the actual logic
    2. extra noise for defining the return type and the returned closure

Sidenote: If you care this much about indirection, &Inputs is actually &Vec<&Data>, which are 2 indirections already before the element can access any contained data.

They do save 1 indirection, but that is 1 indirection which hasn't even been benchmarked yet against 1. worse debuggability - when `Debug`ging `Pipeline` if it were to implement `Debug`, all one sees at the moment is a bunch of addresses of a function in hexadecimal, e.g. `0x000056380fa85320`. 2. when writing `fn runner()` as opposed to a direct `fn eval(&Inputs) -> Outputs` or the like 1. an extra indent for the actual logic 2. extra noise for defining the return type and the returned closure Sidenote: If you care this much about indirection, `&Inputs` is actually `&Vec<&Data>`, which are 2 indirections already before the element can access any contained data.
Author
Owner
  1. worse debuggability

i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy

  1. when writing fn runner() as opposed to a direct fn eval(&Inputs) -> Outputs or the like
    1. an extra indent for the actual logic
    2. extra noise for defining the return type and the returned closure

the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those.
the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently)

Sidenote: If you care this much about indirection, &Inputs is actually &Vec<&Data>, which are 2 indirections already before the element can access any contained data.

yes, but I don't think that's avoidable.

> 1. worse debuggability i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy > 1. when writing fn runner() as opposed to a direct fn eval(&Inputs) -> Outputs or the like > 1. an extra indent for the actual logic > 2. extra noise for defining the return type and the returned closure the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those. the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently) > Sidenote: If you care this much about indirection, &Inputs is actually &Vec<&Data>, which are 2 indirections already before the element can access any contained data. yes, but I don't think that's avoidable.
Collaborator

i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy

Does this idea imply a map of fn pointer to debug info? If so, I'm not sure if the size increase outweighs the 1 (still unbenchmarked) indirection.

the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those.
the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently)

That does not address the points you quote at all. At no point I questioned the fn pointers being opaque. You just re-stated what the pipeline should do, which I already know, without addressing what I listed.

> i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy Does this idea imply a map of fn pointer to debug info? If so, I'm not sure if the size increase outweighs the 1 (still unbenchmarked) indirection. > the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those. > the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently) That does not address the points you quote at all. At no point I questioned the fn pointers being opaque. You just re-stated what the pipeline should do, which I already know, without addressing what I listed.
Collaborator

yes, but I don't think that's avoidable.

It is. One could clone the vec on every instruction call.

EDIT: Thinking about it, actually the vec will need to be created for every instruction call anyway, since in a graph, multiple instructions may be source for an instruction.

> yes, but I don't think that's avoidable. It is. One could clone the vec on every instruction call. EDIT: Thinking about it, actually the vec will need to be created for every instruction call anyway, since in a graph, multiple instructions may be source for an instruction.
schrottkatze added 3 commits 2024-02-23 11:46:44 +00:00
Collaborator

At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is.

I'll try to make the data types trait based as well, so it's technically not a pure image processing library.

Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM very fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing [sic] per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result.

Maybe a hybrid architecture? but then how do we determine when to do what?

I'll try explicit instructions to handle this (which may also be inserted during build):

  • memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably [sic] stored in-memory and if we run out of memory, we can maybe still switch to disk caching
  • disk buffer: forcably [sic] caches the inputs to disk

Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache.

Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent [sic] runs of the program?

The hybrid architecture sounds like the best option to me, since neither of the extremes is desirable, however I don't think always requiring explicit instructions for the user to decide is always desirable, since that'd imply a lot of micro-management for possibly rather simple pipelines.

I'd suggest determining the caching behavior depending on a heuristic instead: If an instruction would result in more than half of the available physical memory to be used in total by all media tracked by iOwO, throw the largest one onto the disk instead.

However, I do not think this is something we have to worry much about while prototyping, as long as we make it possible for elements to refer to media by &mut (which would allow mmapping files and re-using already allocated buffers) rather than forcing them to take ownership.

> At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is. > > I'll try to make the data types trait based as well, so it's technically not a pure image processing library. > > Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM *very* fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing [sic] per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result. > > Maybe a hybrid architecture? but then how do we determine when to do what? > > I'll try explicit instructions to handle this (which may also be inserted during build): > - memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably [sic] stored in-memory and if we run out of memory, we can maybe still switch to disk caching > - disk buffer: forcably [sic] caches the inputs to disk > > Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache. > > Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent [sic] runs of the program? The hybrid architecture sounds like the best option to me, since neither of the extremes is desirable, however I don't think always requiring explicit instructions for the user to decide is always desirable, since that'd imply a lot of micro-management for possibly rather simple pipelines. I'd suggest determining the caching behavior depending on a heuristic instead: If an instruction would result in more than half of the available physical memory to be used in total by all media tracked by iOwO, throw the largest one onto the disk instead. However, I do not think this is something we have to worry much about while prototyping, as long as we make it possible for elements to refer to media by `&mut` (which would allow mmapping files and re-using already allocated buffers) rather than forcing them to take ownership.
Collaborator

Non-content-related: I very much appreciate you experimenting around with possible architectures! It's most likely very tiring, but still very important for a nice API.

Non-content-related: I very much appreciate you experimenting around with possible architectures! It's most likely very tiring, but still very important for a nice API.
schrottkatze added 1 commit 2024-02-26 11:37:10 +00:00
schrottkatze added 2 commits 2024-02-27 12:14:07 +00:00
schrottkatze added 1 commit 2024-02-27 12:18:04 +00:00
This pull request has changes conflicting with the target branch.
  • Cargo.lock
  • Cargo.toml
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u proc-libs:schrottkatze-proc-libs
git checkout schrottkatze-proc-libs

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git checkout main
git merge --no-ff schrottkatze-proc-libs
git checkout schrottkatze-proc-libs
git rebase main
git checkout main
git merge --ff-only schrottkatze-proc-libs
git checkout schrottkatze-proc-libs
git rebase main
git checkout main
git merge --no-ff schrottkatze-proc-libs
git checkout main
git merge --squash schrottkatze-proc-libs
git checkout main
git merge --ff-only schrottkatze-proc-libs
git checkout main
git merge schrottkatze-proc-libs
git push origin main
Sign in to join this conversation.
No reviewers
No labels
proposal
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: katzen-cafe/iowo#12
No description provided.