WIP: image processing library (or libraries?) #12

schrottkatze · 2024-02-05T13:31:25Z

schrottkatze commented

2024-02-05 13:31:25 +00:00

This PR is supposed to add the image processing library that is going to primarily be used in iOwO until there's time for more complex evaluation concepts.

The library is supposed to be able to be used outside of iOwO as well, roughly in that form (the compile-time checked form):

fn some_pl() {
    let mut pipeline_builder = Image::read("foo.png")
        .blend(Image::read("bar.jpg"), Mode::Normal, 0.5)
        .scale_resize(0.5)
        .write("out.jpg");

    let pipeline = pipeline_builder.build();

    pipeline.run().unwrap();
}

There's going to be 3 ways of using it, this is only some pseudocode I typed down.

1.: The "normal" way for external users:
- with typed high level pipelines
- pipelines can be nested and reused and passed to eachother?
2.: The low level way: This is also going to be internal to the normal way, basically manually inserting and calling check functions. Not statically typed, but useful for generating pipelines in runtime
3.: The oversimplified way: Not able to actually use pipelines, but for simple/single operations like inverting etc., without complexity for building and calling pipelines.

The way of using it above is just my first draft and quite flawed (and lacking versatility), the "actual" implementation is not going to be merged until I'm happy with it.

This PR is supposed to add the image processing library that is going to primarily be used in iOwO until there's time for more complex evaluation concepts. The library is supposed to be able to be used outside of iOwO as well, roughly in that form (the compile-time checked form): ```rs fn some_pl() { let mut pipeline_builder = Image::read("foo.png") .blend(Image::read("bar.jpg"), Mode::Normal, 0.5) .scale_resize(0.5) .write("out.jpg"); let pipeline = pipeline_builder.build(); pipeline.run().unwrap(); } ``` There's going to be 3 ways of using it, this is only some pseudocode I typed down. 1.: The "normal" way for external users: - with typed high level pipelines - pipelines can be nested and reused and passed to eachother? 2.: The low level way: This is also going to be internal to the normal way, basically manually inserting and calling check functions. Not statically typed, but useful for generating pipelines in runtime 3.: The oversimplified way: Not able to actually use pipelines, but for simple/single operations like inverting etc., without complexity for building and calling pipelines. The way of using it above is just my first draft and quite flawed (and lacking versatility), the "actual" implementation is not going to be merged until I'm happy with it.

schrottkatze added 1 commit 2024-02-05 13:31:25 +00:00

processing library: init 1d85252523

schrottkatze added 1 commit 2024-02-19 11:56:53 +00:00

experimentation: write experiment for enum architecture f86bd52b92

schrottkatze added 1 commit 2024-02-19 16:57:48 +00:00

cli: add subcommand support 33a1d85318

schrottkatze added 1 commit 2024-02-19 17:07:14 +00:00

cli: add dev command for enums experiment 3af262258e

schrottkatze added 2 commits 2024-02-19 19:54:30 +00:00

prowocessing: extract experiment into its own file 78d800b461

prowocessing: add trait based experiment 2d537cc4ee

schrottkatze requested review from multisamplednight 2024-02-19 19:57:30 +00:00

schrottkatze added 1 commit 2024-02-21 12:11:43 +00:00

prowocessing: refactor trait based experiment to individual files 98f6af78be

schrottkatze commented

2024-02-21 12:39:16 +00:00

At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is.

I'll try to make the data types trait based as well, so it's technically not a pure image processing library.

Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM very fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result.

Maybe a hybrid architecture? but then how do we determine when to do what?

I'll try explicit instructions to handle this (which may also be inserted during build):

memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably stored in-memory and if we run out of memory, we can maybe still switch to disk caching
disk buffer: forcably caches the inputs to disk

Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache.

Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent runs of the program?

At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is. I'll try to make the data types trait based as well, so it's technically not a pure image processing library. Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM *very* fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result. Maybe a hybrid architecture? but then how do we determine when to do what? I'll try explicit instructions to handle this (which may also be inserted during build): - memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably stored in-memory and if we run out of memory, we can maybe still switch to disk caching - disk buffer: forcably caches the inputs to disk Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache. Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent runs of the program?

schrottkatze added 1 commit 2024-02-21 13:25:02 +00:00

prowocessing: add documentation of trait experiment 0ebfed66ed

multisamplednight requested changes 2024-02-22 17:53:22 +00:00

Cargo.toml Outdated

					
				@ -3,3 +3,3 @@

				  "crates/app",

				  "crates/eval",

				  "crates/ir",

				  "crates/ir", "crates/prowocessing",

multisamplednight commented

2024-02-22 17:21:20 +00:00

Nit: Regarding formatting, you may want to split the newly added entry to its own line.

schrottkatze commented

2024-02-23 09:22:34 +00:00

ah cargo keeps doing that and i keep forgetting when adding new crates lol

schrottkatze marked this conversation as resolved

crates/app/src/config.rs Outdated

					
				@ -25,2 +16,2 @@

				        let config = if let Some(config) = args.config_path {

				            Ok(config)

				    pub fn read(args: &CliConfigs) -> Self {

				        let config = if let Some(config) = &args.config_path {

multisamplednight commented

2024-02-22 17:24:33 +00:00

Could use Option::ok_or with Result::or_else (untested):

let config = args.config_path.as_ref().ok_or(()).or_else(|_| find_config_file());

Could use `Option::ok_or` with `Result::or_else` (untested): ```rust let config = args.config_path.as_ref().ok_or(()).or_else(|_| find_config_file()); ```

schrottkatze commented

2024-02-23 09:29:49 +00:00

Doesn't work since find_config_file returns an owned value and the other value is unowned, if i add .as_ref() to the find_config_file call, it doesn't work either since then it's a temporary value.

Doesn't work since `find_config_file` returns an owned value and the other value is unowned, if i add `.as_ref()` to the `find_config_file` call, it doesn't work either since then it's a temporary value.

multisamplednight commented

2024-02-23 14:17:46 +00:00

Then how about cloning it instead?

let config = args.config_path.clone().ok_or(()).or_else(|_| find_config_file());

Then how about cloning it instead? ```rust let config = args.config_path.clone().ok_or(()).or_else(|_| find_config_file()); ```

schrottkatze commented

2024-02-27 12:18:42 +00:00

Works.

schrottkatze marked this conversation as resolved

crates/app/src/main.rs Outdated

					
				@ -12,0 +27,4 @@

				    },

				    Dev {

				        #[command(subcommand)]

				        dev_command: DevCommands,

multisamplednight commented

2024-02-22 17:25:49 +00:00

Nit: Is there clarity gained in prefixing the variable name with dev_ given it's always taken out of the Dev variant?

Nit: Is there clarity gained in prefixing the variable name with `dev_` given it's always taken out of the `Dev` variant?

schrottkatze commented

2024-02-23 09:32:38 +00:00

nope, changing that

schrottkatze marked this conversation as resolved

crates/app/src/main.rs Outdated

					
				@ -27,0 +85,4 @@

				                    let pipe = PipelineBuilder::new().add(1).stringify().build();

				                    println!(

				                        "{:?}",

				                        pipe.run(vec![num0.into(), num1.into()].into()).into_inner()[0]

multisamplednight commented

2024-02-22 17:28:11 +00:00

All of these dev commands seem like they'd belong into tests using #[test] instead.

All of these dev commands seem like they'd belong into tests using `#[test]` instead.

schrottkatze commented

2024-02-23 09:32:03 +00:00

that's what i already mentioned, that i prefer playing around like that

sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively.

planning to do that, but the experimental things are all termporary anyway.

that's what i already mentioned, that i prefer playing around like that sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively. planning to do that, but the `experimental` things are all termporary anyway.

multisamplednight commented

2024-02-23 14:21:15 +00:00

sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively.

They're still just as interactive if one were to use cargo test --no-capture <test_name>.

planning to do that, but the experimental things are all termporary [sic] anyway.

They're what this PR introduces and what you requested my review on.

> sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively. They're still just as interactive if one were to use `cargo test --no-capture <test_name>`. > planning to do that, but the `experimental` things are all termporary [sic] anyway. They're what this PR introduces and what you requested my review on.

schrottkatze commented

2024-02-23 19:28:31 +00:00

They are indeed not what this PR introduces, those will not be in the PR once it's complete. this is basically just a working copy/scratchpad state for now, and i'll be purging it once the actual library is done which is then what will be introduced by this PR.

multisamplednight commented

2024-02-24 22:26:21 +00:00

Fair enough.

multisamplednight marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data.rs Outdated

					
				@ -0,0 +1,7 @@

				//! Definitions of the data transfer and storage types.

				/// Types for element and pipeline IO

multisamplednight commented

2024-02-22 17:30:59 +00:00

Given that you use a module comment at the very start of this file, it seems like these doc-comments would also belong into their respective modules.
This gives the reader added context when opening the respective module e.g. after following a type definition.

Given that you use a module comment at the very start of this file, it seems like these doc-comments would also belong into their respective modules. This gives the reader added context when opening the respective module e.g. after following a type definition.

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data.rs Outdated

					
				@ -0,0 +3,4 @@

				/// Types for element and pipeline IO

				pub mod io;

				/// Raw data types contained in `io`

multisamplednight commented

2024-02-22 17:31:39 +00:00

Could use an intralink.

-/// Raw data types contained in `io`
+/// Raw data types contained in [`io`]

Could use an intralink. ```diff -/// Raw data types contained in `io` +/// Raw data types contained in [`io`] ```

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/io.rs

					
				@ -0,0 +4,4 @@

				pub struct Inputs<'a>(Vec<Data<'a>>);

				impl<'a> Inputs<'a> {

				    /// get inner value(s)

multisamplednight commented

2024-02-22 17:35:57 +00:00

Quite ambiguous doc-comment also regarding the rather lengthy doc-comment on the type itself.
How about removing this method altogether and making the content of Inputs directly public,
given that one's free to convert from/to it already?

Quite ambiguous doc-comment also regarding the rather lengthy doc-comment on the type itself. How about removing this method altogether and making the content of `Inputs` directly public, given that one's free to convert from/to it already?

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/io.rs

					
				@ -0,0 +24,4 @@

				impl<'a> From<&'a Outputs> for Inputs<'a> {

				    fn from(value: &'a Outputs) -> Self {

				        Self(value.0.iter().map(std::convert::Into::into).collect())

multisamplednight commented

2024-02-22 17:37:23 +00:00

Unnecessary full method path, consider just using From::from or Into::into instead.

Unnecessary full method path, consider just using `From::from` or `Into::into` instead.

schrottkatze commented

2024-02-23 09:37:03 +00:00

ah yes, rust-analyzer loves completing full paths lol

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/io.rs Outdated

					
				@ -0,0 +28,4 @@

				    }

				}

				/// Newtype struct around `OwnedData` for pipeline/element outputs

multisamplednight commented

2024-02-22 17:36:44 +00:00

I already see that this is a newtype around OwnedData, just re-stating it does not add semantical value. Rather, what would this struct be used for?

I already see that this is a newtype around `OwnedData`, just re-stating it does not add semantical value. Rather, what would this struct be used for?

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/io.rs

					
				@ -0,0 +33,4 @@

				impl Outputs {

				    /// consume self and return inner value(s)

				    pub fn into_inner(self) -> Vec<OwnedData> {

multisamplednight commented

2024-02-22 17:38:15 +00:00

Wait, why is Outputs allowed to be consumed for its inner content consumed while Inputs doesn't?

Wait, why is `Outputs` allowed to be consumed for its inner content consumed while `Inputs` doesn't?

schrottkatze commented

2024-02-23 09:41:21 +00:00

Inputs only contains a Vec of Data which either contains a string slice or an integer, which are really cheap to clone. OwnedData would be much heavier to clone in this case.

(I'm currently not happy how the IO of instructions works anyway, planning on reworking that to be more sensible, clear and flexible))

`Inputs` only contains a `Vec` of `Data` which either contains a string slice or an integer, which are really cheap to clone. `OwnedData` would be much heavier to clone in this case. (I'm currently not happy how the IO of instructions works anyway, planning on reworking that to be more sensible, clear and flexible))

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/raw.rs Outdated

					
				@ -0,0 +2,4 @@

				/// Owned data type, for use mostly in outputs and storage

				#[derive(Clone, Debug)]

				pub enum OwnedData {

multisamplednight commented

2024-02-22 17:40:30 +00:00

Given the common pattern of AsRef in the stdlib and InstructionRef already being used in this project, how about renaming OwnedData → Data and Data below → DataRef?

Given the common pattern of `AsRef` in the stdlib and `InstructionRef` already being used in this project, how about renaming `OwnedData` → `Data` and `Data` below → `DataRef`?

schrottkatze commented

2024-02-23 09:43:36 +00:00

did the renaming, gonna take care of the AsRef impl later (I remember trying that earlier and something causing problems, i forgot what that was though.)

did the renaming, gonna take care of the `AsRef` impl later (I remember trying that earlier and something causing problems, i forgot what that was though.)

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/raw.rs Outdated

					
				@ -0,0 +20,4 @@

				}

				/// Unowned data type, for inputs into runner functions

				#[derive(Clone, Copy)]

multisamplednight commented

2024-02-22 17:41:07 +00:00

Why does OwnedData implement Debug while Data doesn't?

Why does `OwnedData` implement `Debug` while `Data` doesn't?

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/data/raw.rs Outdated

					
				@ -0,0 +28,4 @@

				impl Data<'_> {

				    /// converts itself to `OwnedData`

				    pub fn to_owned_data(&self) -> OwnedData {

multisamplednight commented

2024-02-22 17:45:19 +00:00

Consider implementing ToOwned and AsRef appropiately instead.

OT: Wondering that clippy didn't scream about this. Interesting, since it has a lint for this, but apparently it didn't trigger since the name was (imo unnecessarily) suffixed with _data.

Consider implementing [`ToOwned`] and [`AsRef`] appropiately instead. OT: Wondering that clippy didn't scream about this. Interesting, since [it has a lint for this](https://rust-lang.github.io/rust-clippy/master/index.html#/should_implement_trait), but apparently it didn't trigger since the name was (imo unnecessarily) suffixed with `_data`. [`ToOwned`]: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html [`AsRef`]: https://doc.rust-lang.org/std/convert/trait.AsRef.html

schrottkatze commented

2024-02-23 09:45:34 +00:00

I did that deliberately to not trigger the lint, but I'll fix that and properly do it after fixing your simpler reviews

schrottkatze commented

2024-02-23 11:48:11 +00:00

As I've just realized because I was trying to implement ToOwned/AsRef, it isn't even necessary to have a seperate DataRef, so i removed that entirely and... who would've thought, it's much more readable and concise overall now

As I've just realized because I was trying to implement `ToOwned`/`AsRef`, it isn't even necessary to have a seperate `DataRef`, so i removed that entirely and... who would've thought, it's much more readable and concise overall now

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/element.rs Outdated

					
				@ -0,0 +5,4 @@

				use super::data::io::Outputs;

				pub(crate) trait PipelineElement {

				    /// return a static runner function pointer to avoid dynamic dispatch during pipeline execution - Types MUST match the signature

multisamplednight commented

2024-02-22 17:48:09 +00:00

The condition might be more visible if it's put in an explicit # Indirect panics section or the like.

The condition might be more visible if it's put in an explicit `# Indirect panics` section or the like.

schrottkatze commented

2024-02-23 09:47:37 +00:00

probably gonna refactor this to not panic in the future and add proper handling (Results lol)

probably gonna refactor this to not panic in the future and add proper handling (`Result`s lol)

crates/prowocessing/src/experimental/trait_based/ops/num.rs Outdated

					
				@ -0,0 +10,4 @@

				};

				/// Addition

				pub struct Add(pub i32);

multisamplednight commented

2024-02-22 17:47:20 +00:00

Why do Add, Subtract, Concatenate hold a pub i32 if it's never used?

Why do `Add`, `Subtract`, `Concatenate` hold a `pub i32` if it's never used?

schrottkatze commented

2024-02-23 10:54:30 +00:00

because i wanted to do second params like that originally, then realizing it doesn't work that "easily". so basically, still WIP

crates/prowocessing/src/experimental/trait_based/ops/str.rs Outdated

					
				@ -0,0 +68,4 @@

				            outputs: vec![DataType::String],

				        }

				    }

				}

multisamplednight commented

2024-02-22 17:50:36 +00:00

Both ops::num and ops::str could very much use the guard condition pattern with let-else instead, so that the actual intended logic of the runner function isn't nested one more.

Both `ops::num` and `ops::str` could very much use the guard condition pattern with [let-else](https://doc.rust-lang.org/stable/rust-by-example/flow_control/let_else.html) instead, so that the actual intended logic of the runner function isn't nested one more.

schrottkatze marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/pipeline.rs

					
				@ -0,0 +7,4 @@

				/// TODO:

				/// - Bind additional inputs if instruction has more then one and is passd without any additional

				/// - allow binding to pointers to other pipelines?

				/// - allow referencing earlier data

multisamplednight commented

2024-02-22 17:51:56 +00:00

Those TODO:s seem like they should belong in an issue, so one can

discuss on them
selectively mark them as done/undone
edit them easily
link them from a respective PR

Those `TODO:`s seem like they should belong in an issue, so one can - discuss on them - selectively mark them as done/undone - edit them easily - link them from a respective PR

schrottkatze commented

2024-02-23 10:56:30 +00:00

this is heavily WIP, remember, these are experiments. There will not be a seperate PR until this is not only out of experimental state, but a functioning image processing library that has all that fixed.

multisamplednight commented

2024-02-23 14:29:18 +00:00

So you'd rather do all the work of finding all TODO:s and converting them to issues (and then creating a PR removing them) after "finishing" prowocessing?

So you'd rather do all the work of finding all `TODO:`s and converting them to issues (and then creating a PR removing them) after "finishing" `prowocessing`?

multisamplednight commented

2024-02-23 14:29:41 +00:00

Also, that just addresses one point, while the other 3 are still standing.

schrottkatze commented

2024-02-23 19:21:28 +00:00

The intention is to perfect the api beforehand, and tbh, you'll be the only other person reviewing stuff anyway. And wdym "editing them easily", tbh i find editing text in my editor much, much easier then working with a forgejo web ui...

and if they're done, i delete them.

The intention is to perfect the api beforehand, and tbh, you'll be the only other person reviewing stuff anyway. And wdym "editing them easily", tbh i find editing text in my editor much, *much* easier then working with a forgejo web ui... and if they're done, i delete them.

multisamplednight commented

2024-02-24 22:25:29 +00:00

Sounds fine to me.

multisamplednight marked this conversation as resolved

crates/prowocessing/src/experimental/trait_based/pipeline.rs

					
				@ -0,0 +90,4 @@

				/// Runnable pipeline - at the core of this library

				pub struct Pipeline {

				    runners: Vec<fn(&Inputs) -> Outputs>,

multisamplednight commented

2024-02-22 17:53:08 +00:00

Also regarding the enum-based arch: Why the indirection of fn(&Inputs) -> Outputs? Why does Pipeline not hold Box<dyn Element> as well?

Also regarding the enum-based arch: Why the indirection of `fn(&Inputs) -> Outputs`? Why does `Pipeline` not hold `Box<dyn Element>` as well?

schrottkatze commented

2024-02-23 11:02:34 +00:00

Unless I misunderstood rusts dyn, this avoids the overhead of dynamic dispatch by saving static function pointers, which can just be called on the fly

Unless I misunderstood rusts `dyn`, this avoids the overhead of dynamic dispatch by saving static function pointers, which can just be called on the fly

multisamplednight commented

2024-02-23 14:48:30 +00:00

They do save 1 indirection, but that is 1 indirection which hasn't even been benchmarked yet against

worse debuggability
- when Debugging Pipeline if it were to implement Debug, all one sees at the moment is a bunch of addresses of a function in hexadecimal, e.g. 0x000056380fa85320.
when writing fn runner() as opposed to a direct fn eval(&Inputs) -> Outputs or the like
1. an extra indent for the actual logic
2. extra noise for defining the return type and the returned closure

Sidenote: If you care this much about indirection, &Inputs is actually &Vec<&Data>, which are 2 indirections already before the element can access any contained data.

They do save 1 indirection, but that is 1 indirection which hasn't even been benchmarked yet against 1. worse debuggability - when `Debug`ging `Pipeline` if it were to implement `Debug`, all one sees at the moment is a bunch of addresses of a function in hexadecimal, e.g. `0x000056380fa85320`. 2. when writing `fn runner()` as opposed to a direct `fn eval(&Inputs) -> Outputs` or the like 1. an extra indent for the actual logic 2. extra noise for defining the return type and the returned closure Sidenote: If you care this much about indirection, `&Inputs` is actually `&Vec<&Data>`, which are 2 indirections already before the element can access any contained data.

schrottkatze commented

2024-02-23 19:26:35 +00:00

worse debuggability

i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy

when writing fn runner() as opposed to a direct fn eval(&Inputs) -> Outputs or the like

an extra indent for the actual logic

extra noise for defining the return type and the returned closure

the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those.
the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently)

Sidenote: If you care this much about indirection, &Inputs is actually &Vec<&Data>, which are 2 indirections already before the element can access any contained data.

yes, but I don't think that's avoidable.

> 1. worse debuggability i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy > 1. when writing fn runner() as opposed to a direct fn eval(&Inputs) -> Outputs or the like > 1. an extra indent for the actual logic > 2. extra noise for defining the return type and the returned closure the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those. the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently) > Sidenote: If you care this much about indirection, &Inputs is actually &Vec<&Data>, which are 2 indirections already before the element can access any contained data. yes, but I don't think that's avoidable.

multisamplednight commented

2024-02-24 22:23:29 +00:00

i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy

Does this idea imply a map of fn pointer to debug info? If so, I'm not sure if the size increase outweighs the 1 (still unbenchmarked) indirection.

the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those.
the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently)

That does not address the points you quote at all. At no point I questioned the fn pointers being opaque. You just re-stated what the pipeline should do, which I already know, without addressing what I listed.

> i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy Does this idea imply a map of fn pointer to debug info? If so, I'm not sure if the size increase outweighs the 1 (still unbenchmarked) indirection. > the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those. > the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently) That does not address the points you quote at all. At no point I questioned the fn pointers being opaque. You just re-stated what the pipeline should do, which I already know, without addressing what I listed.

multisamplednight commented

2024-02-24 23:06:50 +00:00

yes, but I don't think that's avoidable.

It is. One could clone the vec on every instruction call.

EDIT: Thinking about it, actually the vec will need to be created for every instruction call anyway, since in a graph, multiple instructions may be source for an instruction.

> yes, but I don't think that's avoidable. It is. One could clone the vec on every instruction call. EDIT: Thinking about it, actually the vec will need to be created for every instruction call anyway, since in a graph, multiple instructions may be source for an instruction.

schrottkatze added 3 commits 2024-02-23 11:46:44 +00:00

prowocessing: apply most basic reviews ddc1f935c6

prowocessing: let-else refatoring as according to review c900a0f4b3

simplified by entirely removing DataRef a3e357a0e7

multisamplednight commented

2024-02-23 14:57:07 +00:00

At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is.

I'll try to make the data types trait based as well, so it's technically not a pure image processing library.

Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM very fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing [sic] per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result.

Maybe a hybrid architecture? but then how do we determine when to do what?

I'll try explicit instructions to handle this (which may also be inserted during build):

memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably [sic] stored in-memory and if we run out of memory, we can maybe still switch to disk caching

disk buffer: forcably [sic] caches the inputs to disk

Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache.

Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent [sic] runs of the program?

The hybrid architecture sounds like the best option to me, since neither of the extremes is desirable, however I don't think always requiring explicit instructions for the user to decide is always desirable, since that'd imply a lot of micro-management for possibly rather simple pipelines.

I'd suggest determining the caching behavior depending on a heuristic instead: If an instruction would result in more than half of the available physical memory to be used in total by all media tracked by iOwO, throw the largest one onto the disk instead.

However, I do not think this is something we have to worry much about while prototyping, as long as we make it possible for elements to refer to media by &mut (which would allow mmapping files and re-using already allocated buffers) rather than forcing them to take ownership.

> At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is. > > I'll try to make the data types trait based as well, so it's technically not a pure image processing library. > > Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM *very* fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing [sic] per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result. > > Maybe a hybrid architecture? but then how do we determine when to do what? > > I'll try explicit instructions to handle this (which may also be inserted during build): > - memory buffer: the previous instructions get pipelined or whatever (depending on what's ahead), and the result is forcably [sic] stored in-memory and if we run out of memory, we can maybe still switch to disk caching > - disk buffer: forcably [sic] caches the inputs to disk > > Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache. > > Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent [sic] runs of the program? The hybrid architecture sounds like the best option to me, since neither of the extremes is desirable, however I don't think always requiring explicit instructions for the user to decide is always desirable, since that'd imply a lot of micro-management for possibly rather simple pipelines. I'd suggest determining the caching behavior depending on a heuristic instead: If an instruction would result in more than half of the available physical memory to be used in total by all media tracked by iOwO, throw the largest one onto the disk instead. However, I do not think this is something we have to worry much about while prototyping, as long as we make it possible for elements to refer to media by `&mut` (which would allow mmapping files and re-using already allocated buffers) rather than forcing them to take ownership.

multisamplednight commented

2024-02-23 15:28:39 +00:00

Non-content-related: I very much appreciate you experimenting around with possible architectures! It's most likely very tiring, but still very important for a nice API.

❤️ 1

schrottkatze added 1 commit 2024-02-26 11:37:10 +00:00

experimentation: use dynamic type ids for signatures and add qol macro 4bc84451e0

multisamplednight referenced this pull request

2024-02-26 18:16:21 +00:00

dyn and pipelines #13

schrottkatze added 2 commits 2024-02-27 12:14:07 +00:00

experimentation: implement some basic traits for io and data types 062138337c

app, prowocessing: move dev commands to tests ecc466d01d

schrottkatze added 1 commit 2024-02-27 12:18:04 +00:00

app: apply review 6d318aa7c9

This pull request has changes conflicting with the target branch.

Cargo.lock
Cargo.toml

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u proc-libs:schrottkatze-proc-libs

git checkout schrottkatze-proc-libs

Merge

Merge the changes and update on Forgejo.

Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.

git checkout main

git merge --no-ff schrottkatze-proc-libs

git checkout schrottkatze-proc-libs

git rebase main

git checkout main

git merge --ff-only schrottkatze-proc-libs

git checkout schrottkatze-proc-libs

git rebase main

git checkout main

git merge --no-ff schrottkatze-proc-libs

git checkout main

git merge --squash schrottkatze-proc-libs

git checkout main

git merge --ff-only schrottkatze-proc-libs

git checkout main

git merge schrottkatze-proc-libs

git push origin main

Sign in to join this conversation.

No reviewers

No labels

proposal

No milestone

No project

No assignees

2 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: katzen-cafe/iowo#12