WIP: image processing library (or libraries?) #12
Loading…
Reference in a new issue
No description provided.
Delete branch "schrottkatze/iowo:proc-libs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR is supposed to add the image processing library that is going to primarily be used in iOwO until there's time for more complex evaluation concepts.
The library is supposed to be able to be used outside of iOwO as well, roughly in that form (the compile-time checked form):
There's going to be 3 ways of using it, this is only some pseudocode I typed down.
1.: The "normal" way for external users:
- with typed high level pipelines
- pipelines can be nested and reused and passed to eachother?
2.: The low level way: This is also going to be internal to the normal way, basically manually inserting and calling check functions. Not statically typed, but useful for generating pipelines in runtime
3.: The oversimplified way: Not able to actually use pipelines, but for simple/single operations like inverting etc., without complexity for building and calling pipelines.
The way of using it above is just my first draft and quite flawed (and lacking versatility), the "actual" implementation is not going to be merged until I'm happy with it.
At this point I think the trait architecture is superior in extensibility, scalability and just by how much simpler and more readable it is.
I'll try to make the data types trait based as well, so it's technically not a pure image processing library.
Another thing I'd want to figure out before writing the final crate in the trait architecture is memory management: We can't just store every intermediate state in-memory, since in image processing if you have a few 4k images, that'd OOM very fast. Full disk caching is possible, but is slooooow and doesn't scale. Also, random files all over the place. Pure instruction pipelineing per pixel or for the entire image would lead to a lot of repeated evaluation of components with the same result.
Maybe a hybrid architecture? but then how do we determine when to do what?
I'll try explicit instructions to handle this (which may also be inserted during build):
Another idea would also be pipelines supporting incremental compilation, so it can remember previous inputs of subgraphs and then if it gets run with only the last node changed it tries to get the data from the cache.
Maybe an explicit option in the builder to cache inc. compilation in pipelines on disk in the background and try in-memory so it also can be cached over diffferent runs of the program?
@ -3,3 +3,3 @@
"crates/app",
"crates/eval",
"crates/ir",
"crates/ir", "crates/prowocessing",
Nit: Regarding formatting, you may want to split the newly added entry to its own line.
ah cargo keeps doing that and i keep forgetting when adding new crates lol
@ -25,2 +16,2 @@
let config = if let Some(config) = args.config_path {
Ok(config)
pub fn read(args: &CliConfigs) -> Self {
let config = if let Some(config) = &args.config_path {
Could use
Option::ok_or
withResult::or_else
(untested):Doesn't work since
find_config_file
returns an owned value and the other value is unowned, if i add.as_ref()
to thefind_config_file
call, it doesn't work either since then it's a temporary value.Then how about cloning it instead?
Works.
@ -12,0 +27,4 @@
},
Dev {
#[command(subcommand)]
dev_command: DevCommands,
Nit: Is there clarity gained in prefixing the variable name with
dev_
given it's always taken out of theDev
variant?nope, changing that
@ -27,0 +85,4 @@
let pipe = PipelineBuilder::new().add(1).stringify().build();
println!(
"{:?}",
pipe.run(vec![num0.into(), num1.into()].into()).into_inner()[0]
All of these dev commands seem like they'd belong into tests using
#[test]
instead.that's what i already mentioned, that i prefer playing around like that
sure, can do that, but i much prefer having them run like that and output there so i can see it more or less interactively.
planning to do that, but the
experimental
things are all termporary anyway.They're still just as interactive if one were to use
cargo test --no-capture <test_name>
.They're what this PR introduces and what you requested my review on.
They are indeed not what this PR introduces, those will not be in the PR once it's complete. this is basically just a working copy/scratchpad state for now, and i'll be purging it once the actual library is done which is then what will be introduced by this PR.
Fair enough.
@ -0,0 +1,7 @@
//! Definitions of the data transfer and storage types.
/// Types for element and pipeline IO
Given that you use a module comment at the very start of this file, it seems like these doc-comments would also belong into their respective modules.
This gives the reader added context when opening the respective module e.g. after following a type definition.
@ -0,0 +3,4 @@
/// Types for element and pipeline IO
pub mod io;
/// Raw data types contained in `io`
Could use an intralink.
@ -0,0 +4,4 @@
pub struct Inputs<'a>(Vec<Data<'a>>);
impl<'a> Inputs<'a> {
/// get inner value(s)
Quite ambiguous doc-comment also regarding the rather lengthy doc-comment on the type itself.
How about removing this method altogether and making the content of
Inputs
directly public,given that one's free to convert from/to it already?
@ -0,0 +24,4 @@
impl<'a> From<&'a Outputs> for Inputs<'a> {
fn from(value: &'a Outputs) -> Self {
Self(value.0.iter().map(std::convert::Into::into).collect())
Unnecessary full method path, consider just using
From::from
orInto::into
instead.ah yes, rust-analyzer loves completing full paths lol
@ -0,0 +28,4 @@
}
}
/// Newtype struct around `OwnedData` for pipeline/element outputs
I already see that this is a newtype around
OwnedData
, just re-stating it does not add semantical value. Rather, what would this struct be used for?@ -0,0 +33,4 @@
impl Outputs {
/// consume self and return inner value(s)
pub fn into_inner(self) -> Vec<OwnedData> {
Wait, why is
Outputs
allowed to be consumed for its inner content consumed whileInputs
doesn't?Inputs
only contains aVec
ofData
which either contains a string slice or an integer, which are really cheap to clone.OwnedData
would be much heavier to clone in this case.(I'm currently not happy how the IO of instructions works anyway, planning on reworking that to be more sensible, clear and flexible))
@ -0,0 +2,4 @@
/// Owned data type, for use mostly in outputs and storage
#[derive(Clone, Debug)]
pub enum OwnedData {
Given the common pattern of
AsRef
in the stdlib andInstructionRef
already being used in this project, how about renamingOwnedData
→Data
andData
below →DataRef
?did the renaming, gonna take care of the
AsRef
impl later (I remember trying that earlier and something causing problems, i forgot what that was though.)@ -0,0 +20,4 @@
}
/// Unowned data type, for inputs into runner functions
#[derive(Clone, Copy)]
Why does
OwnedData
implementDebug
whileData
doesn't?@ -0,0 +28,4 @@
impl Data<'_> {
/// converts itself to `OwnedData`
pub fn to_owned_data(&self) -> OwnedData {
Consider implementing
ToOwned
andAsRef
appropiately instead.OT: Wondering that clippy didn't scream about this. Interesting, since it has a lint for this, but apparently it didn't trigger since the name was (imo unnecessarily) suffixed with
_data
.I did that deliberately to not trigger the lint, but I'll fix that and properly do it after fixing your simpler reviews
As I've just realized because I was trying to implement
ToOwned
/AsRef
, it isn't even necessary to have a seperateDataRef
, so i removed that entirely and... who would've thought, it's much more readable and concise overall now@ -0,0 +5,4 @@
use super::data::io::Outputs;
pub(crate) trait PipelineElement {
/// return a static runner function pointer to avoid dynamic dispatch during pipeline execution - Types MUST match the signature
The condition might be more visible if it's put in an explicit
# Indirect panics
section or the like.probably gonna refactor this to not panic in the future and add proper handling (
Result
s lol)@ -0,0 +10,4 @@
};
/// Addition
pub struct Add(pub i32);
Why do
Add
,Subtract
,Concatenate
hold apub i32
if it's never used?because i wanted to do second params like that originally, then realizing it doesn't work that "easily". so basically, still WIP
@ -0,0 +68,4 @@
outputs: vec![DataType::String],
}
}
}
Both
ops::num
andops::str
could very much use the guard condition pattern with let-else instead, so that the actual intended logic of the runner function isn't nested one more.@ -0,0 +7,4 @@
/// TODO:
/// - Bind additional inputs if instruction has more then one and is passd without any additional
/// - allow binding to pointers to other pipelines?
/// - allow referencing earlier data
Those
TODO:
s seem like they should belong in an issue, so one canthis is heavily WIP, remember, these are experiments. There will not be a seperate PR until this is not only out of experimental state, but a functioning image processing library that has all that fixed.
So you'd rather do all the work of finding all
TODO:
s and converting them to issues (and then creating a PR removing them) after "finishing"prowocessing
?Also, that just addresses one point, while the other 3 are still standing.
The intention is to perfect the api beforehand, and tbh, you'll be the only other person reviewing stuff anyway. And wdym "editing them easily", tbh i find editing text in my editor much, much easier then working with a forgejo web ui...
and if they're done, i delete them.
Sounds fine to me.
@ -0,0 +90,4 @@
/// Runnable pipeline - at the core of this library
pub struct Pipeline {
runners: Vec<fn(&Inputs) -> Outputs>,
Also regarding the enum-based arch: Why the indirection of
fn(&Inputs) -> Outputs
? Why doesPipeline
not holdBox<dyn Element>
as well?Unless I misunderstood rusts
dyn
, this avoids the overhead of dynamic dispatch by saving static function pointers, which can just be called on the flyThey do save 1 indirection, but that is 1 indirection which hasn't even been benchmarked yet against
Debug
gingPipeline
if it were to implementDebug
, all one sees at the moment is a bunch of addresses of a function in hexadecimal, e.g.0x000056380fa85320
.fn runner()
as opposed to a directfn eval(&Inputs) -> Outputs
or the likeSidenote: If you care this much about indirection,
&Inputs
is actually&Vec<&Data>
, which are 2 indirections already before the element can access any contained data.i have an idea what one could do for that, I'll be implementing that soon-ish when i have the energy
the return types are defined as opaque types deliberately, since during actual execution the pipeline does not know those.
the only thing the pipeline is supposed to do, is to execute the runners in order and get data where it's supposed to go (which is, once again, an unsolved problem currently)
yes, but I don't think that's avoidable.
Does this idea imply a map of fn pointer to debug info? If so, I'm not sure if the size increase outweighs the 1 (still unbenchmarked) indirection.
That does not address the points you quote at all. At no point I questioned the fn pointers being opaque. You just re-stated what the pipeline should do, which I already know, without addressing what I listed.
It is. One could clone the vec on every instruction call.
EDIT: Thinking about it, actually the vec will need to be created for every instruction call anyway, since in a graph, multiple instructions may be source for an instruction.
DataRef
a3e357a0e7The hybrid architecture sounds like the best option to me, since neither of the extremes is desirable, however I don't think always requiring explicit instructions for the user to decide is always desirable, since that'd imply a lot of micro-management for possibly rather simple pipelines.
I'd suggest determining the caching behavior depending on a heuristic instead: If an instruction would result in more than half of the available physical memory to be used in total by all media tracked by iOwO, throw the largest one onto the disk instead.
However, I do not think this is something we have to worry much about while prototyping, as long as we make it possible for elements to refer to media by
&mut
(which would allow mmapping files and re-using already allocated buffers) rather than forcing them to take ownership.Non-content-related: I very much appreciate you experimenting around with possible architectures! It's most likely very tiring, but still very important for a nice API.
dyn
and pipelines #13View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.