New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New cache layout broke VS code plugin #4069
Comments
What if there was a manifest in {
"https://deno.land/foo/bar.ts": "https/deno.land/11d2ae31f7817c87b53893788205d431da511568dcd7bb9c363c6e52fb21a7fd"
} Potentially more data could be cached in the key, like if there were redirects from the requested URL to the final one. |
Using manifest file was my initial thought, but we decided to skip it because URLs are consistently hashed. Adding such file, should not be a big problem though. Waiting for @axetroy to let us know what are requirements for extension to make it work |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@axetroy I secondary benefit of the hashed files was that it would fix the collision between importing 'example.com/mymodule' and 'example.com/mymodule/mod.ts', because before this would not work because 'directory could not be created because a file with name mymodule already exists'. I also think creating huge manifest files will create some perf issues, but I think currently they are the best idea due to all of the benefits that the hashed file names provide. What if there was one manifest per origin? |
I'm not sure it would be that horrible versus the current situation. We already write out headers and other stuff, and we don't expire stuff in the cache, so if we are intelligent about it, we wouldn't even touch the file unless we are actually writing out something to the cache. Also Deno shouldn't use it internally as an index or anything, it simply should write stuff out. If someone deletes only a part of a cache but doesn't delete the manifest file, that is there problem. |
This will not work, because of the collisions @lucacasonato mentioned.
I'm not sure I understand this argument - contents of the file are not changed; it's just a filename in
Agreed; providing a "manifest" file (which is basically mapping between |
I don't think they are collisions in the @axetroy proposition the hash is calculate with the part of the last path : after the last / => filename. @lucacasonato exemple :
the part after # is ignored because is a fragment of page Sorry for my bad english ;) |
Deno users should not care about $ DENO_DIR However, developers of third-party tools may need, eg, development of an
This is probably the simplest solution and does not break the current cache layout. But Why the new cache layout broke vscode extension?
In older cache layoutTake the module Remote File: https://deno.land/std/path/mod.ts if we open the cache file in editor you will see // Copyright the Browserify authors. MIT License.
// Ported from https://github.com/browserify/path-browserify/
import * as _win32 from "./win32.ts";
import * as _posix from "./posix.ts";
import { isWindows } from "./constants.ts";
const path = isWindows ? _win32 : _posix;
export const win32 = _win32;
export const posix = _posix;
export const resolve = path.resolve;
export const normalize = path.normalize;
export const isAbsolute = path.isAbsolute;
export const join = path.join;
export const relative = path.relative;
export const toNamespacedPath = path.toNamespacedPath;
export const dirname = path.dirname;
export const basename = path.basename;
export const extname = path.extname;
export const format = path.format;
export const parse = path.parse;
export const sep = path.sep;
export const delimiter = path.delimiter;
export { EOL, SEP, SEP_PATTERN, isWindows } from "./constants.ts";
export * from "./interface.ts";
export * from "./glob.ts";
export * from "./globrex.ts"; Q: So how does the extension find the we can get the current cache file's url( so. The module Everything is alright. In new cache layoutRemote File: https://deno.land/std/path/mod.ts Then in file
hash4win32 = SHA256('/std/path/win32.ts') As long as we can calculate the Unfortunately, we don’t know what the current URL. So we need a mapping to get url: Here we get Finally, we can calculate |
@xpyxel But in the Typescript plugin, it does need to be read frequently Every time the file changes\recompilation in editor, needs to read-> serialize. |
Yeah I realized that after I commented, sorry about that lol @axetroy |
Hi, this is the third time that I'm being mentioned instead of someone else on this project but for the third time THANK YOU FOR ALL THE HARD WORK ON DENO ❤️ |
@IllusionPerdu thanks for the example, but the problem is you used URLs which all could be valid filenames. Take for example |
@axetroy thanks for explanation, now I see what's the problem clearly. I don't have a good solution off top of my head right now besides the manifest/mapping file. I understand your concerns about serialization performance; however maybe we should start with this solution as it's pretty simple and see if it's really too slow? |
@bartlomieju And, can you print what the manifest/mapping file looks like if use this solution? |
@axetroy I think we can do it per origin (so per directory in cache)
Would that work? |
@bartlomieju it works and look good for me. |
Good, I'll get to implementing it shortly |
You would have to add a command in Deno to check the consistency between the cache files and the manifest because for some reason someone can manually delete files in the cache or edit the manifest. For security, the manifest could by the same designation define an imprint of the files.
|
I'm worried about the speed of manifest.json when it's filled with many entries. Reading and writing to the cache becomes O(N) where it was previously O(1). |
Would a distributed manifest be a problem. We already write out the headers, so write out in some sort of meta data file the original URL? That would put the work on the plugin to go and collect all the meta-data files to create a map, and what the FS events on the DENO_DIR for any new files and add them to the cache. For Deno it would be still be O(1). |
@kitsonk By distributed manifest you mean having a manifest per directory? I guess that would work... but I'm still wondering if we can't avoid the manifest entirely. @axetroy I'm a unclear why the VS Code plugin cannot call "deno info" ? We can modify "deno info" to fit your needs. It would be useful if the DENO_DIR was a private structure - that would allow us to do refactors more easily. |
@ry Yes, I agree to privatize This way, even if the cache layout changes again, third-party tools will not need to change If we no longer depend on $DENO_DIR Then eg. $ deno info $DENO_DIR/https/example.com/[hash]
# output
https://example.com/demo/mod.ts
$ deno info https://example.com/demo/mod.ts
# output
$DENO_DIR/https/example.com/[hash] |
Commands can be called, but synchronously This will cause the module parsing to be very slow, as it needs to frequently call |
@axetroy how about we add an option to “deno info” so it will not download nor compile but only look in the cache? We can also add an option for JSON output. Could you provide a link to where in the plugin “deno info” would be called? |
SGTM In fact, I have already implemented the solution of here is how extension found Deno module. But I rethought a while andI think |
It seems that @ry is disagree with the So, If we use the # locate from filepath
deno info $DENO_DIR/deps/https/deno.land/[hash] --cache-only --output=json
# locate from URL
deno info https://deno.land/std/path/mod.ts --cache-only --output=json Stdout output: {
"filepath": "$DENO_DIR/deps/https/deno.land/[hash]",
"url": "https://deno.land/std/path/mod.ts",
"type": "TypeScript"
} miscIf you use this approach, in TypeScript, you really need to call So I hope there are flags to make the json as short as possible eg. module's dependency tree is optional. There are some modules(https://dev.jspm.io/jsdom) with a lot of dependencies and it is not necessary. |
@axetroy yes, I discussed this issue with @ry There's one thing I am not sure about:
Above example is self-explanatory and easy to implement; what about the one below:
Since we want to privatize That's also ref for #4140 |
But what I really need is map Because in actual use case, I open the If without additional information, it seems that Deno cannot infer what its URL is from |
This might be stupid, but why do we actually hash the file names in the first place? To make sure we have a valid set of characters for the filesystem right? Not for security. Couldn't we just use base64 encoding or a fast terrible encryption algorithm? The name doesn't need to be one way hashed because it needs to be secure - it just needs to be rewritten to valid filesystem chars. base64 is reversible so that would fix the issue right? |
That's right
Now that I think about it, it might actually work, but it would rather be Base32 - that way we'd handle both case-sensitive and case-insensitive file systems. EDIT: There might be the issue of filename length though. |
You're right - filename length could be an issue (base32 will increase file length by an average of 60%). A solution could be to split the base32 encoded strings into 255 char segments (this would fit 99% of filesystems), and then have a folder structure like this: Stitching these back together into a single string or splitting them apart is trivial. |
Just to point out, base32 or base64 do seem a bit of overkill... they are designed to encode arbitrary data. URLs can only contain a certain set of characters. Though I am not aware of an encoding that is file safe that only deals with encoding valid URL characters. The following text:
Encodes in base-32 as:
Also, from an encoding perspective, domains are case-insensitive, while that path and query are case sensitive. |
Why not try Punycode? The example string you used above: would result in the following in punycode: compared to the Base32 example you provided, it is 73 characters less. |
@xpyxel how were you encoding it via punycode? Punycode usually is for taking non-ASCII UNICODE chars and encoding them in a URL. When I do a punycode encoding of example |
@kitsonk seems it was just a bug in the website i was using to test different encodings. |
Why not use something like https://github.com/mit-pdos/noria (Rust) That should be more performant than constantly parsing, stringifying, and writing JSON to disk. |
Still, a descriptive filename should be considered a priority. Even with a language server, opening files which don't have the expected filename is not ideal, especially in other editors. At the very least, it should be:
instead of
There are only a few characters which are invalid in filenames:
I would think a better approach would be to replace those characters and then always add a file extension for http/https imports. So |
@brandonkal I think it is unavoidable once you consider query strings would also make unique resources in the cache. I also think you are missing |
You also have to consider:
So, although I do understand where the desire to have recognizable file names comes from, it is not as straightforward as you make it sound -- there are many more gotchas. As a compromise I could see a solution where we strip clusters of undesirable characters from the URL and replace them by an underscore, strip the |
I'm not sure why query strings changes things unless the old behaviour was to strip them off before creating a file path. If you replace As far as I am aware, '&' is legal on unix and Windows but it does create issues with shell interpretation so it would be good to add that to the list. |
Good points. Something like "$DENO_DIR/[id-in-DB or hash]/[stripped-file-basename].[appended-extension-from-content-type]" would be a nice compromise for readable filenames in editors and search. |
Add original URL to metadata. This is so the VS Code Plugin can reverse look up the URL for cache entries. Ref #4069.
@ry I have built Deno v0.35.0 locally and it works find. waiting v0.35.0 release, then extensions will be released soon |
vscode-deno@3.0.0 have been release and it works again with Deno v0.35.0, it is up to @ry to decide whether to close the issue If implement the |
@axetroy Awesome - glad to hear it's unbroken. I will close this issue but keep working on the "deno info" improvements. |
Add original URL to metadata. This is so the VS Code Plugin can reverse look up the URL for cache entries. Ref denoland#4069.
Previously Deno cached HTTP resources in a directory structure that was similar to the URL
It looked like this
This was a very simple and nice system as it's very obvious how URLs map to the cache. VS code plugin made assumptions about this layout.
However we discovered there are some URLs that don't easily map to this structure - in particular URLs with query strings, or URLs that don't have a file extension but serve TS/JS. (See #4030)
So now we use a cache structure which is flat and hashes URL
The VS Code plugin is very important - so let's figure out how to resolve this problem.
cc @axetroy @bartlomieju
The text was updated successfully, but these errors were encountered: