Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kobo sync and large libraries #1276

Open
shermp opened this issue Mar 29, 2020 · 31 comments
Open

Kobo sync and large libraries #1276

shermp opened this issue Mar 29, 2020 · 31 comments

Comments

@shermp
Copy link

shermp commented Mar 29, 2020

This is both a problem report, and a feature request(s) combined, hence the blank issue.

The Problem
My Calibre library now has a few thousand books in it (it's been going for 12 years now...), and my Aura H2O simply times out when attempting to sync with calibre-web. The Kobo eventually gave a 'sync failed' message, and my wireshark capture shows that it gave up at around ~286K characters.

There needs to be some method of making large libraries more usable for kobo sync. A few ideas off the top of my head:

  • I notice the initial sync sends the full metadata for each book. Is this required? Or is the Kobo smart enough to request additional metadata later?
  • How about "archiving" the library by default, and "unarchiving" the books we want to sync? I actually don't want thousands of books on my Kobo at a time...
  • Is it possible to recreate the store API, such that one could browse one's Calibre library as though we were browsing the Kobo store, then "purchase" books we want to sync?

@shavitmichael do you have any ideas or thoughts on this?

@shavitmichael
Copy link
Contributor

You raise an interesting problem.

I notice the initial sync sends the full metadata for each book. Is this required? Or is the Kobo smart enough to request additional metadata later?

It turns out the Kobo device already has a solution for this problem, we just haven't been making use of it yet. First some background:
We respond to the sync request with one (possibly) massive reply for all new books, tags, and state since the last request. We recognize which books are new to the device thanks to timestamps we pass around in the SyncToken. Each iteration of the protocol can be though of as updating the device to the timestamp of the last modified book, say t_latest.
If necessary, we can instead update the device to some earlier timestamp t_earlier; so that the device catches up to t_latest over multiple rounds instead of the single massive one we currently have. It turns out that setting the "x-kobo-sync" header to "continue" informs the device that it's not fully up to date, and causes it to queue another call to the sync endpoint.


How about "archiving" the library by default, and "unarchiving" the books we want to sync? I actually don't want thousands of books on my Kobo at a time...

That's definitely doable. I only added "Archiving" to CalibreWeb to support library management for the Kobo, and given your use-case it might have been slightly ill-conceived. Perhaps "Kobo Library View" might be a better name?
We could add a user-controllable toggle so that it can either control:

  1. Books on the device (for you :-) ).
  2. Books not on the device (for me :-) ).

Is it possible to recreate the store API, such that one could browse one's Calibre library as though we were browsing the Kobo store, then "purchase" books we want to sync?

It's likely possible as long as the device get's the store url from one of:

  1. A value in the device eReader.conf file
  2. A value in the v1/initialization response.

I've never looked into it before, so I have no idea how easy or difficult it would be...

@shermp
Copy link
Author

shermp commented Mar 30, 2020

Just had an idea, how about using the shelves system to decide what to sync? Create a shelf for kobo books, with a global toggle to add/remove all books, then sync whatever is in that shelf. Would that work?

As an aside, it doesn't look like calibre-web allow for selection and action on multiple books at a time. That could make matters annoying :(

@shavitmichael
Copy link
Contributor

Maybe? For what it's worth #1266 is also adding support for syncing CalibreWeb shelves to Kobo Collections but I don't think that should interfere.

@pgaskin
Copy link

pgaskin commented Apr 8, 2020

It's likely possible as long as the device get's the store url from one of:

IIRC, it comes from the initial API endpoint list. If not, I can write a patch for it.

BTW, nice work so far with the Kobo sync API!

@shavitmichael
Copy link
Contributor

Sorry for the lack of activity here; I can start working on the sync continuation mechanism sometime this week since we'll want that implemented for large libraries regardless of how they're managed.


Re-purposing the Archive table so that books can be selected for syncing inclusively as well as exclusively is pretty easy since we probably don't even need to change the database schema nor the Kobo implementation. The CalibreWeb UI can just display the complement of the table content.

Perhaps we can also add a button to shelves to add/remove-all to the Archive table?
I'm not exactly married to the Archive table either, if people find that using shelves for the UI and/or the implementation is more useful/cleaner.

I noticed @OzzieIsaacs has a tool for importing shelves from Calibre that could also be helpful here.


Thanks for the encouragment @geek1011 :) .

@shermp
Copy link
Author

shermp commented Apr 14, 2020

My preferred use case would be to have the ability to disable syncing all books by default, then syncing only a few titles at any given time. So there would need to be a way of 'archiving'/'unarchiving' easily from the web UI.

@gpulido
Copy link

gpulido commented May 6, 2020

Hello,
I hope you don't mind me stepping in the discussion.
I have also the same problem of a large library that is shared between some users so each of then would like to have different books "listed" on their kobo devices. The shelf functionality seems to fit very nicely with this use case. Just have one shelf with an specialized name. However what if one user has two different devices with different list? Maybe the sync token could be generated from the shelf so it would add the shelve id an then no need to specialized reserved name.

Besides that it would be amazing to have listed all the books as a store and be able to "buy/ import/sync" into the kobo device.
Also I would like to congrats for the amazing work on all the calibre-web development.

@DruidGreeneyes
Copy link

@shavitmichael Any update on the sync continuation mechanism? I'm trying to get a new device up to date against my library and have run into this. If you haven't got the time or energy, I can spare some time to write code if somebody can give me some more detail on how you all would like it to work.

@shavitmichael
Copy link
Contributor

shavitmichael commented Jul 24, 2020

Sorry for the late response, things are a little busy right now and I probably won't have time to implement this for a few weeks.

I have a very incomplete attempt at implementing sync continuation from a few months back, which I've now pushed in commit a8b90ce for others to peruse ; although I left things in a pretty confusing state.
Setting the x-kobo-sync header to continue causes the device to repeat calls to the v1/library/sync endpoint, the problem is figuring out a way to break-up the protocol in a way that we don't skip over new books, reading state updates, new collections, deleted books, etc.

IIRC, I was trying to split up the v1/library/sync call into phases, such that we'd only move onto the next "phase" once the previous was up to date. For example, we would:

  1. Go back and forth with the device, syncing new/updated books N entries at a time, incrementing the SyncToken's book_last_modified each time as it catches up, until book_last_modified on the token matches the server's book_last_modified.
  2. We'd then move on and do the same thing with ReadingStates, until the ReadingState timestamp in the Synctoken matches the server's latest reading state.
  3. Finally we'd sync book deletions.

I forget where exactly I got held up in the implementation, but I think it's possible this idea will run into some edge cases.

In addition, suppose there are M entries to update which all have the same last_modified_timestamp, such that M > N. Since the number of entries returned is capped at N, we also add an id to the SyncToken to mark the id of the N'th entry that was returned, so that syncing can resume at that id on the next round. There's some references to this idea in the commit (e.g: books_last_id), although I don't think I finished writing all the code for it.

@ProtoJazz
Copy link

I really like the idea of using a shelf or something to sync. I also have a large library I like having access to, but don't need synced all the time

@laurentserena
Copy link

I would be really interested in this too..is the Dev still on-going?

@OzzieIsaacs OzzieIsaacs moved this from To do to In progress in Improve Kobo Sync Dec 5, 2020
@OzzieIsaacs OzzieIsaacs pinned this issue Dec 5, 2020
@Floppy
Copy link

Floppy commented Dec 27, 2020

Adding a bit more data - I seem to be getting the same thing with a library of around 750 ePubs, 1.3k books in total. That's running in docker on a raspberry pi 4, so it might be happening with a smaller library because of slower response time?

@user34756361233
Copy link

user34756361233 commented Jan 2, 2021

Having a library with 7k+ ebooks sync failed every time. Made a new library with only a handfull of books and syncing works (almost) like a charme. So I would welcome some of the options already mentioned:

  • Having a do/don't sync with Kobo option would be a great thing.
  • Only sync a shelf ("sync with Kobo shelf") would be great as well.

When only having the Kobo at hand the option to work with the library as with the Kobo shop would also be great.

Until that time I am afraid I will have to work with 2 libraries.

@OzzieIsaacs
Copy link
Collaborator

OzzieIsaacs commented Jan 4, 2021

In the newest commit version a partial sync is implemented, 100 books are getting synced per request. If there are more books left, kobo should request the next books and continue until all books are synced. You can change the no. of books in kobo.py in the variable SYNC_ITEM_LIMIT until we have found a good no of books to sync. I will not make this setting available to the user interface, as I think there will be a number which fits for all.
Further enhancements are planed regarding size of the send cover file, this will also improve the situation.

Having a do/don't sync with Kobo option would be a great thing.

You can set the not to sync books to "archived", but I understand that this is not practicable for 7k books, I need time to extend the books table page to mass change books.

@laurentserena
Copy link

Thank you so much!! Are you planning at some point to add the feature to sync to kobo on only one shelf?

@cerbaire
Copy link

cerbaire commented Jan 5, 2021

Thanks. It is working for me. (Kobo Libra H2O)

@Floppy
Copy link

Floppy commented Jan 15, 2021

I'd happily test this to lend a bit more evidence to the fix but I'm running my setup in docker. Is there a beta build image for this that I could help test?

@otapi
Copy link
Contributor

otapi commented Jan 18, 2021

Hi guys, can you explain a bit more how the sync with kobo feature works? Does kobo download all the synched books or only a list (catalog) of the books?

@pgaskin
Copy link

pgaskin commented Jan 18, 2021

A list.

@rolfberkenbosch
Copy link

rolfberkenbosch commented Jan 19, 2021

In the newest commit version a partial sync is implemented, 100 books are getting synced per request. If there are more books left, kobo should request the next books and continue until all books are synced. You can change the no. of books in kobo.py in the variable SYNC_ITEM_LIMIT until we have found a good no of books to sync. I will not make this setting available to the user interface, as I think there will be a number which fits for all.
Further enhancements are planed regarding size of the send cover file, this will also improve the situation.

Having a do/don't sync with Kobo option would be a great thing.

You can set the not to sync books to "archived", but I understand that this is not practicable for 7k books, I need time to extend the books table page to mass change books.

I found another problem if you have alot of ebooks. The following line has no limit, so the SYNC LIMIT 100 is not affected but makes my kobo to gives an error because of the initial sync cannot be started:

book = calibre_db.session.query(db.Books).filter(db.Books.id == kobo_reading_state.book_id).one_or_none()

See for the debug log what i saw when i copy/paste the sync url (https://calibre//kobo/**SECRETTOKEN**/v1/library/sync?Filter=ALL&DownloadUrlFilter=Generic,Android&PrioritizeRecentReads=true) in my browser. The message says it was blocked because of the database. But i was waiting for i think almost 5 minutes.

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.8/dist-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/app/calibre-web/cps/kobo_auth.py", line 112, in inner
return f(*args, **kwargs)
File "/app/calibre-web/cps/web.py", line 105, in inner
return f(*args, **kwargs)
File "/app/calibre-web/cps/kobo.py", line 228, in HandleSyncRequest
book = calibre_db.session.query(db.Books).filter(db.Books.id == kobo_reading_state.book_id).one_or_none()
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3459, in one_or_none
ret = list(self)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3535, in __iter__
return self._execute_and_instances(context)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 3560, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1011, in execute
return meth(self, multiparams, params)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
util.raise_(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/util/compat.py", line 182, in raise_
raise exception
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
self.dialect.do_execute(
File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 609, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked
[SQL: SELECT books.id AS books_id, books.title AS books_title, books.sort AS books_sort, books.author_sort AS books_author_sort, books.timestamp AS books_timestamp, books.pubdate AS books_pubdate, books.series_index AS books_series_index, books.last_modified AS books_last_modified, books.path AS books_path, books.has_cover AS books_has_cover, books.uuid AS books_uuid, books.isbn AS books_isbn, books.flags AS books_flags
FROM books
WHERE books.id = ?]
[parameters: (3954,)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

@Dinth
Copy link

Dinth commented Aug 21, 2021

I have finally managed to set up Kobo Sync. My library in calibre-web is almost 900 books, but i can only see around 175 books on my Kobo. Is there a way to get all books visible on my Kobo?

@OzzieIsaacs
Copy link
Collaborator

OzzieIsaacs commented Aug 21, 2021

There is a bug in the Sync, it syncs only 100 Books, so Remove all Books from Library except 100, Sync, add next 100 and so on. Or you could try the Development Branch it‘s working there, But Not Perfect. Or edit kobo.py file Theres a constant no of Books to Sync put it to 1000, Might work (timeout Problem Maybe)

@Dinth
Copy link

Dinth commented Aug 21, 2021

Thats fine Ozzie, im not in hurry :) After reading the above i was under impression that 100 books is a limit for a single sync (and it will download another 100 during the next one), but thats obviously wrong :)

Keep up with the amazing work!

@OzzieIsaacs
Copy link
Collaborator

Syncing books should work now in latest nightly version. Reading status (if changed from calibre-web side) is now also chunked synchronized

@returntrip
Copy link

I have around 650 books in my collection and around 470 are fetched/displayed via Kobo sync.

I have tried to use the nightly version (via linuxerver.io container) but the rest of the books are not being synced. Am i doing something wrong?

I think the book that are not being synced are books that were not in kepub format originally but were cbz and pdf. Is there a way to force a full resync?

@OzzieIsaacs
Copy link
Collaborator

@returntrip I'm sorry Kobo only syncs epub and kepub files. PDF or other formats are not synced, and there is nothing you can do to change this (it's a limitation of kobo)

@returntrip
Copy link

returntrip commented Oct 17, 2021

@returntrip I'm sorry Kobo only syncs epub and kepub files. PDF or other formats are not synced, and there is nothing you can do to change this (it's a limitation of kobo)

No need to apologize :). I am really grateful for all your work on calibre-web and the Kobo sync feature!

For those books that were in PDF and cbz, I have uploaded the epub version and then converted them to kepub, but unfortunately they are still not synching. Is there any way to force full resync? Thanks.

@Dinth
Copy link

Dinth commented Nov 14, 2021

Same issue here. I only have around 185 books synced (out of 1030 epubs in the library)

@aob1au
Copy link

aob1au commented Dec 4, 2021

I have noticed that my existing library will fail to sync all of the books (which is another issue being reviewed). However, by removing a book in calibre and manually adding it back in, that book will then sync without issue. In this instance it is only the EPUB file being read back into the database, not the metadata xml file.

However, using the calibre "Add by folder and sub-folder" option or the import/transfer old library options, the sync does not get initiatied. This import the metadata from the OPF XML file.

So it seems there is a flag or variable getting reset when a new item is added direct from epub that is not getting reset using the other methods. I've tried using SQLite queries to interrogate and see which variable is impacting this, but don't seem to have come up with a result

@decisoft
Copy link

decisoft commented Aug 19, 2022

Hi! I've read the whole thread. I'm facing this issue. I have around 2.000 books (only epubs) on my instance, and on my Kobo Clara HD & Kobo Nia only appears around 1.500. Before reading this issue, my partner and I were reuploading the books thinking we don't have them and then resync, this method works. But after several times facing this issue, I looked up here.

So, what can I do now? What is the best option to sync the whole library or to complete the sync with the .epubs that are not showing on ours ereaders? Syncing only shelfs is not an option.

Should do I do a fresh install of Calibre-web and try to sync from scratch?
Should do I reupload manually the books that are not on the ereader?

Thanks in advance :)

Edit: My ereader (Kobo Clara HD) shows 1565 books in my library, my partner's one (Kobo Nia) 1479. Same instance

@PVaissiere
Copy link

In the newest commit version a partial sync is implemented, 100 books are getting synced per request. If there are more books left, kobo should request the next books and continue until all books are synced. You can change the no. of books in kobo.py in the variable SYNC_ITEM_LIMIT until we have found a good no of books to sync.

Hi,
It's working but depends on CPU performance This might exceed the time out, which is roughly 30s, and triggers a 499 http error code (Client Closed Request) by the Kobo.
I don't find this value in eReader.conf, unless you know where and how to change it ;-)

I will not make this setting available to the user interface, as I think there will be a number which fits for all.

Why not add a file in /config directory so we can change the value 'on the fly' ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests